Miguel Casquilho (DEQB / IST / UTL) — mcasquilho@ist.utl.pt 2008-09-15 file=[LEQ.formulas.doc]
«Laboratórios de Engenharia Química II» (LEQ II)
T — true value M — measured value
p 6
Error: e= xmeasured −xtrue =M −T T
M
eabs = −
T T M T
e e −
=
= abs
rel
{1}
T is estimated by M .
p 6 T ≅ M → T =M {2}
p 7 Accuracy (Exactidão) — agreement between M and T.
Precision (Precisão) — agreement between several M’s (in the same
conditions), i.e., repetitions or replicates (statistical concept). Expresses reproducibility (reprodutibilidade). [(Numerical) precision (precisão numérica) resolution, number of significant figures (numerical concept).])
p 9
∑
=∞
= → n i
n xi
n 1
lim1
µ
∑ ( )
∞ =
→ −
= n
i
n xi
n 1
1 2
lim µ
σ
∑
=∞
→ −
= n
i
n xi
n 1
lim1 µ
δ (Gauss:) δ ≅0.8σ
{3}
p 11
( ) [ ( ) ] ( )
( )
( ) ( ) ( ) [
1( ) ]
2( )
1Pr Pr
Pr
inside Pr
− Φ
= Φ
−
− Φ
=
− Φ
− Φ
=
= +
<
<
−
=
− < = − <+
=
= +
<
<
−
=
±
∈
≡
k k
k k
k
k z k x k
z k
k x
k k
x k
P
σ µ
σ µ σ
µ σ
µ
{4}
p 11
( ) ( )
(
; ; ;TRUE)
NORMDIST NORMSDIST
Excel Excel
σ µ σ µ k x
k k
+
=
=
=
= Φ
= +
Φ +
=
+ Φ
= 2
NORMINV 1 2
1 2
2
1 inside inv inside Excel ins.
inv P P P
k
{5}
(See probabilities below.)
The classical convention is usually adopted here: lower case for pdf (probability density function), upper case for cdf (cumulative distribution function).
Gaussian distribution
x Prob. (%) z (=
σ µ
− x ) (k ≡ z) P = Φ(z) z = Φinv(P)
µ ± σ 68.3 1
90 1.64
95 1.96
µ ± 2 σ 95.4 2
98 2.33
99 2.58
µ ± 3 σ 99.7 3 µ ± 4 σ 99.98 4
Sample (statistics): average [média (amostral)], variance (variância), standard deviation (desvio-padrão)
p 12
∑
== n
i
xi
x n
1
1 T x
n→∞
=lim
( )
∑
=− −
= n
i
i x
n x s
1 2 2
1
1
∑ ( )
=
− −
=
= n
i
i x
n x s
s
1 2 2
1 1
{6}
Average deviation (desvio médio):
p 12
∑
=
−
= n
i
i x
n 1 x
δ 1 {7}
For a Gaussian variable (n→∞), δ→~0.80σ.
Coefficient of variation (coeficiente de variação):
p 12
x
Cv = s {8}
p 13
( )
X σnσ =
n t s x±
= µ
{9}
For large samples, i.e., if n → ∞ (or n > ~50), t → z.
Caution: the usual notation (following), tP,ν, may be confusing.
p 13
(
ν)
ν
ν
ν ν
; 1 TINV 2 ;
2 1
1 2 ;
2 1 1
2 ; 2 1
ins.
Excel inv ins.
nside inv
nside inv
,
P P T
P n T
P n T
t
t P i i
−
=
= +
=
− = −
−
+ = −
=
≡
{10}
Rejection of an outlier, say, xk
p 14
Calculate, without the (possible) outlier (so, n becomes n – 1):
x, s P s T
x s t
x P
+
±
≡
± ν ;ν
2 2
1 inside
inv ,
{11}
If xk is not in this interval, reject; otherwise, accept.
Kurtosis (curtose):
p 15
( )
( )
( )( )( ) ( )
(
2)(
1 3)
3 3 2 1
1 vector KURT
kurt
2
1
4 Excel
−
−
− −
−
−
−
−
= +
=
=
∑
= n nn s
x x n
n n
n
n n
i
i {12}
(See the Excel Help for kurtosis and skewness.) There are two trends for the use of kurtosis. The above definition makes kurt = 0 for a (large) Gaussian sample, instead of the classical value 3. Thus, relatively to the Gaussian, for kurt < 0, the distribution is flat; and for kurt > 0, it is peaked.
Skewness (enviesamento):
p 15
( ) ( )( ) ∑
=
−
−
= −
= n
i i
s x x n
n n
1
3 Excel
2 vector 1
SKEW
skew {13}
Indicates asymmetry (e.g., 0 for Gaussian). If positive, long tail to the positive side;
and vice-versa.
pp 16–24 (Measurements, error propagation (upper limit of error, probable error. Error propagation. Significant figures.)
Regression analysis
Free straight line (general case)
p 25 y=α0 +α1x {14}
Notation akin to Walpole & Myers [1989].
( )
∑
=−
= n
i i
xx x x
ss
1
2
∑ ( )
=
−
= n
i i
yy y y
ss
1
2
( )( )
∑
=−
−
= n
i
i i
xy x x y y
ss
1
{15}
Slope (α1), intercept (α0) (coeficiente angular, ordenada na origem):
p 28
xx xy
ss
= ss ˆ1
α αˆ0 = y−αˆ1x {16}
xx
xx ss
s = syy = ssyy {17}
Coefficient of determination (coeficiente de determinação):
yy xx
xy
ss ss R ss
2
2 = {18}
Variance of the correlation:
i i
i y y
e = ˆ −
SSE, sum of squares of the errors (about the regression line):
∑
== n
i
ei 1
SSE 2
2SSE var 1
= −
err n sxy = varerr
{19}
Standard errors:
( )
xx
xy ss
x s n
2 0
std_errα = 1+
( )
xx xy
s
= s
std_errα1 {20}
Confidence interval of the intercept and the slope, α0 and α1:
p 29 α0 =αˆ0 ±tP,ν std_err
( )
α0 α1 =αˆ1±tP,ν std_err( )
α1ν = n – 2 {21}
Confidence interval of (one value of) yi:
p 30
( ) ( )
xx i xy
i
i ss
x x s n
t y Y
2 1
1 1
ˆ = ˆ ± + + − {22}
Confidence interval of the average of (many values of) yi:
p 31
( ) ( )
xx i xy
i
i ss
x x s n
t y Y
2 ave.
ˆ 1
ˆ = ± + − {23}
Remark that err
( )
Yˆi ave.<err( )
Yˆi 1, i.e., of course, the average (of predicted values) varies less than the individual predicted value.Straight line through the origin
p 33 y=α1x {24}
Slope (α1) (coeficiente angular):
p 33
xx n
i i i
ss y
∑
x= =1
α1 {25}
Standard error:
( ) ∑
=
= n
i i xy
x s
1 2
std_errα1
{26}
Confidence interval of the slope, α1: as above.
R2 and some other statistics: not easily applicable (not recommended).
ANOVA (Analysis of variance)
p 45
( ) ( ) ( )
SSE 1
2
SSR 1
2
SST 1
2
∑
ˆ∑
ˆ∑
== +
= =
− +
−
=
− n
i
i i n
i i n
i
i y y y y y
y {27}
ANOVA (Analysis of variance)
Allows to test the hypothesis H0: α1 = 0 (null h.) against H1: α1≠ 0, i.e.:
does y really vary with x or is it just random fluctuation (chance)? Source of
variation
Degrees of freedom
Sum of
squares Mean square Computed F
Regression 1 SSR MST = SSR / 1
MSE
= MST F
Error n – 2 SSE MSE=
−2
= n
2 SSE s
Total n – 1 SST
ANOVA (Análise de variância) Fonte de
variação
Graus de liberdade
Soma dos quadr. dos desvios
Médias quadráticas F calculado Desvios da
regressão vs.
y médio
1 SSR MST = SSR / 1 SSR2
MSE MST F= = s Desvios entre
val.s exper. e calculados
n – 2 SSE MSE =
−2
=n
2 SSE s Total (desvios
dos val.s exper.
vs. y médio)
n – 1 SST
If F is “sufficiently” large (value from software or tables), then H0 is rejected (y does vary with x).
“F” (Fisher) is the ratio of two chi-square variables, each divided by its degrees of freedom: 1 for SSR, n – 2 for s2.
Reject if F > Fcritical = Finv(1-P; νnumer., νdenom.) References:
– WALPOLE, Ronald E. and Raymond H. MYERS, 1989, “Probability and statistics for engineers and scientists”, 4.th ed., Macmillan, New York, NY (USA) (ISBN 0-02-424210-1), pp 366 ff.