«Laboratórios de Engenharia Química II» (LEQ II)

(1)

Miguel Casquilho (DEQB / IST / UTL) — mcasquilho@ist.utl.pt 2008-09-15 file=[LEQ.formulas.doc]

«Laboratórios de Engenharia Química II» (LEQ II)

T — true value M — measured value

p 6

Error: e= x_measured −x_true =M −T T

M

eabs = −

T T M T

e e −

=

= ^abs

rel

{1}

T is estimated by M .

p 6 T ≅ M → T =M {2}

p 7 Accuracy (Exactidão) — agreement between M and T.

Precision (Precisão) — agreement between several M’s (in the same

conditions), i.e., repetitions or replicates (statistical concept). Expresses reproducibility (reprodutibilidade). [(Numerical) precision (precisão numérica) resolution, number of significant figures (numerical concept).])

p 9

∑

=

∞

= → ⁿ i

n xi

n 1

lim1

µ

∑ ( )

∞ =

→ −

= ⁿ

i

n xi

n 1

1 2

lim µ

σ

∑

=

∞

→ −

= ⁿ

i

n xi

n 1

lim1 µ

δ (Gauss:) _δ ≅0.8_σ

{3}

p 11

( ) [ ( ) ] ( )

( )

( ) ( ) ( ) [

¹

( ) ]

²

( )

¹

Pr Pr

Pr

inside Pr

− Φ

= Φ

−

− Φ

=

− Φ

=

= +

<

−

=



 



− < = − <+

=

= +

<

−

=

±

∈

≡

k k

k

k z k x k

z k

k x

k k

x k

P

σ µ

σ µ σ

µ σ

µ

{4}

p 11

( ) ( )

(

^; ^; ^;^TRUE

)

NORMDIST NORMSDIST

Excel Excel

σ µ σ µ k x

k k

+

=

= Φ



 



=  +



 

 Φ  +

=



 

 + Φ

= 2

NORMINV 1 2

1 2

2

1 inside inv inside Excel _ins.

inv P P P

k

{5}

(See probabilities below.)

The classical convention is usually adopted here: lower case for pdf (probability density function), upper case for cdf (cumulative distribution function).

(2)

Gaussian distribution

x Prob. (%) z (=

σ µ

− x ) (k ≡ z) P = Φ(z) z = Φ^inv(P)

µ ± σ 68.3 1

90 1.64

95 1.96

µ ± 2 σ 95.4 2

98 2.33

99 2.58

µ ± 3 σ 99.7 3 µ ± 4 σ 99.98 4

Sample (statistics): average [média (amostral)], variance (variância), standard deviation (desvio-padrão)

p 12

∑

=

= ⁿ

i

xi

x n

1

1 T x

n→∞

=lim

( )

∑

=

− −

= ⁿ

i

i x

n x s

1 2 2

1

∑ ( )

=

− −

=

= ⁿ

i

i x

n x s

s

1 2 2

1 1

{6}

Average deviation (desvio médio):

p 12

∑

=

−

= ⁿ

i

i x

n 1 x

δ 1 {7}

For a Gaussian variable (n→∞), δ→~0.80σ.

(3)

Coefficient of variation (coeficiente de variação):

p 12

x

Cv = s {8}

p 13

( )

X σn

σ =

n t s x±

= µ

{9}

For large samples, i.e., if n → ∞ (or n > ~50), t → z.

Caution: the usual notation (following), t_P,ν, may be confusing.

p 13

(

^ν

)

ν

ν ν

; 1 TINV 2 ;

2 1

1 2 ;

2 1 1

2 ; 2 1

ins.

Excel inv ins.

nside inv

,

P P T

P n T

t

t _P ⁱ ⁱ

−

=



 



=  +

=



 



 − = −

−



 



 + = −

=

≡

{10}

Rejection of an outlier, say, xk

p 14

Calculate, without the (possible) outlier (so, n becomes n – 1):

x, s P s T

x s t

x _P 



 

 +

±

≡

± _ν ;ν

2 2

1 _inside

inv ,

{11}

If xk is not in this interval, reject; otherwise, accept.

Kurtosis (curtose):

p 15

( )

( )( )( ) ( )

(

²

)(

¹ ³

)

3 3 2 1

1 vector KURT

kurt

2

1

4 Excel

−

− −



 



 −

−

= +

=

∑

= n n

n s

x x n

n n

n

n ⁿ

i

i {12}

(See the Excel Help for kurtosis and skewness.) There are two trends for the use of kurtosis. The above definition makes kurt = 0 for a (large) Gaussian sample, instead of the classical value 3. Thus, relatively to the Gaussian, for kurt < 0, the distribution is flat; and for kurt > 0, it is peaked.

Skewness (enviesamento):

p 15

( ) ( )( ) ∑

=



 



 −

−

= −

= ⁿ

i i

s x x n

n n

1

3 Excel

2 vector 1

SKEW

skew {13}

Indicates asymmetry (e.g., 0 for Gaussian). If positive, long tail to the positive side;

and vice-versa.

pp 16–24 (Measurements, error propagation (upper limit of error, probable error. Error propagation. Significant figures.)

(4)

Regression analysis

Free straight line (general case)

p 25 y=α₀ +α₁x {14}

Notation akin to Walpole & Myers [1989].

( )

∑

=

−

= ⁿ

i i

xx x x

ss

1

2

∑ ( )

=

−

= ⁿ

i i

yy y y

ss

1

2

( )( )

∑

=

−

= ⁿ

i

i i

xy x x y y

ss

1

{15}

Slope (α1), intercept (α0) (coeficiente angular, ordenada na origem):

p 28

xx xy

ss

= ss ˆ1

α αˆ₀ = y−αˆ₁x {16}

xx

xx ss

s = s_yy = ss_yy {17}

Coefficient of determination (coeficiente de determinação):

yy xx

xy

ss ss R ss

2

2 = {18}

Variance of the correlation:

i i

i y y

e = ˆ −

SSE, sum of squares of the errors (about the regression line):

∑

=

= ⁿ

i

ei 1

SSE 2

2SSE var 1

= −

err n s_xy = var_err

{19}

Standard errors:

( )

xx

xy ss

x s n

2 0

std_errα = 1+

( )

xx xy

s

= s

std_errα1 {20}

Confidence interval of the intercept and the slope, α0 and α1:

p 29 α0 =αˆ0 ±t_P,_ν std_err

( )

α0 α1 =αˆ1±t_P,_ν std_err

( )

α1

ν = n – 2 {21}

(5)

Confidence interval of (one value of) yi:

p 30

( ) ⁽ ⁾

xx i xy

i

i ss

x x s n

t y Y

2 1

1 1

ˆ = ˆ ± + + − {22}

Confidence interval of the average of (many values of) y_i:

p 31

( ) ⁽ ⁾

xx i xy

i

i ss

x x s n

t y Y

2 ave.

ˆ 1

ˆ = ± + − {23}

Remark that ^err

( )

^Y^ˆⁱ ^ave^.^<^err

( )

^Y^ˆⁱ ¹, i.e., of course, the average (of predicted values) varies less than the individual predicted value.

Straight line through the origin

p 33 y=α₁x {24}

Slope (α1) (coeficiente angular):

p 33

xx n

i i i

ss y

∑

x

= =¹

α1 {25}

Standard error:

( ) ∑

=

= _n

i i xy

x s

1 2

std_errα1

{26}

Confidence interval of the slope, α1: as above.

R² and some other statistics: not easily applicable (not recommended).

(6)

ANOVA (Analysis of variance)

p 45

( ) ( ) ( )

SSE 1

2

SSR 1

2

SST 1

2

∑

ˆ

∑

ˆ

∑

=

= +

= =

− +

−

=

− ⁿ

i

i i n

i

i y y y y y

y {27}

ANOVA (Analysis of variance)

Allows to test the hypothesis H₀: α1 = 0 (null h.) against H₁: α1≠ 0, i.e.:

does y really vary with x or is it just random fluctuation (chance)? Source of

variation

Degrees of freedom

Sum of

squares Mean square Computed F

Regression 1 SSR MST = SSR / 1

MSE

= MST F

Error n – 2 SSE MSE=

−2

= n

2 SSE s

Total n – 1 SST

ANOVA (Análise de variância) Fonte de

variação

Graus de liberdade

Soma dos quadr. dos desvios

Médias quadráticas F calculado Desvios da

regressão vs.

y médio

1 SSR MST = SSR / 1 SSR₂

MSE MST F= = s Desvios entre

val.s exper. e calculados

n – 2 SSE MSE =

−2

=n

2 SSE s Total (desvios

dos val.s exper.

vs. y médio)

n – 1 SST

If F is “sufficiently” large (value from software or tables), then H₀ is rejected (y does vary with x).

“F” (Fisher) is the ratio of two chi-square variables, each divided by its degrees of freedom: 1 for SSR, n – 2 for s².

Reject if F > Fcritical = F^inv(1-P; νnumer., νdenom.) References:

– WALPOLE, Ronald E. and Raymond H. MYERS, 1989, “Probability and statistics for engineers and scientists”, 4.th ed., Macmillan, New York, NY (USA) (ISBN 0-02-424210-1), pp 366 ff.