Quantitative Methods
Applied to Accounting
UnB/PPGCont 2019/1
Types of variables
• Types of data and notation: time series, cross-section, panel data
• Determination:
• endogenous variables • exogenous variables
• pre-determined variables
• Inflation: nominal variables x real variables
• Behaviour over time: Stationary x Non-stationary
• Spurious regressions • Cointegration
Simple regression
• The data
• Population and sample • Regression equation • DGP, PRF, SRF
• Regression Coefficients • Ordinary Least Squares
• Derive the OLS Estimator (Brooks) • Derive the OLS Estimator (Heij)
• Second order conditions for a minimum: Hessian > 0 • Derive the variance of (Kutner)
• Linearity
• Assumptions of CLRM
Computation of the OLS estimators and
• The OLS estimators:
and • 2nd order condition for a minimum
• Hessian matrix •
Maxima and minima of functions of several variables
(Appendix A7, p. 741)
Computation of the OLS estimators and
• A square matrix is positive definite iff all the elements in its main diagonal are positive and its determinant is positive.
Determinant of the Hessian matrix
Properties of the OLS Estimator: BLUE
• Best (smallest variance) • Linear
• Unbiased • Estimators
The Gauss-Markov Theorem
Properties of the OLS Estimator: BLUE
• Unbiasedness: ,
• Consistency: •
Properties of the OLS Estimator: BLUE
Estimating the Variance of the Disturbance Term 2 t u t uˆ
Estimating the Variance of the Disturbance Term (cont’d) 2 ˆ2
T u s t ˆ2 t uPrecision and Standard Errors x y x T x y x T y x t t t ˆ 2 2 and ˆ ˆ
2 2 2 2 2 2 2 2 2 1 ) ( 1 ) ˆ ( , ) ( ) ˆ ( x T x s x x s SE x T x T x s x x T x s SE t t t t t t Example: How to Calculate the Parameters and Standard Errors ( * . * . ) *( . ) . 830102 22 416 5 86 65 3919654 22 416 5 2 0 35 . . * . . 86 65 0 35 4165 59 12 6 . 130 , 3919654 , 65 . 86 , 5 . 416 , 22 , 830102 2
RSS x y x T y x t t t t t x yˆ ˆ ˆ t t x yˆ 59.120.35Example (cont’d) 55 . 2 20 6 . 130 2 ˆ2 T u s t
0.0079 5 . 416 22 3919654 1 * 55 . 2 ) ( 35 . 3 5 . 416 22 3919654 22 3919654 * 55 . 2 ) ( 2 2 SE SE ) 0079 . 0 ( 35 . 0 ) 35 . 3 ( 12 . 59 ˆt xt y An Introduction to Statistical Inference
• Hypothesis Testing: Some Concepts • One-Sided Hypothesis Tests
• Two-sided Hypothesis tests
• The Probability Distribution of the LS Estimators
• Testing Hypotheses: The Test of Significance Approach
• Determining the Rejection Region for a Test of Significance • The Confidence Interval Approach to Hypothesis Testing
• How to Carry out a Hypothesis Test Using Confidence Intervals • Confidence Intervals Versus Tests of Significance
The Probability Distribution of the Least Squares Estimators (cont’d)
~ 0,1 var ˆ N ~ 0,1 var ˆ N 2 ~ ) ˆ ( ˆ T t SE 2 ~ ) ˆ ( ˆ T t SE
The Student’s t-distribution
• The t distributions were discovered by William S. Gosset in 1908. Gosset was a statistician employed by the Guinness brewing company which had stipulated that he not publish under his own name. He therefore wrote under the pen name “Student”.
• These distributions arise in the following situation.
• Suppose we have a simple random sample of size n drawn from a Normal population with mean and standard deviation . Let denote the sample mean and s, the sample standard deviation. Then the quantity has a t distribution with n-1 degrees of freedom.
• Note that there is a different t distribution for each sample size, in other words, it is a class of distributions. When we speak of a specific t distribution, we have to specify the degrees of freedom. The degrees of freedom for this t statistics comes from the sample standard
deviation s in the denominator of equation 1.
• The t density curves are symmetric and bell-shaped like the normal distribution and have their peak at 0. However, the spread is more than that of the standard normal distribution. This is due to the fact that in formula 1, the denominator is s rather than . Since s is a random quantity varying with various samples, the variability in t is more, resulting in a larger spread.
• The larger the degrees of freedom, the closer the t-density is to the normal density. This reflects the fact that the standard deviation s approaches for large sample size n. You can visualize this in the applet below by moving the sliders.
Testing Hypotheses: The Test of Significance Approach test statistic SE * ( ) SE( ) SE( ) t t t x u y
The Test of Significance Approach: Drawing Conclusions
How to Carry out a Hypothesis Test Using Confidence Intervals SE( ) SE( ) )) ˆ ( ˆ ), ˆ ( ˆ ( tcrit SE tcrit SE
Confidence Intervals Versus Tests of Significance t SE t crit crit * ( ) ) ˆ ( * ˆ ) ˆ ( t SE SE tcrit crit
Performing the Test test stat SE * ( ) . . . 05091 1 0 2561 1917 ( 0.0251,1.0433) 2561 . 0 086 . 2 5091 . 0 ) ˆ ( ˆ tcrit SE
Example 1
• Assume you are testing H0: = 1 against H1: >1 at the 5% significance level, with DF = 30.
• By running the regression, you got a t-statistic () = 1.7213
• The critical value is 1.6973, so you reject H0, which means that you have evidence that > 1.
• However, maybe somebody says that your sample is too small. Then you decide to increase your sample to DF = 60 and run the regression again.
• Suppose that with DF = 60 you get a t-stat = 1.6505. Your new critical value is 1.6706 and so you don’t reject H0 in this case.
• Assuming that the new regression is better (more DF) and the t-stat is closer to its true value, we can say that in the test applied to the 1st regression you committed a Type I
Example 2
• Assume you are testing H0: = 0 against H1: 0 at 10%.
• After estimating a simple linear regression with a sample of 42 observations, you get a t statistic = 1.674.
• The appropriate 2-sided critical value is 1.6839, hence you don’t reject H0 since you think that there is evidence that = 0.
• Suppose that another researcher points out that you have used a
inappropriate estimator in your regression, e.g. OLS instead of 2SLS.
• Then you decide to estimate your regressions with a more robust method (e.g. 2SLS) and finds now that t-stat = 1.7512.
Example 2: solution
• You didn’t reject H0 because t-stat = 1.674 < crit.value = 1.6839 • But, the more robust t-stat = 1.7512 > crit. value = 1.6839
• Hence, you committed a type II error = false negative
• This occurs if we don’t reject H0 when it is false: (not the regression β!) • Didn’t reject H0: type II error Reject H0: correct decision
The probability of a Type II error: example 3
• The probability of not rejecting H0 when it is false is • Example:
H0 : = 1
H1 : 1
• If H0 is false, we have to consider H1: 1, but we don’t know what is
the value of
• Then we have to choose a value for 1, for example = 0.9 • This means we are assuming that 0.9 is the actual value of β
Type II error
• Hence t-stat
• These are the values of the t-stats (2-sided test) when = 0.9
• If = 0.9 is the true value what would be the p-values and the non rejection region?
• To find the probability of a type II error, we have to find the area of no
•
+1.5264 -1.5264
rejection rejection No rejection
Type II error
• We can use the Excel function DIST.T to get the area of rejection: DIST.T(1.5264, 20, FALSO) = 0.1239
• Since we have 2 rejection areas, the non rejection area will be: • Hence, the probability of not rejecting a false H0 is:
• •
Erro tipo II: Exemplo 4
• If the previous test were one sided: H0: = 1 versus H1 : < 1
• Assuming again = 0.9, t-stat = -1.5264
• Hence the area of no rejection would be =, so that = P(type II error) =
87.61%
•
-1.5264
The power of a test
• By definition, the power of a test of hypothesis is the probability that a false null hypothesis will be rejected, i.e.
• Example 3: • Example 4: •
The power of a test
• In general, for every hypothesis test that we conduct, we'll want to do the following:
(1) Minimize the probability of committing a Type I error. That, is minimize α = P(Type I Error). Typically, a significance level of α ≤ 0.10 is desired.
(2) Maximize the power (at a value of the parameter under the alternative
hypothesis that is scientifically meaningful). Typically, we desire power to be 0.80 or greater. Alternatively, we could minimize β = P(Type II Error), aiming for a type II error rate of 0.20 or less.
Exercises
• Assuming regressions with k=2, perform the tests below for a biased t-stat and for a robust (unbiased) t-t-stat and check if the 1st attempt was a
type I, type II or no error:
1. Test: H0: = -2 against H1: < -2 at 5%, T = 72, biased t-stat = -1.68, unbiased t-stat = -1.64
2. Test: H0: = -1 against H1: -1 at 1%, T = 52, biased t-stat = -2.5, robust t-stat = -2.8
3. Test: H0: = 3 against H1: 3 at 10%, T = , •
Solutions
1. type I error 2. type II error 3. No error
The Trade-off Between Type I and Type II Errors
less likely
to falsely reject
Reduce size more strict reject null
of test criterion for hypothesis more likely to
rejection less often incorrectly not
A Special Type of Hypothesis Test: The t-ratio
test statistic SE i i i * test stat SE i i ( ) P-value: example 1
• In a simple linear regression with 22 observations, a t-statistic of 1.525 was obtained. Find the approximate p-value for a one-tailed test.
• Solution: from the t distribution table, we find 1.3253 at the 10% significance level and 1.7247 at the 5% significance level.
1.3253 1.7247
1.525 0.1997 0.3994
P-value: example 2
• In a simple linear regression with 22 observations, a t-statistic of 1.90 was obtained. Find the approximate p-value for a two-tailed test.
• Solution: from the t distribution table, we find 1.7247 at the 102=5% significance level and 2.086 at the 52=2.5% significance level.
x 2.086 1.7247 0.025 0.05 1.9 0.186 0.3613
Maximum Likelihood Estimation of ,
• The density of an observation yi for the normal error regression model
is:
• The likelihood function for n observations Y1, Y2 , … , Yn is the product
Maximum Likelihood Estimation of ,
Maximum Likelihood Estimation of ,
• Normal equations:
• The ML estimators and are the same as the OLS estimators. • But the ML estimator is which is biased.
GMM in simple regression
• The two moment conditions
• Suppose that we wish to estimate the parameters and in the model
• We suppose that the functional form is correctly specified in the sense that the DGP has parameters (0, 0) with the property that
GMM in simple regression
• Further we assume that the explanatory variable xi satisfies the orthogonality condition
• This provides 2 moment conditions, so that the model is exactly identified.
The GMM estimators
• The GMM estimates of and are obtained by replacing the expectation E by the sample mean
• These equations are equivalent to the 2 normal equations.
Homework: p-value exercises
• Simple regression, T = 62, t-statistic = 3.0, 1-tailed test, p-value = • Simple regression, T = 152, t-statistic = 1.4, 2-tailed test, p-value = • Simple regression, T = , t-statistic = 2.0, 2-tailed test, p-value =
Tasks
• Study:
• Brooks (2014): Chapter 1, 2 (maths and stats review), and 3, problems, exercises and other materials in:
https://www.cambridge.org/gb/academic/textbooks/introductory-econometrics • Heij et al (2004), Chapters 1 (maths and stats review) and 2, problems,
exercises, and other materials in:
http://global.oup.com/booksites/content/0199268010/
• Kutner (2005): Chapter 1 and 2, problems and exercices. • Gujarati, D.N. (2004). Chapters 1 and 2 and exercises.
Tasks
• Do the exercises from:
https://sydney.edu.au/stuserv/documents/maths_learning_centre/compo sitefunctionrule.pdf