SCREEN READER ADVISORY ON CASE SENSITIVITY:
In this document, the case of letters is vital:
UPPER CASE (e.g., X, Y, Z): Random Variables (population).
lower case (e.g., x, y, z): Specific realizations/observations (sample).
Subscripts: Usually denoted as "subscript i" or "subscript t".
Hats (e.g., beta-hat): Estimated values from your data.
1. Mathematical & Statistical Foundations
Expected Value (Expectation)
The "average" value of a random variable in the population.
Discrete RV: E[X] = Sum of [ x * Probability(X=x) ]
Continuous RV: E[X] = Integral from minus infinity to plus infinity of [ x * f(x) ] dx, where f(x) is the probability density function (PDF).
Linearity: E[aX + bY] = a E[X] + b E[Y]
Variance and Standard Deviation
Measures the spread or "risk" of a variable.
Var(X) = E[X^2] - (E[X])^2
Properties: Var(aX + b) = a^2 * Var(X).
Covariance and Correlation
Measures the linear relationship between two variables.
Cov(X, Y) = E[XY] - E[X]E[Y]
Corr(X, Y) = Cov(X, Y) / [ SD(X) * SD(Y) ]
Conditional Expectation & LIE
Law of Iterated Expectations (LIE): E[Y] = E[ E[Y|X] ]
Independence: If X and Y are independent, then E[Y|X] = E[Y] and Cov(X, Y) = 0.
Indicator Function (The Math behind Dummies)
An indicator function, often written as 1(.) or I(.), translates a logical condition into a number.
1(A) = { 1 if A occurs; 0 if A does not occur }
In Econometrics: This is used to create dummy variables. For example, if we define D = 1(Female), then D=1 for female observations and D=0 for male observations. The expectation of an indicator function is the probability of the event: E[1(A)] = P(A).
beta_1: The causal effect of X_1 on Y, holding all other factors constant (ceteris paribus).
u_i (Error Term): Captures all factors other than X that influence Y.
The Sample Model (Predictions)
Y_i_hat = beta_0_hat + beta_1_hat * X_1i + ...
e_i (Residual): The difference between observed and predicted value: e_i = Y_i - Y_i_hat.
Dummy Variables
Binary variables taking values 0 or 1. Used for categories (e.g., gender, region).
The Dummy Variable Trap: If you have G categories, you must only include G-1 dummy variables if you have an intercept (constant) in the model. Otherwise, you face perfect multicollinearity.
3. OLS Estimation & Derivation
The OLS Criterion
We choose the beta-hats to minimize the Sum of Squared Residuals (SSR).
Minimize: Sum [ (Y_i - beta_0 - beta_1*X_i)^2 ]
Simple OLS Estimators
beta_1_hat = Sample_Cov(X, Y) / Sample_Var(X)
beta_0_hat = Y_bar - (beta_1_hat * X_bar)
Goodness of Fit
R-squared (R^2): Percentage of variance in Y explained by the model.
Adjusted R-squared: Penalties for adding unnecessary regressors. Always lower than R-squared.
4. The OLS Assumptions (Gauss-Markov)
To ensure OLS is the Best Linear Unbiased Estimator (BLUE):
Linearity in Parameters: The model is a linear combination of betas.
Random Sampling: (X_i, Y_i) are i.i.d.
No Perfect Multicollinearity: Regressors aren't perfectly redundant.
Zero Conditional Mean (Orthogonality): E[u|X] = 0. This is the most critical assumption!
Homoskedasticity: Var(u|X) = sigma^2 (constant). If violated, we have Heteroskedasticity and need "Robust Standard Errors."
5. Statistical Inference
Hypothesis Testing Recipe
Null Hypothesis (H0): Usually beta_1 = 0 (no effect).
Alternative (H1): beta_1 is not 0.
Calculate t-statistic:
t = (beta_hat - hyper_null) / SE(beta_hat)
Decision Rule: If |t| > 1.96, reject H0 at 5% significance level.
p-values and Confidence Intervals
p-value: The probability of seeing a result as extreme as ours if H0 is true. Small p-value (< 0.05) = Reject H0.
95% Confidence Interval: [ beta_hat minus 1.96*SE, beta_hat plus 1.96*SE ].
The F-Statistic
Used to test multiple restrictions simultaneously (e.g., test if beta_1 = 0 AND beta_2 = 0).
6. Large Sample Theory (Asymptotics)
Law of Large Numbers (LLN)
Sample averages converge to population averages as n becomes large.
Y_bar ->p mu_Y
Assures that OLS is Consistent (beta_hat ->p beta).
Central Limit Theorem (CLT)
The CLT is the reason we can perform hypothesis tests (like t-tests) without knowing the underlying distribution of the errors, provided the sample size n is large.
1. The Formal Definition
As the sample size n increases, the distribution of the standardized sample mean converges to a Standard Normal distribution N(0, 1).
sqrt(n) * ( (Y_bar - mu) / sigma ) →d N(0, 1)
2. The "Root-n" Logic (√n)
Why do we multiply by √n? As n gets larger:
The variance of the sample mean, Var(Y_bar) = sigma^2 / n, shrinks toward zero.
By multiplying by √n, we "inflate" the shrinking variance back to a constant level (sigma^2), preventing the distribution from collapsing into a single point.
3. Application to OLS (The Derivation)
To see why OLS estimators are Asymptotically Normal, we decompose the slope estimator:
Numerator: The term Sum( (X_i - X_bar) * u_i ) is a sum of independent random variables. By the CLT, this sum (when scaled by 1/√n) becomes Normally distributed.
Denominator: By the Law of Large Numbers (LLN), the average (1/n) * Sum( (X_i - X_bar)^2 ) converges to the population variance Var(X).
Conclusion: Since the numerator is Normal and the denominator is a constant (in large samples), their ratio is also Normal.
Bottom Line: In small samples, β_hat is only normal if the errors u are normal. In large samples, β_hat is normal even if the errors are not normal. This allows us to use the 1.96 critical value for almost any large-sample regression.
7. Nonlinear Specifications
Logarithms (Percentages)
Log-Log Model: log(Y) = beta_0 + beta_1*log(X) + u. beta_1 is the Elasticity (1% change in X -> beta_1% change in Y).
Log-Level Model: log(Y) = beta_0 + beta_1*X + u. (1 unit change in X -> 100*beta_1% change in Y).
Level-Log Model: Y = beta_0 + beta_1*log(X) + u. (1% change in X -> 0.01*beta_1 unit change in Y).
Polynomials (Curvatures)
Model: Y = beta_0 + beta_1*X + beta_2*X^2 + u.
To find the effect of X on Y, you must take the derivative: dY/dX = beta_1 + 2*beta_2*X.
Interactions
Model: Y = beta_0 + beta_1*X + beta_2*D + beta_3*(X*D) + u.
The effect of X on Y now depends on whether D is 0 or 1.
8. Internal Validity & Sources of Bias
Internal validity asks: "Is our estimate of the causal effect correct for our sample?"
1. Omitted Variable Bias (OVB)
Occurs if a factor (Z) that influences Y is correlated with X and left out of the regression.
Direction of Bias: Bias = beta_Z * Corr(X, Z).
Positive Bias means OLS overestimates; Negative Bias means it underestimates.
2. Measurement Error
Error in Y: Just adds noise (higher SEs).
Error in X: Causes Attenuation Bias (drags the estimate toward zero).
3. Simultaneous Causality (Reverse Causality)
Occurs if Y also affects X (e.g., price and quantity in a market).
4. Sample Selection Bias
Occurs if data is missing non-randomly (e.g., only surveyed people who survived a surgery).
9. Advanced Models
Binary Dependent Variables
When Y is 0 or 1 (e.g., yes/no, success/failure).
Linear Probability Model (LPM): OLS on a binary Y. beta_1 measures change in probability. Problem: Can predict probabilities < 0 or > 1.
Probit & Logit: Nonlinear models that keep predictions between 0 and 1 using the Cumulative Normal or Logistic curve. Interpret via Marginal Effects.
Instrumental Variables (IV)
Used to solve Endogeneity (when E[u|X] is not 0). Requires an Instrument (Z):
Relevance: Z must be correlated with X (Corr(Z, X) != 0).
Exogeneity: Z must affect Y only through its effect on X (Corr(Z, u) = 0).
beta_1_IV = Cov(Z, Y) / Cov(Z, X)
Panel Data (Fixed Effects)
Data tracking the same entities over multiple time periods.
Fixed Effects (FE): Controls for all unobserved factors that are constant for each entity (e.g., cultural traits of a country).
10. Time Series Analysis
Autocorrelation
When a variable is correlated with its own past values. Measured by Lagged variables (Y subscript t minus 1).
Stationarity
A series is stationary if its mean and variance do not change over time. Non-stationary series (like stock prices) often have Unit Roots or Stochastic Trends.
AR(p) and ADL(p, q) Models
AR(1): Y_t = phi_0 + phi_1*Y_{t-1} + u_t.
ADL(1, 1): Autoregressive Distributed Lag. Uses lags of Y AND lags of another variable X.
Cointegration
If two non-stationary series move together in the long run, they are "Cointegrated."