Comprehensive Accessible Revision Sheet - Quantitative Economics (QE)

1. Mathematical & Statistical Foundations

Expected Value (Expectation)

The "average" value of a random variable in the population.

Discrete RV: E[X] = Sum of [ x * Probability(X=x) ]
Continuous RV: E[X] = Integral from minus infinity to plus infinity of [ x * f(x) ] dx, where f(x) is the probability density function (PDF).

Linearity: E[aX + bY] = a E[X] + b E[Y]

Variance and Standard Deviation

Measures the spread or "risk" of a variable.

Var(X) = E[X^2] - (E[X])^2

Properties: Var(aX + b) = a^2 * Var(X).

Covariance and Correlation

Measures the linear relationship between two variables.

Cov(X, Y) = E[XY] - E[X]E[Y]

Corr(X, Y) = Cov(X, Y) / [ SD(X) * SD(Y) ]

Conditional Expectation & LIE

Law of Iterated Expectations (LIE): E[Y] = E[ E[Y|X] ]

Independence: If X and Y are independent, then E[Y|X] = E[Y] and Cov(X, Y) = 0.

Indicator Function (The Math behind Dummies)

An indicator function, often written as 1(.) or I(.), translates a logical condition into a number.

1(A) = { 1 if A occurs; 0 if A does not occur }

In Econometrics: This is used to create dummy variables. For example, if we define D = 1(Female), then D=1 for female observations and D=0 for male observations. The expectation of an indicator function is the probability of the event: E[1(A)] = P(A).

2. The Linear Regression Model

The Population Model

Y_i = beta_0 + beta_1*X_1i + beta_2*X_2i + ... + u_i

beta_1: The causal effect of X_1 on Y, holding all other factors constant (ceteris paribus).
u_i (Error Term): Captures all factors other than X that influence Y.

The Sample Model (Predictions)

Y_i_hat = beta_0_hat + beta_1_hat * X_1i + ...

e_i (Residual): The difference between observed and predicted value: e_i = Y_i - Y_i_hat.

Dummy Variables

Binary variables taking values 0 or 1. Used for categories (e.g., gender, region).

The Dummy Variable Trap: If you have G categories, you must only include G-1 dummy variables if you have an intercept (constant) in the model. Otherwise, you face perfect multicollinearity.

3. OLS Estimation & Derivation

The OLS Criterion

We choose the beta-hats to minimize the Sum of Squared Residuals (SSR).

Minimize: Sum [ (Y_i - beta_0 - beta_1*X_i)^2 ]

Simple OLS Estimators

beta_1_hat = Sample_Cov(X, Y) / Sample_Var(X)

beta_0_hat = Y_bar - (beta_1_hat * X_bar)

Goodness of Fit

R-squared (R^2): Percentage of variance in Y explained by the model.
Adjusted R-squared: Penalties for adding unnecessary regressors. Always lower than R-squared.

4. The OLS Assumptions (Gauss-Markov)

To ensure OLS is the Best Linear Unbiased Estimator (BLUE):

Linearity in Parameters: The model is a linear combination of betas.
Random Sampling: (X_i, Y_i) are i.i.d.
No Perfect Multicollinearity: Regressors aren't perfectly redundant.
Zero Conditional Mean (Orthogonality): E[u|X] = 0. This is the most critical assumption!
Homoskedasticity: Var(u|X) = sigma^2 (constant). If violated, we have Heteroskedasticity and need "Robust Standard Errors."

5. Statistical Inference

Hypothesis Testing Recipe

Null Hypothesis (H0): Usually beta_1 = 0 (no effect).
Alternative (H1): beta_1 is not 0.
Calculate t-statistic:
t = (beta_hat - hyper_null) / SE(beta_hat)
Decision Rule: If |t| > 1.96, reject H0 at 5% significance level.

p-values and Confidence Intervals

p-value: The probability of seeing a result as extreme as ours if H0 is true. Small p-value (< 0.05) = Reject H0.
95% Confidence Interval: [ beta_hat minus 1.96*SE, beta_hat plus 1.96*SE ].

The F-Statistic

Used to test multiple restrictions simultaneously (e.g., test if beta_1 = 0 AND beta_2 = 0).

6. Large Sample Theory (Asymptotics)

Law of Large Numbers (LLN)

Sample averages converge to population averages as n becomes large.

Y_bar ->p mu_Y

Assures that OLS is Consistent (beta_hat ->p beta).

Central Limit Theorem (CLT)

The CLT is the reason we can perform hypothesis tests (like t-tests) without knowing the underlying distribution of the errors, provided the sample size n is large.

1. The Formal Definition

As the sample size n increases, the distribution of the standardized sample mean converges to a Standard Normal distribution N(0, 1).

sqrt(n) * ( (Y_bar - mu) / sigma ) →d N(0, 1)

2. The "Root-n" Logic (√n)

Why do we multiply by √n? As n gets larger:

The variance of the sample mean, Var(Y_bar) = sigma^2 / n, shrinks toward zero.
By multiplying by √n, we "inflate" the shrinking variance back to a constant level (sigma^2), preventing the distribution from collapsing into a single point.

3. Application to OLS (The Derivation)

To see why OLS estimators are Asymptotically Normal, we decompose the slope estimator:

beta_1_hat - beta_1 = [ Sum( (X_i - X_bar) * u_i ) ] / [ Sum( (X_i - X_bar)^2 ) ]

Step-by-Step Logic:

Numerator: The term Sum( (X_i - X_bar) * u_i ) is a sum of independent random variables. By the CLT, this sum (when scaled by 1/√n) becomes Normally distributed.
Denominator: By the Law of Large Numbers (LLN), the average (1/n) * Sum( (X_i - X_bar)^2 ) converges to the population variance Var(X).
Conclusion: Since the numerator is Normal and the denominator is a constant (in large samples), their ratio is also Normal.

Bottom Line: In small samples, β_hat is only normal if the errors u are normal. In large samples, β_hat is normal even if the errors are not normal. This allows us to use the 1.96 critical value for almost any large-sample regression.

7. Nonlinear Specifications

Logarithms (Percentages)

Log-Log Model: log(Y) = beta_0 + beta_1*log(X) + u. beta_1 is the Elasticity (1% change in X -> beta_1% change in Y).
Log-Level Model: log(Y) = beta_0 + beta_1*X + u. (1 unit change in X -> 100*beta_1% change in Y).
Level-Log Model: Y = beta_0 + beta_1*log(X) + u. (1% change in X -> 0.01*beta_1 unit change in Y).

Polynomials (Curvatures)

Model: Y = beta_0 + beta_1*X + beta_2*X^2 + u.

To find the effect of X on Y, you must take the derivative: dY/dX = beta_1 + 2*beta_2*X.

Interactions

Model: Y = beta_0 + beta_1*X + beta_2*D + beta_3*(X*D) + u.

The effect of X on Y now depends on whether D is 0 or 1.

8. Internal Validity & Sources of Bias

Internal validity asks: "Is our estimate of the causal effect correct for our sample?"

1. Omitted Variable Bias (OVB)

Occurs if a factor (Z) that influences Y is correlated with X and left out of the regression.

Direction of Bias: Bias = beta_Z * Corr(X, Z). Positive Bias means OLS overestimates; Negative Bias means it underestimates.

2. Measurement Error

Error in Y: Just adds noise (higher SEs).
Error in X: Causes Attenuation Bias (drags the estimate toward zero).

3. Simultaneous Causality (Reverse Causality)

Occurs if Y also affects X (e.g., price and quantity in a market).

4. Sample Selection Bias

Occurs if data is missing non-randomly (e.g., only surveyed people who survived a surgery).

9. Advanced Models

Binary Dependent Variables

When Y is 0 or 1 (e.g., yes/no, success/failure).

Linear Probability Model (LPM): OLS on a binary Y. beta_1 measures change in probability. Problem: Can predict probabilities < 0 or > 1.
Probit & Logit: Nonlinear models that keep predictions between 0 and 1 using the Cumulative Normal or Logistic curve. Interpret via Marginal Effects.

Instrumental Variables (IV)

Used to solve Endogeneity (when E[u|X] is not 0). Requires an Instrument (Z):

Relevance: Z must be correlated with X (Corr(Z, X) != 0).
Exogeneity: Z must affect Y only through its effect on X (Corr(Z, u) = 0).

beta_1_IV = Cov(Z, Y) / Cov(Z, X)

Panel Data (Fixed Effects)

Data tracking the same entities over multiple time periods.

Fixed Effects (FE): Controls for all unobserved factors that are constant for each entity (e.g., cultural traits of a country).

10. Time Series Analysis

Autocorrelation

When a variable is correlated with its own past values. Measured by Lagged variables (Y subscript t minus 1).

Stationarity

A series is stationary if its mean and variance do not change over time. Non-stationary series (like stock prices) often have Unit Roots or Stochastic Trends.

AR(p) and ADL(p, q) Models

AR(1): Y_t = phi_0 + phi_1*Y_{t-1} + u_t.
ADL(1, 1): Autoregressive Distributed Lag. Uses lags of Y AND lags of another variable X.

Cointegration

If two non-stationary series move together in the long run, they are "Cointegrated."