Introducing linear algebra to regression analysis

This post explains how matrix algebra can be used to significantly simplify the formula needed for the calculation of the betas (slope and intercept coefficients) in linear regression. In general, this can be useful for anyone who is studying basic econometrics with or without matrix algebra and would like to see how matrices can be used to simplify calculations in this discipline. Or for someone who just needs some clarifications on this topic.

I suppose that the reader is familiar with basic linear regression, most importantly knows what the normal equations are (as this will be the starting point of the post). It is also imperative that the reader know some very basic matrix algebra. The most important is simply to know what a matrix and vector is, to have a vague idea of how matrix multiplication works, and what certain terms such as inverse or transpose mean. You don’t necessarily need to know how to carry out these operations, just know what they are.

By the way, I think this post is a shining example of why linear algebra is awesome and how much it can simplify otherwise tedious calculations. So if you have doubts about the usefulness of linear algebra, please do read on. Without any further due, let’s begin the post.

Recall that the method of ordinary least squares regression is based on minimizing the sum of squares of the error terms. After carrying out this exercise one gets what are referred to as the normal equations. These are k equations where k is the number of coefficients in your model. For this example I will set k = 3, but the same method applies in general to any k. The normal equations you get in this case are:

$\sum{Y}_i = n \hat{\beta}_1 + \hat{\beta}_2 \sum{X}_{2i} + \hat{\beta}_3 \sum X_{3i},$

$\sum Y_i X_{2i} = \hat{\beta}_1 \sum X_{2i} + \hat{\beta}_2 \sum X^2_{2i} + \hat{\beta}_3 \sum X_{2i} X_{3i},$

$\sum Y_i X_{3i} = \hat{\beta}_1 \sum X_{3i} + \hat{\beta}_2 \sum X_{2i} X_{3i} + \hat{\beta}_3 \sum X^2_{3i}.$

The goal is of course to solve these equations for the betas. Now one might try to solve the first equation for beta 1 and then substitute it in the second equation, solve that for beta 2 and then substitute these in the third equation, solve that for beta 3 and so on. However, the way it’s usually done without matrix algebra is by dividing both sides of the first normal equation by n, this yields:

$\bar{Y} = \hat{\beta}_1 + \hat{\beta}_2 \bar{X}_2 + \hat{\beta}_3 \bar{X}_3 \implies \hat{\beta}_1 = \bar{Y} - \hat{\beta}_2 \bar{X}_2 - \hat{\beta}_3 \bar{X}_3,$

where the variables with a bar above them denote sample means. After this, recall that a variable expressed in deviation form means that you substract the mean of the variable from each occurence of the variable as such

$x_i = X_i - \bar{X},$

where the lower case x is the deviation form and the upper case X is the observation. Now you can clearly see that the above equation implies

$X_i = x_i + \bar{X}.$

Using this expression for each X variable (i.e. for X_2, X_3, etc.) and for the Y variable, one can substitute these values for all the X’s and Y’s in the remaining normal equations. You can also substitute for beta 1 as shown above, so you’d have two equations in the following form:

$\sum (y_i + \bar{Y}) = (\bar{Y} - \hat{\beta}_2 \bar{X}_2 - \hat{\beta}_3 \bar{X}_3) \sum (x_{2i} + \bar{X}_2) + \hat{\beta}_2 \sum (x_{2i} + \bar{X}_2)^2 + \hat{\beta}_3 \sum (x_{2i} + \bar{X}_2)(x_{3i} + \bar{X}_3),$

$\sum (y_i + \bar{Y}) = (\bar{Y} - \hat{\beta}_2 \bar{X}_2 - \hat{\beta}_3 \bar{X}_3) \sum (x_{3i} + \bar{X}_3) + \hat{\beta}_2 \sum (x_{2i} + \bar{X}_2)(x_{3i} + \bar{X}_3) + \hat{\beta}_3 \sum (x_{3i} + \bar{X}_3)^2.$

Now these are 2 equations in two unknowns (or k-1 equations in k-1 unknowns in general), which are technically speaking not hard to solve. You can go through this algebraic mess and solve one of these equations for beta 2, then plug in the solution to the other one and solve for beta 3. Then using that solution for beta 3, solve for beta 2. Beta 1 was already calculated above in terms of beta 2 and beta 3 (and the sample means), none of which are unknowns now. So this completes our calculation of the betas. The solution you get to with this method is that for instance beta 2 takes this form:

$\hat{\beta}_2 = \frac{(\sum y_i x_{2i})(\sum x^2_{3i}) - (\sum y_i x_{3i})(\sum x_{2i} x_{3i})}{(\sum x^2_{2i})(\sum x^2_{3i}) - (\sum x_{2i} x_{3i})^2}.$

Similar expressions exist for beta 3 and any other beta’s there might be in your model. Now, you are not wrong to think that these are quite ugly. So let me show you how matrix algebra can be used to simplify all the mess that we’ve been through into one simple equation.

First, let us write the normal equations in matrix notation:

$\left(\begin{array}{ccc}n&\sum X_{2i}&\sum X_{3i} \\ \sum X_{2i}&\sum X^2_{2i}&\sum X_{2i} X_{3i} \\ \sum X_{3i}&\sum X_{2i} X_{3i}&\sum X^2_{3i} \end{array}\right) \left(\begin{array}{c}\hat{\beta}_1 \\ \hat{\beta}_2 \\ \hat{\beta}_3 \end{array}\right) = \left(\begin{array}{c}\sum Y_i \\ \sum Y_i X_{2i} \\ \sum Y_i X_{3i} \end{array}\right).$

You can already see that using Cramer’s rule for instance, you could easily solve for any of the betas. But it is still a little bit messy as you need to construct those matrices with a lot of complicated sums in them. Luckily for us, it turns out that these two matrices occur naturally in regression analysis. They can be easily expressed in terms of two other matrices: one containing all the X values and one containing all the Y values. To see this, let us construct two very simple and intuitive matrices:

$X = \left(\begin{array}{ccc}1&X_{2,1}&X_{3,1} \\ 1&X_{2,2}&X_{3,2} \\ ...&...&... \\ 1&X_{2,n}&X_{3,n} \end{array}\right), y = \left(\begin{array}{c}Y_1 \\ Y_2 \\ ... \\ Y_n \end{array}\right).$

The X matrix is merely all the X observations. The second column is all the observations of the first independent variable, the third column is all the observations of the second independent variable. The first column is full of 1’s because it represents the intercept term. Obviously you can think of the intercept term as being another independent variable (which is actually a constant) whose observations are all 1’s. Similarly, the y vector represents all the observations of the dependent variable.

Using these definitions, you can verify by using a software like R or Excel or even by calculating by hand that

$X'X = \left(\begin{array}{ccc}n&\sum X_{2i}&\sum X_{3i} \\ \sum X_{2i}&\sum X^2_{2i}&\sum X_{2i} X_{3i} \\ \sum X_{3i}&\sum X_{2i} X_{3i}&\sum X^2_{3i} \end{array}\right).$

Similarly,

$X'y = \left(\begin{array}{c}\sum Y_i \\ \sum Y_i X_{2i} \\ \sum Y_i X_{3i} \end{array}\right).$

This is remarkable because these results are exactly the matrices involved in the normal equations. Therefore, after letting the vector b denote the vector of betas, we can rewrite the normal equations in matrix notation as

$(X'X)b = X'y.$

Recall now that multiplying a matrix by its inverse just yields the identity matrix (denoted by I), which in matrix algebra fulfills the role 1 fulfills in ordinary multiplication. That is just as multiplying a number p by 1 will just equal p, multiplying a matrix q by the identity matrix will just equal the matrix q. This fact can be used to solve the above equation for b as follows:

$(X'X)^{-1}(X'X)b = (X'X)^{-1}X'y$ (multiply both sides by the inverse of X’X),

$Ib = (X'X)^{-1}X'y$ (X’X times its inverse will just equal the identity matrix I),

$b = (X'X)^{-1}X'y$ (Ib is just b as explained above).

And this is it. Instead of the long formula with all kinds of deviation terms, you have one short formula, which has only well-known and well-defined elements: the X matrix and y vector. Nothing else enters this formula. I think this is a great example of how linear algebra can greatly simplify things in mathematics. I hope you do too now.

Advertisements