\documentclass[10pt]{article}
\usepackage{amsmath,graphicx,amssymb}
\usepackage{epstopdf}
\input calcmacs.tex
%\parindent=0pt
\def\disp#1{
\begin{center}
{\large\bf #1}\\[1ex]
\end{center}}
\def\prob#1.{\hspace*{\fill} \\[1ex]{\bf Problem #1. }}
\def\la{\langle}
\def\ra{\rangle}
\def\dydx{\frac{dy}{dx}}
\def\aa{\mathbf{a}}
\def\bb{\mathbf{b}}
\def\cc{\mathbf{c}}
\def\ii{\mathbf{i}}
\def\jj{\mathbf{j}}
\def\kk{\mathbf{k}}
\def\ll{\mathbf{l}}
\def\rr{\mathbf{r}}
\def\uu{\mathbf{u}}
\def\vv{\mathbf{v}}
\def\ww{\mathbf{w}}
\def\TT{\mathbf{T}}
\def\comp{\mathrm{comp}}
\def\proj{\mathrm{proj}}
\def\del{\partial}
\def\delx{\frac{\partial }{\partial x}}
\def\dely{\frac{\partial }{\partial y}}
\def\delzx{\frac{\partial z}{\partial x}}
\def\delzy{\frac{\partial z}{\partial y}}
\def\delyx{\frac{\partial y}{\partial x}}
\def\delux{\frac{\partial u}{\partial x}}
\def\deluy{\frac{\partial u}{\partial y}}
\def\delut{\frac{\partial u}{\partial t}}
\def\delvx{\frac{\partial v}{\partial x}}
\def\delvy{\frac{\partial v}{\partial y}}
\def\delvt{\frac{\partial v}{\partial t}}
\def\<{\langle}
\def\>{\rangle}
\def\colvec#1{\left[\begin{array}{c} #1 \end{array}\right]}
\textwidth 6.5 in \hoffset -.75in
\addtolength{\textheight}{.5in}
% template for putting graphics next to text
%\prob12. Beginning text \\[-1ex]
%\begin{tabular}{p{4in}@{\hspace{.4in}}c}
%\vspace*{-1.4in}
%some more text in a side paragraph&
%\includegraphics[scale=.5]{figure.eps}\\
%\end{tabular}
%
\begin{document}
\begin{center}
{\large\bf Least Squares and Curve Fitting}
\end{center}
These notes cover a portion of the material in Chapter 5.4 of our text; my hope is that a somewhat less thorough treatment may be easier to grasp. You may wish to fill in some of the details by reading section 5.4.
The problem we will use for motivation is to find the straight line that lies closest to a set of data points in the plane. We will see that this leads directly to a question about solving linear equations. To set this up, consider a set of points $(a_1,b_1), \cdots (a_n,b_n)$ in the plane. Maybe these are measurements from an experiment; for instance we might be measuring the movement of some particle, and the first coordinate is time, while the second coordinate is position. We observe in the figures below, that the data seem to lie near a line, but not all on a line. We'd like to find an equation of a line that approximates the data set, as drawn on the left. To keep calculations to a minimum, we'll work with the smaller data set consisting of the points $(0,1),(2,4),(4,5), (6,7)$.
\begin{center}
\includegraphics[scale=.8]{ls1}
\end{center}
First, let's observe why finding the straight line has anything to do with solving linear equations. We are looking for a line of the form $y = c_0 + c_1 t$ that goes through all of the points. For each point $(a_i,b_i)$, we must have $b_i=c_0 + c_1a_i$, which is a linear equation for the {\sl unknowns} $c_0,c_1$. So if there were a line going through all four points on the right-hand graph, the coefficients $c_0,c_1$ would satisfy the system
$$
\begin{array}{ll}
c_0 + 0 c_1 &= 1\\
c_0 +2 c_1&=4\\
c_0 +4 c_1&=5\\
c_0 +6 c_1&=7\\
\end{array}
\quad \mathrm{or,}\ \mathrm{in}\ \mathrm{matrix}\ \mathrm{ form}\ A\colvec{c_0\\c_1} = \colvec{1\\4\\5\\7}
\quad \mathrm{where} \ A = \begin{bmatrix}1&0\\1&2\\1&4\\1&6\end{bmatrix}$$
A quick reduction of $A$ to rref shows that there is no solution to this system, which is pretty clear if you squint at the picture--the points just don't lie on a line. In other terms, we can say that the issue is that the vector $\bb =\colvec{1\\4\\5\\7}$ is not in the image of $A$.
This motivates the more general problem: given an $m \times n $ matrix $A$ (giving a linear transformation $\RR^n \to \RR^m$) and a vector $\bb \in \RR^m$, find the vector $\xx^* \in \RR^n$ that is {\sl closest to being a solution} to the equation $A \xx = \bb$. What do we mean by `closest to being a solution'? If $\xx$ is actually a solution (ie $A\xx =\bb$) then that would certainly qualify as closest. But, as in the curve fitting problem, there may not be an actual solution. The answer is provided by the following picture, in which we have written $V = \mathrm{Image}(A)$.
\begin{center}
\includegraphics[scale=.8]{ls2}
\end{center}
We are looking for the closest vector in $V$ to $\bb$; the closest vector is given by the projection of $\bb$ into $V$. This looks pretty clear in the picture; the book suggests a geometric proof on pages 213-214. Since $\proj_V(\bb)$ lies in $V = \mathrm{Image}(A)$, we can write is as $A$ times some vector. That vector is $\xx^*$. So we have, in principle, achieved our goal: we find our best approximation to a solution to $A\xx = \bb$ by solving (for $\xx^*$) the equation $A\xx^* = \proj_V(\bb)$. We know that there is a solution, because the right hand side is by definition an element of $V$.
We can develop this a little further, and there are two reasons for doing so. One is that projecting into $V$ is kind of a pain in the neck, since we'd have to find an orthogonal basis for $V$ in order to use our formula. Also, it would be nice to have some kind of formula for $\xx^*$. We'll take care of both of these at the same time. (Obligatory disclaimer: in practice, especially with large systems, it's often better to just find the ON basis for $V$ and then solve for $\xx^*$ by Gaussian elimination. But fortunately we are not worrying about such nasty details!)
The main fact that leads us to a new solution is the one proved in Monday's class: a vector $\ww $ is perpendicular to $V = \mathrm{Image}(A)$ if and only if $A^T\ww = \vec0$. How does this help? Well, to say that $A\xx^* = \proj_V(\bb)$ is equivalent to saying that $A\xx^*- \bb$ is perpendicular to $V$, which is (by what we just observed) equivalent to $A^T(A\xx^* -\bb) = \vec0$. The conclusion of all this is that $\xx^*$ is closest to a solution of $A\xx =\bb$ if and only if it is a solution to the equation
$$
A^TA\xx^* = A^T\bb
$$
Usually, we just drop the $*$ and just solve this equation for some variable called $\xx$. The equation is called the {\sl normal equation} of $A\xx = \bb$.
Let's carry this out for the example we started with. We have $ A = \begin{bmatrix}1&0\\1&2\\1&4\\1&6\end{bmatrix}$, which gives $A^T = \begin{bmatrix}1&1&1&1\\0&2&4&6\end{bmatrix}$ and
$A^TA = \begin{bmatrix}4&12\\12&56\end{bmatrix}$. We also need
$A^T\bb = \begin{bmatrix}1&1&1&1\\0&2&4&6\end{bmatrix}\colvec{1\\4\\5\\7}
= \colvec{17\\70}$.
After some mildly messy arithmetic, we solve the system $ \begin{bmatrix}4&12\\12&56\end{bmatrix}\xx =\colvec{17\\70} $ to get $\xx = \colvec{1.4\\.95}$. Going back to the original problem, the line that best fits the data is given by $y = 1.4 + .95 t$. If there were more data points, then the problem is pretty much the same, except that the matrix $A$, which was $4 \times 2$ in this example, will be $k \times 2$ if there are $k$ data points. If you like formulas, you can find the general solution to the normal equation at the end of chapter 5.4.
There are some other variations on this theme. Suppose, for instance that a plot of our data looks like a parabola, suggesting that it might fit a quadratic equation. Thus, we would look for a function of the form $y = c_0 + c_1 t + c_2t^2$, and so $\xx = \colvec{c_0\\c_1\\c_2}$ has $3$ variables. For example, the four data points below look like they might fit a parabola.
\begin{center}
\includegraphics[scale=.8]{ls3}
\end{center}
Plugging in the given values gives us equations $ 6 = c_0 - 3c_1 +9c_2$, $4= c_0 -2c_1 +4c_2$, etc. Thus we look for a good approximation to solutions to $A\xx = \bb$ where $A = \bmat{1&-3&9\\ 1&-2&4\\1&1&1\\ 1&4&16} $ and $\bb = \colvec{6\\4\\-1\\3}$. As above, we can't solve the equation directly, so instead we solve the normal equation $A^TA \xx = A^T\bb$. This is $\bmat{4&0&30\\0&30&30\\30&30&354}\xx = \colvec{12\\-15\\117}$. After some work (I confess, I used a computer), I get $c_0= -\frac{2}{11}$, $c_1= -\frac{61}{66}$, and $c_2= \frac{14}{33}$. The corresponding parabola is drawn below (again, done on the computer).\begin{center}
\includegraphics[scale=.7]{parabola}
\end{center}
Homework problem 32 in this section leads to a more reasonable normal equation, where you can do the work by hand.
\end{document}