You are watching: What does each point on the least squares regression line represent

## Goodness of Fit of a Straight Line to Data

Once the scatter diagram of the information has been attracted and also the model presumptions described in the previous sections at least visually showed (and maybe the correlation coreliable (r) computed to quantitatively verify the direct trend), the following step in the analysis is to uncover the directly line that finest fits the data. We will certainly define exactly how to measure how well a directly line fits a collection of points by researching exactly how well the line (y=frac12x-1) fits the data set

<eginarrayc x & 2 & 2 & 6 & 8 & 10 \ hline y &0 &1 &2 &3 &3\ endarray>

(which will certainly be provided as a running instance for the following three sections). We will write the equation of this line as (haty=frac12x-1) with an accent on the (y) to suggest that the (y)-values computed using this equation are not from the data. We will certainly do this through all lines approximating information sets. The line (haty=frac12x-1) was selected as one that appears to fit the data fairly well.

The principle for measuring the goodness of fit of a right line to information is shown in Figure (PageIndex1), in which the graph of the line (haty=frac12x-1) has been superimposed on the scatter plot for the sample data collection.

Figure (PageIndex1):**Plot of the Five-Point Data and the Line (haty=frac12x-1)**

To each allude in the data set tright here is connected an “error,” the positive or negative vertical distance from the allude to the line: positive if the allude is over the line and also negative if it is below the line. The error can be computed as the actual (y)-value of the suggest minus the (y)-value (haty) that is “predicted” by inserting the (x)-value of the information point right into the formula for the line:

< exterror at information point(x,y)=( exttrue y)−( extpredicted y)=y−haty>

The computation of the error for each of the 5 points in the data collection is shown in Table (PageIndex1).

Table (PageIndex1): The Errors in Fitting File with a Straight Line (x) (y) (haty=frac12x-1) (y-haty) ((y-haty)^2)2 | 0 | 0 | 0 | 0 | |

2 | 1 | 0 | 1 | 1 | |

6 | 2 | 2 | 0 | 0 | |

8 | 3 | 3 | 0 | 0 | |

10 | 3 | 4 | −1 | 1 | |

(sum) | - | - | - | 0 | 2 |

A initially believed for a meacertain of the goodness of fit of the line to the data would be sindicate to include the errors at every allude, yet the instance shows that this cannot work well in basic. The line does not fit the data perfectly (no line can), yet bereason of cancellation of positive and negative errors the sum of the errors (the fourth column of numbers) is zero. Instead goodness of fit is measured by the sum of the squares of the errors. Squaring eliminates the minus indicators, so no cancellation can happen. For the data and line in Figure (PageIndex1) the amount of the squared errors (the last column of numbers) is (2). This number measures the goodness of fit of the line to the data.

## The Leastern Squares Regression Line

Given any type of repertoire of pairs of numbers (except as soon as all the (x)-worths are the same) and also the matching scatter diagram, there constantly exists specifically one directly line that fits the data better than any kind of other, in the sense of minimizing the sum of the squared errors. It is dubbed the leastern squares regression line. Furthermore tright here are formulas for its slope and also (y)-intercept.

Definition: least squares regression Line

Given a arsenal of pairs ((x,y)) of numbers (in which not all the (x)-values are the same), there is a line (haty=hatβ_1x+hatβ_0) that ideal fits the information in the feeling of minimizing the sum of the squared errors. It is called the* least squares regression line*. Its slope (hatβ_1) and (y)-intercept (hatβ_0) are computed making use of the formulas

and

where

and

< SS_xy=amount xy-frac1nleft ( amount x appropriate )left ( amount y appropriate )>

(arx) is the suppose of all the (x)-values, (ary) is the suppose of all the (y)-values, and (n) is the variety of pairs in the data set.

The equation

specifying the leastern squares regression line is referred to as the leastern squares regression equation.

Remember from Section 10.3 that the line with the equation (y=eta _1x+eta _0) is dubbed the population regression line. The numbers (hateta _1) and (hateta _0) are statistics that estimate the populace parameters (eta _1) and (eta _0).

We will certainly compute the leastern squares regression line for the five-point data collection, then for a much more valuable instance that will certainly be one more running instance for the advent of brand-new principles in this and the next 3 sections.

Example (PageIndex2)

Find the leastern squares regression line for the five-point information set

<eginarrayc x & 2 & 2 & 6 & 8 & 10 \ hline y &0 &1 &2 &3 &3\ endarray>

and verify that it fits the data better than the line (haty=frac12x-1) thought about in Section 10.4.1 above.

**Solution**:

In actual exercise computation of the regression line is done using a statistical computation package. In order to clarify the interpretation of the formulas we display screen the computations in tabular develop.

(x) (y) (x^2) (xy)2 | 0 | 4 | 0 | |

2 | 1 | 4 | 2 | |

6 | 2 | 36 | 12 | |

8 | 3 | 64 | 24 | |

10 | 3 | 100 | 30 | |

(sum) | 28 | 9 | 208 | 68 |

In the last line of the table we have actually the amount of the numbers in each column. Using them we compute:

<arx=fracamount xn=frac285=5.6\ ary=fracamount yn=frac95=1.8>

so that

and

The leastern squares regression line for these information is

The computations for measuring just how well it fits the sample data are given in Table (PageIndex2). The amount of the squared errors is the sum of the numbers in the last column, which is (0.75). It is less than (2), the sum of the squared errors for the fit of the line (haty=frac12x-1) to this data set.

Table (PageIndex2)*(x) (y) (haty=0.34375x-0.125) (y-haty) ((y-haty)^2)*

**The Errors in Fitting Data via the Leastern Squares Regression Line**2 | 0 | 0.5625 | −0.5625 | 0.31640625 |

2 | 1 | 0.5625 | 0.4375 | 0.19140625 |

6 | 2 | 1.9375 | 0.0625 | 0.00390625 |

8 | 3 | 2.6250 | 0.3750 | 0.14062500 |

10 | 3 | 3.3125 | −0.3125 | 0.09765625 |

Example (PageIndex3)

Table (PageIndex3) reflects the age in years and also the retail worth in thousands of dollars of a random sample of ten automobiles of the exact same make and also design.

Construct the scatter diagram. Compute the direct correlation coreliable (r). Interpret its value in the context of the difficulty. Compute the least squares regression line. Plot it on the scatter diagram. Interpret the meaning of the slope of the least squares regression line in the context of the problem. Suppose a four-year-old automobile of this make and also version is schosen at random. Use the regression equation to predict its retail worth. Suppose a (20)-year-old vehicle of this make and also version is selected at random. Use the regression equation to predict its retail worth. Interpret the result. Comment on the validity of using the regression equation to predict the price of a brand also new car of this make and also version.Table (PageIndex3)

*(x) 2 3 3 3 4 4 5 5 5 6*

**:**Data on Period and also Value of Used Automobiles of a Specific Make and Model(y) | 28.7 | 24.8 | 26.0 | 30.5 | 23.8 | 24.6 | 23.8 | 20.4 | 21.6 | 22.1 |

**Solution**:

**Scatter Diagram for Age and Value of Used Automobiles**

We have to initially compute (SS_xx,; SS_xy,; SS_yy), which means computer (sum x,; sum y,; sum x^2,; amount y^2; extand; sum xy). Using a computing tool we acquire

Using the worths of (amount x) and (amount y) computed in component (b), <arx=fracamount xn=frac4010=4\ ary=fracsum yn=frac246.310=24.63> Thus utilizing the worths of (SS_xx) and (SS_xy) from part (b),

Figure (PageIndex3) mirrors the scatter diagram through the graph of the leastern squares regression line superenforced.

See more: Why Cant I Punch Hard In My Dreams ? Why Can'T I Punch Hard In My Dreams