How to Make and Interpret Residual Plots

Video Lesson: Residual Plots

What is a Residual?

A residual is the difference between the observed value and the value predicted by the model at a given data point. A positive residual means that the observed value is above the trendline and a negative residual means it is below the trendline. The larger the residual, the further the point is from the trendline.

In the example below, we see a scatter plot showing 5 data points and its corresponding residual plot.

The green line on the scatter plot is the linear regression line of best fit.

This line on the scatter plot can correspond to the x-axis of the residual plot (also shown in green).

  • Any points plotted above the regression line on the scatter plot are above the x-axis of the residual plot. These positive residuals are shown above the axis with a red arrow.
  • Any points plotted below the regression line on the scatter plot are below the x-axis of the residual plot. These negative residuals are shown below the axis with a blue arrow.
definition of a residual

The colour of each data point on the scatter plot shows the corresponding residual on the residual plot.

  • The first point (purple) is above the trendline by a distance of 1.5 and so, its residual is 1.5.
  • The second point (pink) is below the trendline by a distance of 2.15 and so, its residual is -2.15.
  • The third point (blue) is above the trendline by a distance of 0.7 and so, its residual is 0.7.
  • The fourth point (green) is below the trendline by a distance of 0.95 and so, its residual is -0.95.
  • The fifth point (orange) is above the trendline by a distance of 0.9 and so, its residual is 0.9.
residuals example

The trendline is a prediction of the y-value at each position. That is the predicted value of y (y-predicted) is found by the height (y-coordinate) of this line at each value of 𝑥.

Since the residual is the difference between the actual point and the trendline, we can say that the formula for calculating each residual is:

R e s i d u a l equals y open paren a c t u a l close paren minus y open paren p r e d i c t e d close paren

Formula for calculating a residual
formula for calculating residuals

The y (predicted) is calculated in the table above for each value of 𝑥 by using the least squares regression line equation.

  • When 𝑥 = 1, the y (predicted) is found using y = 2.15 × 1 + 2.35. We obtain y (predicted) = 4.5
  • When 𝑥 = 2, the y (predicted) is found using y = 2.15 × 2 + 2.35. We obtain y (predicted) = 6.65
  • When 𝑥 = 3, the y (predicted) is found using y = 2.15 × 3 + 2.35. We obtain y (predicted) = 8.8
  • When 𝑥 = 4, the y (predicted) is found using y = 2.15 × 4 + 2.35. We obtain y (predicted) = 10.95
  • When 𝑥 = 5, the y (predicted) is found using y = 2.15 × 5 + 2.35. We obtain y (predicted) = 13.1

How to Calculate a Residual

A residual is calculated for each data point using the formula: residual = (actual y value) – (predicted y value). The actual y value is the y value as seen in the data whist the predicted y value is the value obtained from the regression line.

To calculate a residual, we use the formula:

R e s i d u a l equals y of open paren a c t u a l close paren minus y open paren p r e d i c t e d close paren

Formula for calculating a residual
  • The y (actual) is the value of y (the dependent/response variable) that is given in the table of data. It is also known as the observed value of y.
  • The y (predicted) is the value of y that is obtained by substituting the given value of 𝑥 into the regression equation. It is also the height of the regression line at that particular value of 𝑥.
formula for calculating residuals
The steps to calculate a residual are:

  1. Calculate the equation of the regression line.
  2. Substitute each value of 𝑥 into the regression line equation to find each y (predicted).
  3. Calculate the residuals using the formula: Residuals = y (actual) – y (predicted).

For example, calculate the residuals of the least squares regression line for the data given by:

𝑥12345
y64.59.51014

Step 1. Calculate the equation of the regression line:

The equation of the regression line can be found using an online calculator or using a graphical calculator.

Simply enter the 𝑥 values as List 1 and the y values as List 2.

In this example, we are finding the residual plot for a the least squares regression line.

For this set of data, the equation of the least squares regression line is found to be: y equals 2.15 x plus 2.35.

how to calculate residuals

Step 2. Substitute each value of 𝑥 into the regression line equation to find each y (predicted)

The least squares regression equation is y equals 2.15 x plus 2.35.

The equation of this model calculates predicted values of y which are found by substituting each given value of 𝑥 into the equation.

  • When 𝑥 = 1, the y (predicted) is found using y = 2.15 × 1 + 2.35. We obtain y (predicted) = 4.5
  • When 𝑥 = 2, the y (predicted) is found using y = 2.15 × 2 + 2.35. We obtain y (predicted) = 6.65
  • When 𝑥 = 3, the y (predicted) is found using y = 2.15 × 3 + 2.35. We obtain y (predicted) = 8.8
  • When 𝑥 = 4, the y (predicted) is found using y = 2.15 × 4 + 2.35. We obtain y (predicted) = 10.95
  • When 𝑥 = 5, the y (predicted) is found using y = 2.15 × 5 + 2.35. We obtain y (predicted) = 13.1

Step 3. Calculate the residuals using the formula: Residuals = y (actual) – y (predicted)

For each column in the table, we subtract the y (predicted) from the y(actual) to obtain the residual for that given value of 𝑥.

  • For 𝑥 = 1, the residual = 6 – 4.5 = 1.5
  • For 𝑥 = 2, the residual = 4.5 – 6.65 = -2.15
  • For 𝑥 = 3, the residual = 9.5 – 8.8 = 1.5
  • For 𝑥 = 4, the residual = 10 – 10.95 = -0.95
  • For 𝑥 = 5, the residual = 14 – 13.1 = 0.9
formula for calculating residuals

How to Make a Residual Plot

A residual plot is a graph in which the residuals are plotted on the y-axis and the 𝑥 values (of the independent/explanatory variable) are displayed on the 𝑥-axis.

To make a residual plot:

  1. Calculate the residuals for each data point using residual = actual y – predicted y.
  2. Plot the residual on the y-axis for each given value of x.

For the bivariate data given by:

𝑥12345
y64.59.51014

We can calculate the residuals to be:

𝑥12345
Residuals1.5-2.150.7-0.950.9

The process for calculating these residuals is shown in the ‘How to Calculate Residuals’ section above and is summarised in the images below.

A residual plot displays the 𝑥 values on the 𝑥-axis, with the residuals plotted on the y-axis.

Residual Plot

Here is the full process of calculating the residuals from the list of data and then plotting the residual plot.

how to make a residual plot

Here is another example of how to calculate the residuals for a set of data and then display the results in a residual plot.

We will calculate the residuals for the least squares regression line (linear regression) for the following set of data:

𝑥357911
y1011131625

The least squares regression equation is found to be y equals 1.75 x plus 2.75.

We substitute each value of 𝑥 into this equation to obtain the predicted y values:

  • When 𝑥 = 3, the y (predicted) is found using y = 1.75 × 3 + 2.75. We obtain y (predicted) = 8
  • When 𝑥 = 5, the y (predicted) is found using y = 1.75 × 5 + 2.75. We obtain y (predicted) = 11.5
  • When 𝑥 = 7, the y (predicted) is found using y = 1.75 × 7 + 2.75. We obtain y (predicted) = 15
  • When 𝑥 = 9, the y (predicted) is found using y = 1.75 × 9 + 2.75. We obtain y (predicted) = 18.5
  • When 𝑥 = 11, the y (predicted) is found using y = 1.75 × 11 + 2.75. We obtain y (predicted) = 22
example of calculating residuals step 1

The residuals are calculated for each 𝑥 value using the formula: Residual = y (actual) – y (predicted).

  • For 𝑥 = 3, the residual = 10 – 8 = 2
  • For 𝑥 = 5, the residual = 11 – 11.5 = -0.5
  • For 𝑥 = 7, the residual = 13 – 15 = -2
  • For 𝑥 = 9, the residual = 16 – 18.5 = -2.5
  • For 𝑥 = 11, the residual = 25 – 22 = 3

These residuals are plotted on the y-axis against each 𝑥 value on the 𝑥-axis to form the residual plot.

how to make a residual plot by hand in steps

How to Make a Residual Plot on a Calculator

The easiest way to produce a residual plot from two lists of data is to use a graphical calculator.

The steps below show how to produce a residual plot using two common graphical calculators, the Ti84 Plus and the Casio

How to Make a Residual Plot on a Ti-84 Calculator

To make a residual plot on a Ti84 Calculator:

  1. Press ‘STAT’ to bring up the statistics menu and press ‘1’ for edit.
  2. Enter the two lists of data as List 1 and List 2.
  3. Perform regression (linear or exponential) with the data.
  4. Press ‘2nd’, ‘y=’ (stat plot), ‘1’ (plot 1) and then set the YList as RESID.
  5. Press ‘Zoom’ ‘9’ (Zoom Stat) to see the residual plot.
  6. Press ‘TRACE’ to read the different residual coordinates using the arrows.

For example, use the Ti84 calculator to make a residual plot for the data:

𝑥246810
y4553607176

Step 1. Press ‘STAT’ to bring up the statistics menu and press ‘1’ for edit

residuals in ti84 step 1

Step 2. Enter the two lists of data as List 1 and List 2

We have the data:

𝑥246810
y4553607176

We enter the 𝑥 values as List 1 and the y values as List 2.

residuals in ti84 step 2

Step 3. Perform regression (linear or exponential) with the data

how to perform linear regression on ti84

Step 4. Press ‘2nd’, ‘y=’ (stat plot), ‘1’ (plot 1) and then set the YList as RESID

By pressing ‘2nd’, ‘y=’, the ‘STAT PLOT’ screen is shown.

From this screen, press ‘1’ to select the Plot 1.

The following screen must be set up as shown in the image below.

The YList must be set to RESID. In order to select the RESID option, press ‘2nd’, ‘STAT’ and then select RESID from the choices shown (option 7).

residuals in ti84 step 4

Step 5. Press ‘Zoom’ ‘9’ (Zoom Stat) to see the residual plot

To read each residual coordinate, press the ‘TRACE’ button.

Use the arrow keys to move between each residual.

residuals in ti84 step 5

How to Make a Residual Plot on a Casio Calculator

To make a residual plot on a Casio calculator:

  1. Press ‘menu’ and then select the statistics option by pressing ‘2’.
  2. Enter the two lists of data as List 1 and List 2.
  3. Press ‘shift’, ‘menu’ (set up) and set the Resid List as ‘List3’. Press ‘exit’.
  4. Perform regression (linear or exponential) with the data.
  5. ‘Press ‘exit’ four times, press ‘F1’ (GRAPH), ‘F6’ (SET) and set the XLIST as ‘List1’ and the YLIST as ‘List3’.
  6. Press ‘exit’ then ‘F1’ (graph) to see the residual plot.

For example, create a residual plot using a Casio graphical calculator for the following data:

𝑥123456
y817658534840

Step 1. Press ‘menu’ and then select the statistics option by pressing ‘2’

Step 2. Enter the two lists of data as List 1 and List 2

The 𝑥 (independent variable) values are written as List 1 and the y (dependent variable) values are written as List 2.

how to make a residual plot on a casio calculator step 1

Step 3. Press ‘shift’, ‘menu’ (set up) and set the Resid List as ‘List3’. Press ‘exit’

The residuals will be stored in List 3 once they are calculated.

To change the list to ‘List 3’ use the arrows to go down to Resid List and then press ‘F2’ to bring up a box in which the number ‘3’ should be typed’. Then press ‘exe’.

how to make a residual plot on a casio calculator step 2

Step 4. Perform regression (linear or exponential) with the data

From the set-up menu in the previous step, simply press ‘exit’ to return to the main statistics menu where the two lists of data can be seen.

Alternatively, press ‘menu’, ‘2’ to return to this page.

Complete linear regression for the two data sets by pressing:

  • F2 for CALC
  • F3 for REG
  • F1 for X
  • F1 for ax+b
How to do linear regression on a casio fxcg50 calculator

The values of a and b calculated on this screen tell us that linear regression have been calculated.

In this example, a=-8.4 and b=88.7, therefore the linear regression equation is y = -8.4𝑥 + 88.7.

Step 5. ‘Press ‘exit’ four times, press ‘F1’ (GRAPH), ‘F6’ (SET) and set the XLIST as ‘List1’ and the YLIST as ‘List3’

Pressing ‘exit’ four times will take us back to the main statistics screen. Alternatively, we can press ‘menu’, ‘2’.

This time we see three lists of data.

The first two lists of data are our original two lists that we entered.

The third list of data in List 3 are the residuals that have now been calculated.

If you did not require a residual plot and simply wanted to calculate the numerical values of each residual, you can read these values off from List 3 now.

To create the residual plot, press:

  • F1 for GRAPH
  • F6 for SET

Then change the YList to List3 by going down to YList and pressing F2 to bring up the list menu. Enter a ‘3’ here and press ‘exe’.

Step 6. Press ‘exit’ then ‘F1’ (graph) to see the residual plot

Press ‘exit’ to return to the main statistics menu showing the three lists and then press ‘F1’ (Graph) and then ‘F1’ (GRAPH 1) to graph the residual plot.

how to make a residual plot on a casio fxcg50 calculator

Press ‘SHIFT’ ‘F1’ (TRACE) to show the coordinates of each residual and use the arrow keys to move from one to the other.

What is a Residual Plot?

A residual plot is a graph of the residuals against the given x values. Positive residuals indicate points that are greater than the prediction of the model and negative residuals indicate points that are below the prediction of the model. Residual plots are used to identify trends in data and to ascertain if a model is an appropriate fit for the data.

what is a residual plot

A residual plot displays the residuals on the vertical axis and the 𝑥 values on the horizontal axis.

Residual plots are used to better observe trends in data. They are useful for identifying if the regression equation used is an appropriate fit for the data or not.

Outliers can be identified from residual plots by noting which points have the largest residual values.

How to Interpret Residual Plots

If the residuals are small, randomly scattered and have a similar number of points above and below the x-axis, then the model used is likely to be a good fit for the data. If the residuals are large or show a clear pattern, then the model is likely not a good fit. If there is a clear curved pattern in the residuals, a linear model is not appropriate.

Residual plots are used to indicate trends in data which may not be as obvious on a scatter plot.

Residual plots magnify the differences seen between the position of each point and the trendline and so, patterns in the data can be more evident.

Therefore, residual plots are used to decide if the regression line used is an appropriate fit for the data or not.

A model is a good fit if the residual plot shows no clear pattern

The regression line is an appropriate model for the data if:

  • The residuals are small
  • The residuals are randomly scattered with no clear pattern
  • There are a similar number of points above and below the axis
how to interpret a residual plot good fit

The example residual plot above indicates that the model used is an appropriate fit for the data. This is because there is no clear pattern to the residuals and they are not too far from the axis.

We can see in the scatter plot that the points lie close to the trendline.

A model is not a good fit for the data if the residual plot shows a clear pattern

The regression line is not appropriate if:

  • The residuals are large
  • There is a clear curved pattern in the residuals
  • Most of the points lie above or below the axis
Interpreting a residual plot bad fit

At first sight, the linear trendline on the scatterplot may seem appropriate as it passes through the middle of the datapoints.

However, the residual plot shows the curved pattern to the data more clearly.

We can see the curved pattern to the data on the scatter plot itself but it may not be as obvious as on the residual plot.

We can also see that some of the residuals are a bit larger and this means that these points are further from the trendline and do not fit the data.

Since the model used is a linear trendline, any curved pattern to the data is deemed inappropriate.

We see in the image below, two curved patterns that indicate that the linear model is not appropriate.

residual plot with a curved pattern

A linear model is not appropriate if there is a clear curved ‘u’ or ‘n’ shaped pattern in the residuals.

Overall, a residual plot indicates that the model used for the data is good if there is no pattern. A residual plot containing a curved pattern indicates that the model may be inappropriate and a better model can be found.

If a clear curved pattern is found on the residual plot for a linear regression trendline, an exponential regression model may be more appropriate for the data

no pattern

How to Make a Residual Plot in Excel

A residual plot can be made in excel using the following steps:

  1. Enter the data in two columns
  2. Select the data and insert a scatterplot
  3. Add a trendline and display the equation
  4. Use this equation to calculate predicted y-values using the given x-values
  5. Calculate the residuals by subtracting the predicted y-values from the given y-values
  6. Select the x-values and residuals and insert a scatterplot

For example, we will create a residual plot for the data:

𝑥12345678
y610131517212526

Step 1. Enter the two lists of data in two columns

We can enter the 𝑥 list in column A (cells A1:A8).

We can enter the y list in column B (cells B1:B8).

entering bivariate data on excel for a residual plot

Step 2. Select the data and insert a scatterplot

Highlight both columns of data

Click on insert from the top toolbar and then click on the ‘Scatter’ option from the charts list as shown.

Step 3. Add a trendline and display its equation

Right click on the scatterplot and click on the option to ‘Add Trendline’ from the list.
Choose the appropriate trendline (in this example we will use Linear) and select the box below to ‘Display Equation on Chart’.

Step 4. Use this equation to calculate predicted y-values using the given x-values

The equation displayed on the chart is the regression line.

This will be used to calculate predicted y values in this step.

The linear regression equation in this example is given by excel as y equals 2.869 x plus 3.7143.

We will represent this on Excel using the formula ‘=2.869*A1+3.7143’.

We need to start the equation with an equals sign and we need to use the asterisk symbol ‘*’ for the multiplication.

We are substituting the 𝑥 values in column A, so we replace 𝑥 in the equation with A1, which contains the first 𝑥 value.

In the next column (C) type ‘=2.869*A1+3.7143’ where A1 is the first 𝑥 value.
Hover over the bottom right of the cell C1 and click and drag down on the thick cross that appears in order to copy this equation down column C

Column C now contains the list of predicted y values.

Step 5. Calculate the residuals by subtracting the predicted y-values from the given y-values

In column E, we will calculate the residuals.

The residuals are equal to the actual y values that are given in column B subtract the predicted y values we calculated in column C.

In cell E1 we type ‘=B1-C1’ this will calculate the residual for the first data point.

Hover over the bottom right edge of cell E1 and click and drag down on the thick black cross in order to copy this calculation down the E column.

This column now contains all of the residual values.

Step 6. Select the x-values and residuals and insert a scatterplot

Select all of the 𝑥 values in column A and then select all of the residuals calculated in column E.

Hold the Ctrl button in order to select two columns like this.

Click on insert from the top toolbar and click on the scatter plot to produce the residual plot.

The produced residual plot is shown below.