Chapter 8 📈 Scatterplots and Correlation | Intro to R @ TJ (2024)

Skip to main content

8.1 Interpreting Scatterplots

An Explanatory Variable is sometimes referred to as an independent variable or a predictor variable. This variable explains the variation in the response variable.

A Response Variable is sometimes referred to as a dependent variable or an outcome variable. The value of this variable responds to changes in the explanatory variable.

Chapter 8 📈 Scatterplots and Correlation | Intro to R @ TJ (1)

The simplest graph for displaying two quantitative variables simultaneously is a scatterplot, which uses an x-axis for (traditionally) the explanatory variable, and a y-axis for (traditionally) the response variable.

For each observational pair, a dot is placed at the intersection of its two values.

When we have two quantitative variables, we often want to investigate the relationship between them– that is, whether the two variables have an association with each other.

Let’s take a closer look at the scatterplot on the right. This is a scatterplot of the percent of high school graduates in each state who took the SAT and the state’s mean SAT Math score in a recent year. We think that “percent taking” will help explain “mean score.” So “percent taking” is the explanatory variable and “mean score” is the response variable. What do we see?

Chapter 8 📈 Scatterplots and Correlation | Intro to R @ TJ (2)

  • The graph shows a clear direction: the overall pattern moves from upper left to lower right. That is, states in which higher percents of high school graduates take the SAT tend to have lower mean SAT Math scores. We call this a negative association between the two variables.
  • The form of the relationship is slightly curved. More important, most states fall into one of two distinct clusters. In about half of the states, 25% or fewer graduates took the SAT. In the other half, more than 40% took the SAT.
  • The strength of a relationship in a scatterplot is determined by how closely the points follow a clear form. The overall relationship in the scatterplot is moderately strong: states with similar percents taking the SAT tend to have roughly similar mean SAT Math scores.

Ultimately, what we care about when analyzing scatterplots is linearity: whether the data is roughly forming the shape of a line, which allows us to fit a straight line on top.

8.2 Constructing Scatterplots

Let’s return to the Orange data. It looks like this:

Treeagecircumference
111830
148458
166487
11004115
11231120
11372142

And you can find the documentation here:

?Orange

age is represents the age of each tree, and I want to investigate whether the circumference is positively correlated with the age of the tree In that case, I should construct a scatter plot using the plot() command.

plot(y= Orange$circumference, x= Orange$age, main= "Age of Orange Trees vs. Circumference", xlab= "Age of Tree (Days since Dec. 31, 1968)", ylab= "Trunk Circumference (mm)")

Chapter 8 📈 Scatterplots and Correlation | Intro to R @ TJ (3)

FIGURE 8.1: A Scatter Plot of Orange Age and Orange Circumference

Let’s do DOFS:

  • The direction is clearly positive. As the age of the tree increases, the circumference increases too.
  • There doesn’t appear to be any major outliers.
  • The form of the scatterplot is roughly linear.
  • The strength of the scatterplot gets weaker as age increases. In other words, it is fan-shaped, which means the spread of trunk circumference increases as the age of the Orange trees increases. This is worrisome, because it makes it harder to judge linearity, and decide whether a linear model is appropriate. One explanation, in context, is that all the trees started off relatively the same, but there were small genetic variations between the trees that became magnified as they grew bigger.

At this point, we have to make a decision. Either we stop here, claiming that a linear model isn’t appropriate, or we judge that the increase in variance is not that bad, and enough for us to continue. We’ll continue.

8.3 Correlation

Look at the two scatterplots below. Which one has the stronger association?

Chapter 8 📈 Scatterplots and Correlation | Intro to R @ TJ (4)

Surprise! They actually have the same correlation. Look closely at the scales. One has been stretched out to make the points look “further apart”, even though numerically they are the same distance.

Since it’s easy to be fooled by different scales or by the amount of space around “clouds” of points in a scatterplot, we need a numerical measurement to supplement the graph. Correlation is the measure we use.

Chapter 8 📈 Scatterplots and Correlation | Intro to R @ TJ (5)

The correlation coefficient r is always a number between -1 and 1. In addition:

  • r indicates the direction of a linear relationship by its sign: r>0 for a positive association and r<0 for a negative association.

  • Values of r near 0 indicate a very weak linear relationship.

  • The extreme values \(r=-1\) and \(r=1\) occur only in the case of a perfect linear relationship, when the points lie exactly along a straight line.

Chapter 8 📈 Scatterplots and Correlation | Intro to R @ TJ (6)

To calculate the correlation coefficient in R, use the cor() command.

cor(y= Orange$circumference, x= Orange$age)
## [1] 0.9135

The correlation coefficient indicates to us that our data is strongly positive, and relatively linear.

8.3.1 Correlation is not Causation

Chapter 8 📈 Scatterplots and Correlation | Intro to R @ TJ (7)

If two variables have a strong correlation with each other, all it implies is that they are associated with each other— in other words, it suggests that the two variables have a relationship.

It does not guarantee that the two variables have a relationship with each other. Ice Cream and Murder Rates tend to be higher in the summer, which suggests a relationship. But that doesn’t mean that ice cream sends people into a murderous frenzy.

The only way to establish causation between two variables is with an experiment. You will learn more about experimental design later in this course.

8.4 Correlation Game

Your turn! Test your knowledge on correlation by guessing the correlation for each scatterplot.

7 🎲 Sampling and Probability

9 🔮 Linear Regression

Chapter 8 📈 Scatterplots and Correlation | Intro to R @ TJ (2024)

References

Top Articles
Acquarena - Brixen - Südtirol
MyWinLocker for Windows - Free download and software reviews - CNET Download
Funny Roblox Id Codes 2023
Golden Abyss - Chapter 5 - Lunar_Angel
Www.paystubportal.com/7-11 Login
Joi Databas
DPhil Research - List of thesis titles
Shs Games 1V1 Lol
Evil Dead Rise Showtimes Near Massena Movieplex
Steamy Afternoon With Handsome Fernando
Which aspects are important in sales |#1 Prospection
Detroit Lions 50 50
18443168434
Zürich Stadion Letzigrund detailed interactive seating plan with seat & row numbers | Sitzplan Saalplan with Sitzplatz & Reihen Nummerierung
Grace Caroline Deepfake
978-0137606801
Nwi Arrests Lake County
Justified Official Series Trailer
London Ups Store
Committees Of Correspondence | Encyclopedia.com
Pizza Hut In Dinuba
Jinx Chapter 24: Release Date, Spoilers & Where To Read - OtakuKart
How Much You Should Be Tipping For Beauty Services - American Beauty Institute
Free Online Games on CrazyGames | Play Now!
Sizewise Stat Login
VERHUURD: Barentszstraat 12 in 'S-Gravenhage 2518 XG: Woonhuis.
Jet Ski Rental Conneaut Lake Pa
Unforeseen Drama: The Tower of Terror’s Mysterious Closure at Walt Disney World
Ups Print Store Near Me
C&T Wok Menu - Morrisville, NC Restaurant
How Taraswrld Leaks Exposed the Dark Side of TikTok Fame
University Of Michigan Paging System
Dashboard Unt
Access a Shared Resource | Computing for Arts + Sciences
Speechwire Login
Healthy Kaiserpermanente Org Sign On
Restored Republic
3473372961
Craigslist Gigs Norfolk
Litter-Robot 3 Pinch Contact & DFI Kit
Moxfield Deck Builder
Senior Houses For Sale Near Me
Whitehall Preparatory And Fitness Academy Calendar
Trivago Myrtle Beach Hotels
Anya Banerjee Feet
Birmingham City Schools Clever Login
Thotsbook Com
Funkin' on the Heights
Vci Classified Paducah
Www Pig11 Net
Ty Glass Sentenced
Latest Posts
Article information

Author: Rob Wisoky

Last Updated:

Views: 5241

Rating: 4.8 / 5 (48 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Rob Wisoky

Birthday: 1994-09-30

Address: 5789 Michel Vista, West Domenic, OR 80464-9452

Phone: +97313824072371

Job: Education Orchestrator

Hobby: Lockpicking, Crocheting, Baton twirling, Video gaming, Jogging, Whittling, Model building

Introduction: My name is Rob Wisoky, I am a smiling, helpful, encouraging, zealous, energetic, faithful, fantastic person who loves writing and wants to share my knowledge and understanding with you.