Chapter 9 - Seeing relationships
The table below shows the fuel efficiency data with two variables (Speed and miles per gallon consumed by cars) for cars in 1984 and 1997.
Speed (mph) | Mpg in 1984 | Mpg in 1997 |
---|---|---|
15 | 21 | 24.5 |
20 | 25.5 | 28 |
25 | 30 | 30.5 |
30 | 32 | 32 |
35 | 33.5 | 31.5 |
40 | 33.5 | 31 |
45 | 33.5 | 32 |
50 | 32 | 32.5 |
55 | 30.5 | 32.5 |
60 | 27.5 | 31.5 |
65 | 25 | 29 |
70 | 22.5 | 27 |
75 | 20 | 24.5 |
Design a line chart comparing the MPG in the two years. Also, calculate Pearson’s correlation coefficient and Spearman’s rank correlation coefficient between: a) speed and 1984 mpg, and b) speed and 1997 mpg. Finally, discuss why the two types of coefficients (Pearson’s & Spearman’s) are very similar or dissimilar.
For the SAT scores and ACT scores data sets, design two scatter plots with participation (%) in the x-axis. The scatter plots should group (color) the states by the Region. Fit and show the LOWESS curve on both scatter plots. Most importantly, choose three states-ones where the SAT participation is either extremely low or high, or moderate, and illustrate these states (using arrows) in both scatter plots. Using your illustrated states as examples, compare and contrast the story revealed by the two scatter plots.
Data: ScoresSAT.csv and ScoresACT.csv
From the Gapminder 2012 dataset below, please select at least four variables of your interest and design a correlation matrix (scatter plot matrix). All of your scatter plots should show the correlation (r). Alternatively, you are welcome to download the most recent data from gapminder.org and design your chart. In addition, also draw a heatmap of the scatter plot matrix (similar to Figure 9.14 in the TTA book). Add a few sentences analyzing the findings revealed by your scatter plot matrix.
Data: Gapminder2012.csv
Design a parallel coordinates chart for the ScoresSAT or ScoresACT dataset showing all the variables. The state names need not be labeled. However, you may color the lines by region. Redesign your chart by reordering the variables, i.e., which comes after which. Also, discuss how the reordering makes it easy/difficult to reveal the relationships among the variables. Submit both charts.
Hint: You are welcome to use the Plotly code here.
Data: ScoresSAT.csv and ScoresACT.csv
In a scatter plot showing the body weight and brain weight of 62 mammal species, it is not easy to spot a relationship because of a few outliers. The correlation (r) and coefficient of determination (r2) for these two variables are 0.93 and 0.86 respectively. Transform the two variables using log10 and redesign the plot. Also calculate the new r and r2. Your plots should also include the marginal distributions of the two variables.
Data: BodyBrain.csv
The website ourworldindata.org publishes several informative data graphics on some of the world’s largest problems. Your task in this homework is to focus at a particular year (say, 2020) and a fixed set of countries (at least 10), and design a parallel coordinates chart displaying the relatioinships between at least five different variables. Some example variables include child mortality rate, consumption of animal products, and public social spending as a share of GDP. You are free to choose any variables that you believe are worth comparing. You will need to download the data for variables of your interest from the website (most charts have a ‘download’ button). Each line in your chart will represent a country. Since you are focusing on a particular year, time (years) should not be one of the variables.