C4 - Exercise Sheet

Here you find the exercise sheet for chapter 4: “Data Visualization”

Getting ready

Task 1

Create an R project for solving this Exercise Sheet.

Task 2

Download the csv-file SSRC_data.csv and the R script SSRC_C4_template.R and put it in the R project folder you created in Task 1.

Task 3

Open the SSRC_C4_template.R R Script.

Task 4

Load the tidyverse package.

Task 5

Use the read.csv() command to load the SSRC data into R and call the respective data object SSRC_data.

Task 6

Get a first impression of the dataset by checking out the dataset using the str() command.

Task 7

Transform the three character variables in the dataset into factor variables. Make sure that the levels of the physical_activity_level and education_level variables are ordered in a reasonable way. (You learned how to do that in chapter 3)

Task 8

What kind of plot would be useful to analyze the …

Bar Charts

Task 9

Use a bar chart to analyze the distribution of the physical_activity_level variable.

Task 10

Create the same bar chart as in Task 9 but with colored bars and a decreased bar-width of 0.5.

Histograms

Task 11

Create a histogram to analyze the distribution of the variable age.

Task 12

Create the same histogram as in Task 11 but change the binwidth to 1.

Density Plots

Task 13

Create a density plot to analyze the distribution of the variable bmi.

Task 14

Create a plot that depicts the distributions of bmi for males and females in a single plot.

Boxplots

Task 15

Create a set of parallel boxplots to describe the relationship between education_level and bmi.

Scatterplots

Task 16

Create a scatterplot to analyze the relationship between age and bmi.

Task 17

Create the same scatterplot as in task 16 and add a line that approximates the relationship between age and bmi. (Use method = “lm”)

Task 18

Create the same scatterplot as in task 17 and add three horizontal lines that indicate bmi levels of 18.5, 25 and 30.