C2 - Sample Solution

Here you find the sample solution for the exercise sheet of chapter 2

Start a project and import data

Task 1

Create an R project for solving this Exercise Sheet.

Task 2

Download the csv-file SSRC_data.csv and the R script SSRC_C2_template.R and put it in the R project folder you created in Task 1.

Task 3

Open the SSRC_C2_template.R R Script.

Task 4

Use the read.csv() command to load the SSRC data into R and call the respective data object SSRC_data.

# Load the dataset
SSRC_data <- read.csv("SSRC_data.csv")

Task 5

Get a first impression of the dataset by checking out the first 6 rows of the dataset and by looking at the data in the spreadsheet mode.

# Check out the first 6 rows
head(SSRC_data)
  age gender education_level physical_activity_level  bmi
1  64   male          medium                     low 27.9
2  59 female             low                  medium 27.5
3  39 female            high                     low 27.4
4  30 female            high                     low 24.2
5  49   male          medium                     low 23.9
6  37   male          medium                  medium 30.7
# Produce spreadsheet mode
View(SSRC_data)

Task 6

Install and load the tidyverse package. (If you have already installed the package before, loading the package is sufficient)

# Install tidyverse package (if you have not done it yet)
# install.packages("tidyverse")

# Load tidyverse package
library("tidyverse")

Isolating data

Task 7

Create a dataset that only contains the variables age and bmi and call this dataset SSRC_data_C2_task_7. Check out the first six rows of this dataset.

# Create the dataset 
SSRC_data_C2_task_7 <- select(SSRC_data, age, bmi)

# Check out dataset
head(SSRC_data_C2_task_7)
  age  bmi
1  64 27.9
2  59 27.5
3  39 27.4
4  30 24.2
5  49 23.9
6  37 30.7

Task 8

Create a dataset that only contains subjects with a bmi below 18.5 and call this dataset SSRC_data_C2_task_8. Check out the first six rows of this dataset.

# Create the dataset 
SSRC_data_C2_task_8 <- filter(SSRC_data, bmi < 18.5)

# Check out dataset
head(SSRC_data_C2_task_8)
  age gender education_level physical_activity_level  bmi
1  28   male          medium                     low 15.8
2  62 female            high                     low 18.3
3  31   male            high                     low 17.4
4  31   male            high                  medium 17.3
5  53   male             low                     low 16.8
6  27 female          medium                  medium 17.3

Task 9

Create a dataset that only contains individuals that have a low level of education and a bmi above 25 and call this dataset SSRC_data_C2_task_9. Check out the first six rows of this dataset.

# Create the dataset
SSRC_data_C2_task_9 <- filter(SSRC_data, education_level == "low" & bmi > 25)

# Check out dataset
head(SSRC_data_C2_task_9)
  age gender education_level physical_activity_level  bmi
1  59 female             low                  medium 27.5
2  57 female             low                     low 35.3
3  40 female             low                     low 26.3
4  56 female             low                  medium 33.7
5  67   male             low                     low 31.5
6  71 female             low                     low 29.6

Task 10

Create a dataset that only contains individuals with a bmi between 18.5 and 25 and is restricted to the variables bmi and gender. Use the Pipe operator to do so and call the dataset SSRC_data_C2_task_10. Check out the first six rows of this dataset.

# Create the dataset 
SSRC_data_C2_task_10 <- SSRC_data %>% 
                        filter(bmi >= 18.5 & bmi <= 25) %>% 
                        select(bmi, gender)

# Check out dataset
head(SSRC_data_C2_task_10)
   bmi gender
1 24.2 female
2 23.9   male
3 18.5 female
4 24.1   male
5 23.1 female
6 23.1   male

Deriving information

Task 11

Use the summarize() command in combination with the filter() command to calculate the mean, maximum and minimum bmi of males that feature a low level of physical activity.

# Calculate mean, maximum and minimum bmi 
SSRC_data %>% 
  filter(gender == "male" & physical_activity_level == "low") %>% 
  summarize(mean_bmi = mean(bmi), maximum_bmi = max(bmi), minimum_bmi = min(bmi))
  mean_bmi maximum_bmi minimum_bmi
1 27.46175          55        15.8

Task 12

Use the summarize() command in combination with the group_by() command to compare males and females with respect to their mean age and bmi.

# Compare males and females with respect to age and bmi 
SSRC_data %>% 
  group_by(gender) %>% 
  summarize(mean_bmi = mean(bmi), mean_age = mean(age))
# A tibble: 2 × 3
  gender mean_bmi mean_age
  <chr>     <dbl>    <dbl>
1 female     26.4     47.1
2 male       27.2     48.4