Here you find the exercise sheet for chapter 6: “Functions and Loops”
Create an R project for solving this Exercise Sheet.
Download the csv-file SSRC_data.csv and the R script SSRC_C6_template.R and put it in the R project folder you created in Task 1.
Open the SSRC_C6_template.R R Script.
Load the tidyverse package.
Use the read.csv() command to load the SSRC data into R and call the respective data object SSRC_data.
Get a first impression of the dataset by checking out the dataset using the str() command.
Transform the three categorical variables in the dataset into factor variables. Make sure that the levels of the physical_activity_level and education_level variables are ordered in a reasonable way.
Estimate a linear regression model with bmi as dependent variable and all other variables in the SSRC data set as independent variables. Call this model lm_mod.
Create a data frame with one observation including the four variables: age, gender, education_level and physical_activity_level. The observation is a female person of age 45 who features a medium level of education and a medium level of physical activity. Call this data frame SSRC_data_new.
Use the lm_mod model to predict the bmi for the new observation described in Task 9.
Build a function that enables a convenient bmi prediction for a particular set of covariables.
The prediction should be based on the same model that you estimated in Task 8.
Arguments of the function should be a data frame called data_input (default: SSRC_data) and the four variables age_input (default: 45), gender_input (default:“female”), education_input (default: “medium”) and physical_input (default = “medium”).
Call the function bmi_pred_funct.
Run the bmi_pred_funct function with its default values.
Use the bmi_pred_funct function to predict the bmi of a male person of age 59 who features a low level of education and a low level of physical activity.
Use the bmi_pred_funct to predict the bmi for 5 female persons that are 20, 30, 40, 50 and 60 years old. All of them feature a medium education level and a medium physical activity level.
Do the exact same thing as in Task 14 but this time you should use a for loop to do so.
Do the exact same thing as in Task 15 but this time you save the results of each iteration in a vector called bmi_predictions. Check out the content of this vector after creating it.
In the following tasks we always focus on the bmi prediction for a person that features the default values of our bmi_pred_funct function. We call such a person “default person”.
Use our bmi_pred_funct function to check out the prediction for our default person.
Use the sample_n() command from the dplyr package to draw a random sample (n = 1000) with replacement from our original SSRC dataset. Call this sample SSRC_data_bootstrap.
Apply bmi_pred_funct to the SSRC_data_bootstrap dataset to predict the bmi of our default person.
Use a for loop to repeat what you did in tasks 18/19 1000 times. Store the results in a vector called bmi_pred_boot. Check out the bmi_pred_boot vector.
Create a histogram to analyse the distribution of bmi_pred_boot vector. (Just use hist() from the base R package)
Calculate the mean, standard deviation and the 0.025 and 0.975 quantiles for the bmi_pred_boot vector.