# Load tidyverse package
library("tidyverse")
# Load "knitr" package -> needed for kable() command
library("knitr")
# Load the dataset
SSRC_data <- read.csv("SSRC_data.csv")
Get a first impression of the dataset.
kable(head(SSRC_data))
age | gender | education_level | physical_activity_level | bmi |
---|---|---|---|---|
64 | male | medium | low | 27.9 |
59 | female | low | medium | 27.5 |
39 | female | high | low | 27.4 |
30 | female | high | low | 24.2 |
49 | male | medium | low | 23.9 |
37 | male | medium | medium | 30.7 |
Transform the three categorical variables in the dataset into factor variables.
# Transform into factor variables
SSRC_data <- mutate(SSRC_data, gender = as.factor(gender),
education_level = as.factor(education_level),
physical_activity_level = as.factor(physical_activity_level))
In the following, we will analyze the distribution of BMI graphically by means of:
# Create histogram
ggplot(data = SSRC_data, mapping = aes(x = bmi)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Create density plot
ggplot(data = SSRC_data, mapping = aes(x = bmi)) +
geom_density()
In the following, we will analyze the distribution of age graphically by means of: