# Load tidyverse package
library("tidyverse")
# Load "knitr" package -> needed for kable() command
library("knitr")
# Load the dataset
SSRC_data <- read.csv("SSRC_data.csv")
Get a first impression of the dataset.
kable(head(SSRC_data))
| age | gender | education_level | physical_activity_level | bmi |
|---|---|---|---|---|
| 64 | male | medium | low | 27.9 |
| 59 | female | low | medium | 27.5 |
| 39 | female | high | low | 27.4 |
| 30 | female | high | low | 24.2 |
| 49 | male | medium | low | 23.9 |
| 37 | male | medium | medium | 30.7 |
Transform the three categorical variables in the dataset into factor variables.
# Transform into factor variables
SSRC_data <- mutate(SSRC_data, gender = as.factor(gender),
education_level = as.factor(education_level),
physical_activity_level = as.factor(physical_activity_level))
In the following, we will analyze the distribution of BMI graphically by means of:
# Create histogram
ggplot(data = SSRC_data, mapping = aes(x = bmi)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Create density plot
ggplot(data = SSRC_data, mapping = aes(x = bmi)) +
geom_density()
In the following, we will analyze the distribution of age graphically by means of: