C8 - Case Study

Analyzing and Predicting CVD Events

Imagine you are doing an internship in the data analytics team of a small health insurer with a portfolio of about 5000 insurees. In the recent past, the health insurer observed an increasing number of serious cardiovascular disease (CVD) incidences (e.g. strokes, myocardial infarctions, etc.). For that reason, the senior executives of the health insurer want to offer high risk individuals in their portfolio the participation in a lifestyle change and sports program. The executives asked your supervisor, the head of data analytics, to identify high risk individuals in the portfolio.

Since your supervisor is always very busy, she assigned you to do this analytics project. All she did by herself so far was to acquire two datasets that are necessary for doing the project. The first dataset was acquired from an external data provider. This dataset contains data for 10.000 individuals and includes information on individual characteristics (e.g. gender, bmi,…) measured in 2010 and information on which of these individuals experienced a CVD event in the following 10 years. The second dataset includes all 5000 insurees in the health insurer’s portfolio. This dataset only includes information on individual characteristics which were measured just recently. To explain you the specific tasks and to tell you her expectations, she scheduled a short meeting with you. Before the meeting took place, she sent you an email with two csv-files that contain the data and two codebooks:

Dataset 1: SSRC_dataset_case_study_1.csv

Codebook Dataset 1

Dataset 2: SSRC_dataset_case_study_2.csv

Codebook Dataset 2

In your meeting your supervisor told you to complete the following 6 Tasks:

1. Data preparation

2. Data exploration

3. Regression analysis

4. Identification of high risks in the portfolio

5. Expectations regarding the number of CVD Events in the upcoming 10 years

6. Risk classification tool

Your supervisor asks you to document all analysis steps and the respective results by means of a well structured R Markdown document.

Have Fun!