A synthetic dataset for predicting cardiovascular risk in patient cohorts.
a data frame with variables related to cvd risk. cvd_genodata
is a smaller subset
of the data with a few genetic cohorts.
Patient Identifier. Unique patient identifier for Health Hospital University.
HHUID + 8 digit code
Patient Age Category. Age category of patient. string of age ranges.
Does patient have hypertension? Threshold systolic blood pressure is 150. Y/N
Is patient receiving hypertension treatment? Y/N
Y/N based on threshold of pack years: 10
Race based on self-defined question in survey. AmInd (american indian),
Asian/PI (asian/pacific islander), Black/AfAm (Black/African American), White
Gender of patient. Male, Female, NA means that patient did not want gender recorded.
Whether patient has Type 2 diabetes. Y/N.
numerical age in years.
Body Mass Index of Patient. kg/m^2
Systolic Blood Pressure in mm/Hg
Cardiovascular disesase based on extraction from patient billing codes. Y/N
There is a subset of patients that include genetic covariate information called cvd_genodata
SNP data
SNP data
SNP data
SNP data
#load full dataset data(cvd_patient) #look at summary of data summary(cvd_patient)#> patientID age htn treat smoking #> HHUID00000002: 1 0-20 : 42341 N:310265 N:358070 N:374322 #> HHUID00000004: 1 20-40:141423 Y:114930 Y: 67125 Y: 50873 #> HHUID00000007: 1 40-55:112907 #> HHUID00000009: 1 55-70: 88033 #> HHUID00000010: 1 70-90: 40491 #> HHUID00000011: 1 #> (Other) :425189 #> race t2d gender numAge bmi #> AmInd : 2293 N:393906 F:243138 Min. : 0.00 Min. :15.00 #> Asian/PI : 77881 Y: 31289 M:182057 1st Qu.:29.00 1st Qu.:19.00 #> Black/AfAm: 23888 Median :44.00 Median :21.00 #> White :321133 Mean :44.01 Mean :21.98 #> 3rd Qu.:59.00 3rd Qu.:24.00 #> Max. :90.00 Max. :36.00 #> #> tchol sbp cvd #> Min. :155.0 Min. : 73.0 N:375325 #> 1st Qu.:160.0 1st Qu.:115.0 Y: 49870 #> Median :180.0 Median :124.0 #> Mean :187.7 Mean :135.5 #> 3rd Qu.:206.0 3rd Qu.:165.0 #> Max. :245.0 Max. :222.0 #>#load genotype dataset data(cvd_genodata) summary(cvd_genodata)#> patientID age htn treat smoking #> HHUID00000004: 1 0-20 : 1067 N:49454 N:53681 N:52992 #> HHUID00000022: 1 20-40:47186 Y:10420 Y: 6193 Y: 6882 #> HHUID00000032: 1 40-55:11621 #> HHUID00000036: 1 #> HHUID00000038: 1 #> HHUID00000090: 1 #> (Other) :59868 #> race t2d gender numAge bmi #> AmInd : 344 N:55248 F:33348 Min. :19.00 Min. :15.00 #> Asian/PI :11106 Y: 4626 M:26526 1st Qu.:26.00 1st Qu.:19.00 #> Black/AfAm: 3286 Median :32.00 Median :21.00 #> White :45138 Mean :32.11 Mean :22.04 #> 3rd Qu.:39.00 3rd Qu.:24.00 #> Max. :44.00 Max. :36.00 #> #> tchol sbp rs10757278 rs1333049 rs4665058 rs8055236 #> Min. :155.0 Min. : 78.0 AA:45719 CC:45719 AA: 4924 GG:20914 #> 1st Qu.:160.0 1st Qu.:114.0 GG:14155 GG:14155 CC:54950 TT:38960 #> Median :180.0 Median :122.0 #> Mean :187.4 Mean :129.8 #> 3rd Qu.:204.0 3rd Qu.:133.0 #> Max. :245.0 Max. :219.0 #> #> cvd #> N:57973 #> Y: 1901 #> #> #> #> #>