A bayesian network for generating categorical synthetic data for assessing cardiovascular risk. Variable types are as follows:
data(cvd_bayes_net)
A Bayesian Network of class CPTgrain
using the gRain
package for representing the data.
age
Patient Age Category. Age category of patient. string of age ranges.
htn
Does patient have hypertension? Threshold systolic blood pressure is 150. Y/N
treat
Is patient receiving hypertension treatment? Y/N
smoking
Y/N based on threshold of pack years: 10
race
Race based on self-defined question in survey. AmInd (american indian),
Asian/PI (asian/pacific islander), Black/AfAm (Black/African American), White
gender
Gender of patient. Male, Female, NA means that patient did not want gender recorded.
t2d
Whether patient has Type 2 diabetes. Y/N.
bmi
Body Mass Index of Patient. kg/m^2
sbp
Systolic Blood Pressure in mm/Hg
rs10757278
SNP data. Associated with race and total cholesterol.
rs1333049
SNP data. Associated with race and total cholesterol. Always co-occurs with rs10757278.
rs4665058
SNP data. Associated with race and total cholesterol.
rs8055236
SNP data. Variant is associated with increased risk.
Note that not all covariates (including cardiovascular risk) were generated by the dataset.
Further details about how to generate the entire dataset from this network can be found from
generate_data_from_network
in the vignettes folder.
library(gRain) data(cvd_bayes_net) #generate categorical data for 1000 patients testData <- simulate(cvd_bayes_net, nsim =1000) summary(testData)#> age htn treat smoking htn race bmi #> 0-20 : 77 N:655 N:797 N:855 N : 0 AmInd : 4 15-18:163 #> 20-40:319 Y:345 Y:203 Y:145 Y : 0 Asian/PI :169 18-25:693 #> 40-55:246 NA's:1000 Black/AfAm: 54 25-31: 92 #> 55-70:231 White :773 31+ : 52 #> 70-90:127 #> #> t2d genotype tchol gender #> N:926 1111: 13 <160 :278 M:451 #> Y: 74 1110: 16 160-199:459 F:549 #> 1100:189 200-239:147 #> 0010: 48 240+ :116 #> 0001:349 #> 0000:385