vignettes/articles/Running-Table-Exporter.Rmd
Running-Table-Exporter.Rmd
Note: This functionality is currently experimental. I’m working to get it working on UK Biobank RAP right now.
The RStudio version on UKB RAP needs an updated dx-toolkit to use this functionality. You can run the code below to update it and install pandas.
install.packages(c("vctrs", "stringr", "remotes", "rlang"))
remotes::install_github("laderast/xvhelper")
reticulate::use_python("/usr/bin/python3")
If we have a large number of fields (more than 15-20) to extract from
the pheno data, then our call to extract_data()
may fail.
That is because this functionality is dependent on a shared resource
called the Thrift Server. There is a hard limit to the query execution
time on the Thrift server: 2 minutes.
If our query takes longer, we can launch Table Exporter, which is an app on the platform that will do the extraction for us. This vignette outlines how to launch table exporter in your R session, monitor it, and find the CSV file that was generated by it.
The first thing we need to do is find the dataset id, and have a
vector of fields that we’ve generated. Once we have these two items, we
can use launch_table_exporter()
to start the Table Exporter
app.
library(xvhelper)
ds_id <- find_dataset_id()
fields <- c("participant.eid", "participant.p31", "participant.p41202")
job_id <- launch_table_exporter(ds_id, fields)
#> → Job has been submitted as job-GY8YzK80Yq363Pz7Pbk7J5ZZ
#> → Use check_job("job-GY8YzK80Yq363Pz7Pbk7J5ZZ)") to monitor job
job_id
#> [1] "job-GY8YzK80Yq363Pz7Pbk7J5ZZ"
When our Table Exporter job is running, we can check on its status
using check_job()
:
check_job(job_id)
#> ✔ Job is currently idle
#> NULL
Note that it also returns a NULL
. When our job finishes
successfully, it will return a file-id
(see below).
The states our job can be are:
idle
runnable
running
failed
done
If we need to terminate our Table Exporter Job, we can use
terminate_job()
:
terminate_job(job_id)
#> → Job job-GY8YzK80Yq363Pz7Pbk7J5ZZ has been terminated
If our job finishes successfully or fails, we will receive an email
notifiation. We can check on the current status of our job
withcheck_job()
. Here we’re passing in a job ID for a
successful run.
file_id <- check_job("job-GY4Zj180Yq3BJyFzg2ygGVX2")
file_id
We can download this to our JupyterLab/RStudio storage using:
We can see a list of all jobs and all their states by using
find_all_jobs()
:
job_frame <- find_all_jobs()
job_frame
#> job_id
#> 1 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY8YzK80Yq363Pz7Pbk7J5ZZ
#> 2 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY6Zqb00Yq311qkkV97xbyKf
#> 3 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY61q000Yq3496Z4xxZj0V80
#> 4 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY61p280Yq31qFxBFg9PYppQ
#> 5 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY61kX00Yq34pvqfXz9kqzbf
#> 6 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY61k4Q0Yq3BVj7B92Bq8pG4
#> 7 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY61f700Yq3874f65jXbqz7Z
#> 8 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY61bxQ0Yq37F3G25160P623
#> 9 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY6074j0Yq31ybPXPX6q1pvj
#> 10 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY5vk580Yq38xpKf11bp8QJ5
#> 11 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY5vj9j0Yq3JZFfZ96qGBqqV
#> 12 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY5FPy80Yq31ybPXPX6pyqy9
#> 13 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY5FJJj0Yq3FvB5zzkXffjzG
#> 14 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY5F7Zj0Yq3JJ3b2853K9XkF
#> 15 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4g4q00Yq362FYzZ9xQXxg2
#> 16 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4f6B00Yq349vVKqKQJJgyV
#> 17 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4f4kQ0Yq3Kj0ZpZYxKX9Qq
#> 18 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4bq3Q0Yq38x1xJqZKXF1K5
#> 19 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4bp0j0Yq36XYpx8QGx1VkX
#> 20 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4bQYj0Yq38b3Pf0PXzVk0j
#> 21 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4b72Q0Yq32vqFp89yqGk4p
#> 22 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4Zzg80Yq303pgXq7Qf5yPB
#> 23 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4ZyQj0Yq303pgXq7Qf5yGk
#> 24 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4Zj180Yq3BJyFzg2ygGVX2
#> 25 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4YBp80Yq3Kj0ZpZYxKX1X1
#> 26 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4Y1JQ0Yq38b3Pf0PXzVYQx
#> 27 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY20GQQ0Yq34KBfpbJZ0gQ3v
#> 28 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY1B2Xj0Yq3738KP3ZF2vfqQ
#> 29 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY19ZZ80Yq36122fZvq22Ykg
#> name state app
#> 1 Table exporter idle table-exporter
#> 2 Table exporter terminated table-exporter
#> 3 Table exporter terminated table-exporter
#> 4 Table exporter terminated table-exporter
#> 5 Table exporter terminated table-exporter
#> 6 Table exporter terminated table-exporter
#> 7 Table exporter terminated table-exporter
#> 8 Table exporter terminated table-exporter
#> 9 Table exporter terminated table-exporter
#> 10 Table exporter terminated table-exporter
#> 11 Table exporter terminated table-exporter
#> 12 Table exporter terminated table-exporter
#> 13 Table exporter terminated table-exporter
#> 14 Table exporter terminated table-exporter
#> 15 Table exporter terminated table-exporter
#> 16 Table exporter terminated table-exporter
#> 17 Table exporter terminated table-exporter
#> 18 Table exporter terminated table-exporter
#> 19 Table exporter terminated table-exporter
#> 20 Table exporter terminated table-exporter
#> 21 Table exporter terminated table-exporter
#> 22 Table exporter terminated table-exporter
#> 23 Table exporter terminated table-exporter
#> 24 Table exporter done table-exporter
#> 25 Table exporter done table-exporter
#> 26 Table exporter failed table-exporter
#> 27 JupyterLab - 7/27/2023 9:24 AM terminated dxjupyterlab
#> 28 JupyterLab - 7/27/2023 9:24 AM terminated dxjupyterlab
#> 29 JupyterLab - 7/27/2023 9:24 AM terminated dxjupyterlab
#> output_file
#> 1 <NA>
#> 2 <NA>
#> 3 <NA>
#> 4 <NA>
#> 5 <NA>
#> 6 <NA>
#> 7 <NA>
#> 8 <NA>
#> 9 <NA>
#> 10 <NA>
#> 11 <NA>
#> 12 <NA>
#> 13 <NA>
#> 14 <NA>
#> 15 <NA>
#> 16 <NA>
#> 17 <NA>
#> 18 <NA>
#> 19 <NA>
#> 20 <NA>
#> 21 <NA>
#> 22 <NA>
#> 23 <NA>
#> 24 file-GY4Zq9j0gJVQxgKGxV7YG6KJ
#> 25 file-GY4YK100fqYgbqyqF3YqbyfF
#> 26 <NA>
#> 27 <NA>
#> 28 <NA>
#> 29 <NA>
We can find the finished jobs by looking for
state == "done"
:
job_frame |>
dplyr::filter(state == "done")
#> job_id name
#> 1 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4Zj180Yq3BJyFzg2ygGVX2 Table exporter
#> 2 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4YBp80Yq3Kj0ZpZYxKX1X1 Table exporter
#> state app output_file
#> 1 done table-exporter file-GY4Zq9j0gJVQxgKGxV7YG6KJ
#> 2 done table-exporter file-GY4YK100fqYgbqyqF3YqbyfF