Note: This functionality is currently experimental. I’m working to get it working on UK Biobank RAP right now.

UKB RAP RStudio only

The RStudio version on UKB RAP needs an updated dx-toolkit to use this functionality. You can run the code below to update it and install pandas.

pip3 install dxpy==0.354.0
pip3 install pandas
install.packages(c("vctrs", "stringr", "remotes", "rlang"))
remotes::install_github("laderast/xvhelper")
reticulate::use_python("/usr/bin/python3")

Running Table Exporter

If we have a large number of fields (more than 15-20) to extract from the pheno data, then our call to extract_data() may fail. That is because this functionality is dependent on a shared resource called the Thrift Server. There is a hard limit to the query execution time on the Thrift server: 2 minutes.

If our query takes longer, we can launch Table Exporter, which is an app on the platform that will do the extraction for us. This vignette outlines how to launch table exporter in your R session, monitor it, and find the CSV file that was generated by it.

The first thing we need to do is find the dataset id, and have a vector of fields that we’ve generated. Once we have these two items, we can use launch_table_exporter() to start the Table Exporter app.

library(xvhelper)

ds_id <- find_dataset_id()
fields <- c("participant.eid", "participant.p31", "participant.p41202")
job_id <- launch_table_exporter(ds_id, fields)
#> → Job has been submitted as job-GY8YzK80Yq363Pz7Pbk7J5ZZ
#> → Use  check_job("job-GY8YzK80Yq363Pz7Pbk7J5ZZ)") to monitor job
job_id
#> [1] "job-GY8YzK80Yq363Pz7Pbk7J5ZZ"

When our Table Exporter job is running, we can check on its status using check_job():

check_job(job_id)
#>  Job is currently idle
#> NULL

Note that it also returns a NULL. When our job finishes successfully, it will return a file-id (see below).

The states our job can be are:

  • idle
  • runnable
  • running
  • failed
  • done

If we need to terminate our Table Exporter Job, we can use terminate_job():

terminate_job(job_id)
#> → Job job-GY8YzK80Yq363Pz7Pbk7J5ZZ has been terminated

Successful Table Exporter Run

If our job finishes successfully or fails, we will receive an email notifiation. We can check on the current status of our job withcheck_job(). Here we’re passing in a job ID for a successful run.

file_id <- check_job("job-GY4Zj180Yq3BJyFzg2ygGVX2")
file_id

We can download this to our JupyterLab/RStudio storage using:

system(glue::glue("dx download {file_id}"))

Finding all jobs

We can see a list of all jobs and all their states by using find_all_jobs():

job_frame <- find_all_jobs()
job_frame
#>                                                           job_id
#> 1  project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY8YzK80Yq363Pz7Pbk7J5ZZ
#> 2  project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY6Zqb00Yq311qkkV97xbyKf
#> 3  project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY61q000Yq3496Z4xxZj0V80
#> 4  project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY61p280Yq31qFxBFg9PYppQ
#> 5  project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY61kX00Yq34pvqfXz9kqzbf
#> 6  project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY61k4Q0Yq3BVj7B92Bq8pG4
#> 7  project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY61f700Yq3874f65jXbqz7Z
#> 8  project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY61bxQ0Yq37F3G25160P623
#> 9  project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY6074j0Yq31ybPXPX6q1pvj
#> 10 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY5vk580Yq38xpKf11bp8QJ5
#> 11 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY5vj9j0Yq3JZFfZ96qGBqqV
#> 12 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY5FPy80Yq31ybPXPX6pyqy9
#> 13 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY5FJJj0Yq3FvB5zzkXffjzG
#> 14 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY5F7Zj0Yq3JJ3b2853K9XkF
#> 15 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4g4q00Yq362FYzZ9xQXxg2
#> 16 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4f6B00Yq349vVKqKQJJgyV
#> 17 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4f4kQ0Yq3Kj0ZpZYxKX9Qq
#> 18 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4bq3Q0Yq38x1xJqZKXF1K5
#> 19 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4bp0j0Yq36XYpx8QGx1VkX
#> 20 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4bQYj0Yq38b3Pf0PXzVk0j
#> 21 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4b72Q0Yq32vqFp89yqGk4p
#> 22 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4Zzg80Yq303pgXq7Qf5yPB
#> 23 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4ZyQj0Yq303pgXq7Qf5yGk
#> 24 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4Zj180Yq3BJyFzg2ygGVX2
#> 25 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4YBp80Yq3Kj0ZpZYxKX1X1
#> 26 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4Y1JQ0Yq38b3Pf0PXzVYQx
#> 27 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY20GQQ0Yq34KBfpbJZ0gQ3v
#> 28 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY1B2Xj0Yq3738KP3ZF2vfqQ
#> 29 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY19ZZ80Yq36122fZvq22Ykg
#>                              name      state            app
#> 1                  Table exporter       idle table-exporter
#> 2                  Table exporter terminated table-exporter
#> 3                  Table exporter terminated table-exporter
#> 4                  Table exporter terminated table-exporter
#> 5                  Table exporter terminated table-exporter
#> 6                  Table exporter terminated table-exporter
#> 7                  Table exporter terminated table-exporter
#> 8                  Table exporter terminated table-exporter
#> 9                  Table exporter terminated table-exporter
#> 10                 Table exporter terminated table-exporter
#> 11                 Table exporter terminated table-exporter
#> 12                 Table exporter terminated table-exporter
#> 13                 Table exporter terminated table-exporter
#> 14                 Table exporter terminated table-exporter
#> 15                 Table exporter terminated table-exporter
#> 16                 Table exporter terminated table-exporter
#> 17                 Table exporter terminated table-exporter
#> 18                 Table exporter terminated table-exporter
#> 19                 Table exporter terminated table-exporter
#> 20                 Table exporter terminated table-exporter
#> 21                 Table exporter terminated table-exporter
#> 22                 Table exporter terminated table-exporter
#> 23                 Table exporter terminated table-exporter
#> 24                 Table exporter       done table-exporter
#> 25                 Table exporter       done table-exporter
#> 26                 Table exporter     failed table-exporter
#> 27 JupyterLab - 7/27/2023 9:24 AM terminated   dxjupyterlab
#> 28 JupyterLab - 7/27/2023 9:24 AM terminated   dxjupyterlab
#> 29 JupyterLab - 7/27/2023 9:24 AM terminated   dxjupyterlab
#>                      output_file
#> 1                           <NA>
#> 2                           <NA>
#> 3                           <NA>
#> 4                           <NA>
#> 5                           <NA>
#> 6                           <NA>
#> 7                           <NA>
#> 8                           <NA>
#> 9                           <NA>
#> 10                          <NA>
#> 11                          <NA>
#> 12                          <NA>
#> 13                          <NA>
#> 14                          <NA>
#> 15                          <NA>
#> 16                          <NA>
#> 17                          <NA>
#> 18                          <NA>
#> 19                          <NA>
#> 20                          <NA>
#> 21                          <NA>
#> 22                          <NA>
#> 23                          <NA>
#> 24 file-GY4Zq9j0gJVQxgKGxV7YG6KJ
#> 25 file-GY4YK100fqYgbqyqF3YqbyfF
#> 26                          <NA>
#> 27                          <NA>
#> 28                          <NA>
#> 29                          <NA>

We can find the finished jobs by looking for state == "done":

job_frame |>
  dplyr::filter(state == "done")
#>                                                          job_id           name
#> 1 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4Zj180Yq3BJyFzg2ygGVX2 Table exporter
#> 2 project-GY19Qz00Yq34kBPz8jj0XKg0:job-GY4YBp80Yq3Kj0ZpZYxKX1X1 Table exporter
#>   state            app                   output_file
#> 1  done table-exporter file-GY4Zq9j0gJVQxgKGxV7YG6KJ
#> 2  done table-exporter file-GY4YK100fqYgbqyqF3YqbyfF