8  Working with JSON on the DNAnexus Platform

Preparing for this Chapter

You will not need to login to the platform for this chapter.

You’ll want to cd into the JSON folder in your project.

cd JSON/

You’ll also need to install jq if it’s not yet on your system. If you’re on Ubuntu/WSL, I recommend installing via apt install. If you’re on Mac, I recommend installing via brew install.

You can check if jq is already installed by typing

which jq

8.1 Learning Objectives

By the end of this chapter, you should be able to:

  • Define and Explain what JSON is and its elements and structures
  • Explain how JSON is used on the DNAnexus platform
  • Explain the basic structure of a JSON file
  • Generate JSON output from dx find data and dx find jobs
  • Execute simple jq commands to extract information from a JSON file
  • Execute advanced jq filters using conditionals to process output from dx find files or dx find jobs.

8.2 What is JSON?

JSON is short for JavaScript Object Notation. It is a format used for storing information on the web and for interacting with APIs.

8.3 How is JSON used on the DNAnexus Platform?

JSON is used in multiple ways on the DNAnexus Platform, including:

  • Submitting Jobs with complex parameters/inputs
  • Specifying parameters of an app or workflow (dxapp.json and dxworkflow.json)
  • Output of commands such as dx find data or dx find jobs with the --json flag
  • Extracting environment variables from dx env

Underneath it all, all interactions with the DNAnexus API server are JSON submissions.

You can see that JSON is used in many places on the DNAnexus platforms, and for many purposes. So having basic knowledge of JSON can be really helpful.

8.4 Elements of a JSON file

Here are the main elements of a JSON file:

  • Key:Value Pair. Example: "name": "Ted Laderas". In this example, our key is “name” and our value is “Ted Laderas”
  • List [] - a collection of values. All values have to be the same data type. Example: ["mom", "dad"]
  • Object {} - A collection of key/value pairs, enclosed with curly brackets ({})

Here’s the example we’re going to use. We’ll do most of our processing of JSON on our own machine.

#| eval: false
#| filename: "json_data/example.json"
{
  "report_html": {
    "dnanexus_link": "file-G4x7GX80VBzQy64k4jzgjqgY"
  },
  "stats_txt": {
    "dnanexus_link": "file-G4x7GXQ0VBzZxFxz4fqV120B"
  },
  "users": ["laderast", "ted", "tladeras"]
}
Check Yourself

What does the names value contain in the following JSON? Is it a list, object or key:value pair?

{
  "names": ["Ted", "Lisa", "George"]
}

It is a list. We know this because the value contains a [].

{
  "names": ["Ted", "Lisa", "George"]
}

8.5 Nestedness

JSON wouldn’t be helpful if it were only limited to a single level or key:values. Values can be lists or objects as well. For example, in our example JSON, we can see that the value of report_html is a JSON object:

"report_html": {
    "dnanexus_link": "file-G4x7GX80VBzQy64k4jzgjqgY"
  }

The object is:

{
    "dnanexus_link": "file-G4x7GX80VBzQy64k4jzgjqgY"
  }

When we work with extracting information, we’ll have to take this nested structure in mind.

8.6 Outputting JSON with dx find commands

We already encountered the dx find data command, which we used in the batch processing chapter.

If we use the --json option, then the file information will be outputted in json format. This command will return a list of JSON file objects.

For example:

#| eval: false
#| filename: 05-JSON/dx-find-data-json.sh
dx find data --path ted_demo:data/ --json

The output will look like this:

[
    {
        "project": "project-GGyyqvj0yp6B82ZZ9y23Zf6q",
        "id": "file-FvQGZb00bvyQXzG3250XGbgz",
        "describe": {
            "id": "file-FvQGZb00bvyQXzG3250XGbgz",
            "project": "project-GGyyqvj0yp6B82ZZ9y23Zf6q",
            "class": "file",
            "name": "small-celegans-sample.fastq",
            "state": "closed",
            "folder": "/json_data",
            "modified": 1665003035646,
            "size": 16801690
        }
    },
    {
        "project": "project-GGyyqvj0yp6B82ZZ9y23Zf6q",
        "id": "file-B5Q8z8V5g3bX5qQ9y9YQ006k",
        "describe": {
            "id": "file-B5Q8z8V5g3bX5qQ9y9YQ006k",
            "project": "project-GGyyqvj0yp6B82ZZ9y23Zf6q",
            "class": "file",
            "name": "NC_001422.fasta",
            "state": "closed",
            "folder": "/json_data",
            "modified": 1665003035645,
            "size": 5539
        }
    }
]
Test your knowledge

What is returned when we run this code? Is it a JSON object, or a list of JSON objects?

#| eval: false
#| filename: 05-JSON/dx-find-jobs-json.sh
dx find jobs --json

It’s hard to tell at first, but We are returning a list of JSON objects, each of which corresponds to a single job run within our project.

8.7 Learning jq gradually

As you can see, JSON can be very complicated to process and extract information from, depending on how many levels you go deep in a JSON document. That’s why jq exists

jq is a utility that is made to process JSON. All jq commands have this format:

#| eval: false
jq '<filter>' <JSON file>

Filters are the heart of processing data using jq. They let you extract JSON values or keys and process them with conditionals to filter data down. For example, you can do something like the following:

  1. Select all elements where the job status is failed
  2. For each of these elements, output the job-status id

You can see how jq can be extremely powerful.

You can also pipe JSON from standard output into jq. This will be really helpful for us when we start using pipes of data files from dx find data.

8.8 Our simplest filter: .

One of the biggest uses for jq is for more readable formatting. Oftentimes, the JSON returned by an API call is really hard to read. It can be returned as a single line of text, and it is really hard for humans to see the actual structure of the JSON response.

If we run jq . on a JSON file, we’ll see that it makes it much more readable.

#| eval: false
#| filename: JSON/jq-simple.sh
jq '.' json_data/example.json

8.9 Getting the keys

We can extract the keys from the top level JSON by using 'keys' as our filter.

#| eval: false
#| filename: JSON/jq-keys.sh
jq 'keys' json_data/example.json

8.10 Extracting a value from a container: jq .report_html

So, say we want to extract the value from the report_html key in the above.

We can specify the key that we’re interested in to extract the value from that key.

#| eval: false
#| filename: JSON/jq-report.sh
jq '.report_html' json_data/example.json
Try it out

This is the JSON file we’re going to be working with, in json_data/example.json.

#| eval: false
#| filename: "json_data/example.json"
{
  "report_html": {
    "dnanexus_link": "file-G4x7GX80VBzQy64k4jzgjqgY"
  },
  "stats_txt": {
    "dnanexus_link": "file-G4x7GXQ0VBzZxFxz4fqV120B"
  },
  "users": ["laderast", "ted", "tladeras"]
}

In your terminal, try out:

jq '.stats_txt' json_data/example.json

What do you return?

#| eval: false
#| filename: JSON/jq-stats-txt.sh
jq '.stats_txt' json_data/example.json

We’ll return the following JSON object, which contains a single key-value pair.

{
  "dnanexus_link": "file-G4x7GXQ0VBzZxFxz4fqV120B"
}

8.10.1 Going one level deeper

We can extract the actual value associated with the dnanexus_link key within report_html by chaining onto our filter:

#| eval: false
#| filename: JSON/jq-nested.sh
jq '.report_html.dnanexus_link' json_data/example.json
Try It Out

What is returned when you run this code?

#| eval: false
#| filename: JSON/jq-nested.sh
jq '.report_html.dnanexus_link' json_data/example.json

Running this command should return the value of dnanexus-link within report_html:

"file-G4x7GX80VBzQy64k4jzgjqgY"

8.11 Conditional Filters using jq

One natural use case for using jq on the DNAnexus platform is to rerun failed jobs.

Failed jobs can occur when using normal priority, which focuses on using spot instances. So, if we ran a series of jobs, we would want to restart these failed jobs.

This is a bit of code that would allow us to select those jobs that have failed.

#| eval: false
#| filename: JSON/dx-find-jobs-jq-clone.sh
dx find jobs --json |\ 
jq '.[] | select (.state | contains("failed")) | .id' |\
xargs -I% sh -c "dx run --clone %"

The second line contains the jq filter that does the magic. Remember, the filter is contained within the single quotes ('').

The last line contains "dx run --clone %".

Let’s take apart the different parts of the jq filter (Figure 8.1):

graph LR;
  A[".[]"] --> B{"|"} 
  B --> C["select (.state | contains('failed'))"] 
  C --> D{"|"} 
  D --> E[".id"]
Figure 8.1: Taking apart the jq filter.

Note that the pipes in this filter apply only to the jq filter, so don’t mix them up with the other pipes in our overall Bash statement.

The first part of the filter, .[], says that we want to process the list (remember, dx find jobs returns a list of objects).

The second part of the filter, select (.state | contains('failed')) will let us select objects in the list that have a state of failed. This list of objects is then passed on the next part of the filter.

The last part of the filter, .id, returns the the file ids for our failed jobs.

This is a basic pattern for selecting objects that meet a criteria, and can be really helpful when you want more control of your batch processing.

Check Yourself

How would you modify the code below to terminate all jobs that had state running using dx terminate?

#| eval: false
dx find jobs --json |\ 
jq '.[] | select (.state | contains("failed")) | .id' |\
xargs -I% sh -c "dx run --clone %"
#| eval: false
dx find jobs --json | \ 
jq '.[] | select (.state | contains("running")) | .id' | \
xargs -I% sh -c "dx terminate %"

8.12 Using JSON as an Input

This section is made to help you in writing JSON files. If you build an app or a workflow, you will need to edit the dxapp.json or dxworkflow.json files to enable your executables to be runnable.

8.12.1 Writing and modifying JSON

I know that JSON is supposed to be human readable. However, there are a lot of little quibbles that don’t make it easily human writable.

I highly recommend using an editor such as VS Code, with the appropriate JSON plugin. A JSON Visualizer such as the JSON Crack Extension will be extremely helpful as well.

JSON Visualizer Plugin

Using the visualizer plugin and this tutorial will help you write well formed JSON, and point out any issues you might have. It’s easy to misplace a comma, or a bracket, and this tool helps you write well-formed JSON.