About

MAGMA

MAGMA (Maximum Accessible Genome for Mtb Analysis) is a pipeline for comprehensive genomic analyses of Mycobacterium tuberculosis with a focus on clinical decision making as well as research.

Go to

Prerequisites
Customization
Citation

Salient features of the implementation

Fine-grained control over resource allocation (CPU/Memory/Storage)
Reliance of bioconda for installing packages for reproducibility
Ease of use on a range of infrastructure (cloud/on-prem HPC clusters/ servers (or local machines))
Resumability for failed processes
Centralized locations for specifying analysis parameters and hardware requirements
- MAGMA parameters (default_parameters.config which can overridden using a params.yaml file)
- Hardware requirements (conf/server.config or conf/pbs.config or conf/low_memory.config)
- Execution (software) requirements (conf/docker.config or conf/conda_local.config or conf/podman.config)

Prerequisites

Nextflow

git : The version control in the pipeline.
Java-11 or Java-17 LTS release (preferred)

:warning: Check java version!: The java version should NOT be an internal jdk release! You can check the release via java --version Notice the LTS next to OpenJDK line.


$ java -version
openjdk version "17.0.7" 2023-04-18 LTS
OpenJDK Runtime Environment (build 17.0.7+7-LTS)
OpenJDK 64-Bit Server VM (build 17.0.7+7-LTS, mixed mode, sharing)

Download Nextflow

$ curl -s https://get.nextflow.io | bash

Make Nextflow executable

$ chmod +x nextflow

Add nextflow to your path (for example /usr/local/bin/)

$ mv nextflow /usr/local/bin

Sanity check for nextflow installation

$ nextflow info

  Version: 23.04.1 build 5866
  Created: 15-04-2023 06:51 UTC (08:51 SAST)
  System: Mac OS X 12.6.5
  Runtime: Groovy 3.0.16 on OpenJDK 64-Bit Server VM 17.0.7+7-LTS
  Encoding: UTF-8 (UTF-8)

:heavy_check_mark: With this you’re all set with Nextflow. Next we shall do a sanity-check for the pipeline setup.:

Pipeline setup sanity check

We provide the test profile, which can be used to test the pipeline as it will

Download 3 publicly available samples from the MAGMA publication https://doi.org/10.1371/journal.pcbi.1011648
Save you the trouble of settings up a samplesheet.csv with your own samples
Save you the trouble of preparing a parameters.yaml file for the analysis

You can initiate the test as shown below, assuming you will run this on a machine with low resources, we suggest you add laptop as well.

nextflow run 'https://github.com/TORCH-Consortium/MAGMA' \
         -profile test,docker,laptop \
         -r v2.2.0

:warning: Run in background!: You might want to run the test in background, using screen or tmux.

Samplesheet

A dummy samplesheet is provided here

The minimal samplesheet structure should have the following fields.

Sample,R1,R2
S0001,/full_path_to_directory_of_fastq_files/S0001_01_R1.fastq.gz,full_path_to_directory_of_fastq_files/S0001_01_R1.fastq.gz
S0002,/full_path_to_directory_of_fastq_files/S0002_01_R1.fastq.gz,full_path_to_directory_of_fastq_files/S0002_01_R2.fastq.gz
S0003,/full_path_to_directory_of_fastq_files/S0003_01_R1.fastq.gz,

If you have the metadata from sequencing instrument, you can specify further information in the samplesheet

Study,Sample,Library,Attempt,R1,R2,Flowcell,Lane,Index Sequence
Study_Name,S0001,1,1,full_path_to_directory_of_fastq_files/S0001_01_R1.fastq.gz,full_path_to_directory_of_fastq_files/S0001_01_R1.fastq.gz,1,1,1
Study_Name,S0002,1,1,full_path_to_directory_of_fastq_files/S0002_01_R1.fastq.gz,full_path_to_directory_of_fastq_files/S0002_01_R2.fastq.gz,1,1,1
Study_Name,S0003,1,1,full_path_to_directory_of_fastq_files/S0003_01_R1.fastq.gz,full_path_to_directory_of_fastq_files/S0003_01_R2.fastq.gz,1,1,1
Study_Name,S0004,1,1,full_path_to_directory_of_fastq_files/S0004_01_R1.fastq.gz,full_path_to_directory_of_fastq_files/S0004_01_R2.fastq.gz,1,1,1

Here’s a formatted version of the CSV above, including all optional fields

Study	Sample	Library	Attempt	R1	R2	Flowcell	Lane	Index Sequence
Study_Name	S0001	1	1	full_path_to_directory_of_fastq_files/S0001_01_R1.fastq.gz	full_path_to_directory_of_fastq_files/S0001_01_R1.fastq.gz	1	1	1
Study_Name	S0002	1	1	full_path_to_directory_of_fastq_files/S0002_01_R1.fastq.gz	full_path_to_directory_of_fastq_files/S0002_01_R2.fastq.gz	1	1	1
Study_Name	S0003	1	1	full_path_to_directory_of_fastq_files/S0003_01_R1.fastq.gz	full_path_to_directory_of_fastq_files/S0003_01_R2.fastq.gz	1	1	1
Study_Name	S0004	1	1	full_path_to_directory_of_fastq_files/S0004_01_R1.fastq.gz	full_path_to_directory_of_fastq_files/S0004_01_R2.fastq.gz	1	1	1

Customization

Note We are currently working on the transition to nf-core standard (see https://github.com/TORCH-Consortium/MAGMA/issues/188), which would add standardized configurations and pipeline structure to benefit from the nf-core nf-core/modules and nf-core/configs projects.

The pipeline parameters are distinct from Nextflow parameters, and therefore it is recommended that they are provided using a yml (or yaml) file as shown below


# Sample contents of my_parameters_1.yml file

input_samplesheet: /path/to/your_samplesheet.csv
only_validate_fastqs: true

When running the pipeline, use profiles to ensure smooth execution on your computing system. The two types of profiles employed by the pipeline are: execution environment + memory/computing requirements

Execution environment profiles:

conda_local
docker
podman

Memory/computing profiles:

pbs (good for high performance computing clusters)
server (good for local servers)
low_memory (this can be run on a laptop, even limited to 8 cores and 8 GB of RAM)

Advanced Users The MAGMA pipeline has default parameters related to minimum QC thresholds that must be reached for samples to be included in the cohort analysis. These default parameters are listed in default_params.config. Users wishing to adjust these parameters should specify these adjustments in the params.yml file supplied when launching the pipeline. An example of adjusted parameters is shown below: Note The -profile mechanism is used to enable infrastructure specific settings of the pipeline. The example below, assumes you are using conda based setup.

Which could be provided to the pipeline using -params-file parameter as shown below

nextflow run 'https://github.com/TORCH-Consortium/MAGMA' \
         -profile docker,server \
         -r v2.2.0 \
         -params-file  my_parameters_1.yml

Analysis

Running MAGMA using Nextflow Tower

You can also use Seqera Platform (aka Nextflow Tower) to run the pipeline on any of the supported cloud platforms and monitoring the pipeline execution.

Please refer the Tower docs for further information.

Running MAGMA using conda

:warning::warning::warning: We discourage running MAGMA via conda, it is prone to challenging-to-reproduce errors

You can run the pipeline using Conda, Mamba or Micromamba package managers to install all the prerequisite softwares from popular repositories such as bioconda and conda-forge.

:information_source: Conda environments and cheatsheet:
You can find out the location of conda environments using conda env list. Here’s a useful cheatsheet for conda operations.

You can use the conda based setup for the pipeline for running MAGMA - On a local linux machine(e.g. your laptop or a university server) - On an HPC cluster (e.g. SLURM, PBS) in case you don’t have access to container systems like Singularity, Podman or Docker

All the requisite softwares have been provided as a conda recipe (i.e. yml files) - magma-env-1.yml - magma-env-2.yml

These files can be downloaded using the following commands

wget https://raw.githubusercontent.com/TORCH-Consortium/MAGMA/master/conda_envs/magma-env-2.yml
wget https://raw.githubusercontent.com/TORCH-Consortium/MAGMA/master/conda_envs/magma-env-1.yml

The conda environments are expected by the conda_local profile of the pipeline, it is recommended that it should be created prior to the use of the pipeline, using the following commands. Note that if you have mamba (or micromamba) available you can rely upon that instead of conda.

$ conda env create -n magma-env-1 --file magma-env-1.yml
$ conda env create -n magma-env-2 --file magma-env-2.yml
$ conda env create -n magma-tbprofiler-env --file magma-tbprofiler-env.yaml
$ conda env create -n magma-ntmprofiler-env --file magma-ntmprofiler-env.yaml

Optionally, you can run bash ./conda/setup_conda_envs.sh to build all the necessary conda environments.

Once the environments are created, you can make use of the pipeline parameter conda_envs_location to inform the pipeline of the names and location of the conda envs.

Next, you need to load the WHO Resistance Catalog within tb-profiler; basically the instructions, which are used to build the necessary containers.

Download magma_resistance_db_who_v1.zip and unzip it

wget https://github.com/TORCH-Consortium/MAGMA/files/14559680/resistance_db_who_v1.zip

unzip resistance_db_who

Activate magma-env-1, which has tb-profiler

conda activate magma-env-1

Move inside that folder and use tb-profiler load_library functionality to load the database


cd resistance_db_who

tb-profiler load_library ./resistance_db_who

Success, would look like this

Running MAGMA using docker

:heavy_check_mark::heavy_check_mark::heavy_check_mark:This is the recommended execution strategy

We provide two docker containers with the pipeline so that you could just download and run the pipeline with them. There is NO need to create any docker containers, just download and enable the docker profile.

🚧 Container build script: The script used to build these containers is provided here.

Although, you don’t need to pull the containers manually, but should you need to, you could use the following commands to pull the pre-built and provided containers

docker pull ghcr.io/torch-consortium/magma/magma-container-1:1.1.1

docker pull ghcr.io/torch-consortium/magma/magma-container-2:1.1.1

:memo: Have singularity or podman instead?:
If you do have access to Singularity or Podman, then owing to their compatibility with Docker, you can still use the provided docker containers.

Here’s the command which should be used

nextflow run 'https://github.com/torch-consortium/magma' \
         -params-file my_parameters_2.yml \
         -profile docker,pbs \
         -r v1.1.1

:bulb: Hint:
You could use -r option of Nextflow for working with any specific version/branch of the pipeline.

Running MAGMA on HPC and cloud executors

HPC based execution for MAGMA, please refer this doc.
Cloud batch (AWS/Google/Azure) based execution for MAGMA, please refer this doc

MAGMA samplesheets

In order to run the MAGMA pipeline, you must provide a samplesheet as input. The structure of the samplesheet should be that located in samplesheet

:warning: Make sure to use full paths!!!:

Library

Certain samples may have had multiple libraries prepared.
This row allows the pipeline to distinguish between
different libraries of the same sample.

Attempt

Certain libraries may need to be sequenced multiple times.
This row allows the pipeline to distinguish between
different attempts of the same library.

Flowcell/Lane/Index Sequence

Providing this information may allow the VQSR filtering step
to better distinguish between true variants and sequencing
errors. Including these is optional, if unknown or irrelevant,
just fill in with a '1' as shown in example_MAGMA_samplesheet.csv)

(Optional) GVCF datasets

We also provide some reference GVCF files which you could use for specific use-cases.

For small datasets (20 samples or less), we recommend that you download the EXIT_RIF GVCF files from https://zenodo.org/record/8054182 containing GVCF reference dataset for ~600 samples is provided for augmenting smaller datasets
For including Mtb lineages and outgroup (M. canettii) in the phylogenetic tree, you can download the LineagesAndOutgroup files from https://zenodo.org/record/8233518

use_ref_gvcf = false
ref_gvcf =  "/path/to/FILE.g.vcf.gz"
ref_gvcf_tbi =  "/path/to/FILE.g.vcf.gz.tbi"

:bulb: Custom GVCF dataset:
For creating a custom GVCF dataset, you can refer the discussion here.

Citations

The MAGMA paper has been published here: https://doi.org/10.1371/journal.pcbi.1011648

The XBS variant calling core was published here: https://doi.org/10.1099%2Fmgen.0.000689

Contributions and interactions with us

Contributions are warmly accepted! We encourage you to interact with us using Discussions and Issues feature of Github.

License

Please refer the GPL 3.0 LICENSE file.

Here’s a quick TLDR for the license terms.