MAGMA Customizable Parameters

This document provides an overview of the customizable parameters for the MAGMA pipeline. Each parameter is listed with its default value, description.

💡 Hint: you may check a full parameters reference file.


Common Parameters

Input Samplesheet

Parameter Default Value Description
input_samplesheet "samplesheet.magma.csv" The input CSV file containing sample information. The study ID cannot start with XBS_REF_.

💡 Hint: The samplesheet should include the fields [Sample, R1, R2]. Optionally, you can add [study, library, attempt, flowcell, lane, index_sequence].


Output Directory

Parameter Default Value Description
outdir "magma-results" The directory where all output files will be written.
vcf_name "joint" The name of the output folder for results. Used to derive JOINT_NAME.

💡 Note: The vcf_name parameter is critical for naming conventions in downstream processes.


Additional Samples Addition

Parameter Default Value Description
use_ref_gvcf true Whether to use a reference GVCF file to include additional samples.
ref_gvcf "${projectDir}/resources/ref_gvcfs/LineagesAndOutgroupV2.g.vcf.gz" Path to the reference GVCF file.
ref_gvcf_tbi "${projectDir}/resources/ref_gvcfs/LineagesAndOutgroupV2.g.vcf.gz.tbi" Path to the index file for the reference GVCF.

💡 Hint: Use this feature if your dataset has low genetic diversity (e.g., clonal or fewer than 20 samples).


Quality Control Parameters

Parameter Default Value Description
cutoff_median_coverage 10 The minimal median coverage required to process the sample.
cutoff_breadth_of_coverage 0.90 The minimal breadth of coverage required to process the sample.
cutoff_rel_abundance 0.70 The minimal relative abundance of the majority strain required to process the sample.
cutoff_ntm_fraction 0.20 The maximum fraction of NTM DNA allowed to process the sample.

⚠️ Attention: Ensure these values are adjusted based on the quality of your input data to avoid processing errors.


Skipping Pipeline Steps

Parameter Default Value Description
only_validate_fastqs false Set to true to only validate input FASTQs and check their FASTQC reports.
skip_merge_analysis false Skip the final merge analysis step.
skip_variant_recalibration false Skip variant quality score recalibration (VQSR).
skip_base_recalibration true Skip base quality score recalibration (BQSR). Not suitable for low-coverage Mtb genomes.
skip_minor_variants_gatk true Skip minor variants detection with GATK. LoFreq is recommended for most purposes.
skip_phylogeny_and_clustering false Disable downstream phylogenetic analysis of merged GVCF.
skip_complex_regions false Disable downstream complex region analysis of merged GVCF.
skip_ntmprofiler false Disable execution of ntmprofiler on FASTQ files.
skip_tbprofiler_fastq true Disable tbprofiler analysis on FASTQ files.
skip_spotyping false Disable spoligotyping analysis.

💡 Hint: Use these flags to customize the pipeline execution based on your specific requirements.


Reference Files

Parameter Default Value Description
ref_fasta_basename "NC-000962-3-H37Rv" Basename of the reference FASTA file.
ref_fasta_dir "${projectDir}/resources/genome" Directory containing the reference FASTA file.
ref_fasta "${params.ref_fasta_dir}/${params.ref_fasta_basename}.fa" Full path to the reference FASTA file.
ref_fasta_dict "${params.ref_fasta_dir}/${params.ref_fasta_basename}.dict" Path to the reference FASTA dictionary file.
ref_fasta_gb "${params.ref_fasta_dir}/${params.ref_fasta_basename}.gb" Path to the reference GenBank file.

⚠️ Warning: It is recommended to use the provided reference files to ensure compatibility with the pipeline.