Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads

GND
126922008X
Zugehörige Organisation
Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany
Welzel, Marius;
GND
1293626988
LSF
55857
Zugehörige Organisation
Department of Bioinformatics and Computational Biophysics, University of Duisburg-Essen, Essen, Germany
Lange, Anja;
GND
136776493
ORCID
0000-0002-3108-8311
Zugehörige Organisation
Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany
Heider, Dominik;
Zugehörige Organisation
Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany
Schwarz, Michael;
GND
171521188
ORCID
0000-0002-7205-8389
Zugehörige Organisation
Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany
Freisleben, Bernd;
LSF
5694
Zugehörige Organisation
Department of Biodiversity, University of Duisburg-Essen, Essen, Germany
Jensen, Manfred;
GND
122715500
LSF
52338
Zugehörige Organisation
Department of Biodiversity, University of Duisburg-Essen, Essen, Germany
Boenigk, Jens;
GND
1021794848
ORCID
0000-0002-0679-6631
LSF
57561
Zugehörige Organisation
Department of Biodiversity, University of Duisburg-Essen, Essen, Germany
Beisser, Daniela

Background: Sequencing of marker genes amplified from environmental samples, known as amplicon sequencing, allows us to resolve some of the hidden diversity and elucidate evolutionary relationships and ecological processes among complex microbial communities. The analysis of large numbers of samples at high sequencing depths generated by high throughput sequencing technologies requires efficient, flexible, and reproducible bioinformatics pipelines. Only a few existing workflows can be run in a user-friendly, scalable, and reproducible manner on different computing devices using an efficient workflow management system.

Results: We present Natrix, an open-source bioinformatics workflow for preprocessing raw amplicon sequencing data. The workflow contains all analysis steps from quality assessment, read assembly, dereplication, chimera detection, split-sample merging, sequence representative assignment (OTUs or ASVs) to the taxonomic assignment of sequence representatives. The workflow is written using Snakemake, a workflow management engine for developing data analysis workflows. In addition, Conda is used for version control. Thus, Snakemake ensures reproducibility and Conda offers version control of the utilized programs. The encapsulation of rules and their dependencies support hassle-free sharing of rules between workflows and easy adaptation and extension of existing workflows. Natrix is freely available on GitHub ( https://github.com/MW55/Natrix ) or as a Docker container on DockerHub ( https://hub.docker.com/r/mw55/natrix ).

Conclusion: Natrix is a user-friendly and highly extensible workflow for processing Illumina amplicon data.

Zitieren

Zitierform:
Zitierform konnte nicht geladen werden.

Rechte

Rechteinhaber:

© The Author(s) 2020

Nutzung und Vervielfältigung:
Dieses Werk kann unter einer
CC BY 4.0 LogoCreative Commons Namensnennung 4.0 Lizenz (CC BY 4.0)
genutzt werden.