ResearchDirect - Collection of datasets containing the TaxaSE bacterial taxonomic annotation pipeline, SILVA insilico datasets and Illumina sequencing data from sugarcane bacterial (16S) including subhabitats from soil, rhizosphere, stem and root

Collection of datasets containing the TaxaSE bacterial taxonomic annotation pipeline, SILVA insilico datasets and Illumina sequencing data from sugarcane bacterial (16S) including subhabitats from soil, rhizosphere, stem and root

Description

This dataset contains the TaxaSE bacterial taxonomic annotation pipeline (including its source code and associated data files). Insilico data generated from SILVA Release 123 database is also provided here, consisting of both whole SILVA and Removal of Taxa based validation approaches, which were used to compare Shannon entropy based sequence similarity approach to Percentage Identity (via USEARCH v7.0.1090 32bit, see Edgar 2010). Lastly, the raw FASTQ files as well as processed FASTA files from Sugarcane (Saccharum Spp.) are included, consisting of samples from soil, rhizosphere, root and stem sub-habitats, alongside results generated in QIIME 1.9.1 (Caporaso et.al 2010).

The quality of all Illumina R1 and R2 reads were assessed visually using FASTQC (Andrews 2016), merged using FLASH (Magoč & Salzberg 2011) and converted to FASTA format using QIIME’s “convert_fastaqual_fastq.py” script. Alpha diversity and beta diversity analysis were performed in QIIME, with TaxaSE results converted to QIIME compatible format for comparison. Insilico data was generated using MicroSim simulator from SILVA 123 Release database. Sugarcane leaf, stalk, root and rhizosphere soil samples were collected by Dr. Kelly Hamonts at Hawkesbury Institute for the Environment, Western Sydney University, Australia, in November 2014 from eight sugarcane fields growing three sugarcane varieties (KQ228, MQ239 and Q240) near Ingham, Queensland, Australia.

In each field, 3 stools were randomly selected and samples were collected from 2 plants per stool. Samples were snap-frozen in liquid nitrogen on the field, transported to the laboratory on dry ice and stored at -80C. Frozen sugarcane tissue samples were ground using mortar and pestle and DNA was extracted from the resulting powder using the MoBio PowerPlant DNA extraction kit, following the manufacturer’s instructions. The MoBIO PowerSoil DNA extraction kit was used to extract DNA from the soil samples. Bacterial 16S rRNA amplicon sequencing was performed by the NGS facility at Western Sydney University using Illumina Miseq (2x 301 bp PE) and the 341F/805R primer set.

- Start

Data publication title Collection of datasets containing the TaxaSE bacterial taxonomic annotation pipeline, SILVA insilico datasets and Illumina sequencing data from sugarcane bacterial (16S) including subhabitats from soil, rhizosphere, stem and root

Description

Data type dataset

Keywords

NGS
Illumina
Taxonomy
Annotation
Pipeline
Community analysis
SILVA
Saccharum Spp

Funding source

Western Sydney University and CRC-CARE

Grant number(s)

FoR codes

SEO codes

- Coverage

Temporal (time) coverage

Start date 2013/02/01

End date 2017/02/28

Time period

Spatial (location,mapping) coverage

Locations

- Data

Data Locations

Type	Location	Notes

The Data Manager is: Ali Ijaz

Access conditions Open

- Supplements

Related publications

Name Ijaz, AZ, Jeffries, T, Quince, C, Hamonts K & Singh, B 2017, ‘TaxaSE: Exploiting evolutionary conservation within 16S rDNA sequences for enhanced taxonomic annotation’, Peer J Preprints. DOI: 10.7287/peerj.preprints.2941v1

URL https://doi.org/10.7287/peerj.preprints.2941v1

Notes Pre print

Name Taxonomic and Environmental Annotation of Bacterial 16S rDNA sequences via Shannon Entropy and Database Metadata Terms

URL

Notes PhD thesis; add when deposited in repository

Name Edgar, RC 2010, 'Search and clustering orders of magnitude faster than BLAST', Bioinformatics,vol. 26, no. 19, pp. 2460-2461.

URL

Notes As mentioned in Description

Name Caporaso, JG, Kuczynski, J, Stombaugh, J, Bittinger, K, Bushman, FD, Costello, EK et al. 2010, 'QIIME allows analysis of high-throughput community sequencing data', Nature Methods, vol. 7, no. 5, pp. 335-336.

URL

Notes As mentioned in Description

Name Magoč, T. & Salzberg, S 2011, ‘FLASH: Fast length adjustment of short reads to improve genome assemblies’, Bioinformatics, vol. 27, no. 21, pp. 2957-63.

URL

Notes As mentioned in Description

Related website

Name Download HIE data

URL

Notes * add data url when the paper is published

Name HIE | Hawkesbury Institute for the Environment

URL

Notes

Name A quality control tool for high throughput sequence data (Andrews S. 2016)

URL

Notes

Related metadata (including standards, codebooks, vocabularies, thesauri, ontologies)

Related data

Name

URL

Notes

Related services

Name

URL

Notes

- License

The data will be licensed under CC BY 4.0: Attribution 4.0 International

Other license

Statement of rights in data Copyright Western Sydney University

- Citation

Citation Ijaz, Ali; Hamonts, Kelly; Jeffries, Thomas (2017): Collection of datasets containing the TaxaSE bacterial taxonomic annotation pipeline, SILVA insilico datasets and Illumina sequencing data from sugarcane bacterial (16S) including subhabitats from soil, rhizosphere, stem and root. Western Sydney University.