Gossypium turneri (D10) genome NSF_v1_a2
Material and Methods
DNA was extracted from Gossypium turneri (accession D10-3) using CTAB techniques (Kidwell and Osborn 1992). DNA concentration was measured by a Qubit Fluorometer (ThermoFisher, Inc.). The sequencing library was constructed according to PacBio recommendations at the BYU DNA Sequencing Center (DNASC). Fragments >18 kb were selected for sequencing via BluePippen (Sage Science, LLC). Prior to sequencing, the size distribution of fragments in the libraries was evaluated using a Fragment Analyzer (Advanced Analytical Technologies, Inc). Eight and eleven PacBio cells were sequenced from a single library for G. turneri, respectively, on the Pacific Biosciences Sequel system. For both genomes, the raw PacBio sequencing reads were assembled using Canu V1.6 using default parameters (Koren et al. 2017).
Leaf tissue of G. turneri was shipped to DoveTail Genomics for DNA extraction and construction of Hi-C sequencing libraries. These HiC sequencing libraries were sequenced on the Illumina HiSeq 2500 (PE125 bp) at the BYU DNASC. Reads were mapped to the G. raimondii (Paterson et al. 2012) reference genome, and a scaffolded assembly was created for G. turneri by Dovetail Genomics. Whole genome alignments identified in-silico assembly errors where a contiguous 25.7 MB of Chromosome 9 (D10_09) was initially placed on D10_12, and the remainder of that chromosome was in smaller scaffolded pieces. Similar to the process above, manual iterations of scaffolding correctly assembled D10_09 and D10_11 using Juicebox (Durand 2016). The final genome sequence of G. turneri was constructed using a custom python script developed by PhaseGenomics, LLC and consists of 13 assembled chromosomes.
About the assembly
This is the first de novo genome sequence for G. turneri. The G. turneri genome was assembled from 73.2x PacBio of raw sequence reads. The assembly consisted of 220 contigs with an N50 of 7.9MB (Table 1). Similar to the G. raimondii sequence, these contigs were scaffolded by Dovetail Genomics and the pseudomolecules were manually adjusted using JuiceBox. Bionano data was not collected for G. turneri. The G. raimondii Bionano data was uninformative when aligned to the G. turneri genome sequence (because the distances between labeled recognition sites were too different). After creation of the sequence assembly, the G. raimondii HiC sequence reads were also mapped to the G. turneri genome sequence (and vice versa). While the number of mapped reads was reduced significantly (29.90% and 12.67%, respectively), there were no association anomalies detected between genomes.
Table 1. Assembly metrics of the G. turneri genome, the G. raimondii (our current assembly, D5), and the previous G. raimondii assembly (Paterson et al. 2012).
Udall, et al., 2019. De novo genome sequence assemblies of Gossypium raimondii and G. turneri.
The Gossypium turneri NSF(BYU) D10-3 Genome v1.0 assembly files are available in FASTA and GFF3 formats.
All annotation files are available for download by selecting the desired data type in the left-hand side bar. Each data type page will provide a description of the available files and links do download.
Functional annotation files for the Gossypium turneri D10-3 Genome v1.0 are available for download below. The Gossypium turneri NSF D10-3 Genome proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).
The Gossypium turneri NSF(BYU) D10-3 genome v1.0.a2 gene prediction files are available in FASTA and GFF3 formats.
Homology of the Gossypium turneri NSF Genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2018-05) and 1e-6 for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2019-01), and UniProtKB/TrEMBL (Release 2019-01) databases. The best hit reports are available for download in Excel format.
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium turneri genome assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. turneri genome assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.