Gossypium turneri (D10) genome NSF_v1_a2

Overview
Analysis NameGossypium turneri (D10) genome NSF_v1_a2
MethodPacBio Sequel (Canu v. 1.6)
Sourcegturneri_large_chr_c.fasta
Date performed2019-07-10

Material and Methods

DNA was extracted from Gossypium turneri (accession D10-3) using CTAB techniques (Kidwell and Osborn 1992). DNA concentration was measured by a Qubit Fluorometer (ThermoFisher, Inc.). The sequencing library was constructed according to PacBio recommendations at the BYU DNA Sequencing Center (DNASC). Fragments >18 kb were selected for sequencing via BluePippen (Sage Science, LLC). Prior to sequencing, the size distribution of fragments in the libraries was evaluated using a Fragment Analyzer (Advanced Analytical Technologies, Inc). Eight and eleven PacBio cells were sequenced from a single library for G. turneri, respectively, on the Pacific Biosciences Sequel system. For both genomes, the raw PacBio sequencing reads were assembled using Canu V1.6 using default parameters (Koren et al. 2017).

Leaf tissue of G. turneri was shipped to DoveTail Genomics for DNA extraction and construction of Hi-C sequencing libraries. These HiC sequencing libraries were sequenced on the Illumina HiSeq 2500 (PE125 bp) at the BYU DNASC. Reads were mapped to the G. raimondii (Paterson et al. 2012) reference genome, and a scaffolded assembly was created for G. turneri by Dovetail Genomics. Whole genome alignments identified in-silico assembly errors where a contiguous 25.7 MB of Chromosome 9 (D10_09) was initially placed on D10_12, and the remainder of that chromosome was in smaller scaffolded pieces. Similar to the process above, manual iterations of scaffolding correctly assembled D10_09 and D10_11 using Juicebox (Durand 2016). The final genome sequence of G. turneri was constructed using a custom python script developed by PhaseGenomics, LLC and consists of 13 assembled chromosomes.

About the assembly

This is the first de novo genome sequence for G. turneri. The G. turneri genome was assembled from 73.2x PacBio of raw sequence reads. The assembly consisted of 220 contigs with an N50 of 7.9MB (Table 1). Similar to the G. raimondii sequence, these contigs were scaffolded by Dovetail Genomics and the pseudomolecules were manually adjusted using JuiceBox. Bionano data was not collected for G. turneri. The G. raimondii Bionano data was uninformative when aligned to the G. turneri genome sequence (because the distances between labeled recognition sites were too different). After creation of the sequence assembly, the G. raimondii HiC sequence reads were also mapped to the G. turneri genome sequence (and vice versa). While the number of mapped reads was reduced significantly (29.90% and 12.67%, respectively), there were no association anomalies detected between genomes.

 

Table 1. Assembly metrics of the G. turneri genome, the G. raimondii (our current assembly, D5), and the previous G. raimondii assembly (Paterson et al. 2012).

Summary of Improved Assembly G. turneri (D10) G. raimondii (D5-NSF) G. raimondii (D5-JGI 2012)
Contigs 220 187 16,924
Max Contig 23,475,487 24,216,129 1,162,971
Mean Contig 3,432,648 3,929,767 43,597
Contig N50 7,909,293 6,291,832 136,998
Contig N90 1,624,019 2,044,991 32,166
Total Contig Length 755,182,540 734,866,495 737,837,083
Assembly GC 33.21 33.19 33.19
Max Scaffolds 67,704,245 65,701,939 70,713,020
Mean Scaffold 58,092,557 56,529,546 57,632,930
Scaffold N50 60,464,062 58,819,159 62,175,169
Scaffold N90 50,570,303 46,322,098 45,765,648
Total Scaffold Length 755,203,240 734,884,094 749,228,090
Captured Gaps 207 174 16,911
Max Gap 100 200 63,138
Mean Gap 100 101 674
Gap N50 100 100 2,607
Total Gap Length 20,700 17,599 11,391,007

Publication

Udall, et al., 2019. De novo genome sequence assemblies of Gossypium raimondii and G. turneri.

Assembly

The Gossypium turneri NSF(BYU) D10-3 Genome v1.0 assembly files are available in FASTA and GFF3 formats.

Chromosome (FASTA file) Gossypium_turneri_NSF-v1.0.fasta.gz
Downloads

All annotation files are available for download by selecting the desired data type in the left-hand side bar.  Each data type page will provide a description of the available files and links do download.

Functional Analysis

Functional annotation files for the Gossypium turneri D10-3 Genome v1.0 are available for download below. The Gossypium turneri NSF D10-3 Genome proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan D10_NSF-v1.0_genes2GO.xlsx.gz
IPR assignments from InterProScan D10_NSF-v1.0_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs D10_NSF-v1.0_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways D10_NSF-v1.0_KEGG-pathways.xlsx.gz

 

Gene Predictions

 

The Gossypium turneri NSF(BYU) D10-3 genome v1.0.a2 gene prediction files are available in FASTA and GFF3 formats.

Transcript sequences (FASTA file) Gossypium_turneri_NSF-v1.0.transcripts.fasta.gz
CDS sequences (FASTA file) Gossypium_turneri_NSF-v1.0.CDs.fasta.gz
Gene sequences  (FASTA file) Gossypium_turneri_NSF-v1.0.genes.fasta.gz
Protein sequences  (FASTA file) Gossypium_turneri_NSF-v1.0.proteins.fasta.gz
Genes (GFF3 file) Gossypium_turneri_NSF-v1.0.genes.gff3.gz

 

Homology

Homology of the Gossypium turneri NSF Genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2018-05) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2019-01), and UniProtKB/TrEMBL (Release 2019-01) databases. The best hit reports are available for download in Excel format. 

 

Protein Homologs

Gossypium turneri NSF v1.0 proteins with NCBI nr homologs (EXCEL file) D10_NSF-v1.0_vs_nr.xlsx.gz
Gossypium turneri NSF v1.0 proteins with NCBI nr (FASTA file) D10_NSF-v1.0_vs_nr_hit.fasta.gz
Gossypium turneri NSF v1.0 proteins without NCBI nr (FASTA file) D10_NSF-v1.0_vs_nr_noHit.fasta.gz
Gossypium turneri NSF v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) D10_NSF-v1.0_vs_tair.xlsx.gz
Gossypium turneri NSF v1.0 proteins with arabidopsis (Araport11) (FASTA file) D10_NSF-v1.0_vs_tair_hit.fasta.gz
Gossypium turneri NSF v1.0 proteins without arabidopsis (Araport11) (FASTA file) D10_NSF-v1.0_vs_tair_noHit.fasta.gz
Gossypium turneri NSF v1.0 proteins with SwissProt homologs (EXCEL file) D10_NSF-v1.0_vs_swissprot.xlsx.gz
Gossypium turneri NSF v1.0 proteins with SwissProt (FASTA file) D10_NSF-v1.0_vs_swissprot_hit.fasta.gz
Gossypium turneri NSF v1.0 proteins without SwissProt (FASTA file) D10_NSF-v1.0_vs_swissprot_noHit.fasta.gz
Gossypium turneri NSF v1.0 proteins with TrEMBL homologs (EXCEL file) D10_NSF-v1.0_vs_trembl.xlsx.gz
Gossypium turneri NSF v1.0 proteins with TrEMBL (FASTA file) D10_NSF-v1.0_vs_trembl_hit.fasta.gz
Gossypium turneri NSF v1.0 proteins without TrEMBL (FASTA file) D10_NSF-v1.0_vs_trembl_noHit.fasta.gz

 

Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium turneri genome assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
 
CottonGen SNP markers mapped to genome G.turneri_D10_NSF-v1.0_SNP
CottonGen InDel markers mapped to genome G.turneri_D10_NSF-v1.0_InDel
CottonGen RFLP markers mapped to genome G.turneri_D10_NSF-v1.0_RFLP
CottonGen SSR markers mapped to genome G.turneri_D10_NSF-v1.0_SSR

 

Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. turneri genome assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.

 

G. arboreum CottonGen RefTrans v1 G.turneri_D10_NSF-v1.0_g.arboreum_cottongen_reftransV1
G. barbadense CottonGen RefTrans v1 G.turneri_D10_NSF-v1.0_g.barbadense_cottongen_reftransV1
G. hirsutum CottonGen RefTrans v1 G.turneri_D10_NSF-v1.0_g.hirsutum_cottongen_reftransV1
G. raimondii CottonGen RefTrans v1 G.turneri_D10_NSF-v1.0_g.raimondii_cottongen_reftransV1