Gossypium hirsutum (AD1) 'TM-1' genome NAU-NBI_v1.1

Overview
Analysis NameGossypium hirsutum (AD1) 'TM-1' genome NAU-NBI_v1.1
MethodSOAPdenovo (12)
SourceIllumina HiSeq 2000 reads from various insert size libraries (NAU-NBI)
Date performed2015-04-20

About the assembly
An allohaploid plant was derived from the allotetraploid cotton (TM-1) and used for genome sequencing. 612 Gb (245× genome equivalent) of high-quality Illumina reads were produced and assembled using SOAPdenovo12. The resulting contigs and scaffold were integrated using 174,454 pairs of Sanger-sequenced BAC-end sequences comprising 116.5 Mb, and assembled into the TM-1 genome sequence (V1.0). To correct for misassembly, classify the homoeologous segments and order the scaffolds, an ultradense genetic map was developed using genotyping by sequencing of 59 F2 individuals derived from TM-1 and G. barbadense cv. Hai7124. The map consisted of 4,999,048 single-nucleotide polymorphism (SNP) loci and 4,049 recombination bins spanning 4,042 cM in 26 linkage groups. Using the map, 218 misassembled scaffolds were corrected (442.2 Mb, or 17.6%, of the genome sequence) in the assembly V1.0 and most misassembled scaffolds were caused by ambiguous homeolog sequences. The final assembly (V1.1) comprised 265,279 contigs (N50 = 34.0 kb) and 40,407 scaffolds (N50 = 1.6 Mb). The total scaffold length (2.4 Gb) spanned ~96% of the estimated allotetraploid genome (2.5 Gb), of which 6,146 scaffolds (2.3 Gb) were aligned and organized into 26 pseudochromosomes, including 1.5 Gb (4,635 scaffolds) in the A subgenome and 0.8 Gb (1,511 scaffolds) in the D subgenome. Furthermore, 1.9 Gb (79.2%) was oriented based on linkage maps.

 

 

Summary A subgenome D subgenome UN* Total
Scaffold number 4,635 1,511 34,261 40,407
Scaffold length 1,477.1 Mb 831.0 Mb 124.6 Mb 2,432.7 Mb
Scaffold N50 1.4 Mb 2.5 Mb 7,160 bp 1.6 Mb
Oriented scaffold number 955 501 NA 1,456
Oriented scaffold size 1,150.8 Mb 769.5 Mb NA 1,920.4 Mb
Contig number 142,201 44,057 79,021 265,279
Contig length 1,220.6 Mb 746.8 Mb 100.7 Mb 2,068.1 Mb
Contig N50 30.7 kb 47.2 kb 2,542 bp 34.0 kb
Gene number 32,032 34,402 4,044 70,478
Total gene length 107.2 Mb 109.8 Mb 3.1 Mb 220 Mb
Transposable elements 843.5 Mb 433 Mb 62.5 Mb 1,339 Mb
GC content 34.4% 33.3% 35.5% 34.1%

*Un-anchored scaffolds

Publication
Zhang et. al., Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvementNature Biotechnology. 33, 531–537. 2015

 

 

Assembly

The chromosomes (pseudomolecules) and scaffolds for Gossypium hirsutum (AD1) Genome NAU-NBI Assembly v1.1

Assembly pseudomolecules (FASTA format) NBI_Gossypium_hirsutum_v1.1.fa.gz

 

Downloads

All assembly and annotation files are available for download by selecting the desired data type in the left-hand side bar.  Each data type page will provide a description of the available files and links to download.

Functional Annotation
Functional annotation for Gossypium hirsutum (AD1) Genome NBI Assembly v1.1 (Performed by NBI)  
Arabidopsis/Swissprot/TrEMBL/KEGG/Pfam/Interpro/GO Annotation NBI_TM1-annotation.xlsx

 

Functional annotation for Gossypium hirsutum (AD1) Genome NBI Assembly v1.1 (Performed by the CottonGen Team of the Main Bioinformatics Lab at WSU.)

 

Interpro Analysis G.hirsutum_NBI_v1.1_interpro.txt.gz
GO annotation G.hirsutum_NBI_v1.1_genes2GO.txt.gz
IPR Terms G.hirsutum_NBI_v1.1_genes2IPR.txt.gz
KEGG Orthologs G.hirsutum_NBI_v1.1_KEGG.orthologs.txt.gz
KEGG Pathways G.hirsutum_NBI_v1.1_KEGG.pathways.txt.gz
Genes

The predicted genes and proteins for Gossypium hirsutum (AD1) Genome NAU-NBI Assembly v1.1
 

Predicted genes coding sequences (CDS)  (FASTA format) NBI_Gossypium_hirsutum_v1.1.cds.fa.gz
Predicted genes proteins (FASTA format) NBI_Gossypium_hirsutum_v1.1.pep.fa.gz
Predicted genes and CDS (GFF3 format) NBI_Gossypium_hirsutum_v1.1.gene.gff3.gz

 

Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the G. hirsutum genome assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen and CMap are linked to JBrowse.
 
CottonGen SNP markers mapped to genome G.hirsutum_NAU-NBI_v1.1_SNP.gff3.gz
CottonGen RFLP markers mapped to genome G.hirsutum_NAU-NBI_v1.1_RFLP.gff3.gz
CottonGen SSR markers mapped to genome G.hirsutum_NAU-NBI_v1.1_SSR.gff3.gz
CottonGen InDel markers mapped to genome G.hirsutum_NAU-NBI_v1.1_Indels.gff3.gz

 

Protein Homology
Homology of the Gossypium hirsutum NAU-NBI v1.1 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2018-05) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2019-01), and UniProtKB/TrEMBL (Release 2019-01) databases. The best hit reports are available for download in Excel format. 
G.hirsutum NAU-NBI_v1.1 transcripts with NCBI nr homologs (EXCEL file) G.hirsutum_NAU-NBI_v1.1_vs_nr.xlsx.gz
G.hirsutum NAU-NBI_v1.1 transcripts with NCBI nr (FASTA file) G.hirsutum_NAU-NBI_v1.1_vs_nr_hit.fasta.gz
G.hirsutum NAU-NBI_v1.1 transcripts without NCBI nr (FASTA file) G.hirsutum_NAU-NBI_v1.1_vs_nr_noHit.fasta.gz
G.hirsutum NAU-NBI_v1.1 transcripts with arabidopsis (Araport11) homologs (EXCEL file) G.hirsutum_NAU-NBI_v1.1_vs_tair.xlsx.gz
G.hirsutum NAU-NBI_v1.1 transcripts with arabidopsis (Araport11) (FASTA file) G.hirsutum_NAU-NBI_v1.1_vs_tair_hit.fasta.gz
G.hirsutum NAU-NBI_v1.1 transcripts without arabidopsis (Araport11) (FASTA file) G.hirsutum_NAU-NBI_v1.1_vs_tair_noHit.fasta.gz
G.hirsutum NAU-NBI_v1.1 transcripts with SwissProt homologs (EXCEL file) G.hirsutum_NAU-NBI_v1.1_vs_swissprot.xlsx.gz
G.hirsutum NAU-NBI_v1.1 transcripts with SwissProt (FASTA file) G.hirsutum_NAU-NBI_v1.1_vs_swissprot_hit.fasta.gz
G.hirsutum NAU-NBI_v1.1 transcripts without SwissProt (FASTA file) G.hirsutum_NAU-NBI_v1.1_vs_swissprot_noHit.fasta.gz
G.hirsutum NAU-NBI_v1.1 transcripts with TrEMBL homologs (EXCEL file) G.hirsutum_NAU-NBI_v1.1_vs_trembl.xlsx.gz
G.hirsutum NAU-NBI_v1.1 transcripts with TrEMBL (FASTA file) G.hirsutum_NAU-NBI_v1.1_vs_trembl_hit.fasta.gz
G.hirsutum NAU-NBI_v1.1 transcripts without TrEMBL (FASTA file) G.hirsutum_NAU-NBI_v1.1_vs_trembl_noHit.fasta.gz
Publication
Authors Tianzhen Zhang, Yan Hu, Wenkai Jiang, Lei Fang, Xueying Guan, Jiedan Chen, Jinbo Zhang, Christopher A Saski, Brian E Scheffler, David M Stelly, Amanda M Hulse-Kemp, Qun Wan, Bingliang Liu, Chunxiao Liu, Sen Wang, Mengqiao Pan, Yangkun Wang, Dawei Wang, Wenxue Ye, Lijing Chang, Wenpan Zhang, Qingxin Song, Ryan C Kirkbride, Xiaoya Chen, Elizabeth Dennis, Danny J Llewellyn, Daniel G Peterson, Peggy Thaxton, Don C Jones, Qiong Wang, Xiaoyang Xu, Hua Zhang, Huaitong Wu, Lei Zhou, Gaofu Mei, Shuqi Chen, Yue Tian, Dan Xiang, Xinghe Li, Jian Ding, Qiyang Zuo, Linna Tao, Yunchao Liu, Ji Li, Yu Lin, Yuanyuan Hui, Zhisheng Cao, Caiping Cai, Xiefei Zhu, Zhi Jiang, Baoliang Zhou, Wangzhen Guo, Ruiqiang Li & Z Jeffrey Chen
Title  Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement
Journal Nature Biotechnology
Issue 33
Pages 531-537
Year 2015

 

Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. hirsutum genome assembly. Alignments with an alignment length of 97% and 98% identify were preserved. The available files are in GFF3 format.

 

G. arboreum CottonGen RefTrans v1 G.hirsutum_NAU-NBI_v1.1_g.arboreum_cottongen_reftransV1
G. barbadense CottonGen RefTrans v1 G.hirsutum_NAU-NBI_v1.1_g.barbadense_cottongen_reftransV1
G. hirsutum CottonGen RefTrans v1 G.hirsutum_NAU-NBI_v1.1_g.hirsutum_cottongen_reftransV1
G. raimondii CottonGen RefTrans v1 G.hirsutum_NAU-NBI_v1.1_g.raimondii_cottongen_reftransV1