Gossypium hirsutum (AD1) Genome NAU-NBI Assembly v1.1 & Annotation v1.1

Genome Overview
Analysis NameGossypium hirsutum (AD1) Genome NAU-NBI Assembly v1.1 & Annotation v1.1
MethodSOAPdenovo (12)
SourceIllumina HiSeq 2000 reads from various insert size libraries (NAU-NBI)
Date performed2015-04-20

About the assembly
An allohaploid plant was derived from the allotetraploid cotton (TM-1) and used for genome sequencing. 612 Gb (245× genome equivalent) of high-quality Illumina reads were produced and assembled using SOAPdenovo12. The resulting contigs and scaffold were integrated using 174,454 pairs of Sanger-sequenced BAC-end sequences comprising 116.5 Mb, and assembled into the TM-1 genome sequence (V1.0). To correct for misassembly, classify the homoeologous segments and order the scaffolds, an ultradense genetic map was developed using genotyping by sequencing of 59 F2 individuals derived from TM-1 and G. barbadense cv. Hai7124. The map consisted of 4,999,048 single-nucleotide polymorphism (SNP) loci and 4,049 recombination bins spanning 4,042 cM in 26 linkage groups. Using the map, 218 misassembled scaffolds were corrected (442.2 Mb, or 17.6%, of the genome sequence) in the assembly V1.0 and most misassembled scaffolds were caused by ambiguous homeolog sequences. The final assembly (V1.1) comprised 265,279 contigs (N50 = 34.0 kb) and 40,407 scaffolds (N50 = 1.6 Mb). The total scaffold length (2.4 Gb) spanned ~96% of the estimated allotetraploid genome (2.5 Gb), of which 6,146 scaffolds (2.3 Gb) were aligned and organized into 26 pseudochromosomes, including 1.5 Gb (4,635 scaffolds) in the A subgenome and 0.8 Gb (1,511 scaffolds) in the D subgenome. Furthermore, 1.9 Gb (79.2%) was oriented based on linkage maps.



Summary A subgenome D subgenome UN* Total
Scaffold number 4,635 1,511 34,261 40,407
Scaffold length 1,477.1 Mb 831.0 Mb 124.6 Mb 2,432.7 Mb
Scaffold N50 1.4 Mb 2.5 Mb 7,160 bp 1.6 Mb
Oriented scaffold number 955 501 NA 1,456
Oriented scaffold size 1,150.8 Mb 769.5 Mb NA 1,920.4 Mb
Contig number 142,201 44,057 79,021 265,279
Contig length 1,220.6 Mb 746.8 Mb 100.7 Mb 2,068.1 Mb
Contig N50 30.7 kb 47.2 kb 2,542 bp 34.0 kb
Gene number 32,032 34,402 4,044 70,478
Total gene length 107.2 Mb 109.8 Mb 3.1 Mb 220 Mb
Transposable elements 843.5 Mb 433 Mb 62.5 Mb 1,339 Mb
GC content 34.4% 33.3% 35.5% 34.1%

*Un-anchored scaffolds

Zhang et. al., Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvementNature Biotechnology. 33, 531–537. 2015




The chromosomes (pseudomolecules) and scaffolds for Gossypium hirsutum (AD1) Genome NAU-NBI Assembly v1.1

Assembly pseudomolecules (FASTA format) NBI_Gossypium_hirsutum_v1.1.fa.gz



The predicted genes and proteins for Gossypium hirsutum (AD1) Genome NAU-NBI Assembly v1.1

Predicted genes coding sequences (CDS)  (FASTA format) NBI_Gossypium_hirsutum_v1.1.cds.fa.gz
Predicted genes proteins (FASTA format) NBI_Gossypium_hirsutum_v1.1.pep.fa.gz
Predicted genes and CDS (GFF3 format) NBI_Gossypium_hirsutum_v1.1.gene.gff3.gz


Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the G. hirsutum genome assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen and CMap are linked to JBrowse.
CottonGen SNP markers mapped to genome G.hirsutum_NAU-NBI_v1.1_SNP.gff3.gz
CottonGen RFLP markers mapped to genome G.hirsutum_NAU-NBI_v1.1_RFLP.gff3.gz
CottonGen SSR markers mapped to genome G.hirsutum_NAU-NBI_v1.1_SSR.gff3.gz
CottonGen InDel markers mapped to genome G.hirsutum_NAU-NBI_v1.1_CTG_Indels.gff3.gz
NCBI Cotton dbSNPs mapped to genome G.hirsutum_NAU-NBI_v1.1_SNP.gff3.gz


Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. hirsutum genome assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.


dbEST for all Gossypium G.hirsutum_NAU-NBI_v1.1_dbEST.gff3.gz
dbEST for G. arboreum G.hirsutum_NAU-NBI_v1.1_dbEST_G-arboreum.gff3.gz
dbEST for G. barbadense G.hirsutum_NAU-NBI_v1.1_dbEST_G-barbadense.gff3.gz
dbEST for G. herbaceum G.hirsutum_NAU-NBI_v1.1_dbEST_G-herbaceum.gff3.gz
dbEST for G. hirsutum G.hirsutum_NAU-NBI_v1.1_dbEST_G-hirsutum.gff3.gz
dbEST for G. raimondii G.hirsutum_NAU-NBI_v1.1_dbEST_G-raimondii.gff3.gz
Unigene from NCBI for all Gossypium G.hirsutum_NAU-NBI_v1.1_NCBI-unigenes.gff3.gz
Unigene from NCBI for  G. hirsutum G.hirsutum_NAU-NBI_v1.1_NCBI_G-hirsutum-unigene.gff3.gz
Unigene from NCBI for  G. raimondii G.hirsutum_NAU-NBI_v1.1_NCBI_G-raimondii-unigene.gff3.gz
CottonGen unigene v1.0 G.hirsutum_NAU-NBI_v1.1_cottongen-unigenes.gff3.gz
J. Udall 2012 Unigene contigs G.hirsutum_NAU-NBI_v1.1_Udall2012-unigenes.gff3.gz
J. Udall 2012 Unigene contigs (CDS) G.hirsutum_NAU-NBI_v1.1_Udall2012-unigenes_cds.gff3.gz
PlantGDB Gossypium unigenes G.hirsutum_NAU-NBI_v1.1_PlantGDB_Gossypium.gff3.gz
PlantGDB G. arboreum unigenes G.hirsutum_NAU-NBI_v1.1_PlantGDB_arboreum.gff3.gz
PlantGDB G. barbadense unigenes G.hirsutum_NAU-NBI_v1.1_PlantGDB_barbadense.gff3.gz
PlantGDB G. hirsutum unigenes G.hirsutum_NAU-NBI_v1.1_PlantGDB_hirsutum.gff3.gz
PlantGDB G. raimondii unigenes G.hirsutum_NAU-NBI_v1.1_PlantGDB_raimondii.gff3.gz
TIGR CGI unigenes G.hirsutum_NAU-NBI_v1.1_TIGR_CGI.gff3.gz


Protein Alignments
Protein alignments available below were performed by the CottonGen Team of the Main Bioinformatics Lab at WSU.  The alignment tool 'exonerate' was used to map protein sequences onto the G. hirsutum NBI v1.1 genome. Only alignments with a percent identity of 90% were retained.


Protein Homology
Protein homology was performed by the CottonGen Team of Main Bioinformatics Lab at WSU. Proteins from the G. hirsutum assembly were mapped against proteins from other genomes and databases using blastp with an e-value cutoff of 1e-6. Only the best match was kept. The available files are in Excel 2007 format.
Cacao theobroma v1.1 proteins G.hirsutum_NAU-NBI_v1.1_vs_Cacao.xlsx
Arabidopsis thaliana TAIR10 proteins G.hirsutum_NAU-NBI_v1.1_vs_TAIR10.xlsx
Oryza sativa MSU v7.0 proteins G.hirsutum_NAU-NBI_v1.1_vs_Rice.xlsx
Poplar trichocarpa v2.0 proteins G.hirsutum_NAU-NBI_v1.1_vs_Poplar.xlsx
Vitis vinifera proteins G.hirsutum_NAU-NBI_v1.1_vs_Grape.xlsx
Glycine max v1.0 proteins G.hirsutum_NAU-NBI_v1.1_vs_Soybean.xlsx
Uniprot SwissProt proteins G.hirsutum_NAU-NBI_v1.1_vs_SwissProt.xlsx
Uniprot TrEMBL proteins G.hirsutum_NAU-NBI_v1.1_vs_TrEMBL.xlsx
NCBI nr proteins G.hirsutum_NAU-NBI_v1.1_vs_nr.xlsx


Authors Tianzhen Zhang, Yan Hu, Wenkai Jiang, Lei Fang, Xueying Guan, Jiedan Chen, Jinbo Zhang, Christopher A Saski, Brian E Scheffler, David M Stelly, Amanda M Hulse-Kemp, Qun Wan, Bingliang Liu, Chunxiao Liu, Sen Wang, Mengqiao Pan, Yangkun Wang, Dawei Wang, Wenxue Ye, Lijing Chang, Wenpan Zhang, Qingxin Song, Ryan C Kirkbride, Xiaoya Chen, Elizabeth Dennis, Danny J Llewellyn, Daniel G Peterson, Peggy Thaxton, Don C Jones, Qiong Wang, Xiaoyang Xu, Hua Zhang, Huaitong Wu, Lei Zhou, Gaofu Mei, Shuqi Chen, Yue Tian, Dan Xiang, Xinghe Li, Jian Ding, Qiyang Zuo, Linna Tao, Yunchao Liu, Ji Li, Yu Lin, Yuanyuan Hui, Zhisheng Cao, Caiping Cai, Xiefei Zhu, Zhi Jiang, Baoliang Zhou, Wangzhen Guo, Ruiqiang Li & Z Jeffrey Chen
Title  Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement
Journal Nature Biotechnology
Issue 33
Pages 531-537
Year 2015


Functional Annotation
Functional annotation for Gossypium hirsutum (AD1) Genome NBI Assembly v1.1 (Performed by NBI)  
Arabidopsis/Swissprot/TrEMBL/KEGG/Pfam/Interpro/GO Annotation NBI_TM1-annotation.xlsx


Functional annotation for Gossypium hirsutum (AD1) Genome NBI Assembly v1.1 (Performed by the CottonGen Team of the Main Bioinformatics Lab at WSU.)


Interpro Analysis G.hirsutum_NBI_v1.1_interpro.txt.gz
GO annotation G.hirsutum_NBI_v1.1_genes2GO.txt.gz
IPR Terms G.hirsutum_NBI_v1.1_genes2IPR.txt.gz
KEGG Orthologs G.hirsutum_NBI_v1.1_KEGG.orthologs.txt.gz
KEGG Pathways G.hirsutum_NBI_v1.1_KEGG.pathways.txt.gz

All assembly and annotation files are available for download by selecting the desired data type in the left-hand "Resources" side bar.  Each data type page will provide a description of the available files and links to download.  Alternatively, you can browse all available files on the FTP repository