Gossypium barbadense (AD2) '3-79' genome HAU_v1

Analysis NameGossypium barbadense (AD2) '3-79' genome HAU_v1
MethodSOAPdenovo (2)
SourceIllumina paired-end (insert size of ~500 bp) and mate-pair (insert size of 5, 10 and 20 kb)
Date performed2019-06-07

About the assembly

Here, we report the genome sequence of the superior fibre quality tetraploid cotton, G. barbadense acc. 3-79 using a whole-genome shotgun approach with large fragments of DNA Paired-End Tag (DNA-PET) sequencing data.  This is a high-quality assembly of the 2.57 gigabase genome of G. barbadense, including 80,876 protein-coding genes. The double-sized genome of the A (or At) (1.50 Gb) against D (or Dt) (853 Mb) primarily resulted from the expansion of Gypsy elements, including Peabody and Retrosat2 subclades in the Del clade, and the Athila subclade in the Athila/Tat clade. Substantial gene expansion and contraction were observed and rich homoeologous gene pairs with biased expression patterns were identified, suggesting abundant gene sub-functionalization occurred by allopolyploidization. More specifically, the CesA gene family has adapted differentially temporal expression patterns, suggesting an integrated regulatory mechanism of CesA genes from At and Dt subgenomes for the primary and secondary cellulose biosynthesis of cotton fibre in a “relay race”-like fashion. It is anticipated that the G. barbadense genome sequence will advance the understanding of the mechanism of genome polyploidization and underpin genome-wide comparison research in this genus.

  Whole genome Allocated to subgenomes
At-subgenome Dt-subgenome Ungrouped
Scaffold N50 (Mb) 0.260 0.253 0.306 0.157
Maximum scaffold length (Mb) 2.15 1.63 2.15 0.96
Minimum scaffold length (Mb) 0.001 0.001 0.001 0.001
Number of scaffolds 29,751 14,319 6,967 8,465
Total length of assemblies (Mb) 2,573.19 1,493.53 852.98 226.68
Total gaps in assemblies (Mb) 334.55 223.47 82.35 28.72
Number of protein-coding genes 80,876 36,947 34,575 9,354
Average gene density (per 100 kb) 3.14 2.47 4.05 4.12
Average exon/intron sizes (bp) 283.40/423.41 283.07/434.49 281.81/422.92 290.92/449.44
Total size of transposable elements (Mb) 1,778.62 1097.99 541.57 139.06



Yuan et.al, The genome sequence of Sea-Island cotton (Gossypium barbadense) provides insights into the allopolyploidization and development of superior spinnable fibres. Sci Rep. 2015 Dec 4;5:17662. doi: 10.1038/srep17662.


The chromosomes (pseudomolecules) and scaffolds for Gossypium barbadense (AD1) Genome HAU-NBI Assembly v1.0

Assembly (FASTA format) G.barbadense_HAU_v1.0_genome.gz
Assembly - chromosomes only (FASTA format) AD2-NBI_v1.0_chromosomes_only.fas.gz AD2-NBI_v1.0_chromosomes_only.fas.gz.md5
Assembly - scaffolds only (FASTA format) AD2-NBI_v1.0_scaffold_only.fas.gz AD2-NBI_v1.0_scaffold_only.fas.gz.md5
Assembly - other anchored sequences (FASTA format) AD2-NBI_v1.0_other_anchored.fas.gz AD2-NBI_v1.0_other_anchored.fas.gz.md5

All assembly and annotation files are available for download by selecting the desired data type in the left-hand side bar.  Each data type page will provide a description of the available files and links to download.

Functional Annotation

Functional annotation for Gossypium barbadense (AD2) Genome NBI Assembly v1.0 (Performed by NBI)

Interpro/Pfam/KEGG/Swissprot/TrEMBL/TAIR Annotation Gb.gene.annotation.info.txt.gz


Functional annotation for Gossypium barbadense (AD2) Genome NBI Assembly v1.0 (Performed by the CottonGen Team of the Main Bioinformatics Lab at WSU.)

Interpro Analysis G.barbadense_NBI_v1.0_interpro.txt.gz
GO annotation G.barbadense_NBI_v1.0_genes2GO.txt.gz
IPR Terms G.barbadense_NBI_v1.0_genes2IPR.txt.gz
KEGG Orthologs G.barbadense_NBI_v1.0_KEGG.orthologs.txt.gz
KEGG Pathways G.barbadense_NBI_v1.0_KEGG.pathways.txt.gz

The predicted genes and proteins for Gossypium barbadense (AD2) Genome HAU-NBI Assembly v1.0

Predicted genes coding sequences (CDS) (FASTA format) AD2-NBI_v1.0.cds.fas.gz
cDNA sequences (FASTA format) AD2-NBI_v1.0.cDNA.fas.gz
Predicted genes (FASTA format) AD2-NBI_v1.0.gene.fas.gz
Predicted genes proteins (FASTA format) AD2-NBI_v1.0.pep.fas.gz
Predicted gene model (GFF3 format) Gb.final.all.gene.model.gff3.gz
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the G. barbadense genome assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen and CMap are linked to JBrowse.
CottonGen SNP markers mapped to genome G.barbadense_HAU-NBI_v1.0_SNP
CottonGen RFLP markers mapped to genome G.barbadense_HAU-NBI_v1.0_RFLP
CottonGen SSR markers mapped to genome G.barbadense_HAU-NBI_v1.0_SSR
CottonGen InDel markers mapped to genome G.barbadense_HAU-NBI_v1.0_Indels
Protein Homology
Homology of the Gossypium barbadense  HAU v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2018-05) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2019-01), and UniProtKB/TrEMBL (Release 2019-01) databases. The best hit reports are available for download in Excel format. 
G.hirsutumNAU-NBI_v1.1 transcripts with NCBI nr homologs (EXCEL file) G.barbadense_HAU_v1.0_vs_nr.xlsx.gz
G.hirsutumNAU-NBI_v1.1 transcripts with NCBI nr (FASTA file) G.barbadense_HAU_v1.0_vs_nr_hit.fasta.gz
G.hirsutumNAU-NBI_v1.1 transcripts without NCBI nr (FASTA file) G.barbadense_HAU_v1.0_vs_nr_noHit.fasta.gz
G.hirsutumNAU-NBI_v1.1 transcripts with arabidopsis (Araport11) homologs (EXCEL file) G.barbadense_HAU_v1.0_vs_tair.xlsx.gz
G.hirsutumNAU-NBI_v1.1 transcripts with arabidopsis (Araport11) (FASTA file) G.barbadense_HAU_v1.0_vs_tair_hit.fasta.gz
G.hirsutumNAU-NBI_v1.1 transcripts without arabidopsis (Araport11) (FASTA file) G.barbadense_HAU_v1.0_vs_tair_noHit.fasta.gz
G.hirsutumNAU-NBI_v1.1 transcripts with SwissProt homologs (EXCEL file) G.barbadense_HAU_v1.0_vs_swissprot.xlsx.gz
G.hirsutumNAU-NBI_v1.1 transcripts with SwissProt (FASTA file) G.barbadense_HAU_v1.0_vs_swissprot_hit.fasta.gz
G.hirsutumNAU-NBI_v1.1 transcripts without SwissProt (FASTA file) G.barbadense_HAU_v1.0_vs_swissprot_noHit.fasta.gz
G.hirsutumNAU-NBI_v1.1 transcripts with TrEMBL homologs (EXCEL file) G.barbadense_HAU_v1.0_vs_trembl.xlsx.gz
G.hirsutumNAU-NBI_v1.1 transcripts with TrEMBL (FASTA file) G.barbadense_HAU_v1.0_vs_trembl_hit.fasta.gz
G.hirsutumNAU-NBI_v1.1 transcripts without TrEMBL (FASTA file) G.barbadense_HAU_v1.0_vs_trembl_noHit.fasta.gz
Authors Daojun Yuan, Zhonghui Tang, Maojun Wang, Wenhui Gao, Lili Tu, Xin Jin, Lingling Chen, Yonghui He, Lin Zhang, Longfu Zhu, Yang Li, Qiqi Liang, Zhongxu Lin, Xiyan Yang, Nian Liu, Shuangxia Jin, Yang Lei, Yuanhao Ding, Guoliang Li, Xiaoan Ruan, Yijun Ruan & Xianlong Zhang
Title  The genome sequence of Sea-Island cotton (Gossypium barbadense) provides insights into the allopolyploidization and development of superior spinnable fibres
Journal Scientific Reports
Issue 5
doi 10.1038/srep17662
Year 2015
Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. barbadense genome assembly. Alignments with an alignment length of 97% and 98% identify were preserved. The available files are in GFF3 format.


G. arboreum CottonGen RefTrans v1 G.barbadense_HAU_v1.0_g.arboreum_cottongen_reftransV1
G. barbadense CottonGen RefTrans v1 G.barbadense_HAU_v1.0_g.barbadense_cottongen_reftransV1
G. hirsutum CottonGen RefTrans v1 G.barbadense_HAU_v1.0_g.hirsutum_cottongen_reftransV1
G. raimondii CottonGen RefTrans v1 G.barbadense_HAU_v1.0_g.raimondii_cottongen_reftransV1