Gossypium hirsutum (AD1) 'TM-1' genome CGP-BGI_v1

Overview
Analysis NameGossypium hirsutum (AD1) 'TM-1' genome CGP-BGI_v1
MethodSOAPdenovo (na)
SourceIllumina HiSeq 2000 reads from various insert size libraries (CGP-BGI)
Date performed2015-04-20

About the assembly
The allotetraploid genome of Upland cotton G. hirsutum TM-1 has been estimated, using various methods, as 2.25–2.43 Gb. A total of 445.7 Gb, or 181-fold haploid genome coverage, of raw paired-end Illumina reads by sequencing whole genome shotgun (WGS) libraries of homozygous cv. 'TM-1' with fragment lengths ranging from 250 bp to 40 kb was generated. Owing to the existence of abundant repetitive sequences and homeologous chromosomes, Assembling this allotetraploid genome satisfactorily using only the WGS data was not possible. Supplemental use of a bacterial artificial chromosome (BAC-to-BAC) sequencing strategy substantially improved the assembly. A total of 100,187 BACs, that corresponded to about fivefold genome coverage, were sequenced and used in the final assembly. Each BAC was assembled individually before genome assembly. Genome assembly used sequenced BACs and paired-end data. A total of 2,173 Mb of the G. hirsutum genome sequence was assembled using SOAPdenovo, with the largest scaffold being 8.4 Mb. This corresponds to 96.7% of the previous estimation of nuclear DNA content13, or 89.6% according to a more recent report14. The N50 (the size above which 50% of the total length of the sequence assembly can be found) of the contigs and scaffolds was 80 kb and 764 kb, respectively, which was better than the assembly that used WGS data only (N50 of contigs and scaffolds was 20 kb and 107 kb, respectively.

Category Number N50 (kb) Longest (kb) Size (Mb) Percent of the assembly
Total contigs 44,816 80 784 2,090
Total scaffolds 8,591 764 8,400 2,173 100.0
Anchored and oriented scaffolds 4,023 853 8,400 1,923 88.5
Genes annotated 76,943     220 9.5
miRNAs 602     0.07 <0.01
rRNAs 2,153     0.6 <0.01
tRNAs 2,050     0.2 0.01
snRNAs 8,325     0.9 0.04
Repeat sequences     1,471 67.2

Publication
Li et. al., Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution Nature Biotechnology. 33, 524–530. 2015

 

 

Assembly

The chromosomes (pseudomolecules) and scaffolds for Gossypium hirsutum (AD1) Genome CGP-BGI Assembly v1.0

Assembly pseudomolecules (FASTA format) BGI_Gossypium_hirsutum_v1.0.gz
Assembly pseudomolecules (GFF3format) BGI_Gossypium_hirsutum_v1.0.gff.gz

 

Downloads

All assembly and annotation files are available for download by selecting the desired data type in the left-hand side bar.  Each data type page will provide a description of the available files and links to download.

Functional Annotation
Functional annotation for Gossypium hirsutum (AD1) Genome BGI Assembly v1.0 (Performed by BGI)
 
GO Annotation BGI_Gossypium_hirsutum_gene.GO.result.gz
Interpro Annotation BGI_Gossypium_hirsutum_gene.InterPro.result.gz
KEGG Annotation BGI_Gossypium_hirsutum_gene.KEGG.result.gz
Swissprot Annotation BGI_Gossypium_hirsutum_gene.Swissprot.result.gz
TrEMBL Annotation BGI_Gossypium_hirsutum_gene.TrEMBL.result.gz

 

Functional annotation for Gossypium hirsutum (AD1) Genome BGI Assembly v1.0 (Performed by the CottonGen Team of the Main Bioinformatics Lab at WSU.)

 

Interpro Analysis G.hirsutum_BGI_v1.0_interpro.txt.gz
GO annotation G.hirsutum_BGI_v1.0_genes2GO.txt.gz
IPR Terms G.hirsutum_BGI_v1.0_genes2IPR.txt.gz
KEGG Orthologs G.hirsutum_BGI_v1.0_KEGG.orthologs.txt.gz
KEGG Pathways G.hirsutum_BGI_v1.0_KEGG.pathways.txt.gz
Genes

The predicted genes and proteins for Gossypium hirsutum (AD1) Genome CGP-BGI Assembly v1.0
 

Predicted genes coding sequences (CDS)  (FASTA format) BGI_Gossypium_hirsutum_v1.0.cds.gz
Predicted genes proteins (FASTA format) BGI_Gossypium_hirsutum_v1.0.pep.gz
Predicted genes and CDS (GFF3 format) BGI_Gossypium_hirsutum_v1.0.cds.gff.gz

 

Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the G. barbadense genome assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen and CMap are linked to JBrowse.
 
CottonGen SNP markers mapped to genome G.hirsutum_CGP-BGI_v1.0_SNP.gff3.gz
CottonGen RFLP markers mapped to genome G.hirsutum_CGP-BGI_v1.0_RFLP.gff3.gz
CottonGen SSR markers mapped to genome G.hirsutum_CGP-BGI_v1.0_SSR.gff3.gz
CottonGen InDel markers mapped to genome G.hirsutum_CGP-BGI_v1.0_Indels.gff3.gz
Protein Alignments
Protein alignments available below were performed by the CottonGen Team of the Main Bioinformatics Lab at WSU.  The alignment tool 'exonerate' was used to map protein sequences onto the G. hirsutum NBI v1.0 genome. Only alignments with a percent identity of 90% were retained.
 

 

Protein Homology
Homology of the Gossypium hirsutum CGP-BGI_v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2018-05) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2019-01), and UniProtKB/TrEMBL (Release 2019-01) databases. The best hit reports are available for download in Excel format. 
G.hirsutum CGP-BGI_v1.0 proteins with NCBI nr homologs (EXCEL file) G.hirsutum_CGP-BGI_v1.0_vs_nr.xlsx.gz
G.hirsutum CGP-BGI_v1.0 proteins with NCBI nr (FASTA file) G.hirsutum_CGP-BGI_v1.0_vs_nr_hit.fasta.gz
G.hirsutum CGP-BGI_v1.0 proteins without NCBI nr (FASTA file) G.hirsutum_CGP-BGI_v1.0_vs_nr_noHit.fasta.gz
G.hirsutum CGP-BGI_v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) G.hirsutum_CGP-BGI_v1.0_vs_tair.xlsx.gz
G.hirsutum CGP-BGI_v1.0 proteins with arabidopsis (Araport11) (FASTA file) G.hirsutum_CGP-BGI_v1.0_vs_tair_hit.fasta.gz
G.hirsutum CGP-BGI_v1.0 proteins without arabidopsis (Araport11) (FASTA file) G.hirsutum_CGP-BGI_v1.0_vs_tair_noHit.fasta.gz
G.hirsutum CGP-BGI_v1.0 proteins with SwissProt homologs (EXCEL file) G.hirsutum_CGP-BGI_v1.0_vs_swissprot.xlsx.gz
G.hirsutum CGP-BGI_v1.0 proteins with SwissProt (FASTA file) G.hirsutum_CGP-BGI_v1.0_vs_swissprot_hit.fasta.gz
G.hirsutum CGP-BGI_v1.0 proteins without SwissProt (FASTA file) G.hirsutum_CGP-BGI_v1.0_vs_swissprot_noHit.fasta.gz
G.hirsutum CGP-BGI_v1.0 proteins with TrEMBL homologs (EXCEL file) G.hirsutum_CGP-BGI_v1.0_vs_trembl.xlsx.gz
G.hirsutum CGP-BGI_v1.0 proteins with TrEMBL (FASTA file) G.hirsutum_CGP-BGI_v1.0_vs_trembl_hit.fasta.gz
G.hirsutum CGP-BGI_v1.0 proteins without TrEMBL (FASTA file) G.hirsutum_CGP-BGI_v1.0_vs_trembl_noHit.fasta.gz
Publication
Authors  Fuguang Li, Guangyi Fan, Cairui Lu, Guanghui Xiao, Changsong Zou, Russell J Kohel, Zhiying Ma, Haihong Shang, Xiongfeng Ma, Jianyong Wu, Xinming Liang, Gai Huang, Richard G Percy, Kun Liu, Weihua Yang, Wenbin Chen, Xiongming Du, Chengcheng Shi, Youlu Yuan, Wuwei Ye, Xin Liu, Xueyan Zhang, Weiqing Liu, Hengling Wei, Shoujun Wei, Guodong Huang, Xianlong Zhang, Shuijin Zhu, He Zhang, Fengming Sun, Xingfen Wang, Jie Liang, Jiahao Wang, Qiang He, Leihuan Huang, Jun Wang, Jinjie Cui, Guoli Song, Kunbo Wang, Xun Xu, John Z Yu, Yuxian Zhu & Shuxun Yu
Title  Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution
Journal Nature Biotechnology
Issue 33
Pages 524-530
Year 2015

 

Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. hirsutum genome assembly. Alignments with an alignment length of 97% and 98% identify were preserved. The available files are in GFF3 format.

 

G. arboreum CottonGen RefTrans v1 G.hirsutum_CGP-BGI_v1.0_g.arboreum_cottongen_reftransV1
G. barbadense CottonGen RefTrans v1 G.hirsutum_CGP-BGI_v1.0_g.barbadense_cottongen_reftransV1
G. hirsutum CottonGen RefTrans v1 G.hirsutum_CGP-BGI_v1.0_g.hirsutum_cottongen_reftransV1
G. raimondii CottonGen RefTrans v1 G.hirsutum_CGP-BGI_v1.0_g.raimondii_cottongen_reftransV1