|Analysis Name||Gossypium hirsutum (AD1) TM-1 Genome CGP-BGI v1_a1 |
|Method||SOAPdenovo (na) |
|Source||Illumina HiSeq 2000 reads from various insert size libraries (CGP-BGI) |
|Date performed||2015-04-20 |
About the assembly
The allotetraploid genome of Upland cotton G. hirsutum has been estimated, using various methods, as 2.25–2.43 Gb. A total of 445.7 Gb, or 181-fold haploid genome coverage, of raw paired-end Illumina reads by sequencing whole genome shotgun (WGS) libraries of homozygous cv. 'TM-1' with fragment lengths ranging from 250 bp to 40 kb was generated. Owing to the existence of abundant repetitive sequences and homeologous chromosomes, Assembling this allotetraploid genome satisfactorily using only the WGS data was not possible. Supplemental use of a bacterial artificial chromosome (BAC-to-BAC) sequencing strategy substantially improved the assembly. A total of 100,187 BACs, that corresponded to about fivefold genome coverage, were sequenced and used in the final assembly. Each BAC was assembled individually before genome assembly. Genome assembly used sequenced BACs and paired-end data. A total of 2,173 Mb of the G. hirsutum genome sequence was assembled using SOAPdenovo, with the largest scaffold being 8.4 Mb. This corresponds to 96.7% of the previous estimation of nuclear DNA content13, or 89.6% according to a more recent report14. The N50 (the size above which 50% of the total length of the sequence assembly can be found) of the contigs and scaffolds was 80 kb and 764 kb, respectively, which was better than the assembly that used WGS data only (N50 of contigs and scaffolds was 20 kb and 107 kb, respectively.
||Percent of the assembly
|Anchored and oriented scaffolds
Li et. al., Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution Nature Biotechnology. 33, 524–530. 2015
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the G. barbadense genome assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen and CMap are linked to JBrowse.
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. hirsutum genome assembly. Alignments with an alignment length of 97% and 98% identify were preserved. The available files are in GFF3 format.
Homology of the Gossypium hirsutum CGP-BGI_v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2018-05) and 1e-6 for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2019-01), and UniProtKB/TrEMBL (Release 2019-01) databases. The best hit reports are available for download in Excel format.
|| Fuguang Li, Guangyi Fan, Cairui Lu, Guanghui Xiao, Changsong Zou, Russell J Kohel, Zhiying Ma, Haihong Shang, Xiongfeng Ma, Jianyong Wu, Xinming Liang, Gai Huang, Richard G Percy, Kun Liu, Weihua Yang, Wenbin Chen, Xiongming Du, Chengcheng Shi, Youlu Yuan, Wuwei Ye, Xin Liu, Xueyan Zhang, Weiqing Liu, Hengling Wei, Shoujun Wei, Guodong Huang, Xianlong Zhang, Shuijin Zhu, He Zhang, Fengming Sun, Xingfen Wang, Jie Liang, Jiahao Wang, Qiang He, Leihuan Huang, Jun Wang, Jinjie Cui, Guoli Song, Kunbo Wang, Xun Xu, John Z Yu, Yuxian Zhu & Shuxun Yu
||Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution
Functional annotation for Gossypium hirsutum (AD1) Genome BGI Assembly v1.0 (Performed by BGI)
Functional annotation for Gossypium hirsutum (AD1) Genome BGI Assembly v1.0 (Performed by the CottonGen Team of the Main Bioinformatics Lab at WSU.)
All assembly and annotation files are available for download by selecting the desired data type in the left-hand "Resources" side bar. Each data type page will provide a description of the available files and links to download. Alternatively, you can browse all available files on the FTP repository.