Gossypium arboreum (A2) 'SXY1' genome HAU v1
About the assembly
In this study, we applied Oxford Nanopore sequencing technology to assemble G. rotundifolium (K2*) 'K201', G. arboreum (A2) 'SXY1', and G. raimondii (D5) 'D502' genomes. G. arboreum and G. raimondii genomes have been de novo assembled previously using Illumina and PacBio reads, but both genomes have a number of sequence gaps and require an improvement in assembly contiguity. We generated a total of 304 Gb, 212 Gb, 125 Gb Nanopore sequencing data with a genome coverage 124×, 131×, 167× for K2*, A2 and D5, respectively. We assembled 3,593, 1,173 and 366 contigs for G. rotundifolium, G. arboreum and G. raimondii with a contig length of 2.44 Gb, 1.62 Gb and 0.75 Gb, respectively (Table 1). These initial contigs were polished using Illumina paired-end reads with a genome coverage of 108×, 118×, 132× for K2*, A2 and D5. The contig N50 is 5.33 Mb, 11.69 Mb and 17.04 Mb for K2*, A2 and D5, respectively. The maximum contig has a length of 32.72 Mb, 58.57 Mb and 43.74 Mb. After polishing contig using Illumina reads, we used high-through chromosome conformation capture (Hi-C) data to order and orient contigs, aimed at constructing pseudo chromosomes of each species. In the Hi-C assisted assembly, 2,559, 485 and 201 contigs were placed on the 13 chromosomes of K2*, A2 and D5 genomes, occupying over 99% of genome length.
*Should be K12
Table 1. Summary of genome assemblies and annotations of G. rotundifolium, G. arboreum and G.raimondii.
Supplementary Table 4. Comparing A2 genome with previously published genome version.
Wang, M. et al. Comparative genome analyses highlight transposon-mediated genome expansion and the evolutionary architecture of 3D genomic folding in cotton. Mol Biol Evol. 2021 May 11:msab128. doi: 10.1093/molbev/msab128.
Additional information about this analysis:
Functional annotation files for the Gossypium arboreum HAU Genome v1.0 are available for download below. The Gossypium arboreum HAU Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).
The predicted gene model, their alignments and proteins for Gossypium arboreum'(A2)' genome. These files belong to the HAU G. arboreum Assembly v1.0
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium arboreum HAU me assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
Homology of the Gossypium arboreum HAU Genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2018-05) and 1e-6 for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2019-01), and UniProtKB/TrEMBL (Release 2019-01) databases. The best hit reports are available for download in Excel format.
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. arboreum genome assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.