Gossypium rotundifolium (K2) 'Grot K201' genome HAU_v1

Overview
Analysis NameGossypium rotundifolium (K2) 'Grot K201' genome HAU_v1
Methodllumina and PacBio
Source (v1)
Date performed2021-08-09

About the assembly

In this study, we applied Oxford Nanopore sequencing technology to assemble G. rotundifolium (K2*) 'K201', G. arboreum (A2) 'SXY1', and G. raimondii (D5) 'D502' genomes. G. arboreum and G. raimondii genomes have been de novo assembled previously using Illumina and PacBio reads, but both genomes have a number of sequence gaps and require an improvement in assembly contiguity. We generated a total of 304 Gb, 212 Gb, 125 Gb Nanopore sequencing data with a genome coverage 124×, 131×, 167× for K2*, A2 and D5, respectively. We assembled 3,593, 1,173 and 366 contigs for G. rotundifolium, G. arboreum and G. raimondii with a contig length of 2.44 Gb, 1.62 Gb and 0.75 Gb, respectively (Table 1). These initial contigs were polished using Illumina paired-end reads with a genome coverage of 108×, 118×, 132× for K2*, A2 and D5. The contig N50 is 5.33 Mb, 11.69 Mb and 17.04 Mb for K2*, A2 and D5, respectively. The maximum contig has a length of 32.72 Mb, 58.57 Mb and 43.74 Mb. After polishing contig using Illumina reads, we used high-through chromosome conformation capture (Hi-C) data to order and orient contigs, aimed at constructing pseudo chromosomes of each species. In the Hi-C assisted assembly, 2,559, 485 and 201 contigs were placed on the 13 chromosomes of K2*, A2 and D5 genomes, occupying over 99% of genome length.

*Should be K2 but not K12

 

Table 1. Summary of genome assemblies and annotations of G. rotundifolium, G. arboreum and G.raimondii.

Genomic feature G. rotundifolium 'Grot K201' G. arboreum 'Shixiya1' G. raimondii 'Grai D502'
Total length of contigs, bp 2,444,364,209 1,621,008,062 750,197,587
Total length of scaffolds, bp 2,444,484,509 1,621,030,562 750,205,487
Total length of gaps, bp 120,300 22,500 7,900
Percentage of anchoring 99.28% 99.47% 99.57%
Percentage of anchoring and ordering 93.16% 98.84% 99.01%
Number of contigs 3,593 1,173 366
Number of scaffolds 2,390 948 287
Contig N50, bp 5,326,689 11,691,474 17,043,680
Contig N90, bp 621,066 2,910,421 3,537,560
Scaffold N50, bp 177,839,665 129,592,444 57,716,579
Scaffold N90, bp 115,394,628 93,157,762 49,929,625
Maximun contig length, bp 32,728,186 58,575,076 43,739,617
Maximum scaffold length, bp 205,722,655 143,367,608 63,188,200
GC content 36.38% 35.16% 33.23%
Percentage of repeat sequences 80.92% 68.05% 57.04%
GC content 36.38% 35.16% 33.23%
Number of genes 41,590 41,778 40,820

 

Publication

Wang, M. et al. Comparative genome analyses highlight transposon-mediated genome expansion and the evolutionary architecture of 3D genomic folding in cotton. Mol Biol Evol. 2021 May 11:msab128. doi: 10.1093/molbev/msab128.

Assembly

The chromosomes (pseudomolecules) and scaffolds for Gossypium rotundifolium '(K2)' genome. This file belongs to the HAU G. rotundifolium Assembly v1.0

Chromosomes & scaffolds (FASTA format) G.rotundifolium_HAU.fa.gz
Functional Analysis

Functional annotation files for the Gossypium rotundifolium HAU Genome v1.0 are available for download below. The Gossypium rotundifolium HAU Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan K2_HAU_v1_genes2GO.xlsx.gz
IPR assignments from InterProScan K2_HAU_v1_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs K2_HAU_v1_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways K2_HAU_v1_KEGG-pathways.xlsx.gz

 

Genes

The predicted gene model, their alignments and proteins for Gossypium rotundifolium '(K2)' genome. These files belong to the HAU G. rotundifolium Assembly v1.0

Predicted gene models with exons (GFF3 format) G.rotundifolium_HAU.gff3.gz
Coding sequences, CDS (FASTA format) G.rotundifolium_HAU.cds.fa.gz
Protein sequences (FASTA format) G.rotundifolium_HAU.pep.fa.gz
Homology

Homology of the Gossypium rotundifolium HAU Genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2021-09) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2021-09), and UniProtKB/TrEMBL (Release 2021-09) databases. The best hit reports are available for download in Excel format. 

 

Protein Homologs

G.rotundifolium HAU Genome v1.0 proteins with NCBI nr homologs (EXCEL file) K2_HAU_v1_vs_nr.xlsx.gz
G.rotundifolium HAU Genome v1.0 proteins with NCBI nr (FASTA file) K2_HAU_v1_vs_nr_hit.fasta.gz
G.rotundifolium HAU Genome v1.0 proteins without NCBI nr (FASTA file) K2_HAU_v1_vs_nr_noHit.fasta.gz
G.rotundifolium HAU Genome v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) K2_HAU_v1_vs_tair.xlsx.gz
G.rotundifolium HAU Genome v1.0 proteins with arabidopsis (Araport11) (FASTA file) K2_HAU_v1_vs_tair_hit.fasta.gz
G.rotundifolium HAU Genome v1.0 proteins without arabidopsis (Araport11) (FASTA file) K2_HAU_v1_vs_tair_noHit.fasta.gz
G.rotundifolium HAU Genome v1.0 proteins with SwissProt homologs (EXCEL file) K2_HAU_v1_vs_swissprot.xlsx.gz
G.rotundifolium HAU Genome v1.0 proteins with SwissProt (FASTA file) K2_HAU_v1_vs_swissprot_hit.fasta.gz
G.rotundifolium HAU Genome v1.0 proteins without SwissProt (FASTA file) K2_HAU_v1_vs_swissprot_noHit.fasta.gz
G.rotundifolium HAU Genome v1.0 proteins with TrEMBL homologs (EXCEL file) K2_HAU_v1_vs_trembl.xlsx.gz
G.rotundifolium HAU Genome v1.0 proteins with TrEMBL (FASTA file) K2_HAU_v1_vs_trembl_hit.fasta.gz
G.rotundifolium HAU Genome v1.0 proteins without TrEMBL (FASTA file) K2_HAU_v1_vs_trembl_noHit.fasta.gz

 

Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium rotundifolium HAU assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
 
CottonGen SNP markers mapped to genome G.rotundifolium_HAU_K2_SNP
CottonGen RFLP markers mapped to genome G.rotundifolium_HAU_K2_RFLP
CottonGen SSR markers mapped to genome G.rotundifolium_HAU_K2_SSR
CottonGen InDel markers mapped to genome G.rotundifolium_HAU_K2_InDel

 

Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. rotundifolium genome assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.

 

G. arboreum CottonGen RefTrans v1 G.rotundifolium_HAU_K2_g.arboreum_cottongen_reftransV1
G. barbadense CottonGen RefTrans v1 G.rotundifolium_HAU_K2_G.barbadense_cottongen_reftransV1
G. hirsutum CottonGen RefTrans v1 G.rotundifolium_HAU_K2_g.hirsutum_cottongen_reftransV1
G. raimondii CottonGen RefTrans v1 G.rotundifolium_HAU_K2_g.raimondii_cottongen_reftransV1