Gossypium tomentosum (AD3) genome HAU_v1

Overview
Analysis NameGossypium tomentosum (AD3) genome HAU_v1
MethodPacBio; Canu (1.3)
Source (v1)
Date performed2022-06-08

About the assembly

Here, we sequenced the allotetraploid wild species G. tomentosum (AD3) germplasm 'HAZU-2020-AD3' by generating 229.32 gigabases (Gb) PacBio single-molecule long reads (N50 length of 21.44 Kb) with 102.90 fold genome coverage (Table 1). The completed assembly of the genome captured 2229 megabases (Mb) genome sequences and contained 1286 contigs (contig N50 = 11.98 Mb; Table 1). The assessment of assembly completeness, sequence consistency, accuracy and heterozygosity showed a high quality G. tomentosum genome assembly (Table 1). The final assembly captured 2.23 Gb genome sequence, of which 98.40% of the entire genome sequence was organized orientation and divided into 26 chromosomes using Hi-C data; and 96.06% of the remaining scaffolds was less than 0.1 Mb in length. While it covered approximately 94.64% of the total genome size with an estimated 2.36 Gb by k-mer genome survey analysis. The assembly integrity of genetic regions was supported 98.68% (1421) of highly conserved core proteins in the BUSCO dataset, which further confirmed the high completeness and quality of the current genome assembly.

 Table 1: Summary of G. tomentosum genome assembly and annotation.

Genomic feature G. tomentosum (AD3)
Total PacBio reads (Gb) 229.32
Total Hi-C reads (Gb) 172.16
Total length of assemblies 2,228,718,597
Longest scaffold (bp) 122,188,960
Number of contigs 1,286
Contig N50, bp 11.978.058
Contig N90, bp 2,703,290
Number of scaffolds 914
Scaffold N50,bp 103,048,976
Scaffold N90,bp 57,603,400
Percentage of repeat sequence 75.25%
Complete BUSCOs 96.39% (1388)
GC content 34.2%
Number of genes 72,620

 

Publication: Shen, et al. (2021). Gossypium tomentosum genome and interspecific ultra-dense genetic maps reveal genomic structures, recombination landscape and flowering depression in cotton. Genomics 113, 1999–2009. doi: 10.1016/j.ygeno.2021.04.036

Assembly

The chromosomes (pseudomolecules) for Gossypium tomentosum genome. These files belong to the HAU-AD3 Assembly v1  and NCBI BioProject PRJNA629964)

 Chromosomes (FASTA format) HAU v1 G.tomentosum_HAU-AD3_assembly_v1.fas.gz
 Chromosomes (FASTA format) NCBI v1 G.tomentosum_HAU-AD3_assembly_NCBI_v1.fas.gz
Functional Analysis

Functional annotation files for the Gossypium tomentosum HAU Genome v1.0 are available for download below. The Gossypium tomentosum HAU Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan AD3_HAU_v1_genes2GO.xlsx.gz
IPR assignments from InterProScan AD3_HAU_v1_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs AD3_HAU_v1_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways AD3_HAU_v1_KEGG-pathways.xlsx.gz

 

Genes

The predicted gene model, their alignments and proteins for G. tomentosum genome. These files belong to the HAU-AD3 Assembly v1.

 Predicted gene models with exons (GFF3 format) Gossypium_tomentosum_HAU.gene.gff.gz
 CDS sequences (FASTA format) Gossypium_tomentosum_HAU.gene.cds.gz
 Protein sequences (FASTA format) Gossypium_tomentosum_HAU.gene.pep.gz
Homology

Homology of the Gossypium tomentosum HAU Genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2021-09) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2021-09), and UniProtKB/TrEMBL (Release 2021-09) databases. The best hit reports are available for download in Excel format. 

 

Protein Homologs

G.tomentosum HAU Genome v1.0 proteins with NCBI nr homologs (EXCEL file) AD3_HAU_v1_vs_nr.xlsx.gz
G.tomentosum HAU Genome v1.0 proteins with NCBI nr (FASTA file) AD3_HAU_v1_vs_nr_hit.fasta.gz
G.tomentosum HAU Genome v1.0 proteins without NCBI nr (FASTA file) AD3_HAU_v1_vs_nr_noHit.fasta.gz
G.tomentosum HAU Genome v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) AD3_HAU_v1_vs_tair.xlsx.gz
G.tomentosum HAU Genome v1.0 proteins with arabidopsis (Araport11) (FASTA file) AD3_HAU_v1_vs_tair_hit.fasta.gz
G.tomentosum HAU Genome v1.0 proteins without arabidopsis (Araport11) (FASTA file) AD3_HAU_v1_vs_tair_noHit.fasta.gz
G.tomentosum HAU Genome v1.0 proteins with SwissProt homologs (EXCEL file) AD3_HAU_v1_vs_swissprot.xlsx.gz
G.tomentosum HAU Genome v1.0 proteins with SwissProt (FASTA file) AD3_HAU_v1_vs_swissprot_hit.fasta.gz
G.tomentosum HAU Genome v1.0 proteins without SwissProt (FASTA file) AD3_HAU_v1_vs_swissprot_noHit.fasta.gz
G.tomentosum HAU Genome v1.0 proteins with TrEMBL homologs (EXCEL file) AD3_HAU_v1_vs_trembl.xlsx.gz
G.tomentosum HAU Genome v1.0 proteins with TrEMBL (FASTA file) AD3_HAU_v1_vs_trembl_hit.fasta.gz
G.tomentosum HAU Genome v1.0 proteins without TrEMBL (FASTA file) AD3_HAU_v1_vs_trembl_noHit.fasta.gz

 

Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium tomentosum HAU me assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
 
CottonGen SNP markers mapped to genome G.tomentosum_AD3_HAU_SNP
CottonGen RFLP markers mapped to genome G.tomentosum_AD3_HAU_RFLP
CottonGen SSR markers mapped to genome G.tomentosum_AD3_HAU_SSR
CottonGen InDel markers mapped to genome G.tomentosum_AD3_HAU_InDel

 

Publications

Shen, C., Wang, N., Zhu, D., Wang, P., Wang, M., Wen, T., et al. (2021). Gossypium tomentosum genome and interspecific ultra-dense genetic maps reveal genomic structures, recombination landscape and flowering depression in cotton. Genomics 113, 1999–2009. doi: 10.1016/j.ygeno.2021.04.036

Repeats

Total repeats of the chromosomes (pseudomolecules) for Gossypium tomentosum genome. This file belong to the HAU-AD3 Assembly v1

 Total Repeat (GFF format) HAU G.tomentosum_HAU-AD3_assembly_total_repeat.gff.gz
Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. tomentosum genome assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.

 

G. arboreum CottonGen RefTrans v1 G.tomentosum_AD3_HAU_g.arboreum_cottongen_refTransV1
G. barbadense CottonGen RefTrans v1 G.tomentosum_AD3_HAU_g.hirsutum_cottongen_refTransV1
G. barbadense CottonGen RefTrans v1 G.tomentosum_AD3_HAU_g.barbadense_cottongen_refTransV1
G. raimondii CottonGen RefTrans v1 G.tomentosum_AD3_HAU_g.raimondii_cottongen_refTransV1