Gossypium herbaceum (A1) 'A1a wild' genome HAU_v1

Overview
Analysis NameGossypium herbaceum (A1) 'A1a wild' genome HAU_v1
MethodOxford Nanopore; Illumina (Canu v. 1.3)
Source (v1)
Date performed2023-09-08

About the assembly

Seven new assembly and annotation of seven diploid cotton genomes were reported in Wang, et al. Net. Genet. 2022 Dec. They are: Two of G. herbaceum (A1) genomes (a wild form 'A1a' and an A1 cultivar 'ZhongCao1’), each one of G. anomalum (B1), G. sturtianum (C1), G. stocksii (E1), G. longicalyx (F1) and G. bickii (G1). The seven genomes were assembled by integration of Nanopore long reads (126–161×), Illumina short reads (52–79×) and high-throughput chromosome conformation capture (Hi-C) data. Table 1 is the summary of detailed information in 7 genome assemblies.

Table 1. Summary of detailed information in 7 genome assemblies.

Assembly Metrics (bp) G.herbaceum (A1a) G.herbaceum (A1) G.anomalum (B1) G.sturtianum (C1) G.stocksii (E1) G.longicalyx (F1) G.bickii (G1)
Total length of all contigs 151,8491,120 1,621,008,062 1,202,727,438 1,903,530,088 1,442,088,789 1,198,534,575 1,606,432,167
Number of contigs 759 848 162 564 126 148 154
Contig N50 10,515,852 11,199,225 19,802,926 7,671,138 28,801,805 20,138,000 22,888,721
Contig N90 3,279,696 3,426,106 4,795,930 1,970,427 6,479,102 5,689,122 5,951,034
Minimum contig length 1,618 3,584 254,410 30,321 39,072 40,977 31,914
Average contig length 1,785,825 1,910,051 3,836,759 3,375,053 11,445,149 8,098,206 10,431,377
Maximum contig length 46,901,260 44,454,407 10,090,100 31,024,449 64,292,180 56,164,400 74,215,947
Total length of scaffolds 1,518,510,620 1,514,399,279 1,202,738,338 1,903,576,288 1,442,098,089 1,198,547,475 1,606,445,567
Number of scaffolds 600 651 53 122 33 19 20
Scaffold N50 123,512,226 124,037,455 98,351,605 156,041,810 115,821,386 95,944,508 132,272,514
Scaffold N90 94,341,082 94,910,960 73,514,615 110,824,594 88,888,218 76,618,481 97,521,415
Minimum scaffold length 3,584 1,618 30,275 30,321 39,072 66,792 31,914
Average scaffold length 2,530,851 2,326,266 22,693,176 15,603,084 43,699,942 63,081,446 80,322,278
Maximum scaffold length 134,223,852 137,970,533 107,526,953 173,404,516 129,798,129 110,469,152 152,440,763
Anchored length 1,496,892,602 1,490,728,467 1,197,579,856 1,891,333,840 1,438,429,494 1,197,329,593 1,605,310,299

 

Publication

Wang M et al., Genomic innovation and regulatory rewiring during evolution of the cotton genus Gossypium. Nat Genet, 2022 Dec;54(12):1959-1971

 

Assembly

The chromosomes (pseudomolecules) and scaffolds for Gossypium herbaceum (A1) wild genome. This file belongs to the HAU G. herbaceum A1 wild Assembly v1.0.

Chromosomes & scaffolds (FASTA format) G.herbaceum_ A1_wild_genome_HAU.fa.gz
Functional Analysis

Functional annotation files for the Gossypium herbaceum HAU Genome v1.0 are available for download below. The Gossypium herbaceum HAU Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan A1_HAU_wild_v1_genes2GO.xlsx.gz
IPR assignments from InterProScan A1_HAU_wild_v1_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs A1_HAU_wild_v1_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways A1_HAU_wild_v1_KEGG-pathways.xlsx.gz
Genes

The predicted gene model, their alignments and proteins for Gossypium herbaceum (A1) wild genome. These files belong to the HAU G. herbaceum A1 wild Assembly v1.0.

Predicted gene models with exons (GFF3 format) G.herbaceum_ A1_wild_HAU.gff3.gz
Coding sequences, CDS (FASTA format) G.herbaceum_ A1_wild_HAU.cds.fa.gz
Protein sequences (FASTA format) G.herbaceum_A1_wild_HAU.pep.fa.gz
Homology

Homology of the Gossypium herbaceum HAU Genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2021-09) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2021-09), and UniProtKB/TrEMBL (Release 2021-09) databases. The best hit reports are available for download in Excel format. 

Protein Homologs

G.herbaceum HAU Genome v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) A1_HAU_wild_v1_vs_tair.xlsx.gz
G.herbaceum HAU Genome v1.0 proteins with arabidopsis (Araport11) (FASTA file) A1_HAU_wild_v1_vs_tair_hit.fasta.gz
G.herbaceum HAU Genome v1.0 proteins without arabidopsis (Araport11) (FASTA file) A1_HAU_wild_v1_vs_tair_noHit.fasta.gz
G.herbaceum HAU Genome v1.0 proteins with SwissProt homologs (EXCEL file) A1_HAU_wild_v1_vs_swissprot.xlsx.gz
G.herbaceum HAU Genome v1.0 proteins with SwissProt (FASTA file) A1_HAU_wild_v1_vs_swissprot_hit.fasta.gz
G.herbaceum HAU Genome v1.0 proteins without SwissProt (FASTA file) A1_HAU_wild_v1_vs_swissprot_noHit.fasta.gz
G.herbaceum HAU Genome v1.0 proteins with TrEMBL homologs (EXCEL file) A1_HAU_wild_v1_vs_trembl.xlsx.gz
G.herbaceum HAU Genome v1.0 proteins with TrEMBL (FASTA file) A1_HAU_wild_v1_vs_trembl_hit.fasta.gz
G.herbaceum HAU Genome v1.0 proteins without TrEMBL (FASTA file) A1_HAU_wild_v1_vs_trembl_noHit.fasta.gz
Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium herbaceum HAU v1.0 assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
 
CottonGen SNP markers mapped to genome G.herbaceum_A1_HAU_wild_SNP
CottonGen RFLP markers mapped to genome G.herbaceum_A1_HAU_wild_RFLP
CottonGen SSR markers mapped to genome G.herbaceum_A1_HAU_wild_SSR
CottonGen InDel markers mapped to genome G.herbaceum_A1_HAU_wild_InDel
Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. herbaceum genome assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.

 

G. arboreum CottonGen RefTrans v1 G.herbaceum_A1_HAU_wild_g.arboreum_cottongen_reftransV1
G. hirsutum CottonGen RefTrans v1 G.herbaceum_A1_HAU_wild_g.hirsutum_cottongen_reftransV1
G. barbadense CottonGen RefTrans v1 G.herbaceum_A1_HAU_wild_g.barbadense_cottongen_reftransV1
G. raimondii CottonGen RefTrans v1 G.herbaceum_A1_HAU_wild_g.raimondii_cottongen_reftransV1