Gossypium hirsutum (AD1) 'TM-1' genome HAU_v2.0

Analysis NameGossypium hirsutum (AD1) 'TM-1' genome HAU_v2.0
MethodONT and PacBio; Canu v2.1.1; Hi-C reads based on DpnII
Source (v2)
Date performed2024-11-08

About the Assembly

We assembled the TM-1 and 3-79 genomes with Canu (version 2.1.1), which included correction, trimming, and assembly in three steps. We performed these steps manually. First, ONT reads and PacBio reads from both genomes were corrected and trimmed using Canu with default parameters (correctedErrorRate = 0.045 for PacBio reads; correctedErrorRate = 0.144 for Nanopore reads). Trimmed highquality reads from ONT (~40x) and PacBio (~40x) were delivered as input to Canu using a mix of formats with default parameters. To improve base quality, we aligned Illumina paired-end reads (~50x) to contigs using BWA–MEM and polished them with Pilon (version 1.23) (–fix bases –mindepth 10 –minmq 30). High-quality paired-end Hi-C reads based on DpnII for G. hirsutum TM-1 and G. barbadense 3-79 were mapped to the two contig-scale assemblies using Juicer (version 1.6). The original contigs were organized into chromosomes with the 3D-DNA pipeline (version 180 419) (-r 2 -i 15000 –buildgapped- map). Finally, we used Juicebox Assembly Tools (v1.11.08) to manually correct and refine the connections.

 Summary of assemblies of G. hirsutum and G. barbadense genome

  G. hirsutum G. barbadense
Contig N90 5,021,880 1,985,260
Contig N80 9,197,101 4,198,208
Contig N70 13,863,239 6,962,112
Contig N60 18,410,734 12,139,909
Contig N50 21,961,441 12,139,909
Longest contig length 84,707,317 49,101,280
Contig number 1,418 2,064
Total size 2,282,609,487 2,216,666,023
Scaffold N50 108, 550, 191 93, 110, 895
Scaffold number 582 1,688
Genome size 2,324,185,275 2,254,770,940
Pseudochromosomes size (Mb) 2,302,180,022 2,169,217,327
Percentage of anchoring 99.05% 97.20%

The chromosomes (pseudomolecules) and scaffolds for G. hrisutum 'TM-1' genome. These files belong to the Gossypium hirsutum (AD1) 'TM-1' genome HAU_v2.0

Chromosomes & scaffolds (FASTA format) G.hirsutum_HAU-TM1_assembly_v2.0.fasta.gz
Functional Annotation

Functional annotation files for the Gossypium hirsutum TM-1 Genome v2.0 are available for download below. The Gossypium hirsutum TM-1 Genome v2.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).


GO assignments from InterProScan AD1_HAU_v2_genes2GO.xlsx.gz
IPR assignments from InterProScan AD1_HAU_v2_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs AD1_HAU_v2_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways AD1_HAU_v2_KEGG-pathways.xlsx.gz

The predicted gene model, their alignments and proteins for G. hirsutum 'TM-1' genome. These files belong to the Gossypium hirsutum (AD1) 'TM-1' genome HAU_v2.0

Predicted gene models with exons (GFF3 format) G.hirsutum_HAU-AD1_v2.0_gene.gff3.gz
Coding sequences, CDS (FASTA format)  
Protein sequences (FASTA format)  

Chang, Xing, Xin He, Jianying Li, Zhenping Liu, Ruizhen Pi, Xuanxuan Luo, Ruipeng Wang et al. "High-quality Gossypium hirsutum and Gossypium barbadense genome assemblies reveal the landscape and evolution of centromeres." Plant Communications 5, no. 2 (2024). doi.org/10.1016/j.xplc.2023.100722