Analysis NameGossypium hirsutum (AD1) 'TX-1000' genome CRI_v1
MethodPacBio Sequel, FALCON v. 1 (v.1)
Source (v1)
Date performed2023-03-22

Three allotetraploid cotton genomes (Gossypium ekmanianum [(AD)6, Ge], Gossypium stephensii [(AD)7, Gs], and one early form of domesticated Gossypium hirsutum, race punctatum [(AD)1, Ghp].) were sequenced and assembled using a combination of sequencing technologies, including single-molecule realtime (PacBio), paired-end Illumina sequencing, and chromatin conformation capture (Hi-C). An initial assembly was generated via FALCON (22) using at least 20.83 million PacBio long reads for each cotton species and subsequently corrected using Illumina paired-end data (average 120-fold coverage). These megabase assemblies (N50 of 1.57, 1.23, and 11.49 Mb for Ge, Gs, and Ghp, respectively) were combined with Hi-C interaction information to produce chromosome-scale scaffolds, yielding final assemblies of 2.34, 2.29, and 2.29 Gb for Ge, Gs, and Ghp, respectively. These high-quality assemblies had scaffold N50 values of more than 107 Mb (Table 1), with 99% of bases anchored onto chromosomes and with 99% of mapped Illumina reads covering about 97% of the genomes.  Nearly all of the 1,614 Embryophyta benchmarking universal single-copy orthologs (BUSCOs, embryophyta_odb10) were complete in the Ge (99.2%), Gs (97.4%), and Ghp (99.3%) assemblies and the long terminal repeat (LTR) assembly index (LAI score 13.7 in Ge, 12.8 in Gs, and 12.7 in Ghp) further indicated that these three assemblies could be considered "reference quality".

Table 1: Features of three tetraploid cotton assemblies

Genomic feature G.ekmanianum (AD6) G.stephensii (AD7) G.hitsutum (AD1-TX1000)
    Genome size (Mb)  2,341.87 2,291.84 2,292.48
    Scaffold number 160 243 277
    Scaffold N50 (Mb) 108.06 108.2 106.96
    Contig size (Mb) 2,341.51 2,291.47 2292.40
    Contig number 3,781 3,927 1,111
    Contig N50 (Mb) 1.57 1.23 11.49
    Gap number 3,621 3,684 834
    Gap length (Mb) 0.36 0.37 0.08
    Pseudochromosomes size (Mb) 2,337.03 2,272.89 2,283.07
    TE percentage 64.86 63.01 64.89
    Gene number 74,978 74,970 74,520
    Genes in pseudochromosomes 74,038 74,324 74,283
    Complete BUSCOs (%) 95.50 97.10 95.40

The chromosomes (pseudomolecules) for Gossypium hirsutum race punctatum accession no. Punctatum 25 (TX-1000) genome. These files belong to the Gossypium hirsutum (AD1) 'TX-1000' genome CRI_v1

Chromosomes (FASTA format) G.hirsutum_CRI_AD1-TX1000.genome.fasta

The predicted gene model, their alignments and proteins for Gossypium hirsutum race punctatum accession no. Punctatum 25 (TX-1000) genome. These files belong to the Gossypium hirsutum (AD1) 'TX-1000' genome CRI_v1

Predicted gene models with exons (GFF3 format) G.hirsutum_CRI_AD1-TX1000.gff3.gz
Coding sequences, CDS (FASTA format) G.hirsutum_CRI_AD1-TX1000.cds.fa.gz
Protein sequences (FASTA format) G.hirsutum_CRI_AD1-TX-1000.pep.fa.gz

