Gossypium barbadense (AD2) 'Yuma' genome HAU_v1

Overview
Analysis NameGossypium barbadense (AD2) 'Yuma' genome HAU_v1
MethodHiFi, HiFiasm, BUSCO, LTR Assembly Index
SourceY2013.zip
Date performed2024-03-08

We selected 12 representative G. barbadense accessions from around the world, including 7 primitively domesticated accessions from South America, 2 Sea Island landrace accessions from the Caribbean region, and 3 cultivated accessions. An average of 63.1 GB of high-fidelity (HiFi) reads were generated for the 12 accessions. These were initially assembled via HiFiasm into individual genomes ranging from 2.21 to 2.25GB in size (Table 1) and with contig N50 ranging from 55.0 Mb to 102.7 Mb (average = 70.4 Mb), longer than the previously published genome of G. barbadense.  These 98.40% (range 97.48% to 98.93%) assembled contigs were further anchored and ordered into 26 pseudo-chromosomes based on the 3-79 reference genome.  Assembly completeness was high for all assemblies, which contained more than 99.5% complete BUSCOs, and the LTR Assembly Index (LAI) scores ranged from 13.78 to 15.18 per genome, which is considered reference quality according to LAI scores (Table 1). 

Table 1. Statistics of the genomic assembly and annotation of 12 G. barbadense accessions

Sample_ID Y2003 Y2005 Y2010 Y2013 Y2016 Y2029 Y2031 Y2032 Y2033 Y2034 Y2036 Y3048
Germplasm Name GB249 AZB51 AZB375 Yuma AZK101 Giza 7 AZB339 AZB634 GB660 GB776 Junhai-1 CEG
Total length (MB) 2,240 2,299 2,237 2,244 2,213 2,237 2,247 2,252 2,235 2,269 2,267 2,251
Anchor and Orient (%) 99.5 99.5 99.6 99.6 99.6 99.6 99.5 99.6 99.5 99.5 99.6 99.6
Contigs N50 (BP) 64.72 67.25 77.88 102.72 65.91 56.54 61.99 102.73 56.31 61.89 68.79 64.62
GC content(%) 34.42 34.56 34.37 34.39 34.18 34.36 34.43 34.54 34.41 34.63 34.55 34.44
BUSCO (%) 99.5 99.5 99.6 99.6 99.6 99.6 99.5 99.6 99.5 99.5 99.6 99.6
LTR Assembly Index (LAI) 14.51 14.69 15.01 14.46 14.83 14.17 14.18 14.54 14.08 13.78 15.18 15.11
Repeatation 72.63 69.88 69.61 72.63 68.92 72.62 72.69 71.46 70.97 72.16 72.89 72.51
Number of Genes  71,549 74,094 71,963 74,046 71,312 76,012 72,247 72,870 72,281 74,527 75,774 75,755
Assembly

The chromosomes (pseudomolecules) for Gossypium barbadense 'Yuma' genome. These files belong to the Gossypium barbadense (AD2) ' Yuma' genome HAU_v1.

Chromosomes (FASTA format) G.barbadense_HAU-12GB-Yuma.genome.fasta
Functional Analysis

Functional annotation files for the Gossypium barbadense Yuma Genome v1.0 are available for download below. The Gossypium barbadense Yuma Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan AD2_Y2013_v1_genes2GO.xlsx.gz
IPR assignments from InterProScan AD2_Y2013_v1_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs AD2_Y2013_v1_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways AD2_Y2013_v1_KEGG-pathways.xlsx.gz
Genes

The predicted gene model, their alignments and proteins for Gossypium barbadense ' Yuma.' genome. These files belong to the Gossypium barbadense (AD2) ' Yuma' genome HAU_v1

Predicted gene models with exons (GFF3 format) G.barbadense_HAU_Yuma.gene.gff.gz
Coding sequences, CDS (FASTA format) G.barbadense_HAU_Yuma.gene.cds.fa.gz
Protein sequences (FASTA format) G.barbadense_HAU_Yuma.gene.pep.fa.gz
Homology

Homology of the Gossypium barbadense Yuma genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-6  for the Arabidoposis proteins (Araport11, 2022-09), UniProtKB/SwissProt (Release 2023-07), and UniProtKB/TrEMBL (Release 2023-07) databases. The best hit reports are available for download in Excel format. 

Protein Homologs

G. barbadense Yuma Genome v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) AD2_Y2013_v1_vs_tair.xlsx.gz
G. barbadense Yuma Genome v1.0 proteins with arabidopsis (Araport11) (FASTA file) AD2_Y2013_v1_vs_tair_hit.fasta.gz
G. barbadense Yuma Genome v1.0 proteins without arabidopsis (Araport11) (FASTA file) AD2_Y2013_v1_vs_tair_noHit.fasta.gz
G. barbadense Yuma Genome v1.0 proteins with SwissProt homologs (EXCEL file) AD2_Y2013_v1_vs_swissprot.xlsx.gz
G. barbadense Yuma Genome v1.0 proteins with SwissProt (FASTA file) AD2_Y2013_v1_vs_swissprot_hit.fasta.gz
G. barbadense Yuma Genome v1.0 proteins without SwissProt (FASTA file) AD2_Y2013_v1_vs_swissprot_noHit.fasta.gz
G. barbadense Yuma Genome v1.0 proteins with TrEMBL homologs (EXCEL file) AD2_Y2013_v1_vs_trembl.xlsx.gz
G. barbadense Yuma Genome v1.0 proteins with TrEMBL (FASTA file) AD2_Y2013_v1_vs_trembl_hit.fasta.gz
G. barbadense Yuma Genome v1.0 proteins without TrEMBL (FASTA file) AD2_Y2013_v1_vs_trembl_noHit.fasta.gz
Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium barbadense Yuma v1.0 assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
 
CottonGen SNP markers mapped to genome AD2_Y2013_SNP
CottonGen RFLP markers mapped to genome AD2_Y2013_RFLP
CottonGen SSR markers mapped to genome AD2_Y2013_SSR
CottonGen InDel markers mapped to genome AD2_Y2013_InDel
Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. barbadense Yuma genome assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.
G. arboreum CottonGen RefTrans v1 AD2_Y2013_g.arboreum_cottongen_reftransV1
G. hirsutum CottonGen RefTrans v1 AD2_Y2013_g.hirsutum_cottongen_reftransV1
G. barbadense CottonGen RefTrans v1 AD2_Y2013_g.barbadense_cottongen_reftransV1
G. raimondii CottonGen RefTrans v1 AD2_Y2013_g.raimondii_cottongen_reftransV1