Gossypium harknessii (D2-2) genome CRI_v1

Overview
Analysis NameGossypium harknessii (D2-2) genome CRI_v1
MethodIllumina Hi-C; PacBio SMRTbell; HiSeq X Ten; NovaSeq
Source (v1.0)
Date performed2023-10-07

To delve into the speciation history of Gossypium genus, the authors introduced four novel assemblies – G. harknessii (Ghar), G. Gossypioides (Ggos), G. trilobum (Gtri), and G. klotzschianum (Gklo).

All samples for DNA-seq and RNA-seq were collected from adult plants in the National Wild Cotton Nursery at the Institute of Cotton Research Institute (ICR), Chinese Academy of Agricultural Sciences (CAAS), in Sanya, China. Briefly, the leaves, flowers, stems, apex, bolls, and flower buds were collected, immediately frozen in liquid nitrogen, and stored at −80 °C. The leaves of 15-day-old seedlings were also collected for a high-throughput chromosome conformation capture experiment.

High molecular weight genomic DNA (gDNA) was extracted with a standard CTAB protocol from the leaves of four cotton species: G. harknessii (D2-2), Ggos (D6), Gtri (D8), and Gklo. PacBio SMRTbell long-read sequencing libraries were prepared from Gklo DNA by fragmenting extracted DNA with a Covaris® g-TUBE® Shearing Device. Illumina (San Diego, CA, USA) paired-end sequencing libraries (with an insert size of 350 bp) were generated from the same extracted Gklo gDNA following the manufacturer’s protocol. Short-insert (220 bp and 500 bp insert size) paired-end and large-insert (3, 4, 5 kb) mate pair libraries were prepared from Ggos, Ghar, and Gtri gDNA. Illumina Hi-C was conducted following a previously published protocol (Berkum et al., 2010). Briefly, fresh leaves from Gklo seedlings were fixed in a 1% formaldehyde solution. The nuclei and chromatin were extracted, then digested with DpnII. The overhangs resulting from DpnII digestion were filled in with biotin-14-dCTP (Invitrogen) and Klenow (New England Biolabs [NEB]). After chromatin dilution and re-ligation, gDNA was extracted and purified; purified DNA was sheared to 300-500 bp with a Bioruptor (Diagenode). Finally, the Hi- C library was prepared as previously described (Servant et al., 2015). Total RNAs were extracted from six different tissues using a TRIzol® Reagent RNA isolation kit (Invitrogen). RNA-seq libraries were prepared using the standard Illumina mRNA-seq library preparation kit. 

The PacBio SMRTbell long-read sequencing library was sequenced on the PacBio Sequel Ⅰ platform. The Hi-C library was sequenced on the Illumina HiSeq X Ten platform. PE150 sequencing was conducted on the Illumina HiSeq X Ten platform for the Gklo library. PE125 sequencing was conducted on the Illumina HiSeq X Ten platform for the prepared Ghar, Ggos,Journal Pre-proof and Gtri libraries. Paired-end 150 bp reads were generated on the Illumina NovaSeq platform for the RNA libraries.

 

Publication:

Xu, et al.,Widespread Incomplete Lineage Sorting and Introgression Shaped Adaptive Radiation in the Gossypium genus. PLANT COMMUNICATIONS 2 October 2023.

Assembly

The chromosomes (pseudomolecules) and scaffolds for Gossypium harknessii genome. These files belong to the CRI Assembly v1.0.

Downloads

Chromosomes (FASTA file) Gossypium_harknessii_D22_CRI_v1.0.fasta.gz

 

Functional Analysis

Functional annotation files for the Gossypium harknessii CRI Genome v1.0 are available for download below. The Gossypium harknessii CRI Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan D22_CRI_v1_genes2GO.xlsx.gz
IPR assignments from InterProScan D22_CRI_v1_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs D22_CRI_v1_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways D22_CRI_v1_KEGG-pathways.xlsx.gz
Genes

The Gossypium harknessii D2-2 CRI v1.0 genome gene prediction files are available in FASTA and GFF3 formats.

Downloads

Genes (GFF3 file) Gossypium_harknessii_D22_CRI_v1.0.genes.gff3.gz
CDS (FASTA file) Gossypium_harknessii_D22_CRI_v1.0.cds.fasta.gz
Protein sequences  (FASTA file) Gossypium_harknessii_D22_CRI_v1.0.proteins.fasta.gz
Homology

Homology of the Gossypium harknessii CRI genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-6  for the Arabidoposis proteins (Araport11, 2022-09), UniProtKB/SwissProt (Release 2023-07), and UniProtKB/TrEMBL (Release 2023-07) databases. The best hit reports are available for download in Excel format. 

Protein Homologs

G.harknessii CRI Genome v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) D22_CRI_v1_vs_tair.xlsx.gz
G.harknessii CRI Genome v1.0 proteins with arabidopsis (Araport11) (FASTA file) D22_CRI_v1_vs_tair_hit.fasta.gz
G.harknessii CRI Genome v1.0 proteins without arabidopsis (Araport11) (FASTA file) D22_CRI_v1_vs_tair_noHit.fasta.gz
G.harknessii CRI Genome v1.0 proteins with SwissProt homologs (EXCEL file) D22_CRI_v1_vs_swissprot.xlsx.gz
G.harknessii CRI Genome v1.0 proteins with SwissProt (FASTA file) D22_CRI_v1_vs_swissprot_hit.fasta.gz
G.harknessii CRI Genome v1.0 proteins without SwissProt (FASTA file) D22_CRI_v1_vs_swissprot_noHit.fasta.gz
G.harknessii CRI Genome v1.0 proteins with TrEMBL homologs (EXCEL file) D22_CRI_v1_vs_trembl.xlsx.gz
G.harknessii CRI Genome v1.0 proteins with TrEMBL (FASTA file) D22_CRI_v1_vs_trembl_hit.fasta.gz
G.harknessii CRI Genome v1.0 proteins without TrEMBL (FASTA file) D22_CRI_v1_vs_trembl_noHit.fasta.gz
Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium harknessii CRI v1.0 assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
 
CottonGen SNP markers mapped to genome G.harknessii_D22_CRI_SNP
CottonGen RFLP markers mapped to genome G.harknessii_D22_CRI_RFLP
CottonGen SSR markers mapped to genome G.harknessii_D22_CRI_SSR
CottonGen InDel markers mapped to genome G.harknessii_D22_CRI_InDel
Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. harknessii genome assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.
G. arboreum CottonGen RefTrans v1 G.harknessii_D22_CRI_g.arboreum_cottongen_reftransV1
G. hirsutum CottonGen RefTrans v1 G.harknessii_D22_CRI_g.hirsutum_cottongen_reftransV1
G. barbadense CottonGen RefTrans v1 G.harknessii_D22_CRI_g.barbadense_cottongen_reftransV1
G. raimondii CottonGen RefTrans v1 G.harknessii_D22_CRI_g.raimondii_cottongen_reftransV1