Gossypium hirsutum (AD1) 'PSC355' genome USDA_v1

Overview
Analysis NameGossypium hirsutum (AD1) 'PSC355' genome USDA_v1
MethodHi-C libraries; matlock; Juicebox
Source (v1)
Date performed2023-10-15

The sequencing libraries were constructed at the Brigham Young University DNA Sequencing Center (DNASC). High Molecular Weight (HMW) DNA was partitioned into 13 bins using the Sage Elf (Sage Science, Beverly, MA, USA), and the top five bins were run on a Fragment Analyzer (Agilent Technologies, Santa Clara, CA, USA) to select the appropriate bin size range (15-18 kb). Libraries were made using the SMRTbell Express Template Prep kit as recommended by Pacific Biosciences (PacBio, Menlo Park, CA, USA). Five PacBio cells were sequenced on the Pacific Biosciences Sequel II system, and the reads were assembled using hifiasm (Cheng et al. 2021) with default parameters. Manual scaffolding was used to create pseudomolecules of PSC355.  

Hi-C libraries were constructed from the same seedling tissue using the Plant Hi-C Kit (Phase Genomics, Seattle, WA, USA). Short-read sequencing (Illumina, San Diego, CA, USA; 150PE) of the libraries was performed by BGI Americas Corp (Cambridge, MA, USA). The Hi-C data was mapped to their respective assembled genome sequence using bwa mem (Li and Durbin 2009). Within their respective set of mapped reads, matlock (https://github.com/phasegenomics/matlock) was used to identify linkages between different genomic regions in the bam file. Juicebox (Durand et al. 2016) was used to visualize the linkages along the pseudomolecules.   

Centromeric locations were estimated by mapping CENH3 Chip-seq reads from leaf tissue (Hu et al. 2019). Reads from immunoprecipitated samples were compared to control samples using the deeptools program bamCompare (Ramírez et al. 2016) with visualization in IGV. Telomeric repeats and their respective lengths were manually identified in Geneious (Biomatters, Ltd.).   

Assembly

The chromosomes (pseudomolecules) for Gossypium hirsutum cv 'Phytogen PSC 355' genome. These files belong to the Gossypium hirsutum (AD1) 'PSC355' genome USDA_v1

Chromosomes (FASTA format) G.hirsutum_USDA_AD1-PSC355.genome.fasta
Functional Analysis

Functional annotation files for the Gossypium hirsutum USDA Genome v1.0 are available for download below. The Gossypium hirsutum USDA Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan AD1_USDA_PSC355_v1_genes2GO.xlsx.gz
IPR assignments from InterProScan AD1_USDA_PSC355_v1_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs AD1_USDA_PSC355_v1_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways AD1_USDA_PSC355_v1_KEGG-pathways.xlsx.gz
Genes

The predicted gene model, their alignments and proteins for Gossypium hirsutum cv 'Phytogen PSC 355' genome. These files belong to the Gossypium hirsutum (AD1) 'PSC355' genome USDA_v1

Predicted gene models with exons (GFF3 format) G.hirsutum_USDA_AD1-PSC355.gff3.gz
Coding sequences, mRNA (FASTA format) G.hirsutum_USDA_AD1-PSC355.mRNA.fa.gz
Protein sequences (FASTA format) G.hirsutum_USDA_AD1-PSC355.pep.fa.gz
Homology

Homology of the Gossypium hirsutum USDA-PSC355 genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-6  for the Arabidoposis proteins (Araport11, 2022-09), UniProtKB/SwissProt (Release 2023-07), and UniProtKB/TrEMBL (Release 2023-07) databases. The best hit reports are available for download in Excel format. 

Protein Homologs

G.hirsutum USDA Genome v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) AD1_USDA_PSC355_v1_vs_tair.xlsx.gz
G.hirsutum USDA Genome v1.0 proteins with arabidopsis (Araport11) (FASTA file) AD1_USDA_PSC355_v1_vs_tair_hit.fasta.gz
G.hirsutum USDA Genome v1.0 proteins without arabidopsis (Araport11) (FASTA file) AD1_USDA_PSC355_v1_vs_tair_noHit.fasta.gz
G.hirsutum USDA Genome v1.0 proteins with SwissProt homologs (EXCEL file) AD1_USDA_PSC355_v1_vs_swissprot.xlsx.gz
G.hirsutum USDA Genome v1.0 proteins with SwissProt (FASTA file) AD1_USDA_PSC355_v1_vs_swissprot_hit.fasta.gz
G.hirsutum USDA Genome v1.0 proteins without SwissProt (FASTA file) AD1_USDA_PSC355_v1_vs_swissprot_noHit.fasta.gz
G.hirsutum USDA Genome v1.0 proteins with TrEMBL homologs (EXCEL file) AD1_USDA_PSC355_v1_vs_trembl.xlsx.gz
G.hirsutum USDA Genome v1.0 proteins with TrEMBL (FASTA file) AD1_USDA_PSC355_v1_vs_trembl_hit.fasta.gz
G.hirsutum USDA Genome v1.0 proteins without TrEMBL (FASTA file) AD1_USDA_PSC355_v1_vs_trembl_noHit.fasta.gz
Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium hirsutum USDA v1.0 assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
 
CottonGen SNP markers mapped to genome G.hirsutum_AD1_PSC355_USDA_SNP
CottonGen RFLP markers mapped to genome G.hirsutum_AD1_PSC355_USDA_RFLP
CottonGen SSR markers mapped to genome G.hirsutum_AD1_PSC355_USDA_SSR
CottonGen InDel markers mapped to genome G.hirsutum_AD1_PSC355_USDA_InDel
Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. hirsutum USDA genome assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.
G. arboreum CottonGen RefTrans v1 AD1_PSC355_USDA_g.arboreum_cottongen_reftransV1
G. hirsutum CottonGen RefTrans v1 AD1_PSC355_USDA_g.hirsutum_cottongen_reftransV1
G. barbadense CottonGen RefTrans v1 AD1_PSC355_USDA_g.barbadense_cottongen_reftransV1
G. raimondii CottonGen RefTrans v1 AD1_PSC355_USDA_g.raimondii_cottongen_reftransV1