Gossypium hirsutum (AD1) 'Bar32' genome NSF_v1

Overview
Analysis NameGossypium hirsutum (AD1) 'Bar32' genome NSF_v1
MethodPacBio Sequel, hifiasm v. 0.13-r307
SourceBar32
Date performed2022-01-11

Acid-delinted seeds of BARBREN-713 and BAR 32-30 (G. hirsutum) were chipped and rolled in moist germination towels. Rolled towels were placed in an incubator set at 30C for 4 days to germinate seeds. DNA was extracted from resulting seedlings using the Qiagen Genomic Tip kit (Qiagen, Hilden, Germany). The sequencing libraries were constructed at the Brigham Young University DNA Sequencing Center (DNASC). DNA shearing of both libraries was done on a Megaruptor2 (~20 kb) (Diagenonde Inc., Denville, NJ, USA). HMW DNA was partitioned into 13 bins using the Sage Elf (Sage Science, Beverly, MA, USA), and the top 5 bins were run on a Fragment Analyzer (Agilent Technologies, Santa Clara, CA, USA) to select the appropriate bin size range (15–18 kb). Libraries were made using the SMRTbell Express Template Prep kit as recommended by Pacific Biosciences (PacBio; Menlo Park, CA, USA). Five PacBio cells were sequenced from each library on the Pacific Biosciences Sequel 2 system. The PacBio reads were assembled using hifiasm (Cheng et al. 2021) and default parameters. Both assembled genomes were aligned to previously assembled genomes of G. hirsutum using minimap2 (Li 2018; Chen et al. 2020) and visualized by dotPlotly (https://github.com/tpoorten/dotPlotly, last accessed 8/17/21). Manual scaffolding was used to create pseudomolecules of both genomes.

Hi-C libraries were constructed from the same seedling tissue using the Plant Hi-C Kit (Phase Genomics, Seattle, WA, USA). Short-read sequencing (Illumina, San Diego, CA, USA; 150PE) of the libraries was performed by BGI Americas Corp (Cambridge, MA, USA). The Hi-C data of both genomes were mapped to their respective assembled genome sequence using bwa mem (Li and Durbin 2009). The Hi-C interactions were used as evidence for contig proximity and in scaffolding contig sequences. Within their respective set of mapped reads, matlock (https://github. com/phasegenomics/matlock, last accessed 8/17/21) was used to identify linkages between different genomic regions in the bam file. Juicebox (Robinson et al. 2018) was used to visualize the linkages along the pseudomolecules.

Table 1. The assembled genomes of BARBREN-713, BAR 32-30, and TM-1 (Chen et al. 2020)

Genomic feature BARBREN-713 BAR 32-30 UTX-TM1_v2.1
Number of contigsa 6,403 4,716 6,733
Max contig (MB) 126.1 125.5 9.0
Mean contig (MB) 0.4 0.5 0.3
Contig N50L (MB) 64.8 75.3 0.8
Contig N90L (MB) 56.7 56.0 0.2
Total contig length (MB) 2,558.7 2,454.3 2,302.3
Assembly GC (%) 36.1 35.1 34.4
Number of scaffoldsb 26 26 1,025
Max scaffold (MB) 127.5 127.4 128.2
Mean scaffold (MB) 88.3 88.3 2.2
Scaffold N50L 108.0 108.3 108.1
Scaffold N90L 61.06 61.15 5.94
Total scaffold length (MB) 2,296 2,296 2,305
Number of genes 75,723 75,988 75,376
Repeat sequences (%) 76.0 76.9 73.2

  a Contig metrics reports are from the raw output file of hifiasm.
  b Scaffold metrics reported are after manual scaffolding.

 Publication: Perkin et al., Genome assembly of two nematode-resistant cotton lines (Gossypium hirsutum L.). G3 5 August 2021.

Assembly

The chromosomes (pseudomolecules) for Gossypium hirsutum BARBREN-713 genome. These files belong to the Gossypium hirsutum (AD1) 'B713' genome NSF_v1

Chromosomes (FASTA format) G.hirsutum_Bar32.genome.fasta
Functional Analysis

Functional annotation files for the Gossypium hirsutum NSF Genome v1.0 are available for download below. The Gossypium hirsutum NSF Genome v1.0 proteins were analyzed using InterProScan in order to assign InterPro domains and Gene Ontology (GO) terms. Pathways analysis was performed using the KEGG Automatic Annotation Server (KAAS).

Downloads

GO assignments from InterProScan AD1_NSF-Bar32_v1_genes2GO.xlsx.gz
IPR assignments from InterProScan AD1_NSF-Bar32_v1_genes2IPR.xlsx.gz
Proteins mapped to KEGG Orthologs AD1_NSF-Bar32_v1_KEGG-orthologis.xlsx.gz
Proteins mapped to KEGG Pathways AD1_NSF-Bar32_v1_KEGG-pathways.xlsx.gz

 

Genes

The predicted gene model, their alignments and proteins for Gossypium hirsutum'(AD1)' Bar32-30 genome. These files belong to the Gossypium hirsutum (AD1) 'Bar32' genome NSF_v1

Predicted gene models with exons (GFF3 format) G.hirsutum_NSF_AD1-Bar32.gene.gff3.gz
Coding sequences, CDS (FASTA format) G.hirsutum_NSF_AD1-Bar32.transcripts.fa.gz
Protein sequences (FASTA format) G.hirsutum_NSF_AD1-Bar32.protein.fa.gz
Homology

Homology of the Gossypium hirsutum NSF Genome v1.0 proteins was determined by pairwise sequence comparison using the blastp algorithm against various protein databases. An expectation value cutoff less than 1e-9 was used for the NCBI nr (Release 2021-09) and 1e-6  for the Arabidoposis proteins (Araport11), UniProtKB/SwissProt (Release 2021-09), and UniProtKB/TrEMBL (Release 2021-09) databases. The best hit reports are available for download in Excel format. 

 

Protein Homologs

G.hirsutum NSF Genome v1.0 proteins with NCBI nr homologs (EXCEL file) AD1_NSF-Bar32_v1_vs_nr.xlsx.gz
G.hirsutum NSF Genome v1.0 proteins with NCBI nr (FASTA file) AD1_NSF-Bar32_v1_vs_nr_hit.fasta.gz
G.hirsutum NSF Genome v1.0 proteins without NCBI nr (FASTA file) AD1_NSF-Bar32_v1_vs_nr_noHit.fasta.gz
G.hirsutum NSF Genome v1.0 proteins with arabidopsis (Araport11) homologs (EXCEL file) AD1_NSF-Bar32_v1_vs_tair.xlsx.gz
G.hirsutum NSF Genome v1.0 proteins with arabidopsis (Araport11) (FASTA file) AD1_NSF-Bar32_v1_vs_tair_hit.fasta.gz
G.hirsutum NSF Genome v1.0 proteins without arabidopsis (Araport11) (FASTA file) AD1_NSF-Bar32_v1_vs_tair_noHit.fasta.gz
G.hirsutum NSF Genome v1.0 proteins with SwissProt homologs (EXCEL file) AD1_NSF-Bar32_v1_vs_swissprot.xlsx.gz
G.hirsutum NSF Genome v1.0 proteins with SwissProt (FASTA file) AD1_NSF-Bar32_v1_vs_swissprot_hit.fasta.gz
G.hirsutum NSF Genome v1.0 proteins without SwissProt (FASTA file) AD1_NSF-Bar32_v1_vs_swissprot_noHit.fasta.gz
G.hirsutum NSF Genome v1.0 proteins with TrEMBL homologs (EXCEL file) AD1_NSF-Bar32_v1_vs_trembl.xlsx.gz
G.hirsutum NSF Genome v1.0 proteins with TrEMBL (FASTA file) AD1_NSF-Bar32_v1_vs_trembl_hit.fasta.gz
G.hirsutum NSF Genome v1.0 proteins without TrEMBL (FASTA file) AD1_NSF-Bar32_v1_vs_trembl_noHit.fasta.gz

 

Markers
Marker alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map marker sequences from CottonGen to the Gossypium hirsutum NSF me assembly. Markers required 90% identity over 97% of their length. For SSRs & RFLPs, gap size was restricted to 1000bp or less with less than 2 gaps. For dbSNPs and Indels gap size was restricted to 2bp with less than 2 gaps. The available files are in GFF3 format. Markers available in CottonGen are linked to JBrowse.
 
CottonGen SNP markers mapped to genome G.hirsutum_AD1_NSF-Bar32_SNP
CottonGen RFLP markers mapped to genome G.hirsutum_AD1_NSF-Bar32_RFLP
CottonGen SSR markers mapped to genome G.hirsutum_AD1_NSF-Bar32_SSR
CottonGen InDel markers mapped to genome G.hirsutum_AD1_NSF-Bar32_InDel

 

Transcript Alignments
Transcript alignments were performed by the CottonGen Team of Main Bioinformatics Lab at WSU. The alignment tool 'BLAT' was used to map transcripts to the G. hirsutum genome assembly. Alignments with an alignment length of 97% and 90% identify were preserved. The available files are in GFF3 format.

 

G. arboreum CottonGen RefTrans v1 G.hirsutum_AD1_NSF-Bar32_g.arboreum_cottongen_reftransV1
G. hirsutum CottonGen RefTrans v1 G.hirsutum_AD1_NSF-Bar32_g.hirsutum_cottongen_reftransV1
G. barbadense CottonGen RefTrans v1 G.hirsutum_AD1_NSF-Bar32_g.barbadense_cottongen_reftransV1
G. raimondii CottonGen RefTrans v1 G.hirsutum_AD1_NSF-Bar32_g.raimondii_cottongen_reftransV1