Description of Sequence Datasets

  • Curated Genes:
    • CottonGen Gene Database: A single non-redundant list of Gossypium genes with gene symbols. The majority of them are parsed out from the NCBI nucleotide database but it includes some user-contributed data. Multiple gene sequences, from different sources, are associated with their respective genes from the CottonGen Gene Database.
    • NCBI Gossypium gene and mRNA sequences: All gene, mRNA, and nucleotide sequences parsed out from the NCBI nucleotide database. Gene and mRNA sequences from NCBI for all Gossypium species are anchored to the cotton whole genome assemblies using blat with criteria of >98% PID and >95% Aligned Length.
       
  • Predicted Genes: Genes and mRNAs from whole-genome assemblies. Additional annotation of these predicted genes by the CottonGen team includes computational annotation with homology to genes of closely related or plant model species and assignment of InterPro protein domains, GO terms and Kyoto Encyclopedia of Genes and Genomes database (KEGG) pathway and ortholog terms.
     
  • RefTrans: RefTrans combines published RNA-Seq and EST data sets to create a reference transcriptome (RefTrans) provides putative gene function identified by homology to known proteins. Four species of RefTrans G.arboreum, G.barbadense, G.hirsutum, and G.raimondii available at CottonGen.