Gossypium unigene v1.0

Analysis NameGossypium unigene v1.0
SourceGenbank Gossypium ESTs (September 16, 2012)
Date performed2012-09-27

This is the first version of the Gossypium unigene. This build used ESTs downloaded from the genus Gossypium in the NCBI dbEST database.  The Gossypium ESTs included in this assembly were downloaded on September 16, 2012.

Not all of the Gossypium ESTs are of high quality. To filter, we crossmatched the public sequences against NCBI's UniVec database and used the BLAST sequence similarity algorithm to remove species-specific chloroplast, mitochondrial, tRNA, and rRNA sequences. To reduce redundancy and create longer transcripts we assembled these ESTs using the CAP31 program. The final assembly has been annotated by BLAST sequence similarity searching against Swiss-Prot2TrEMBL3TAIR4 Arabidopsis proteins, Prunus persica5Populus trichocarpa6 and Vitis vinifera7.

For more information on this project please contact the Cottongen development team.

 Processing Summary
 Number of ESTs available 442,954
 Number of ESTs available after filtering 437,185
 Average Length 647
 Number of Contigs(CAP3 Assembly, -p 90 ) 21,698
 Average Length of Contigs 1080
 Number of Singlets 128,218
 Number of Putative Unigenes 149,916


  1. Huan, X. and Madan, A. (1999). CAP3: A DNA sequence assembly program. Genome Research, 9, 868-877.
  2. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. (1990) Basic local alignment search tool. J Mol Biol. 215(3):403-10.
  3. Boeckmann B., Bairoch A., Apweiler R., Blatter M.-C., Estreicher A., Gasteiger E., Martin M.J., Michoud K., O'Donovan C., Phan I., Pilbout S., and Sneider M. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research. 31:365-370.
  4. Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia-Hernandez M, Huala E, Lander G, Montoya M, Miller N, Mueller LA, Mundodi S, Reiser L, Tacklind J, Weems DC, Wu Y, Xu I, Yoo D, Yoon J, Zhang P. (2003) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gate way to Arabidopsis biology, research materials and community. Nucleic Acids Research. 31(1):224-8.
  5. http://services.appliedgenomics.org/projects/drupomics/
  6. Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A, Schein J, Sterck L, Aerts A, Bhalerao RR, Bhalerao RP, Blaudez D, Boerjan W, Brun A, Brunner A, Busov V, Campbell M, Carlson J, Chalot M, Chapman J, Chen GL, Cooper D, Coutinho PM, Couturier J, Covert S, Cronk Q, Cunningham R, Davis J, Degroeve S, Déjardin A, Depamphilis C, Detter J, Dirks B, Dubchak I, Duplessis S, Ehlting J, Ellis B, Gendler K, Goodstein D, Gribskov M, Grimwood J, Groover A, Gunter L, Hamberger B, Heinze B, Helariutta Y, Henrissat B, Holligan D, Holt R, Huang W, Islam-Faridi N, Jones S, Jones-Rhoades M, Jorgensen R, Joshi C, Kangasjärvi J, Karlsson J, Kelleher C, Kirkpatrick R, Kirst M, Kohler A, Kalluri U, Larimer F, Leebens-Mack J, Leplé JC, Locascio P, Lou Y, Lucas S, Martin F, Montanini B, Napoli C, Nelson DR, Nelson C, Nieminen K, Nilsson O, Pereda V, Peter G, Philippe R, Pilate G, Poliakov A, Razumovskaya J, Richardson P, Rinaldi C, Ritland K, Rouzé P, Ryaboy D, Schmutz J, Schrader J, Segerman B, Shin H, Siddiqui A, Sterky F, Terry A, Tsai CJ, Uberbacher E, Unneberg P, Vahala J, Wall K, Wessler S, Yang G, Yin T, Douglas C, Marra M, Sandberg G, Van de Peer Y, Rokhsar D. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). (2006) Science. Sep 15; 313(5793):1596-604
  7. French-Italian Public Consortium for Grapevine Genome Characterization.(2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. Sep 27; 449(7161):463-7.




Additional information about this analysis:
Property NameValue
Analysis Typetripal_analysis_unigene
Analysis unigene nameGossypium unigene v1.0
Analysis unigene num contigs
Analysis unigene num reads
Analysis unigene avg length
Analysis unigene num clusters
Analysis unigene num singlets
 Contact Details
 Name  Main, Dorrie
 Lab  Department of Horticulture
 Organization  Washington State University
 Address  45 Johnson Hall, Pullman, WA 99164
 Telephone  509-335-2774
 Fax  509-335-8690
 Email  dorrie@wsu.edu


Orignal EST sequences from NCBI (442,954 sequences) Gossypium_spp_NCBI_091212.EST.fasta.gz
Filtered and trimmed EST sequences (437,185 sequences) Gossypium_spp_NCBI_091212.EST.trim.fasta.gz
EST Summary Gossypium_spp_NCBI_091212.EST.summary.xlsx
Contigs from CAP3 assembly (21,698 contig sequences) Gossypium_Unigene_v1.0.cap3_contigs.fasta.gz
Singlets from CAP3 assembly (128,218 singlet sequences) Gossypium_Unigene_v1.0.cap3_singlets.fasta.gz
Ace file from CAP3 assembly Gossypium_Unigene_v1.0.cap3.ace.gz
CAP3 results in GFF3 format Gossypium_Unigene_v1.0.cap3.gff3.gz

Microsatellite Analysis

SSRs found in contigs (with primer predictions) Gossypium_Unigene_v1.0.contig_SSRs.xls

Functional Analysis

Gene Ontology annotations by contig Gossypium_Unigene_v1.0.contigs2GO.txt
InterPro annotations by contig Gossypium_Unigene_v1.0.contigs2IPR.txt
Raw output results from InterProScan Gossypium_Unigene_v1.0.contigs_interpro.raw.txt.gz
KEGG pathway annotations by contig Gossypium_Unigene_v1.0.contigs2KEGG_pathways.txt
KEGG ortholog annotations by contig Gossypium_Unigene_v1.0.contigs2KEGG_orthologs.txt
KEGG/KAAS server hier output file (visualize using KegHier) Gossypium_Unigene_v1.0.KEGG.hier.tar.gz

Gossypium unigene v1.0 contigs blastx vs protein databases. Best hit reports in Excel

BLAST of contigs to UniProtKB/Swiss-Prot (1.9MB) Gossypium_blastx_Swiss-Prot.xlsx
BLAST of contigs to UniProtKB/TrEMBL (2.4MB) Gossypium_blastx_TrEMBL.xlsx
BLAST of contigs to TAIR10 Arabidopsis proteins (2.5MB) Gossypium_blastx_TAIR10.xlsx
BLAST of contigs to Prunus persica (peach) v1.0 proteins (2.1MB) Gossypium_blastx_peach.xlsx
BLAST of contigs to Vitis vinifera (grape) proteins (2.0MB) Gossypium_blastx_grape.xlsx
BLAST of contigs to Populus trichocarpa (poplar) v2.0 proteins (2.1MB) Gossypium_blastx_poplar.xlsx

Gossypium unigene v1.0 singlets blastx vs protein databases. Best hit reports in Excel

BLAST of singlets to UniProtKB/Swiss-Prot (8.2MB) Gossypium_blastx_Swiss-Prot.xlsx
BLAST of singlets to UniProtKB/TrEMBL Gossypium_blastx_TrEMBL.xlsx
BLAST of singlets to TAIR10 Arabidopsis proteins (11MB) Gossypium_blastx_TAIR10.xlsx
BLAST of singlets to Prunus persica (peach) v1.0 proteins (9.8B) Gossypium_blastx_peach.xlsx
BLAST of singlets to Vitis vinifera (grape) proteins (9.5MB) Gossypium_blastx_grape.xlsx
BLAST of singlets to Populus trichocarpa (poplar) v2.0 proteins (10MB) Gossypium_blastx_poplar.xlsx


Functional Analysis

KEGG Analysis
All Gossypium unigene v1.0 contigs were uploaded to the KEGG / KASS server at http://www.genome.jp/kaas-bin/kaas_main. The SBH (single-directional best hit) method was selected under the category "Assignment method". All other settings were defaults. Results were downloaded in the heir.tar.gz heirarchy file and uploaded to the website.  Click the Downloads link on the sidebar for downloadable files.  Analysis details can be found here.

InterPro Analysis
InterPro domains and Gene Ontology assignments were made to Gossypium Unigene v1.0 contigs using InterProScan on a copmutational cluster at Washington State Univeristy by the Main bioinforamtics Lab.  Click the Downloads link on the sidebar for downloadable files.  Analysis details can be found here.


Homology was determined using the BLASTx algorithm for the Gossypium Contigs and Singlets vs. the Swiss-Prot, TrEMBLTAIR Arabidopsis proteins, Prunus persica, Populus trichocarpa and Vitis vinifera proteins. Only matches with an E-value of =<1.0 e-6 r were recorded. Swiss-Prot is a curated protein database with a high level of annotation and a minimal level of redundancy, and TrEMBL is a computer-annotated supplement of Swiss-Prot that contains all the translations of TrEMBL nucleotide sequence entries not yet integrated in Swiss-Prot. Homology of Gossypium in Excel spreadsheet can be downloaded from the Downloads page.


Library Information
The Gossypium ESTs used for this assembly were downloaded on Sepermber 16, 2012


 EST Libraries
 Number of ESTs available  442,453
 # of Species  5
 # of Libraries  130
 # of Tissues  43
 # of Development Stages  49


View detailed chart of libraries.

 Gossypium arboreum  42,493
 Gossypium barbadense  39,115
Gossypium herbaceum subsp. africanum  247
 Gossypium hirsutum  297,021
 Gossypium raimondii  63,577


Microsatellite Analysis

The type and frequency of simple sequence repeats in Gossypium unigene v1.0 contigs was determined using the MainLabssr.pl program.For these searches, SSRs are defined as dinucleotides repeated at least 5 times, trinucleotides repeated at least 4 times, tetranucleotides repeated at least 3 times, or pentanucleotides repeated at least 3 times. The SSRs of Gossypium unigene v1.0 contigs are available to be downloaded from the Downloads link on the left sidebar.

Sequence information
Number of Sequences 21,698
Number of Sequences Having One Or More SSRs 5,349
Percentage of Sequences Having One Or More SSRs 24.65%
Total Number of SSRs Found 6,979
Number of Motifs 493

Frequency of Motif Type

Motif Length Frequency Percentage Frequency
2bp 1883 26.98%
3bp 3165 45.35%
4bp 1448 20.75%
5bp 483 6.92%