Search Genes and Transcripts

[ Video TutorialHow to Search for Genes and Transcripts ]

Search Genes and Transcripts can be accessed through the Search menu in the header.  It is a page where users can search for genes and transcripts. 

from various datasets available in CottonGen. Users can search for genes from various datasets: predicted genes from whole genome assemblies, a single non-redundant list of cotton genes with gene symbols (Cotton Gene Database), or gene and mRNA sequences parsed out from the NCBI nucleotide database. These genes and mRNAs parsed out from the NCBI sequences are aligned to the reference whole genome sequences when possible. When expert-contributed information is available, these gene names are associated directly with the predicted genes from the whole genome assembly. Users can also search for transcripts from RefTrans sets or reference transcriptome sets built from all publicly available transcripts. For more details, refer to the 'Description of Gene and Transcript Dataset' page.

The search interface of Search Gene and TranscriptsThe Query session is on the left and the Downloadable Fields is on the right. See detailed explanation at below.

Search Genes and Transcripts can be limited to datasets by four categories (red arrows on left), such as genome and RefTrans assemblies or NCBI genes, by giving gene or transcript names, or by functional annotations. Here are the descriptions of how to search Gene/Transcript data:

There are left (Query) and right (downloadable fields) sessions in the search interface.

Query (Please Note: All the search categories below, except the file upload, can be combined.)

1. Genome

  • Genome GroupUsers can limit their results of predicted genes by genome groups (such as AD, A, D, etc). 
    • When a genome group is chosen in the drop-down menu next to 'Genome Group':
      • The corresponding genomes are dynamically displayed in the 'Standardized Chromosome'
      • The corresponding genomes are dynamically displayed in the 'Genome' and 
  • Genome and Chromosome/Scaffold: Users can limit their results of predicted genes by their genome location.
    • When a genome assembly is chosen in the drop-down menu next to 'Genome', the corresponding chromosome or scaffold names are dynamically displayed in the 'Chromosome/Scaffold'. Choose any option and then type in the position in bp in the text boxes.
  • Standardized Chromosome: The standardized chromosome names (given by CottoGen) for each genome group. 
    • When a genome group name is chosen in the drop-down list next to 'Genome Group' and choose an option from the 'Standardized Chromosome' list:
      • The search result contains genes or transcripts on the same standard chromosome from all genomes in the genome group.
  • ​​Start and Stop: Users can limit their results of predicted genes by their genome location.
    • When a genome assembly is chosen in the drop-down menu next to 'Dataset', the corresponding chromosome or scaffold names are dynamically displayed in the 'Genome Location' drop-down menu. Choose any option and then type in the position in bp in the text boxes.

2. Transcriptome/Other Dataset: Use this drop-down menu to limit the results to sequences from a specific dataset. 

3. Gene/Transcript Name: Users can search genes and transcripts by name for an exact match, that contains, starts with, or ends with the input, by selecting the desired option from the drop-down menu. The search is case-insensitive.

4. Functional Annotation: Users can limit their results by associated functional terms. Predicted genes from whole genome assembly and transcripts have been annotated with some of the following: homology to genes of closely related or plant model species, InterPro protein domains, GO terms, KEGG pathway, and ortholog terms. Users can enter any protein name (eg. polygalacturonase), KEGG term/EC number (eg. resistance, EC:1.4.1.3), GO term (eg. cell cycle, ATP binding), or InterPro term (eg. zinc finger) in the text box to limit the results with the entries that are associated with the functional annotation terms.

Downloadable Fields

1. View|FASTA|CSV|TSV: Users can 'View' the query results on the web interface or download the results in the format of 'FASTA' (of the gene/transcript sequences) or CSV (Comma Separated Values) or TSV (Tab-Separated Values).

2. The All Fields List: A list of fields that users can decide whether in the search results to view or download. 

 

If you have any questions/comments/feedback about this search page, please let us know via the contact form.