Transcriptome analysis of extant cotton progenitors revealed tetraploidization and identified genome-specific single nucleotide polymorphism in diploid and allotetraploid cotton

Publication Overview
TitleTranscriptome analysis of extant cotton progenitors revealed tetraploidization and identified genome-specific single nucleotide polymorphism in diploid and allotetraploid cotton
AuthorsGuan X, Nah G, Song Q, Udall JA, Stelly DM, Chen ZJ
TypeJournal Article
Journal NameBMC research notes
Volume7
Issue1
Year2014
Page(s)493
CitationGuan X, Nah G, Song Q, Udall JA, Stelly DM, Chen ZJ. Transcriptome analysis of extant cotton progenitors revealed tetraploidization and identified genome-specific single nucleotide polymorphism in diploid and allotetraploid cotton. BMC research notes. 2014 Aug 6; 7(1):493.
Publication CodeBMCRN-7-493

Abstract

BACKGROUND
The most widely cultivated cotton (Gossypium hirsutum L., AD-genome) is derived from tetraploidization between A- and D-genome species. G. arboreum L. (A-genome) and G. raimondii Ulbr. (D-genome) are two of closely-related extant progenitors. Gene expression studies in allotetraploid cotton are complicated by the homoeologous loci of A- and D-genome origins. To develop genomic resources for gene expression and cotton breeding, we sequenced and assembled expressed sequence tags (ESTs) derived from G. arboreum and G. raimondii.

RESULTS
Roche/454 FLX sequencing technology was employed to sequence normalized cDNA libraries prepared from leaves, roots, bolls, ovules, and fibers in G. arboreum and G. raimondii, respectively. Sequencing reads from two independent libraries in each species were combined to assemble high-quality EST contigs. The combined sequencing reads included 1,699,776 from A-genome and 1,464,815 from D-genome, which were clustered into 89,588 contigs in the A-genome and 65,542 contigs in the D-genome. These contigs represented ~80% of EST collections in Cotton Gene Index 11 (CGI11, March 2011). Compared to the D-genome transcript database, 27,537 and 10,452 contigs were unique transcripts in A and D genomes, respectively. Further analysis using self-blastn reduced the unigene contig number by 52% in A-genome and 57% in D-genome, suggesting that 50% or more of contigs are paralogs or isoforms within each species. The majority of EST contigs (73-81%) were conserved between A- and D-genomes, whereas 27% and 19% contigs were specific to A- and D-genomes, respectively. Using these ESTs, we generated a total of 75,754 genome-specific single nucleotide polymorphism (SNP) (gSNPs or GNPs) or homoeologous-specific SNPs (hSNPs) of 10,885 contigs or genes between A and D genomes, indicating a possibility of separating allelic expression for those genes in allotetraploid cotton.

CONCLUSIONS
Expressed genes are highly redundant within each diploid progenitor and between A and D progenitor species, suggesting that diploid progenitors in cotton are likely ancient tetraploids. This large set of A- and D-genome ESTs and GNPs will be valuable resources for genome annotation, gene expression, and crop improvement in allotetraploid cotton.

Properties
Additional details for this publication include:
Property NameValue
eISSN1756-0500
ISSN1756-0500
Journal AbbreviationBMC Res Notes
LanguageEnglish
Language AbbrENG
Publication CodeBMCRN-7-493
Publication Date2014 Aug 6
Publication ModelPrint-Electronic
Publication TypeJournal Article
URLhttp://bmcresnotes.biomedcentral.com/articles/10.1186/1756-0500-7-493
Cross References
This publication is also available in the following databases:
DatabaseAccession
PMID: PubMedPMID:25099166