Comparative Genomics and Bioinformatics

CottonGen: an Integrated Genomics, Genetics and Breeding Database for the Cotton Research Community

Presentation type: 
0
Abstract: 
CottonGen is a new database for the cotton community that has been created by the consolidation of the existing databases CottonDB and Cotton Marker Database (CMD) and expanded to include annotated transcriptome, genome sequence, marker-trait-locus and breeding data. Enhanced tools for easy querying and visualizing research data greatly increases the accessibility of CottonGen over its predecessors. The database is built using the open-source Tripal database infrastructure developed for the Cacao Genome Database, the Genome Database for Rosaceae and the Citrus Genome Database. This presentation will provide an overview on the current functionality provided and being developed.
ICGI working group session: 

Genome Resequencing of Select Cotton Diploid Accessions

Presentation type: 
0
Abstract: 
The cotton tetraploid genome contains two types of SNPs: SNPs between alleles (allele-SNPs) and SNPs between genomes (homoeo-SNPs). These two types of SNPs represent different levels of DNA variation and discrimination between these two types has been a serious obstacle for plant improvement programs that use molecular markers. In this project, we have identified a robust set of putative homoeo-SNPs by re-sequencing 5 A and 5 D diploid, cotton genomes. By aligning these reads to the D-reference genome, we were able to identify numerous SNPs between the A and D genomes that were consistent within both sets of diploids (homoeo-SNPs). We were also able to identify homozygous and heterozygous SNPs within subsets of accessions (allele-SNPs). Because of the related genomes within tetraploid cotton, many of these homoeo-SNPs are also found between the AT and DT genomes of cultivated cotton. This SNP collection will allow researchers to discriminate homoeo-SNPs from allele-SNPs during sequence and genetic analyses of tetraploid cotton.
ICGI working group session: 

Comparison and Evaluation of Cotton SNPs Developed by Transcriptome, Genome Reduction on Restriction Site Conservation and RAD-based Sequencing

Presentation type: 
0
Abstract: 
NGS technologies are facilitating genome-wide SNP discovery in many organisms, including crop species. A few approaches have been proposed to identify SNPs in a high-throughput fashion. Among the common strategies are genome-wide transcriptome sequencing, genome reduction on restriction site conservation (GR-RSC) followed by NGS, and selection of gene-enriched regions using methylation-sensitive digestion of genomic DNA followed by NGS and bioinformatics analyses. In cotton (Gossypium sp.), we have used normalized transcriptome sequences generated by 454 Roche Biosciences and legacy Sanger-EST sequences form GenBank for hybrid de novo transcriptome assembly of upland cotton (G. hirsutum cv. TM-1). In addition, transcriptome libraries of five G. hirsutum lines including TM-1 as well as five other cotton species were sequenced by Illumina Genome Analyzer (IGA). Our hybrid assembly of TM-1 was used as a reference sequence to align Illumina reads against it and to identify SNPs among the five G. hirsutum lines as well as between G. hirsutum (TM-1) and any of G. barbadense, G. longicalyx, G. armourianum, G. mustelinum, and G. tomentosum species. Over 10,000 putative SNP markers were identified for differences among five upland cotton lines and relative to the other cultivated tetraploid species, G. barbadense; similar results were obtained for the other AD species. Many more SNPs were identified for the diploid species, e.g., ~70,000 for G. longicalyx. Using GR-RSC a combined inter-specific assembly of G.hirsutum (Acala Maxxa and TX2094) and G. barbadense (Pima-S6 and K101), was developed. Within this assembly 11,834 and 1,679 SNPs were identified in 6,467 and 965 contigs, respectively. As the third method, to assess the Floragenex Restriction Site Associated DNA (RAD) platform for intraspecific SNP development, we compared TM-1 and Acala Maxxa. Using well represented in both Illumina HiSeq samples, we identified ~1500 simple SNPs that were monomorphic within each cultivar, and close to 2000 others that were polymorphic in one parent but not the other (presumably "hemi-SNPs" or "genome-specific polymorphisms" (GSPs). The distribution of putative SNPs identified by the three technologies on the cotton D genome (G. raimondii) will be discussed. A three-way comparison of the three SNP discovery methods will be depicted and the relative advantages and disadvantages of each method will be discussed.
ICGI working group session: 

De novo SNP Discovery and Development of an Interspecific Cotton Genome Map Using a Simplified Genotyping-by-sequencing (GBS) Approach

Presentation type: 
0
Abstract: 
Recent developments in next-generation sequencing (NGS) technology have lowered the cost of sequencing per base and enabled whole genome re-sequencing, genome-wide association studies, and for some species, unprecedented discovery of molecular markers. For species with large, complex genomes, genotyping-by-sequencing (GBS) of reduced representation libraries has emerged as a promising tool for association studies and genomics-assisted breeding. We used a simplified GBS approach to genotype recombinant inbred lines (RILs) of an interspecific cross between Gossypium hirsutum x G. barbadense (TM-1 x 3-79) to discover novel SNPs and develop an interspecific linkage map. The GBS approach presented here provides a powerful tool for developing informative markers in species without a sequenced genome, while also providing a valuable resource for anchoring physical maps and ordering scaffolds from whole genome sequencing projects. The GBS approach will have broad applications in SNP marker-assisted breeding in taxa with limited prior knowledge of the genome or genetic diversity.
ICGI working group session: 

Identification and Comparison of Conserved and Novel Mature miRNAs, miRNA Precursors, and miRNA*s in the Genus Gossypium

Presentation type: 
0
Abstract: 
Using over 40 small RNA libraries obtained by deep sequencing of small RNAs from leaves, roots, flowers, and bolls of Gossypium arboreum, Gossypium raimondii, and Gossypium hirsutum, we have identified as many mature miRNAs, miRNA precursors, and miRNA*s as we could using available genomic sequences and EST collections as references, and a small RNA pipeline that we developed to identify miRNAs and genes in plants. By species these putative miRNAs were classified into 4 groups consisting of: 1) conserved plant miRNAs (found in miRBase18 database in one or more plants, no mismatches) ; 2) novel plant miRNAs (miRNAs not found in miRBase18, but meeting the criteria for annotation of plant miRNAs, i.e. having a miRNA* sequence found in the small RNA library); 3) candidate novel plant miRNAs (miRNAs not found in miRBase18, and lacking predicted miRNA* sequences in any library, but having read numbers greater than 10 reads per million in 2 or more libraries in a species); and 4) miRNA-like sequences (miRNAs not found in miRBase18, and lacking predicted miRNA* sequences in any library, but having read numbers less than 10 reads per million in all libraries in a species). All miRNAs in group 1, 2, and 3 above in each of the 3 species were collapsed into miRNA families, and family representation between the 3 Gossypium species was compared. Using precursors of these miRNAs, phylogenetic analysis was performed on these data, and all plant miRNA precursos found in miRBase18. Tissue specific expression of the miRNA elements in each family will be summarized, and select sequence comparisons of the miRNA precursors from the diploid relatives and tetraploid cotton will be shown. Those sequences which can be mapped to the sequenced D-genome of cotton will be examined, and “hot spots containing multiple miRNA genes will be identified and discussed.
ICGI working group session: 

Small RNA Analysis of Resistant and Susceptible Cotton Roots Infested with Reniform Nematodes

Abstract: 
Cotton (Gossypium hirsutum L.) is an important crop plant that is susceptible to multiple species of parasitic nematodes including Rotylenchulus reniformis, reniform nematode (RN). There is much emphasis on breeding resistance to RN into commercially viable cotton varieties, and in 2007 LONREN-1 (LR1) and LONREN-2 (LR2), putative RN-resistant germplasm derived from Gossypium longicalyx was released. More recently Implementation of these materials in commercially useful germplasm has proven problematic. A resistant line BARBAREN-713 (B-713) has been developed with RN resistance based on Gossypium barbadense, which shows promise. To facilitate this implementation, the gene regulatory networks involved in RN-resistant and susceptible genotypes are under investigation particularly focusing on miRNAs. Initially, the role of small regulatory RNAs (srRNAs) in these regulatory networks is under investigation. In this study, LR1, LR2, and B-713 were selected as nematode resistant genotypes, and DeltaPine 90 & Suregrow 747 were susceptible lines. Small RNA libraries were prepared from greenhouse grown plants of each of these 4 genotypes either pre- or post-infection with reniform nematodes. RNA-seq was conducted on these small RNA libraries using the ABI Solid platform with single-end sequencing, and the resulting sequence libraries were submitted to a small RNA library pipeline that we have developed. Abundance of each small RNA found in each library was determined. We found a total of 2,976 putative small RNAs (2,263 identical to Viridiplantae miRNA sequences in miRBase18, and 713 novel unique miRNAs) in the datasets. All of these putative microRNAs fall into 181 existing (miRBase18) miRNA families. Of these 72 families (involving 1286 individual putative mature miRNA sequences) are up-regulated by nematode infection by more than 3-fold in at least 1 genotype, while 54 families (involving 1216 individual putative mature miRNA sequences) are down-regulated by more than 3-fold by nematode infection. Further analysis and characterization of these changes in the 5 genotypes will be utilized to to define regulatory networks that are involved in gene regulation during reniform nematode infestation of cotton roots.
ICGI working group session: 

Data Search and Visualization Tools at the Comparative Evolutionary Genomics of Cotton Web Resource

Presentation type: 
1
Abstract: 
The “Comparative Evolutionary Genomics of Cotton” (CEGC) Web-database resource (http://cottonevolution.info) has been developed as one component of an NSF plant genome center (PI: J. Wendel). The resource contains search and display tools for data types ranging from BAC and genetic map through EST and gene expression. Genetic and physical mapping resources include tools such as CMAP for consensus genetic map data and BACman for BAC and related information. Over the past two years, its transcriptome related data have grown with the addition of new gene expression, EST assembly, functional annotation and marker data. Tools for these data types include those for Cotton EST assemblies that facilitate comparative searches, based on significant BLAST sequence alignments to Arabidopsis thaliana (TAIR10) genes. These also facilitate searches for Cotton EST contigs based on TAIR10 annotations. EST assembly views are available along with sequence alignments via the genome browser, GBrowse. The volume of diversity data is increasing and related tools are being added. For example, Web tools for Cotton SNP detection are in development. Bulk data downloads are available and, where appropriate, submitted to public repositories at NCBI. With respect to gene expression data, our spotted oligo and NimbleGen microarray datasets are available via the resource’s implementation of the Stanford microarray database (SMD). Data displays are in familiar formats since our array of related search tools incorporate community developed tools like CMAP, GBrowse and SMD for data views and additional search functionality.
ICGI working group session: 

The Genome of a Diploid Cotton Gossypium raimondii

Presentation type: 
0
Abstract: 
We sequenced and assembled the draft genome of Gossypium raimondii, which is widely known as a contributor of the D-subgenome to the economically important natural textile fiber producer, G. hirsutum. Next-generation Illumina pair-end (PE) sequencing strategies were performed to obtain 103.6-fold genome coverage of cleaned DNA sequence from various shotgun libraries with insert sizes ranging from 170 bp to 40 kbp. Over 73% of the assembled sequences were anchored on 13 G. raimondii chromosomes or linkage groups. The genome was predicted to contain 40,976 protein-coding genes with 92.2% of them further confirmed by transcriptome data. We observed two whole genome duplication (WGD) events, one occurred at about 56~63 and the other at about 13~20 million years ago, and identified 2355 synteny blocks in the G. raimondii genome. About 40% of the gene models are present in more than one block, suggesting that the G. raimondii genome has undergone substantial chromosome rearrangements. Nearly 57% of the genome is composed of transposable elements, most of which may come from the expansion of long terminal repeats (LTRs) since 4 MYA until now. The G. raimondii genome not only provides a major source of candidate genes for cotton research, but also it may serve as a platform for assembly of the tetraploid G. hirsutum genome.
ICGI working group session: