Comparative Genomics and Bioinformatics

Comparative Genomics and Bioinformatics group

How to Effectively Search and Download Data in CottonGen

Authors: 
Cheng, Chun-Huai
Yu, Jing
Jung, Sook
Zheng, Ping
Humann, Jodi
Lee, Taein
McGaughey, Deah
Mohan, Amita
Frank, Morgan
Hough, Heidi
Udall, Josh
Jones, Don
Main, Dorrie
Abstract: 
CottonGen (www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding resources for cotton research discovery and crop improvement. CottonGen contains annotated whole genome sequences, reference transcriptomes, gene sequences from NCBI, pathways, genetic maps, trait, germplasm, marker, breeding, publication and community member data. In this presentation we show how to effectively search, filter and download data in CottonGen. CottonGen is directly supported by Cotton Incorporated, USDA-ARS, the cotton industry and USDA NIFA National Research Support Project 10.

CottonGen: A Database Resource for Genomics, Genetics and Breeding Research

Authors: 
Main, Dorrie
Abstract: 
CottonGen (www.cottongen.org) is the worldwide community database for cotton, housing data and tools that facilitate research discovery and cultivar improvement. It provides access to publicly available genomics, genetics and breeding data for cotton that is curated and integrated with a suite of analysis and visualization tools. These include genome, map and pathway viewers such as JBrowse, MapViewer and CottonCyc, as well as many search interfaces that allow researchers to readily search and retrieve data. CottonGen recently released a Cotton Trait Ontology, CottonGen Reference Transcriptomes (RefTrans) and new MapViewer. The Breeding Information Management System (BIMS) is now available in CottonGen providing both a public and private site that individual breeders can use to manage and analyze their breeding program data while also leveraging the publicly available GGB data in their decision-making. In this presentation the role of CottonGen for research is explored and future plans elaborated. CottonGen is directly supported by Cotton Incorporated, USDA-ARS, the cotton industry, NIFA USDA NRSP10, US Land Grant Universities and indirectly by NSF and USDA SCRI awards.

Visualization of Conserved Syntenic Blocks Among Six Cotton Genomes in CottonGen

Authors: 
Zheng, Ping
Jung, Sook
Cheng, Chun-Huai
Yu, Jing
Hough, Heidi
Udall, Josh
Jones, Don
Main, Dorrie
Abstract: 
Assessment of synteny between related genome sequences, coupled with software to visualize these relationships, has become a powerful tool in facilitating our understanding of genome and gene family evolution. As a resource for genomics, genetics and breeding research, the CottonGen research team has developed new synteny resources in the database using publicly available genome sequences, analysis software and graphical visualization tools. Synteny between Gossypium raimondii D5_BGI, Gossypium raimondii D5_JGI, Gossypium arboreum A2_BGI, Gossypium hirsutum AD1_BGI, Gossypium hirsutum AD1_NBI and Gossypium barbadense AD2_HAU was identified using MCScanX v1.0. Results are displayed using the Tripal Syntenic Viewer at the Cotton Genome website (https://www.cottongen.org/) with various highlighted features. The overall pattern of synteny and collinearity between chromosomes is displayed in a circular layout and the homologous gene pairs in each block are also displayed both in graphics and tables with hyperlinks to gene pages.

CottonGen BIMS for Effective and Efficient Management of Breeding Data

Authors: 
Yu, Jing
Lee, Taein
Jung, Sook
Campbell, B. Todd
Gasic, Ksenija
Humann, Jodi
Hough, Heidi
Jones, Don
Percy, Richard
Main, Dorrie
Abstract: 
Advances in sequencing, sensor, drone and computational technology have led to increasing volumes of genotype and phenotype data being collected and tracked by modern breeding programs. To efficiently store, manage and integrate these large private and public research data sets so breeders can use these data efficiently in decision-making, we are developing the Tripal Breeding Information Management System (BIMS). Now available in CottonGen, BIMS v1.0 currently allows breeders to create and manage access to their breeding programs, upload phenotype data from the Field Book App or excel templates, generate input files for Field Book, archive their entire data in BIMS, search and filter by accessions/lines by name, trial, location, cross, parent and traits, and perform basic statistical analysis. We also have an open version of BIMS where the community can search, filter and download publicly available phenotype data housed in CottonGen. The use of Field Book and BIMS promotes the use and development of standard trait descriptors and metadata collection. We demonstrate current functionality and highlight future plans for BIMS development as a data management and analysis solution for cotton breeding.

Using TripalMap for Genetics Research

Authors: 
Buble, Katheryn
Yu, Jing
Jung, Sook
Humann, Jodi
Cheng, Chun-Huai
Lee,Taein
Hough, Heidi
McGaughey, Deah
Frank, Morgan
Main, Dorrie
Abstract: 
Tripal MapViewer is a new interactive visualization tool for viewing and comparing genetic maps in CottonGen, replacing CMap. Genetic maps provided by authors in peer-reviewed papers are curated in CottonGen with standardized marker, primer and QTL names and integrated into the database with publication, contact and associated trait information. MapViewer displays complete linkage groups and allows dynamic selection and magnification of desired regions. Linkage groups from different maps and organisms can be compared and correspondences between common features are shown with hyperlinks to the individual page for each of these features, such as marker or QTL. The control panel can be used to select different reference and comparison maps, configure options for marker and QTL display patterns, and change color display of different marker types. Correspondence Matrix and Dot Plot options give a more comprehensive representation, showing where correspondences exist across multiple maps and organisms. Providing several integration points with Tripal, MapViewer offers a map overview displaying a summary graphic of all linkage groups and links to more detailed views. In this presentation we demonstrate how to use this new tool in CottonGen to answer example research questions.

Maximizing Utility of Community-based Resources: CottonSNP Arrays

Authors: 
Hulse-Kemp, Amanda M
Yu, Jing
Scheffler, Jodi
Main, Dorrie
Scheffler, Brian
Abstract: 
The amount of data being produced and stored in databases has increased exponentially as the speed of data generation continues to increase. Utilization of data sets is dependent on excellent data stewardship and management by the crop community. Initiation of these practices has been suggested and monitored by the CottonGen database, but the potential utility of the database through inclusion of samples ultimately falls on the individual researchers within the community. Arrays are specifically dependent, as their utilization in communities (including the cotton community) has been steadily increasing but they fall outside of traditional data required for input into a standardized database such as NCBI. As collection of array data from the Cotton SNP63K (public) and other arrays (CottonSNP80K) increases, utility of the collected data can be maximized in the community by good data stewardship and a community-based standard for inclusion of both raw and processed data into the public domain for maximizing utility of the community-based resource. We will discuss data stewardship and good practices for inclusion and utilization of array based data in coordination with the CottonGen database.

The origin, diversity, and domestication of tetraploid cotton (G. hirsutum L. and G. barbadense L.)

Authors: 
Udall, Joshua A.
Yuan, Daojun
Ramaraj, Thiruvarangan
Hinze, Lori
Conover, Justin L.
Pan, Mengqiao
Percy, Richard
Wendel, Jonathan F.
Abstract: 
Genome resequencing of cotton germplasm can uncover the natural history of cotton from its origination to its domestication. Numerous accessions were selected from the USDA collection for resequencing with high coverage, including wild, feral, and cultivar tetraploid and diploid cotton. A comprehensive variation map of more than 900 G. hirsutum and G. barbadense accessions has been constructed, including SNPs and InDels. SNPs were used to create a phylogenetic tree, illuminate population structure, and identify putative introgression regions. Four main groups of G. hirsutum were identified (domesticated, Landrace 1, Landrace 2, and wild) while three groups were identified in G. barbadense. The cultivars of G. hirsutum appear to be derived from a single landrace, suggesting a single domestication event. The two genomes were scanned for regions of domestication (pi and Fst) and putative selective sweeps were identified. Our new understanding of genetic variation in the cotton genome will assist cotton breeders' efforts to improve the fiber quality, disease resistance, and yield of modern cotton varieties.

Kokia, Gossypioides, and Their Sister, Gossypium

Authors: 
Peterson, Daniel G.
Grover, Corrinne E.
Wendel, Jonathan F.
Arick 2nd, Mark A.
Conover, Justin L.
Thrash, Adam
Hu, Guanjiang
Sanders, Willam S.
Hsu, Chuan-Yu
Zahara Naqvi, Rubab
Farooq, Muhammad
Li, Xiaochong
Gong, Lei
Mudge, Joann
Ramaraj, Thiruvarangan
Udall, Joshua A.
Abstract: 
Kokia and Gossypioides are sister genera to Gossypium. Kokia is limited to the Hawaiian Islands while Gossypioides is found only in Madagascar and East Africa. Recently, we sequenced the genome of Kokia drynarioides and annotated the K. drynarioides and existing Gossypioidies kirkii draft genomes. Comparison of these two genomes with each other and that of Gossypium raimondii allowed us to generate estimates of divergence based on 13,000 gene orthologs. Our analysis indicates that the Kokia/Gossypioides lineage diverged from Gossypium roughly 10 MYA, while Kokia and Gossypioides diverged about 5 MYA after a 17,500 km transoceanic dispersal event delivered a Kokia ancestor to Hawaii. Study of the gene contents of the three species indicates that both Kokia and Gossypioides have experienced significant net gene losses (30% or about 10,000 genes) since their divergence from Gossypium raimondii. As the vast majority of gene deletions are common to both G. kirkii and K. drynarioides, it is likely that most gene deletion events occurred prior to the Gossypioides/Kokia split. With regard to repeat sequences, Kokia and Gossypioides have statistically indistinguishable repeat sequence contents and distributions despite representing different genera. In contrast, G. raimondii has a greatly reduced repeat content compared to G. herbaceum and G. arboreum. Not surprisingly, the major repeat sequence differences between Gossypioides kirkii, K. drynarioides, G. ramondii, and the A-genome cottons is a reflection of the relative abundance of LTR/Gypsy retrotransposons. The genome sequences of Kokia and Gossypioides are of interest not only as outgroups to Gossypium, but because their high levels of gene deletion make them interesting subjects for studying gene reduction in non-parasitic plants.

Genome-wide analysis of four upland cotton germplasms with different types of pigment glands based on next-generation sequencing

Authors: 
Tianlun, Zhao
Cheng, Li
Cong, Li
Fan, Zhang
Lei, Mei
Elmon, Chindudzi
Jinhong, Chen
Shuijin, Zhu
Abstract: 
Cotton (Gossypium spp.) is one of the most important economic crops in the world. It not only produces natural fiber for textile industry, but also provides a large quantity of cottonseeds enriched with high-quality protein and oil. However, the presence of gossypol limits the utilization of the cottonseeds. Two pairs of cotton near isogenic lines (NILs) with different types of pigment glands, CRI12 and CRI12W and Coker 312 and Coker 312W, exhibit different gossypol contents in plants. The glandless traits of CRI12W and Coker 312W are mainly controlled by dominant and recessive genes. However, less is known about genomic differences in NILs. In the current study, next-generation sequencing was used to discover the relationship between phenotypes and DNA polymorphisms in these NILs. The whole genomes of CRI12, CRI12W, Coker 312 and Coker 312W were resequenced. The sequencing depths were 34.01, 37.60, 36.27 and 35.88, respectively. Genomic variations of the NILs were identified in comparison with the reference genome of TM-1. A total of 2371614, 2045733, 2013021 and 1974262 SNPs, 181706, 180714, 177324 and 167971 Indels, 4208, 4026, 3963 and 3771 SVs, and 46741, 35976, 36388 and 34765 CNVs for CRI12, CRI12W, Coker 312 and Coker 312W were uncovered, respectively. Gene Ontology (GO) analysis of the genes with differential SNPs and Indels in two NILs revealed that variations were enriched in different ontology terms. KEGG pathway analysis figured out genes with differential SNPs and Indels were mainly enriched in biosynthesis of secondary metabolites pathway and sesquiterpenoid and triterpenoid biosynthesis pathway. Quantitative RT-PCR (qRT-PCR) analysis revealed that key genes with variations participating in the pathway of gossypol biosynthesis and pigment glands formation had a different expression pattern in two NILs. Through next-generation sequencing, a large number of genomic variations were revealed. These DNA polymorphisms provided deeper insight into cotton lines with different types of pigment glands. Further comprehensive analysis revealed that genomic variations were tightly associated with phenotypic difference.

Genomic screening for artificial selection during domestication and improvement in Upland cotton

Authors: 
Guozhong, Zhu
Wangzhen, Guo
Abstract: 
Upland cotton (Gossypium hirsutum) is the most important natural fiber crop in the world. Modern cultivated Upland cotton was domesticated from its allotetraploid wild accessions. However, a genome-wide and evolutionary understanding of the effects of human selection is still not clear. Here, 416 cotton accessions are genotyped by CottonSNP80K array. The phylogenetic relationship indicated that obvious differentiation is found between semi-wild relatives and modern cultivars. But the differentiation between landrace and modern cultivars is not clear implying a relatively short time of cultivar improvement. We detect 785 selective sweeps occupying 138.07 Mb of the genome and 4420 genes related to domestication by artificial selection. The candidate selected genes with putative function mainly involved in fiber development and plant stress resistance. Combined with genome-wide association and transcriptome analysis, we provide evidence showing the fiber yield is significantly improved during artificial selection. Nevertheless, the stress resistance ability in modern cultivars is decreased, which may be due to the suitable artificial planting conditions. The present study provides a genomic basis for cotton improvement and further evolutionary analysis of polyploid crops.