Cottongen

Comparative analysis of 3D genome organization in cotton reveals the topological basis for transcriptional divergence during polyploidization

Authors:

Maojun Wang, Pengcheng Wang, Min Lin, Guoliang Li, Qingyong Yang, and Xianlong Zhang*

Abstract:

The spatial partition of the genome into higher-order structures underpins pivotal regulatory functions, but knowledge of three-dimensional (3D) genome structure and its dynamics during polyploidization remains poor. Here, we characterize 3D genome architectures for diploid and tetraploid cotton, and found the existence of A/B compartments and topologically associated domains (TADs). By comparing chromatin interaction maps in cotton, this study interrogates the reorganization of chromatin during polyploidization. We also identified inter-subgenomic interactions and linked these interactions to expression of homoeologous genes. This study offers evolutionary insights into 3D genome organization and transcriptional regulation in polyploids. Keywords: Cotton; 3D genome; Hi-C data analysis; transcriptional regulation; polyploidization

Read more about Comparative analysis of 3D genome organization in cotton reveals the topological basis for transcriptional divergence during polyploidization

CottonGen: A Database Resource for Genomics, Genetics and Breeding Research

Comparative Genomics and Bioinformatics

Authors:

Main, Dorrie

Abstract:

CottonGen (www.cottongen.org) is the worldwide community database for cotton, housing data and tools that facilitate research discovery and cultivar improvement. It provides access to publicly available genomics, genetics and breeding data for cotton that is curated and integrated with a suite of analysis and visualization tools. These include genome, map and pathway viewers such as JBrowse, MapViewer and CottonCyc, as well as many search interfaces that allow researchers to readily search and retrieve data. CottonGen recently released a Cotton Trait Ontology, CottonGen Reference Transcriptomes (RefTrans) and new MapViewer. The Breeding Information Management System (BIMS) is now available in CottonGen providing both a public and private site that individual breeders can use to manage and analyze their breeding program data while also leveraging the publicly available GGB data in their decision-making. In this presentation the role of CottonGen for research is explored and future plans elaborated. CottonGen is directly supported by Cotton Incorporated, USDA-ARS, the cotton industry, NIFA USDA NRSP10, US Land Grant Universities and indirectly by NSF and USDA SCRI awards.

Read more about CottonGen: A Database Resource for Genomics, Genetics and Breeding Research

CottonGen BIMS for Effective and Efficient Management of Breeding Data

Comparative Genomics and Bioinformatics

Authors:

Yu, Jing

Lee, Taein

Jung, Sook

Campbell, B. Todd

Gasic, Ksenija

Humann, Jodi

Hough, Heidi

Jones, Don

Percy, Richard

Main, Dorrie

Abstract:

Advances in sequencing, sensor, drone and computational technology have led to increasing volumes of genotype and phenotype data being collected and tracked by modern breeding programs. To efficiently store, manage and integrate these large private and public research data sets so breeders can use these data efficiently in decision-making, we are developing the Tripal Breeding Information Management System (BIMS). Now available in CottonGen, BIMS v1.0 currently allows breeders to create and manage access to their breeding programs, upload phenotype data from the Field Book App or excel templates, generate input files for Field Book, archive their entire data in BIMS, search and filter by accessions/lines by name, trial, location, cross, parent and traits, and perform basic statistical analysis. We also have an open version of BIMS where the community can search, filter and download publicly available phenotype data housed in CottonGen. The use of Field Book and BIMS promotes the use and development of standard trait descriptors and metadata collection. We demonstrate current functionality and highlight future plans for BIMS development as a data management and analysis solution for cotton breeding.

Read more about CottonGen BIMS for Effective and Efficient Management of Breeding Data

Genome-wide analysis of four upland cotton germplasms with different types of pigment glands based on next-generation sequencing

Comparative Genomics and Bioinformatics

Authors:

Tianlun, Zhao

Cheng, Li

Cong, Li

Fan, Zhang

Lei, Mei

Elmon, Chindudzi

Jinhong, Chen

Shuijin, Zhu

Abstract:

Cotton (Gossypium spp.) is one of the most important economic crops in the world. It not only produces natural fiber for textile industry, but also provides a large quantity of cottonseeds enriched with high-quality protein and oil. However, the presence of gossypol limits the utilization of the cottonseeds. Two pairs of cotton near isogenic lines (NILs) with different types of pigment glands, CRI12 and CRI12W and Coker 312 and Coker 312W, exhibit different gossypol contents in plants. The glandless traits of CRI12W and Coker 312W are mainly controlled by dominant and recessive genes. However, less is known about genomic differences in NILs. In the current study, next-generation sequencing was used to discover the relationship between phenotypes and DNA polymorphisms in these NILs. The whole genomes of CRI12, CRI12W, Coker 312 and Coker 312W were resequenced. The sequencing depths were 34.01, 37.60, 36.27 and 35.88, respectively. Genomic variations of the NILs were identified in comparison with the reference genome of TM-1. A total of 2371614, 2045733, 2013021 and 1974262 SNPs, 181706, 180714, 177324 and 167971 Indels, 4208, 4026, 3963 and 3771 SVs, and 46741, 35976, 36388 and 34765 CNVs for CRI12, CRI12W, Coker 312 and Coker 312W were uncovered, respectively. Gene Ontology (GO) analysis of the genes with differential SNPs and Indels in two NILs revealed that variations were enriched in different ontology terms. KEGG pathway analysis figured out genes with differential SNPs and Indels were mainly enriched in biosynthesis of secondary metabolites pathway and sesquiterpenoid and triterpenoid biosynthesis pathway. Quantitative RT-PCR (qRT-PCR) analysis revealed that key genes with variations participating in the pathway of gossypol biosynthesis and pigment glands formation had a different expression pattern in two NILs. Through next-generation sequencing, a large number of genomic variations were revealed. These DNA polymorphisms provided deeper insight into cotton lines with different types of pigment glands. Further comprehensive analysis revealed that genomic variations were tightly associated with phenotypic difference.

Read more about Genome-wide analysis of four upland cotton germplasms with different types of pigment glands based on next-generation sequencing

Genomic screening for artificial selection during domestication and improvement in Upland cotton

Comparative Genomics and Bioinformatics

Authors:

Guozhong, Zhu

Wangzhen, Guo

Abstract:

Upland cotton (Gossypium hirsutum) is the most important natural fiber crop in the world. Modern cultivated Upland cotton was domesticated from its allotetraploid wild accessions. However, a genome-wide and evolutionary understanding of the effects of human selection is still not clear. Here, 416 cotton accessions are genotyped by CottonSNP80K array. The phylogenetic relationship indicated that obvious differentiation is found between semi-wild relatives and modern cultivars. But the differentiation between landrace and modern cultivars is not clear implying a relatively short time of cultivar improvement. We detect 785 selective sweeps occupying 138.07 Mb of the genome and 4420 genes related to domestication by artificial selection. The candidate selected genes with putative function mainly involved in fiber development and plant stress resistance. Combined with genome-wide association and transcriptome analysis, we provide evidence showing the fiber yield is significantly improved during artificial selection. Nevertheless, the stress resistance ability in modern cultivars is decreased, which may be due to the suitable artificial planting conditions. The present study provides a genomic basis for cotton improvement and further evolutionary analysis of polyploid crops.

Read more about Genomic screening for artificial selection during domestication and improvement in Upland cotton

GrTEdb: the first web-based database of transposable elements in cotton (Gossypium raimondii)

Comparative Genomics and Bioinformatics

Authors:

Xu, Zhenzhen

Liu, Jing

Xu, Peng

Guo, Qi

Zhang, Xianggui

Du, Jianchang

Shen, Xinlian

Abstract:

Transposable elements (TEs) are the most abundant DNA components in most characterized genomes of high eukaryotes. Based on their structural features and transposition mechanisms, TEs are generally classified into two classes: retrotransposons and DNA transposons. In plants, retrotransposons are further classified into two distinct orders, long terminal repeat (LTR)-retrotransposons (Ty1/Copia and Ty3/Gypsy) and non-LTR retrotransposons (LINE and SINE), whereas DNA transposons are traditionally separated into two main orders, terminal inverted repeat (TIR) (Tc1-Mariner, hAT, Mutator, PIF/Harbinger and CACTA) and Helitron (Helitron). Although TEs are often considered as ‘junk DNA’ due to their continuous reproduction and potential disruption of the regular host genes, more evidence has unambiguously shown that they play important roles in altering gene structures, regulation of gene expression, affecting genome evolution and creating new genes. Thus, complete identification and characterization of TEs have become a priority in genome sequencing projects, and this will largely contribute to accurate annotation of protein-coding genes and other genomic components, and play significant roles in investigating potential interaction between TEs and functional genes. Recently, several diploid and tetroploid Gossypium species' genomes have been sequenced, and the availability of their draft genome sequences has provided an unprecedented opportunity for identification, structural and functional characterization and evolutionary analysis of TEs in this economically important crop. Gossypium raimondii (DD; 2n = 6), one of the putative D-genome parents of tetraploid cotton species (such as G. hirsutum L. and G. barbadense L.) has a smaller genome size (∼737.8 Mb). So, we carried out the characterization of almost all families of TEs in the G. raimondii genome using comprehensive methods and constructed a comprehensive, specific, and user-friendly web-based database, Gossypium raimondii transposable elements database (GrTEdb). A total of 14 332 TEs were structurally annotated and clearly categorized in the G. raimondii genome, and these elements have been classified into seven distinct superfamilies based on the order of protein-coding domains, structures and/or sequence similarity, including 2929 Copia-like elements, 10 368 Gypsy-like elements, 299 L1, 12 Mutators, 435 PIF-Harbingers, 275 CACTAs and 14 Helitrons. Meanwhile, web-based sequence browsing, searching, downloading and blast tools were implemented to help users easily and effectively annotate the TEs or TE fragments in genomic sequences from G. raimondii and other closely related Gossypium species. Thus, GrTEdb provides the first web-based friendly user interface database of TEs in Gossypium species, and will also facilitate genome evolution analysis within or across Gossypium species, evaluating the impact of TEs on their host genomes, and investigating the potential interaction between TEs and protein-coding genes.

Read more about GrTEdb: the first web-based database of transposable elements in cotton (Gossypium raimondii)

How to Effectively Search and Download Data in CottonGen

Comparative Genomics and Bioinformatics

Authors:

Cheng, Chun-Huai

Yu, Jing

Jung, Sook

Zheng, Ping

Humann, Jodi

Lee, Taein

McGaughey, Deah

Mohan, Amita

Frank, Morgan

Hough, Heidi

Udall, Josh

Jones, Don

Main, Dorrie

Abstract:

CottonGen (www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding resources for cotton research discovery and crop improvement. CottonGen contains annotated whole genome sequences, reference transcriptomes, gene sequences from NCBI, pathways, genetic maps, trait, germplasm, marker, breeding, publication and community member data. In this presentation we show how to effectively search, filter and download data in CottonGen. CottonGen is directly supported by Cotton Incorporated, USDA-ARS, the cotton industry and USDA NIFA National Research Support Project 10.

Read more about How to Effectively Search and Download Data in CottonGen

Kokia, Gossypioides, and Their Sister, Gossypium

Comparative Genomics and Bioinformatics

Authors:

Peterson, Daniel G.

Grover, Corrinne E.

Wendel, Jonathan F.

Arick 2nd, Mark A.

Conover, Justin L.

Thrash, Adam

Hu, Guanjiang

Sanders, Willam S.

Hsu, Chuan-Yu

Zahara Naqvi, Rubab

Farooq, Muhammad

Li, Xiaochong

Gong, Lei

Mudge, Joann

Ramaraj, Thiruvarangan

Udall, Joshua A.

Abstract:

Kokia and Gossypioides are sister genera to Gossypium. Kokia is limited to the Hawaiian Islands while Gossypioides is found only in Madagascar and East Africa. Recently, we sequenced the genome of Kokia drynarioides and annotated the K. drynarioides and existing Gossypioidies kirkii draft genomes. Comparison of these two genomes with each other and that of Gossypium raimondii allowed us to generate estimates of divergence based on 13,000 gene orthologs. Our analysis indicates that the Kokia/Gossypioides lineage diverged from Gossypium roughly 10 MYA, while Kokia and Gossypioides diverged about 5 MYA after a 17,500 km transoceanic dispersal event delivered a Kokia ancestor to Hawaii. Study of the gene contents of the three species indicates that both Kokia and Gossypioides have experienced significant net gene losses (30% or about 10,000 genes) since their divergence from Gossypium raimondii. As the vast majority of gene deletions are common to both G. kirkii and K. drynarioides, it is likely that most gene deletion events occurred prior to the Gossypioides/Kokia split. With regard to repeat sequences, Kokia and Gossypioides have statistically indistinguishable repeat sequence contents and distributions despite representing different genera. In contrast, G. raimondii has a greatly reduced repeat content compared to G. herbaceum and G. arboreum. Not surprisingly, the major repeat sequence differences between Gossypioides kirkii, K. drynarioides, G. ramondii, and the A-genome cottons is a reflection of the relative abundance of LTR/Gypsy retrotransposons. The genome sequences of Kokia and Gossypioides are of interest not only as outgroups to Gossypium, but because their high levels of gene deletion make them interesting subjects for studying gene reduction in non-parasitic plants.

Read more about Kokia, Gossypioides, and Their Sister, Gossypium

Maximizing Utility of Community-based Resources: CottonSNP Arrays

Comparative Genomics and Bioinformatics

Authors:

Hulse-Kemp, Amanda M

Yu, Jing

Scheffler, Jodi

Main, Dorrie

Scheffler, Brian

Abstract:

The amount of data being produced and stored in databases has increased exponentially as the speed of data generation continues to increase. Utilization of data sets is dependent on excellent data stewardship and management by the crop community. Initiation of these practices has been suggested and monitored by the CottonGen database, but the potential utility of the database through inclusion of samples ultimately falls on the individual researchers within the community. Arrays are specifically dependent, as their utilization in communities (including the cotton community) has been steadily increasing but they fall outside of traditional data required for input into a standardized database such as NCBI. As collection of array data from the Cotton SNP63K (public) and other arrays (CottonSNP80K) increases, utility of the collected data can be maximized in the community by good data stewardship and a community-based standard for inclusion of both raw and processed data into the public domain for maximizing utility of the community-based resource. We will discuss data stewardship and good practices for inclusion and utilization of array based data in coordination with the CottonGen database.

Read more about Maximizing Utility of Community-based Resources: CottonSNP Arrays

The origin, diversity, and domestication of tetraploid cotton (G. hirsutum L. and G. barbadense L.)

Comparative Genomics and Bioinformatics

Authors:

Udall, Joshua A.

Yuan, Daojun

Ramaraj, Thiruvarangan

Hinze, Lori

Conover, Justin L.

Pan, Mengqiao

Percy, Richard

Wendel, Jonathan F.

Abstract:

Genome resequencing of cotton germplasm can uncover the natural history of cotton from its origination to its domestication. Numerous accessions were selected from the USDA collection for resequencing with high coverage, including wild, feral, and cultivar tetraploid and diploid cotton. A comprehensive variation map of more than 900 G. hirsutum and G. barbadense accessions has been constructed, including SNPs and InDels. SNPs were used to create a phylogenetic tree, illuminate population structure, and identify putative introgression regions. Four main groups of G. hirsutum were identified (domesticated, Landrace 1, Landrace 2, and wild) while three groups were identified in G. barbadense. The cultivars of G. hirsutum appear to be derived from a single landrace, suggesting a single domestication event. The two genomes were scanned for regions of domestication (pi and Fst) and putative selective sweeps were identified. Our new understanding of genetic variation in the cotton genome will assist cotton breeders' efforts to improve the fiber quality, disease resistance, and yield of modern cotton varieties.

Read more about The origin, diversity, and domestication of tetraploid cotton (G. hirsutum L. and G. barbadense L.)

Search form