Comparative Genomics and Bioinformatics

Updates to CottonGen: the cotton community database for basic, translational and applied research

Presentation type: 
0
Abstract: 
CottonGen (http://www.cottongen.org) is a curated and integrated web-based relational database providing centralized access to publicly available genomic, genetic and breeding data for cotton. &nbsp;Superseding CottonDB and the Cotton Marker Database, CottonGen has been builting the Tripal database infrastructure and has enhanced tools for easier data sharing, mining, visualization and data retrieval of cotton research data.&nbsp;&nbsp; CottonGen contains annotated whole genome sequences, unigene transcripts, markers, trait loci, genetic maps, genes, taxonomy, germplasm, publications and communication resources for the cotton community. Since becoming publicly available in October 2012, many new data and tools have been added to CottonGen. Major data and tools include the JGI reference genome for <em>G. raimondii</em> with additional annotation; a digital image library of the USDA-ARS National Cotton Germplasm Collection; 39,099 CIR SNPs and 66,444 NBRI SNPs; 55467 new SSRs from CIR and NBRI; and germplasm evaluation data from 3,027 Chinese and 848 Uzbekistan accessions; the cotton metabolic pathway database (CottonCyc); the genome browser JBrowse. New querying functionality includes an advanced marker search, gene search, publications search and a sequence retrieval tool. Data submission templates, tutorials and a frequently asked questions section have also been added. &nbsp;Future development will include implementation of a breeder&rsquo;s toolbox, and addition of synteny, gene/genome curation tools as well as more map, marker and trait data.
ICGI working group session: 

The Cotton 70K SNP Chip developed from International Consortium-based Collaborations

Presentation type: 
0
Abstract: 
An international collaborative effort has developed a SNP Chip for Cotton. This Illumina Infinium genotyping assay will allow genotyping of up to 90,000 attempted bead types, with 70,000 included in the public “fixed content”. The remaining 20,000 attempted bead types are available as “add-on content”. The consortium effort has included all currently published SNP development works as well as some sets of in press SNP data sets. The fixed content will primarily target intra-specific Gossypium hirsutum SNPs, but will also contain markers which target inter-specific SNPs for G. barbadense, G. tomentosum, G. mustelinum, G. armourianum, and G. longicalyx. The inter-specific targeted SNP are all developed relative to G. hirsutum and will greatly enable introgression work. To assist the cotton community, a cluster file has been developed for the 70,000 SNP included in the fixed content which will allow for automated genotyping. The Cotton 70K SNP Chip will be a resource that will be used globally by public and private breeders, geneticists, and other researchers to enhance cotton genetic analysis, breeding, genome sequence assembly and many other uses.
ICGI working group session: 

Genomic Analysis of Cotton (Gossypium hirsutum) Root Responses to Reniform Nematode (Rotylenchlus reniformis) Infestation

Presentation type: 
1
Abstract: 
Reniform nematodes (RN, Rotylenchlus reniformis) are semi-endoparasitic nematode species causing significant yield loss in cotton. In response to parasitic nematode infestation, plants demonstrate susceptible, resistant, and/or hypersensitive responses. The nature of the plant response relies on the coordination of different resistance mechanisms including specific resistance genes or proteins, several plant hormone pathways, reactive oxygen species (ROS), and pathogenesis related effector proteins. These resistance-related elements and signaling pathways crosstalk to each other and can be seen as an integrated signaling network mediated by transcription factors and small regulatory RNAs (sRNAs) at the transcriptional (epigenetic), posttranscriptional, and/or translational levels. The objective of this study is to identify and characterize such regulatory networks in cotton genotypes that are susceptible, resistant, and hypersensitive to RN infestation. To accomplish this objective, six sRNA and six cDNA libraries were constructed from RN susceptible cotton genotypes (DP90 and SG747 combined as one sample), resistant genotype (BARBREN-713), and hypersensitive genotype (LONREN-1) with or without RN infestation. The sRNA and cDNA libraries were submitted for deep sequencing using the Illumina platform generating 6 corresponding datasets. Various classes of sRNAs were identified in the sRNA datasets using an sRNA bioinformatics pipeline, including microRNAs (miRNAs) and other hairpin sRNAs, repeat associated small interfering RNAs (ra-siRNAs), phased and nonphased secondary siRNAs, and cis- and trans-natural antisense small interfering RNAs. Among them, 28 conserved and 24 novel miRNAs exhibited statistically significant differential expression in response to RN infestation, and over 90% of these RN responsive miRNAs were distinctly regulated in different genotypes. The spectrum of miRNAs target genes include both expected and novel genes previously implicated in plant innate immunity, hormone signaling, ROS generation and related signaling. In addition, RN responsive miR482 initiated secondary siRNAs were distinctly accumulated in response to RN infestation in cotton susceptible, resistance, and hypersensitive genotypes. Notably, NBS-LRR protein coding genes are the targets of miR482, which generate secondary siRNAs. This type of sRNA regulatory network has been implicated in plant innate immunity to a variety of pathogens, and thus, we have produced insights into the specific aspects of this network regulated during RN infestation of cotton roots. For cDNA sequencing, over 150 million raw reads were generated from Illumina sequening platform using the 6 cDNA libraries above. These reads were trimmed and cleaned and de novo assembled into putative transcripts that ranged from 200 bp to greater than 5000 bp. Gene expression and biological function analysis has been used to to determination up and down regulated RN responsive genes. Expression of genes that are targets of RN responsive sRNA and/or are involved in sRNA regulatory networks will be compared to the predicted results from regulatory networks to determine cotton the molecular basis of susceptible, resistant, and hypersensitive responses.
ICGI working group session: 

Bemisia tabaci: A sole vector of Cotton leaf curl virus: Its host range, development and relation to CLCuV incidence

Presentation type: 
1
Abstract: 
Whitefly, Bemisia tabaci (Homoptera: Cicadellidae), a vector of cotton leaf curl virus (CLCuV), is one of the key pest of cotton in Pakistan. Studies were conducted to access its host plants, survival on CLCuV affected plants along with impact of time of sowing on population buildup. About 50 plants were recorded as major hosts in Multan District including 17 crops and vegetables, 5 fruits and forest plants, 14 each ornamental and weeds. Moreover 94 plants were recorded as minor host plants. B. tabci prevalence was higher on crops and vegetables, while weeds were found to be less preferred host plants. Survival from egg to adult stage was highest on Malvestrum coromendelianum (50 %) followed by Gossypium hirsutum (31%) under Lab. conditions at 25±2°C and 70-75% RH. Under field conditions fecundity rate on 10 cm2 leaf portion of Solanum melongena was highest (4.2 eggs) followed by Xanthium strumarium (3.2 eggs), Lantana camara and G. hirsutum (2.1 eggs) respectively. The least fecundity rate was recorded on Citrullus lanatus (1.3 eggs) and Achyranthis aspersa (1.9 eggs). Hatchability rate on these plants was 70 – 91 %, pupation rate 62 – 81 %, adult emergence was 77 – 88 %. Total development was in the range of 51 – 72 %. Survival rate was 73 to 60 % on healthy and CLCuV stressed cotton plant respectively. Life table of B. tabaci on cotton crop showed that out of 1035 eggs, 26 % was desiccated and 65 % hatched from which about 36 % nymphs were either killed due to predation or parasitism or missed and only 64 % were pupated. About 38 % of the total pupae were either dried or predated and 62 % hatched as adult. This means that about 73 % population of the pest killed and 27 % developed into adults. Time of sowing had profound impact on B. tabaci, with maximum population on mid-May sown cotton as compared to mid-March and mid-April sown cotton. CLCuv incidence also found to be 100% in mid-May sown cotton. This information may help to devise management strategies for B. tabaci to minimize incidence of CLCuV.
ICGI working group session: 

Mixed linear model approach for dissecting complex genetic architecture of seed traits in multiple environments

Presentation type: 
1
Abstract: 
The seeds of flowering plants develop from double fertilization and play a vital role in reproduction and supplying food for human and animals. Multiple genetic systems, e.g., maternal, embryo and/or endosperm genomes, are together involved in the genetic variation of seed traits. Understanding the genetic mechanism of seed traits is a major challenge because of its complex mechanism of multiple genetic systems, especially for the epistasis within or between different genomes and their interaction with environment. In this study, two statistical models were proposed for mapping QTLs with epistasis and QE interactions underlying endosperm and embryo traits of interest, respectively. Our models integrating the maternal and the offspring genomes into one mapping framework, can accurately analyze the maternal additive and the dominant effects, the endosperm/embryo additive and dominant effects, the epistatic effects of two loci in one or two different genomes, as well as interaction effects of each genetic component of QTL with environment. Extensive simulations under different sampling strategies, heritabilities and model parameters were performed to investigate the statistical properties of the models. A set of real data on cotton seed was used as an example to demonstrate our methods.
ICGI working group session: 

Identification of single nucleotide polymorphisms from the EST data of Gossypium hirsutum

Presentation type: 
0
Abstract: 
Single nucleotide polymorphisms (SNPs) are the markers of choice. SNPs can be identified experimentally by sequencing approaches or in silico. Identification of SNPs through sequencing approaches is expensive and time-consuming. In silico identification of SNPs is cost-effective and efficient approach. In the present study, an in silico approach was adopted to find out the SNPs in Gossypium hirsutum L. using publicly available expressed sequence tags (EST) data. EST sequences of Bikaneri narma were downloaded from the Gossypium hirsutum EST database. These EST sequences were of fibre tissue. Similar sequences of these EST sequences were identified by the Blastn search. For each EST sequence about 250 similar sequences were retrieved from the GenBank. HaploSNPer software was used to identify SNPs from these EST sequences which incorporated CAP3 program for the construction of the contigs. In this study, we used 499 EST sequences for the identification of SNPs. With the use of online SNP identification tool HaploSNPer, SNPs were identified from 499 EST sequences of Gossypium hirsutum cv. Bikaneri narma. Maximum number of potential SNPs was identified for the EST sequence, JG453752.1. One thousand and fifty one potential SNPs were identified for this sequence. Maximum number of reliable SNPS was identified for the EST sequence, JG453733.1. Seven hundred and eighteen reliable SNPs were identified for this sequence. In these 499 EST sequences a total number of 15809 potential SNPs were discovered. The total number of reliable SNPs identified for these EST sequences were 9990. In the potential SNPs, number of transitions, transversions and InDels were 7272, 7857 and 2336 respectively. In the total number of 7272 transitions, 3457 were C/T and 3815 were A/T. In the total number of 7857 transversions, 2845 were A/T, 1875 were A/C, 1185 were C/G, and 1952 were T/G. In the total number of 9990 reliable SNPs, number of transitions, transversions and InDels were 4702, 3919 and 930 respectively. In 4702 transitions, 2425 were C/T, and 2277 were A/T. In 3919 transversions, 1368 were A/T, 920 were A/C, 673 were C/G, and 958 were T/G. From this analysis, it is evident that the number of transitions in reliable SNPs is higher followed by the transversions. The number of InDels is markedly low. SNPs markers show polymorphisms at single nucleotide level. The identified SNPs will be a great source for the genetic evaluation of existing cotton germplasm for fiber potential. These SNPs can be employed for the molecular breeding efforts for the development of elite cotton cultivars.
ICGI working group session: 

Association mapping for epistasis and environmental interaction of yield and fiber traits using SSR markers in 323 cotton cultivars under 9 environments

Presentation type: 
1
Abstract: 
Cotton yield and fiber quality are complex traits controlled by polygenes involving epistasis and environment interaction. Association mapping was performed to detect associated between agronomy traits with 142 SSRs in A sub-genome and 178 SSRs in D sub-genome of 323 accessions of Gossypium hirsutum L. in 9 environments. A mixed linear model including epistasis and environmental interaction was used to screen associate loci. There were total 252 SSRs detected for yield and its three components (99 for lint yield, 171 for boll number, 164 for boll weight, and 150 for lint percentage). These significant loci explain about 62.05% of the phenotypic variance (h2(G+GE)=49.06~ 72.29%). It was indicated by high contribution of environmental interaction to the phenotypic variance for lint yield and boll numbers, that genetic effects of SSR loci were susceptible to environment factors. For three fiber traits, total 320 SSRs were identified (203 for fiber length, 159 for fiber strength, and 73 for micronaire value). Additive and epistasis effects were both significant to three traits, but epistasis effects were the leading genetic effects to fiber length (h2(AA) =49.64) and additive effects were the primary genetic component of fiber strength (h2(A)=30.39) and fiber micronaire value (h2(A)=12.38). The heritability of environment interaction effects was relatively small, indicating that three fiber traits could be expressed stably in various environments. The additive loci tended to be located in the A sub-genome, and the digenic epistasis loci were more like to arise interactions within D sub-genome as well as between A sub-genome and D sub-genome. These results provided insights into the genetic basis of cotton complex traits and may be useful for marker-assisted breeding to improve cotton production.
ICGI working group session: 

The Gene Families RPW8, HSP20 and FMO in Gossypium raimondii

Presentation type: 
0
Abstract: 
Plants are resistant if they inhibit pathogen development, while tolerance refers to plant productivity despite the presence of a pathogen. Basal resistance is the low resistance level which may be present in susceptible plants. The genes related to those mechanisms may be presented in a considerable number even in plant pathogen interactions where the resistance have been shown to be triggered by a single gene. To investigate the presence of genes of Gossypium raimondii that are not R genes but may be related to resistance, basal resistance or tolerance to pathogens, one Arabidopsis gene of three selected gene families, RPW8 (recognition of powdery mildew), HSP20 (20KDa heat shock proteins) and FMO (flavin dependent monooxygenases), was used to conduct a local blastp system against all G. raimondii proteome library using a cut-off e-value of 10-10 . All the downloaded proteins were submitted to Pfam database (http://pfam.sanger.ac.uk/) to confirm the gene families. Multiple alignments with complete protein sequences were conducted in ClustalW and phylogenetic trees were constructed using the Neighbor-joining methods, pairwise deletion and p-distance on the MEGA 4.1 program. Five RPW8W homologs are found in G. raimondii, the same of Arabidopisis ecotype Ms-0. RPW8 is a family of disease resistance genes with a putative N-terminal transmembrane and a coiledcoil domain. In Arabdodsis, by triggering hypersensitive response, they confer resistance to all the four powdery mildew species. So it differs from most of the disease resistance genes which are specific to pathogen races. The G. raimondii genome showed a total of 24 HSP20 like proteins. One of the genes present in chromosome 04 grouped with one of the genes of chromosome 08 according to the MEGA analyses, indicating segments duplications. HSP20 supposedly have been evolved from the original property as chaperones to have different functions on response to environmental stresses, such as the caused by salt, alcohol, chilling oxidative injury or heavy metals. In tomato, the HSP20 RSl2 interacts with the resistance protein which confers resistance to Fusarium oxysporum, and hypersensitive resistance response was not normally activated in the presence of the resistance gene when this particularly chaperone was absent. F. oxysporum is also an important pathogen to cotton. The number of FMO like proteins G. raimondii is 26, while in Arabidopsis 29 proteins have been annotated as FMOs. The greater number of genes in Arabidopsis can be explained by the absence in G. raimondii of YUCCA family of FMO, which apparently is present exclusively in crucifers and is responsible for the biosynthesis of glucosinolates. One Arabidopsis FMO gene was shown to increase basal resistance to pathogens, through the accumulation of salicylic acid and possibly by inactivation of pathogen-derived virulence factors, analogy to the detoxification function of FMO proteins in animals. FMO like proteins are distributed among 12 chromosomes. Two pairs of closely related genes were found in chromosomes 09 and 10, indicating segments duplications, consistently with the synteny of the segments those chromosomes found in dot plot analysis and the hypothesis of a whole genome duplication.
ICGI working group session: 

New Methods of Mapping QTX-based on Omics Data and Their Applications in Crop Breeding

Presentation type: 
0
Abstract: 
New methods of mapping QTX based on omics data and their applications in crop breeding Jun Zhu Institute of Bioinformatics, Zhejiang University Hangzhou 310058, PR China Corresponding author: jzhu@zju.edu.cn It is a challenge to develop efficient statistical methods and computing software for mapping QTX underlying complex traits based on omics data. Recently, we have developed new mapping approaches of detecting gene-to-gene interaction and gene-to-environment interaction by GWAS of quantitative trait with markers (QTLs), SNPs (QTSs), transcripts (QTTs), proteins (QTPs), and metabolites (QTMs). The genetic model of complex traits can include cofactors (i.e. block, sex, age), genetic main effects (additive A, dominance D), epistasis effects (additive by additive AA, additive by dominance AD, dominance by dominance DD), and gene-to-environment interaction (AE, DE, AAE, ADE, and DDE). Mixed linear model approaches are used for unbiased prediction of all these genetic main effects, epistasis effects, and gene-to-environment interaction effects. The heritability of individual effects is estimated. Mapping software GMDR-GPU and QTXNetwork have also been developed, which can be used under different operating systems (Windows, Unix, Mac). The software use GPU computation technology and can be 250X faster than suing CPU computation. Worked examples and crop breeding applications will be demonstrated for mapping QTSs in corn NAM population, QTTs in cotton diallel crosses, and QTXs in tobacco variety test.
ICGI working group session: 

Application of Index Selection in Upland Cotton Breeding Program Using QuLine Simulation

Abstract: 
When selection is applied to the improvement of the economic value of plants, it is generally applied to several characters simultaneously and not just to one, because economic value depends on more than one character. How, then, should selection be applied to the component traits in order to achieve the maximum improvement of economic value? There are several possible procedures. One might select in turn for each trait singly in successive generations (tandem selection); or one might select for all the traits at the same time but independently, rejecting all individuals that fail to come up to a certain standard for each traits regardless of their values for any other of the traits (independent culling levels). In this study we compared four index selection methods, which were applied simultaneously to all the component traits together, with tandem selection to investigate the effectiveness of these methods on giving rapid improvement of economic value in Upland cotton breeding program. Lint yield, pre-frost harvest ratio, fiber length, uniformity ratio, fiber strength, fiber fineness, Fusarium wilt, Verticilium wilt and days to boll opening were chosen as component traits of index selection, appropriate weight being given to each trait according to its relative economic importance based on the genetic information gained so far. One environment type, four inheritance models (additive, dominance, over-dominance, and epistasis), two levels of linkage strength between lint yield and fiber strength, formed a total of eight GE systems. Then computer simulation approach was used to generate the starting population of genotypes by the QU-GENE engine, followed by the manipulation of the starting population according to the breeding and selection strategy to be tested with application module QuLine, which was developed in a collaborative project between The University of Queensland and CIMMYT. Results from the analysis of variance showed that inheritance model (IM), linkage strength (LS), among or within family selection involved in index selection strategies (AFS) and two-way interaction between inheritance model and linkage strength (IM*LS) were the four main contributors to the overall variation of lint yield gain; whereas IM, AFS, IM*LS contributed most to the variation from addition of simple traits (AST) and improvement of optimum index (OI) was mainly attributed to AFS, IM, IM*AFS. Among the four genetic models highest genetic gain of lint yield, AST and OI were all obtained from epistasis model after one breeding cycle, however, no significant difference was observed from dominance and over-dominance models for optimum index selection (OIS). 2.93 percent higher genetic gain of lint yield was achieved when genes were tightly linked, recombination frequency of which being 0.001, than that when genes were independently inherited; in contrast, 3.02 percent less genetic gain of OIS was obtained from high linkage strength of genes between lint yield and fiber strength than from low linkage strength. More than double genetic gain of OI was produced using OIS as compared with tandem selection for component traits of selection index. Ten percent more genetic gain was achieved for OI when within family selection of OI was implemented than that when within family selection of lint yield was conducted from F2 through F7.
ICGI working group session: