Draft genome of the peanut A-genome progenitor (Arachis duranensis) provides insights into geocarpy, oil biosynthesis and allergens

Xiaoping Chena,1, Hongjie Lib,1, Manish K. Pandeyc,1, Qingli Yangd,e,1, Xiyin Wangf,1, Vanika Gargc, Haifen Lia, Xiaoyuan Chid, Dadakhalandar Doddamanic, Yanbin Honga, Hari D. Upadhyayac, Hui Guof, Aamir W. Khanc, Fanghe Zhua, Xiaoyan Zhangb, Lijuan Pand, Gary J. Piercef, Guiyuan Zhoua, Katta AVS Krishnamohanc, Mingna Chend, Ni Zhonga, Gaurav Agarwalc, Shuanzhu Lib, Annapurna Chitikinenic, Guoqiang Zhangg, Shivali Sharmac, Na Chend, Haiyan Liua, Pasupuleti Janilac, Shaoxiong Lia, Min Wangb, Tong Wangd, Jie Sund, Xingyu Lia, Chunyan Lib, Mian Wangd, Lina Yud, Shijie Wena, Sube Singhc, Zhen Yangd, Jinming Zhaob, Chushu Zhangd, Yue Yuh, Jie Bid, Xiaojun Zhange, Zhongjian Liug,2, Andrew H. Patersonf,2, Shuping Wangb,2, Xuanqiang Lianga,2, Rajeev K. Varshneyc,i,j,2, Shanlin Yud,2

aCrops Research Institute, Guangdong Academy of Agricultural Sciences, South China Peanut Sub-center of National Center of Oilseed Crops Improvement, Guangdong Key Laboratory for Crops Genetic Improvement, Guangzhou 510640, China; bShandong Shofine Seed Company, Jiaxiang 272400, China; cInternational Crops Research Institute for the Semi-Arid Tropics, Hyderabad 502324, India; dShandong Peanut Research Institute, Shandong Academy of Agricultural Sciences, Qingdao 266000, China; eCollege of Food Science and Engineering, Qingdao Agricultural University, Qingdao 266000, China; fPlant Genome Mapping Laboratory, University of Georgia, Athens, GA 30605; gShenzhen Key Laboratory for Orchid Conservation and Utilization, National Orchid Conservation Center of China and Orchid Conservation and Research Center of Shenzhen, Shenzhen 518000, China; hMacrogen Millennium Genomics Company, Shenzhen 518000, China; iSchool of Plant Biology, University of Western Australia, Crawley, WA 6009, Australia; and jThe Institute of Agriculture, University of Western Australia, Crawley, WA 6009, Australia

1X. Chen, Hongjie Li, M.K.P., Q.Y. and X.W. contributed equally to this work.

2To whom correspondence may be addressed. Email: liuzj@sinicaorchid.org, paterson@uga.edu, wsp@shofine.com, liang-804@163.com, r.k.varshney@cgiar.org or yshanlin1956@163.com

Significance: We present a draft genome of the peanut A-genome progenitor, Arachis duranensis providing details on total genes present in the genome. Genome analysis suggests that the peanut lineage was affected by at least three polyploidizations since the origin of eudicots. Resequencing of synthetic Arachis tetraploids reveals extensive gene conversion since their formation by human hands. The A. duranensis genome provides a major source of candidate genes for fructification, oil biosynthesis, and allergens, expanding knowledge of understudied areas of plant biology and human health impacts of plants. This study also provides millions of structural variations that can be used as genetic markers for the development of improved peanut varieties through genomics-assisted breeding.

Abstract

Peanut or groundnut (Arachis hypogaea L.), a legume of South American origin, has high seed oil content (45-56%), and is a staple crop in semiarid tropical and subtropical regions, partially because of drought tolerance conferred by its geocarpic reproductive strategy. We present a draft genome of the peanut A-genome progenitor, Arachis duranensis, and 50,324 protein coding gene models. Patterns of gene duplication suggest the peanut lineage has been affected by at least three polyploidizations since the origin of eudicots. Re-sequencing of synthetic Arachis tetraploids reveals extensive gene conversion in only three seed-to-seed generations since their formation by human hands, indicating that this process begins virtually immediately following polyploid formation. Expansion of some specific gene families suggests roles in the unusual subterranean fructification of Arachis. For example, the S1Fa-like transcription factor family has 126 Arachis members, in contrast to no more than five members in other examined plant species, and is more highly expressed in roots and etiolated seedlings than green leaves. The A. duranensis genome provides a major source of candidate genes for fructification, oil biosynthesis, and allergens, expanding knowledge of understudied areas of plant biology and human health impacts of plants, informing peanut genetic improvement and aiding deeper sequencing of Arachis diversity.

Please use following citation for using any information downloaded from this page or PNAS website

Citation: Chen X, Li H, Pandey MK, Yang Q, Wang X, Garg V, Li H, Chi X, Doddamani D, Hong Y, Upadhyaya HD,, Guo H, Khan AW, Zhu F, Zhang X, Pan L, Pierce GJ, Zhou G, Katta AVSK , Chen M, Zhong N, Agarwal A, Li S, Chitikineni A, Zhang G-Q, Sharma S, Chen N, Liu H, Janila P, Li S, Wang M, Wang T, Sun J, Li X, Li C, Wang M, Yu L, Wen S, Singh S, Yang Z, Zhao J, Zhang C, Yu Y, Bi J, Zhang X, Liu Z-J, Paterson AH, Wang S, Liang X, Varshney RK, Yu S (2016) Draft genome of the peanut A-genome progenitor (Arachis duranensis) provides insights into geocarpy, oil biosynthesis, and allergens. Proceedings of National Academy of Sciences (USA), 113(24): 6785-6790, doi: 10.1073/pnas.1600899113

Click here to download the full research article published in PNAS (May 31, 2016)

Supplementary information

SI Appendix: Details on Materials and Methods, SI Tables and SI Figures

Dataset S1: Information on the primers used for evaluation of the A. duranensis assembled genome

Dataset S2: Enriched GO terms for biological process

Dataset S3: Enriched GO terms for molecular functions

Dataset S4: Enriched GO terms for cellular components

Dataset S5: Transcription factors identified in A. duranensis and other plant species

Dataset S6: Conserved/known miRNA gene loci in the A. duranensis genome

Dataset S7: Details on the simple repeat regions and masking in the A. duranensis genome

Dataset S8: Occurrence and number of SSR repeats motifs in A. duranensis

Dataset S9: List of primer pairs designed for the SSRs identified in A. duranensis

Dataset S10: The mapped features of the re-sequencing data

Dataset S11: Identification and distribution of nucleotide variation (SNPs and INDELs) among complete genome sequence

Dataset S12: Summary of the syntenic blocks between the scaffolds of A. duranensis and chromosomes of other genomes

Dataset S13: Likely converted sites in scaffolds of A. duranensis

Dataset S14: Paired genes involved in gravitropism between A. duranensis and soybean

Dataset S15: Gravitropism genes identified in Arabidopsis in previous studies and their homologs in A. duranensis, soybean and                                 Medicago

Dataset S16: Manual annotation of genes involved in fatty biosythesis and triacylglycerol assembly

Dataset S17: Putative allergen encoding genes in the A. duranensis genome

Arachis_duranensis_Genome_V1.0.tar

The compressed file ‘Arachis_duranensis_Genome_V1.0.tar.gz’ contains genome assembly (Arachis_duranensis_Genome_V1.0.tar.gz), gene sequences (Arachis_duranensis_Genome_V1.0_genes.fa.gz), CDS sequences (Arachis_duranensis_Genome_V1.0_CDS.fa.gz), gene models (Arachis_duranensis_Genome_V1.0_genes.gff.gz), proteins (Arachis_duranensis_Genome_V1.0_proteome.fa) and repeats (Arachis_duranensis_Genome_V1.0_repeats.gff.gz). This version of assembly has been used for all the analysis presented in the paper except description of genome and evolutionary analysis.

Arachis_duranensis_Genome_V2.0.tar
This is an updated version of genome assembly published in PNAS paper. Please note that this assembly has been used for preparing
Table 1 regarding genome assembly and Figure 2D for evolutionary analysis in the PNAS paper
.

For any questions, please contact Rajeev Varshney (r.k.varshney@cgiar.org), Xuanqiang Liang (liang-804@163.com), Xiaoping Chen (xpchen1011@qq.com) and Manish Pandey (m.pandey@cgiar.org).

Please contact : Dr Rajeev Varshney, Director-Center of Excellence in Genomics, ICRISAT, Patancheru, 502 324, India;
Office: +91 40 3071 3305; Email : r.k.varshney@cgiar.org
© Center of Excellence in Genomics. 2008