The Nicotiana attenuata data hub (NaDH) combines different data sources for analyzing, such as genomic, transcriptomic and metabolimic data. Additionally, NaDH provides unique tools to analyze this large data sets. Here, we describe how the data was collected and how the analysis can be performed.
Genomic data (release version 2.0):
Assembly version r2.0
For a detailed description about the assembly check the corresponding genome paper (manuscript in submission).
Genome annotation version r2.0
In total, 33,449 highly confidence gene models were predicted for N. attenuata. The functions of this genes were predicted using Blast2GO, InterProScan and MapMan. Additionally, a large set of gene models were manually corrected.
Genes of 11 published dicot species were clustered in homologous groups (HG) based on their sequence similarity. In total, all genes were clustered into 23,340 HG with at least two homolog sequences. The phylogenetic trees for all HG were constructed using an in-house build pipeline including PhyML and jModelTest. Furthermore, all these gene trees structures were analyzed to detect duplication events. This step was performed using the species-overlapping algorithm implemented in Notung 2.6 and post-processed using an in-house pipeline to detect the most recent gene duplication events with high support values.
|Species||Version||# of gene models||Reference||URL|
|A. thaliana||TAIR 10||27,416||2)||http://phytozome.jgi.doe.gov/arabidopsis|
1) Xu S, Brockmöller T, Navarro-Quezada A, Kuhl H, Gase K, Ling Z, et al. Wild tobacco genomes reveal the evolution of prolific nicotine production. Submitted.
2) Swarbreck,D., Wilks,C., Lamesch,P., Berardini,T.Z., Garcia-Hernandez,M., Foerster,H., Li,D., Meyer,T., Muller,R., Ploetz,L., et al. (2008) The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res., 36, D1009–14.
3) Qin,C., Yu,C., Shen,Y., Fang,X., Chen,L., Min,J., Cheng,J., Zhao,S., Xu,M., Luo,Y., et al. (2014) Whole-genome sequencing of cultivated and wild peppers provides insights into Capsicum domestication and specialization. Proc. Natl. Acad. Sci. U. S. A., 111, 5135–40.
4) Huang,S., Li,R., Zhang,Z., Li,L., Gu,X., Fan,W., Lucas,W.J., Wang,X., Xie,B., Ni,P., et al. (2009) The genome of the cucumber, Cucumis sativus L. Nat. Genet., 41, 1275–81.
5) Hellsten,U., Wright,K.M., Jenkins,J., Shu,S., Yuan,Y., Wessler,S.R., Schmutz,J., Willis,J.H. and Rokhsar,D.S. (2013) Fine-scale variation in meiotic recombination in Mimulus inferred from population shotgun sequencing. Proc. Natl. Acad. Sci. U. S. A., 110, 19478–82.
6) Tuskan,G. a, Difazio,S., Jansson,S., Bohlmann,J., Grigoriev,I., Hellsten,U., Putnam,N., Ralph,S., Rombauts,S., Salamov, a, et al. (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science, 313, 1596–604.
7) Tomato,T. and Consortium,G. (2012) The tomato genome sequence provides insights into fleshy fruit evolution. Nature, 485, 635–41.
8) Hirakawa,H., Shirasawa,K., Miyatake,K., Nunome,T., Negoro,S., Ohyama,A., Yamaguchi,H., Sato,S., Isobe,S., Tabata,S., et al. (2014) Draft genome sequence of eggplant (Solanum melongena L.): the representative solanum species indigenous to the old world. DNA Res., 21, 649–60.
9) Xu,X., Pan,S., Cheng,S., Zhang,B., Mu,D., Ni,P., Zhang,G., Yang,S., Li,R., Wang,J., et al. (2011) Genome sequence and analysis of the tuber crop potato. Nature, 475, 189–95.
10) Jaillon,O., Aury,J.-M., Noel,B., Policriti,A., Clepet,C., Casagrande,A., Choisne,N., Aubourg,S., Vitulo,N., Jubin,C., et al. (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature, 449, 463–7.
Genomic data (release version 1.0):
Assembly version 7
|Species||N50 (kb)||Count (x1000)||Longest (kb)||Total length (Gb)||Completeness|
Genome annotation version 1.0
The gene models of N. attenuata were annotated using a combined of MAKER2 and augustus. In total 34,941 highly confidence gene models were predicted. The functions of this genes were predicted using Blast2GO, InterProScan and MapMan. Additionally, a large set of gene models were manually corrected.
The used RNAseq data contains expression of 20 different tissues of Nicotiana attenuata: leaf control/treated, root treated, stem treated, flower buds, opening flower, corolla early/late, nectaries, ovary, style selfed/outcrossed/without pollination, pollentubes, pedicels, stigma, anthers, seeds dry/watered/smoked. The raw dataset contains expression data for 22,637 expressed genes with a TPM larger then 5.
|Tissue||Treatment/development stage||# of expressed genes|
|Root||Rosette stage plants, treated with 5 µL 1:1 diluted M. sexta oral secretion three times in leaves||15,499|
|Leaf||Rosette stage plants, treated with 5 µL 1:1 diluted M. sexta oral secretion three times in leaves||12,179|
|Leaf||Rosette stage plants, no treatment||11,840|
|Stem||Rosette stage plants, treated with 5 µL 1:1 diluted M. sexta oral secretion three times in leaves||14,682|
|Corolla||Early developmental stage, no treatment||13,662|
|Corolla||Late developmental stage, no treatment||13,486|
|Stigma||Mature stigma, no treatment||14,485|
|Pollen tube||No treatment||3,490|
|Style||Mature style without pollination||13,492|
|Style||Mature style, pollinated with pollens from different genotype||13,365|
|Style||Mature style, self-pollinated||13,533|
|Nectary||Mature nectary, no treatment||12,928|
|Anther||Mature anther no treatment||11,550|
|Ovary||Mature ovary, no treatment||13,960|
|Pedicel||Mature pedicel, no treatment||14,550|
|Flower||Fully opened flowers, no treatment||14,390|
|Flower bud||Two early developmental stages of flowers, no treatment||14,543|
|Seed||Treated with liquid smoke||9,227|
|Seed||Treated with water||8,872|
Further analysis were performed with log-transformed TPM values. This data was used for the eFP Browser and the co-expression network analysis.
Currently, this database includes more than 222 microarrays of 10 different experiments. All expression values were log2 transformed and the mean of the probes value of all samples was used. For more detailed information, see the description part of the experiments in eFP Browser. All microarray probes were mapped to their corresponding genes in N. attenuata genome - only high confident probes were used.
The eFP Browser visualizes this data.
|Dataset ID||Genotype used for microarray||Treatment||Tissues||Developmental stage||# of arrays||Data source||Reference|
|NaHER1||WT, irHER1||Wounding + OS||Leaves||Rosette stage||6||Molecular Ecology Department in MPI-CE||-|
|Cytokines and senescence||WT, SAG:IPT||M. sexta neonates feeding||Young rosette leaves||Early flowering stage||12||Molecular Ecology Department in MPI-CE||-|
|Coronatine spray on irAOC||irAOC||H2O ethanol spray, 1µM coronatine spray||Corolla, pistil, nectary||Flower buds before opening||6||GSE52765||11)|
|NaMYB5 transcription factor||WT, irMYB5||Wounding + OS||Leaves||Rosette stage||6||Molecular Ecology Department in MPI-CE||-|
|NaMYC2 transcription factors||EV, MYC2-VIGS, MYC2-like VIGS, MYC2-MYC2-like double VIGS||Wounding + OS||Leaves||Rosette stage||12||GSE45608||12)|
|N. attenuata under herbivore attack||WT||Control, wounding, wounding + OS||Leaves and roots||Rosette stage||138||GSE30287||13)|
|Volatile exposure - 6 h||WT, irLOX2/3, 35s::TPS10, irlox2/3x35S::TPS10||VOC exposure for 6 h||Leaves||Rosette stage||12||Molecular Ecology Department in MPI-CE||-|
|Volatile exposure - 30 min||WT, irLOX2||VOC exposure for 30 min and GLV supplementation||Leaves||Rosette stage||12||Molecular Ecology Department in MPI-CE||-|
|WRKY3/6||WT, irWRKY3, irWRKY6, irWRKY3/6||M. sexta neonates feeding||Leaves||Rosette stage||12||Molecular Ecology Department in MPI-CE||-|
|WRKY9||WT, irWRKY9||M. sexta neonates feeding||Leaves||Rosette stage||6||Molecular Ecology Department in MPI-CE||-|
11) Stitz,M., Hartl,M., Baldwin,I.T., Gaquerel,E. (2014) Jasmonoyl-l-Isoleucine Coordinates Metabolic Networks Required for Anthesis and Floral Attractant Emission in Wild Tobacco (Nicotiana attenuata). The Plant Cell. 26:10:3964-3983.
12) Woldemariam,M.G., Dinh,S.T., Oh,Y., Gaquerel,E., Baldwin,I.T., Galis,I. (2013) NaMYC2 transcription factor regulates a subset of plant defense responses in Nicotiana attenuata. BMC Plant Biology. 13:73.
13) Kim, S.G., Yon,F.,Gaquerel,E., Gulati,J., Baldwin,I.T. (2011) Tissue specific diurnal rhythms of metabolites and their regulation during herbivore attack in a native Tobacco, Nicotiana attenuata. PLoS ONE. 6.
Metabolites were analyzed in 14 different tissues of Nicotiana attenuata: leaf, root, stem, flower bud, corolla, pedicels, limb, nectaries, ovary, sepals, filament, style, anthers and seeds.
Data was obtained with UHPLC-ESI/qTOF-MS under following conditions:
The column used was a Acclaim column (150×2.1 mm, particle size 2.2 μm) with a 4 mm×4 mm i.d. guard column of the same material. The following binary gradient was applied using Dionex Ultimate 3000 UHPLC system: 0–1 min, isocratic 90% (vol/vol) A (de-ionized water, 0.1% acetonitrile, and 0.05% formic acid), 10% B (acetonitrile and 0.05% formic acid); 1–40 min, gradient phase to reach 10% A, 90% B; 40–45 min, isocratic 10% A, 90% B. Flow rate was 300 μL/min. Eluted compounds were detected by a high-resolution MicroToF mass spectrometer (Bruker Daltonics, Bremen, Germany) equipped with an electrospray ionization source operating in positive ionization mode. Typical instrument settings were as follows: capillary voltage 4500 V, capillary exit 130 V, dry gas temperature 200°C, dry gas flow of 8 L/min. Ions were detected from m/z 50 to 1400 at a repetition rate of 1 Hz. Mass calibration was performed using sodium formate clusters (10 mM solution of NaOH in 50/50% v/v isopropanol/water containing 0.2% formic acid). Raw data files were converted to netCDF format using the export function of the Data Analysis v4.0 software (Bruker Daltonics, Bremen, Germany).
Further analysis were made only with binary data.
The RNA-Seq data described above was used to calculate expression similarity between genes.
In a first step, the not expressed genes were removed. Therefor, genes that do not have a TPM value larger then 5 in at least one tissues
In a second step, constantly expressed genes were removed. To archive this, genes with an expression variance with less than 1 were removed.
The resulting dataset contains the expression of 15,311 informative genes.
The gene expression similarity is calculated among all 20 tissues. As a measurement for gene expression similarity it can be chosen among the Gini correlation coefficient (GSS) [Ma et al], Pearson and Spearman correlation. The tissues specificity is calculated only among the four tissues - this tissues were selected based on biological relevance: treated leaf and root, flower bud and dry seed. The tissues specificity is calculated in two ways: Tau and Shannon entropy [Gerstberger et al].
The metabolomic data described above was used to calculate expression similarity between metabolites.
To overcome the limitations to quantify metabolite expression, the expression was transformed in binary values. Therefor, the expression was ZMAD transformed and a cutoff of 2 was applied to convert if to binary data. The expression similarity between metabolites was calculated using the Ochiai coefficient among 14 tissues.
Gene-metabolite & metabolite-gene co-expression:
The same binary data from the metabolite-metabolite co-expression network was used to overcome the limitations to quantify metabolite expression.
Additionally, the gene expression data based on RNA-Seq data (described above) was also ZMAD transformed and a cutoff of 3 was applied to convert the expression to binary data.
Only 12 tissues were used from both data sets for comparison. anthers, nectaries, ovary, pedicels, root treated, leaf (control and treated merged), stem treated, seed (dry, watered and smoked merged), corolla (early and late merged), style (selfed, outcrossed and no pollination merged), open flower and flower bud from the RNA-Seq data, and anthers, flower bud, corolla, leaf, limb, nectaries, ovary, pedicels, root, seed, stem and style from the metabolomic data. The expression similarity between genes and metabolites was calculated using the Ochiai coefficient among this 12 tissues.