Data information

The Nicotiana attenuata data hub (NaDH) combines different data sources for analyzing, such as genomic, transcriptomic and metabolimic data. Additionally, NaDH provides unique tools to analyze this large data sets. Here, we describe how the data was collected and how the analysis can be performed.

Genomic data (release version 2.0):

Genome:

Assembly version r2.0

For a detailed description about the assembly check the corresponding genome paper (manuscript in submission).

Gene annotation:

Genome annotation version r2.0

In total, 33,449 highly confidence gene models were predicted for N. attenuata. The functions of this genes were predicted using Blast2GO, InterProScan and MapMan. Additionally, a large set of gene models were manually corrected.

Phylogeny:

Genes of 11 published dicot species were clustered in homologous groups (HG) based on their sequence similarity. In total, all genes were clustered into 23,340 HG with at least two homolog sequences. The phylogenetic trees for all HG were constructed using an in-house build pipeline including PhyML and jModelTest. Furthermore, all these gene trees structures were analyzed to detect duplication events. This step was performed using the species-overlapping algorithm implemented in Notung 2.6 and post-processed using an in-house pipeline to detect the most recent gene duplication events with high support values.

The following tables shows the included plant species:
Species Version # of gene models Reference URL
N. attenuata r2.0 33,449 1) http://nadh.ice.mpg.de/NaDH/download/overview
N. obtusifolia r1.0 27,911 1) http://nadh.ice.mpg.de/NaDH/download/overview
A. thaliana TAIR 10 27,416 2) http://phytozome.jgi.doe.gov/arabidopsis
C. annuum v2.0 35,336 3) http://peppersequence.genomics.cn/page/species/download.jsp
C. sativus v1.0 21,503 4) http://phytozome.jgi.doe.gov/cucumber
M. guttatus v2.0 28,140 5) http://phytozome.jgi.doe.gov/mimulus
P. trichocarpa v3.0 41,335 6) http://phytozome.jgi.doe.gov/poplar
S. lycopersicum ITAG2.3 34,727 7) http://phytozome.jgi.doe.gov/tomato
S. melongena v2.5.1 42,035 8) ftp://ftp.kazusa.or.jp/pub/eggplant/
S. tuberosum v3.4 35,119 9) http://phytozome.jgi.doe.gov/potato
V. vinifera Genoscope.12X 26,346 10) http://phytozome.jgi.doe.gov/grape

Reference:
1) Xu S, Brockmöller T, Navarro-Quezada A, Kuhl H, Gase K, Ling Z, et al. Wild tobacco genomes reveal the evolution of prolific nicotine production. Submitted.
2) Swarbreck,D., Wilks,C., Lamesch,P., Berardini,T.Z., Garcia-Hernandez,M., Foerster,H., Li,D., Meyer,T., Muller,R., Ploetz,L., et al. (2008) The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res., 36, D1009–14.
3) Qin,C., Yu,C., Shen,Y., Fang,X., Chen,L., Min,J., Cheng,J., Zhao,S., Xu,M., Luo,Y., et al. (2014) Whole-genome sequencing of cultivated and wild peppers provides insights into Capsicum domestication and specialization. Proc. Natl. Acad. Sci. U. S. A., 111, 5135–40.
4) Huang,S., Li,R., Zhang,Z., Li,L., Gu,X., Fan,W., Lucas,W.J., Wang,X., Xie,B., Ni,P., et al. (2009) The genome of the cucumber, Cucumis sativus L. Nat. Genet., 41, 1275–81.
5) Hellsten,U., Wright,K.M., Jenkins,J., Shu,S., Yuan,Y., Wessler,S.R., Schmutz,J., Willis,J.H. and Rokhsar,D.S. (2013) Fine-scale variation in meiotic recombination in Mimulus inferred from population shotgun sequencing. Proc. Natl. Acad. Sci. U. S. A., 110, 19478–82.
6) Tuskan,G. a, Difazio,S., Jansson,S., Bohlmann,J., Grigoriev,I., Hellsten,U., Putnam,N., Ralph,S., Rombauts,S., Salamov, a, et al. (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science, 313, 1596–604.
7) Tomato,T. and Consortium,G. (2012) The tomato genome sequence provides insights into fleshy fruit evolution. Nature, 485, 635–41.
8) Hirakawa,H., Shirasawa,K., Miyatake,K., Nunome,T., Negoro,S., Ohyama,A., Yamaguchi,H., Sato,S., Isobe,S., Tabata,S., et al. (2014) Draft genome sequence of eggplant (Solanum melongena L.): the representative solanum species indigenous to the old world. DNA Res., 21, 649–60.
9) Xu,X., Pan,S., Cheng,S., Zhang,B., Mu,D., Ni,P., Zhang,G., Yang,S., Li,R., Wang,J., et al. (2011) Genome sequence and analysis of the tuber crop potato. Nature, 475, 189–95.
10) Jaillon,O., Aury,J.-M., Noel,B., Policriti,A., Clepet,C., Casagrande,A., Choisne,N., Aubourg,S., Vitulo,N., Jubin,C., et al. (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature, 449, 463–7.

Genomic data (release version 1.0):

Genome:

Assembly version 7

Species N50 (kb) Count (x1000) Longest (kb) Total length (Gb) Completeness
N. attenuata 187 253 981 2.26 81%
N. obtusifolia 95 114 694 1.23 85%

Gene annotation:

Genome annotation version 1.0

The gene models of N. attenuata were annotated using a combined of MAKER2 and augustus. In total 34,941 highly confidence gene models were predicted. The functions of this genes were predicted using Blast2GO, InterProScan and MapMan. Additionally, a large set of gene models were manually corrected.

Expression data:

RNA-Seq data:

The used RNAseq data contains expression of 20 different tissues of Nicotiana attenuata: leaf control/treated, root treated, stem treated, flower buds, opening flower, corolla early/late, nectaries, ovary, style selfed/outcrossed/without pollination, pollentubes, pedicels, stigma, anthers, seeds dry/watered/smoked. The raw dataset contains expression data for 22,637 expressed genes with a TPM larger then 5.

The following tables shows the RNA libraries:
Tissue Treatment/development stage # of expressed genes
Root Rosette stage plants, treated with 5 µL 1:1 diluted M. sexta oral secretion three times in leaves 15,499
Leaf Rosette stage plants, treated with 5 µL 1:1 diluted M. sexta oral secretion three times in leaves 12,179
Leaf Rosette stage plants, no treatment 11,840
Stem Rosette stage plants, treated with 5 µL 1:1 diluted M. sexta oral secretion three times in leaves 14,682
Corolla Early developmental stage, no treatment 13,662
Corolla Late developmental stage, no treatment 13,486
Stigma Mature stigma, no treatment 14,485
Pollen tube No treatment 3,490
Style Mature style without pollination 13,492
Style Mature style, pollinated with pollens from different genotype 13,365
Style Mature style, self-pollinated 13,533
Nectary Mature nectary, no treatment 12,928
Anther Mature anther no treatment 11,550
Ovary Mature ovary, no treatment 13,960
Pedicel Mature pedicel, no treatment 14,550
Flower Fully opened flowers, no treatment 14,390
Flower bud Two early developmental stages of flowers, no treatment 14,543
Seed Treated with liquid smoke 9,227
Seed Treated with water 8,872
Seed Dry seeds 8,681

Further analysis were performed with log-transformed TPM values. This data was used for the eFP Browser and the co-expression network analysis.

Microarray data:

Currently, this database includes more than 222 microarrays of 10 different experiments. All expression values were log2 transformed and the mean of the probes value of all samples was used. For more detailed information, see the description part of the experiments in eFP Browser. All microarray probes were mapped to their corresponding genes in N. attenuata genome - only high confident probes were used.

The eFP Browser visualizes this data.

Dataset ID Genotype used for microarray Treatment Tissues Developmental stage # of arrays Data source Reference
NaHER1 WT, irHER1 Wounding + OS Leaves Rosette stage 6 Molecular Ecology Department in MPI-CE -
Cytokines and senescence WT, SAG:IPT M. sexta neonates feeding Young rosette leaves Early flowering stage 12 Molecular Ecology Department in MPI-CE -
Coronatine spray on irAOC irAOC H2O ethanol spray, 1µM coronatine spray Corolla, pistil, nectary Flower buds before opening 6 GSE52765 11)
NaMYB5 transcription factor WT, irMYB5 Wounding + OS Leaves Rosette stage 6 Molecular Ecology Department in MPI-CE -
NaMYC2 transcription factors EV, MYC2-VIGS, MYC2-like VIGS, MYC2-MYC2-like double VIGS Wounding + OS Leaves Rosette stage 12 GSE45608 12)
N. attenuata under herbivore attack WT Control, wounding, wounding + OS Leaves and roots Rosette stage 138 GSE30287 13)
Volatile exposure - 6 h WT, irLOX2/3, 35s::TPS10, irlox2/3x35S::TPS10 VOC exposure for 6 h Leaves Rosette stage 12 Molecular Ecology Department in MPI-CE -
Volatile exposure - 30 min WT, irLOX2 VOC exposure for 30 min and GLV supplementation Leaves Rosette stage 12 Molecular Ecology Department in MPI-CE -
WRKY3/6 WT, irWRKY3, irWRKY6, irWRKY3/6 M. sexta neonates feeding Leaves Rosette stage 12 Molecular Ecology Department in MPI-CE -
WRKY9 WT, irWRKY9 M. sexta neonates feeding Leaves Rosette stage 6 Molecular Ecology Department in MPI-CE -

Reference:
11) Stitz,M., Hartl,M., Baldwin,I.T., Gaquerel,E. (2014) Jasmonoyl-l-Isoleucine Coordinates Metabolic Networks Required for Anthesis and Floral Attractant Emission in Wild Tobacco (Nicotiana attenuata). The Plant Cell. 26:10:3964-3983.
12) Woldemariam,M.G., Dinh,S.T., Oh,Y., Gaquerel,E., Baldwin,I.T., Galis,I. (2013) NaMYC2 transcription factor regulates a subset of plant defense responses in Nicotiana attenuata. BMC Plant Biology. 13:73.
13) Kim, S.G., Yon,F.,Gaquerel,E., Gulati,J., Baldwin,I.T. (2011) Tissue specific diurnal rhythms of metabolites and their regulation during herbivore attack in a native Tobacco, Nicotiana attenuata. PLoS ONE. 6.

Metabolomic data

Metabolites were analyzed in 14 different tissues of Nicotiana attenuata: leaf, root, stem, flower bud, corolla, pedicels, limb, nectaries, ovary, sepals, filament, style, anthers and seeds.

Data was obtained with UHPLC-ESI/qTOF-MS under following conditions:
The column used was a Acclaim column (150×2.1 mm, particle size 2.2 μm) with a 4 mm×4 mm i.d. guard column of the same material. The following binary gradient was applied using Dionex Ultimate 3000 UHPLC system: 0–1 min, isocratic 90% (vol/vol) A (de-ionized water, 0.1% acetonitrile, and 0.05% formic acid), 10% B (acetonitrile and 0.05% formic acid); 1–40 min, gradient phase to reach 10% A, 90% B; 40–45 min, isocratic 10% A, 90% B. Flow rate was 300 μL/min. Eluted compounds were detected by a high-resolution MicroToF mass spectrometer (Bruker Daltonics, Bremen, Germany) equipped with an electrospray ionization source operating in positive ionization mode. Typical instrument settings were as follows: capillary voltage 4500 V, capillary exit 130 V, dry gas temperature 200°C, dry gas flow of 8 L/min. Ions were detected from m/z 50 to 1400 at a repetition rate of 1 Hz. Mass calibration was performed using sodium formate clusters (10 mM solution of NaOH in 50/50% v/v isopropanol/water containing 0.2% formic acid). Raw data files were converted to netCDF format using the export function of the Data Analysis v4.0 software (Bruker Daltonics, Bremen, Germany).

Further analysis were made only with binary data.

Network analysis:

Gene-gene co-expression:

The RNA-Seq data described above was used to calculate expression similarity between genes. In a first step, the not expressed genes were removed. Therefor, genes that do not have a TPM value larger then 5 in at least one tissues were removed. In a second step, constantly expressed genes were removed. To archive this, genes with an expression variance with less than 1 were removed.
The resulting dataset contains the expression of 15,311 informative genes.

The gene expression similarity is calculated among all 20 tissues. As a measurement for gene expression similarity it can be chosen among the Gini correlation coefficient (GSS) [Ma et al], Pearson and Spearman correlation. The tissues specificity is calculated only among the four tissues - this tissues were selected based on biological relevance: treated leaf and root, flower bud and dry seed. The tissues specificity is calculated in two ways: Tau and Shannon entropy [Gerstberger et al].

Metabolite-metabolite co-expression:

The metabolomic data described above was used to calculate expression similarity between metabolites.
To overcome the limitations to quantify metabolite expression, the expression was transformed in binary values. Therefor, the expression was ZMAD transformed and a cutoff of 2 was applied to convert if to binary data. The expression similarity between metabolites was calculated using the Ochiai coefficient among 14 tissues.

Gene-metabolite & metabolite-gene co-expression:

The same binary data from the metabolite-metabolite co-expression network was used to overcome the limitations to quantify metabolite expression. Additionally, the gene expression data based on RNA-Seq data (described above) was also ZMAD transformed and a cutoff of 3 was applied to convert the expression to binary data.
Only 12 tissues were used from both data sets for comparison. anthers, nectaries, ovary, pedicels, root treated, leaf (control and treated merged), stem treated, seed (dry, watered and smoked merged), corolla (early and late merged), style (selfed, outcrossed and no pollination merged), open flower and flower bud from the RNA-Seq data, and anthers, flower bud, corolla, leaf, limb, nectaries, ovary, pedicels, root, seed, stem and style from the metabolomic data. The expression similarity between genes and metabolites was calculated using the Ochiai coefficient among this 12 tissues.