Data information

The Nicotiana attenuata data hub (NaDH) combines different data sources for analyzing, such as genomic, transcriptomic and metabolimic data. Additionally, NaDH provides unique tools to analyze this large data sets. Here, we describe how the data was collected and how the analysis can be performed.

Genomic data (release version 2.0):

Genome:

Assembly version r2.0

For a detailed description about the assembly check the corresponding genome paper (manuscript in submission).

Gene annotation:

Genome annotation version r2.0

In total, 33,449 highly confidence gene models were predicted for N. attenuata. The functions of this genes were predicted using Blast2GO, InterProScan and MapMan. Additionally, a large set of gene models were manually corrected.

Phylogeny:

Genes of 11 published dicot species were clustered in homologous groups (HG) based on their sequence similarity. In total, all genes were clustered into 23,340 HG with at least two homolog sequences. The phylogenetic trees for all HG were constructed using an in-house build pipeline including PhyML and jModelTest. Furthermore, all these gene trees structures were analyzed to detect duplication events. This step was performed using the species-overlapping algorithm implemented in Notung 2.6 and post-processed using an in-house pipeline to detect the most recent gene duplication events with high support values.

The following tables shows the included plant species:

Species	Version	# of gene models	Reference	URL
N. attenuata	r2.0	33,449	1)	http://nadh.ice.mpg.de/NaDH/download/overview
N. obtusifolia	r1.0	27,911	1)	http://nadh.ice.mpg.de/NaDH/download/overview
A. thaliana	TAIR 10	27,416	2)	http://phytozome.jgi.doe.gov/arabidopsis
C. annuum	v2.0	35,336	3)	http://peppersequence.genomics.cn/page/species/download.jsp
C. sativus	v1.0	21,503	4)	http://phytozome.jgi.doe.gov/cucumber
M. guttatus	v2.0	28,140	5)	http://phytozome.jgi.doe.gov/mimulus
P. trichocarpa	v3.0	41,335	6)	http://phytozome.jgi.doe.gov/poplar
S. lycopersicum	ITAG2.3	34,727	7)	http://phytozome.jgi.doe.gov/tomato
S. melongena	v2.5.1	42,035	8)	ftp://ftp.kazusa.or.jp/pub/eggplant/
S. tuberosum	v3.4	35,119	9)	http://phytozome.jgi.doe.gov/potato
V. vinifera	Genoscope.12X	26,346	10)	http://phytozome.jgi.doe.gov/grape

Reference:
1) Xu S, Brockmöller T, Navarro-Quezada A, Kuhl H, Gase K, Ling Z, et al. Wild tobacco genomes reveal the evolution of prolific nicotine production. Submitted.
2) Swarbreck,D., Wilks,C., Lamesch,P., Berardini,T.Z., Garcia-Hernandez,M., Foerster,H., Li,D., Meyer,T., Muller,R., Ploetz,L., et al. (2008) The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res., 36, D1009–14.
3) Qin,C., Yu,C., Shen,Y., Fang,X., Chen,L., Min,J., Cheng,J., Zhao,S., Xu,M., Luo,Y., et al. (2014) Whole-genome sequencing of cultivated and wild peppers provides insights into Capsicum domestication and specialization. Proc. Natl. Acad. Sci. U. S. A., 111, 5135–40.
4) Huang,S., Li,R., Zhang,Z., Li,L., Gu,X., Fan,W., Lucas,W.J., Wang,X., Xie,B., Ni,P., et al. (2009) The genome of the cucumber, Cucumis sativus L. Nat. Genet., 41, 1275–81.
5) Hellsten,U., Wright,K.M., Jenkins,J., Shu,S., Yuan,Y., Wessler,S.R., Schmutz,J., Willis,J.H. and Rokhsar,D.S. (2013) Fine-scale variation in meiotic recombination in Mimulus inferred from population shotgun sequencing. Proc. Natl. Acad. Sci. U. S. A., 110, 19478–82.
6) Tuskan,G. a, Difazio,S., Jansson,S., Bohlmann,J., Grigoriev,I., Hellsten,U., Putnam,N., Ralph,S., Rombauts,S., Salamov, a, et al. (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science, 313, 1596–604.
7) Tomato,T. and Consortium,G. (2012) The tomato genome sequence provides insights into fleshy fruit evolution. Nature, 485, 635–41.
8) Hirakawa,H., Shirasawa,K., Miyatake,K., Nunome,T., Negoro,S., Ohyama,A., Yamaguchi,H., Sato,S., Isobe,S., Tabata,S., et al. (2014) Draft genome sequence of eggplant (Solanum melongena L.): the representative solanum species indigenous to the old world. DNA Res., 21, 649–60.
9) Xu,X., Pan,S., Cheng,S., Zhang,B., Mu,D., Ni,P., Zhang,G., Yang,S., Li,R., Wang,J., et al. (2011) Genome sequence and analysis of the tuber crop potato. Nature, 475, 189–95.
10) Jaillon,O., Aury,J.-M., Noel,B., Policriti,A., Clepet,C., Casagrande,A., Choisne,N., Aubourg,S., Vitulo,N., Jubin,C., et al. (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature, 449, 463–7.

Genomic data (release version 1.0):

Genome:

Assembly version 7

Species	N50 (kb)	Count (x1000)	Longest (kb)	Total length (Gb)	Completeness
N. attenuata	187	253	981	2.26	81%
N. obtusifolia	95	114	694	1.23	85%

Gene annotation:

Genome annotation version 1.0

The gene models of N. attenuata were annotated using a combined of MAKER2 and augustus. In total 34,941 highly confidence gene models were predicted. The functions of this genes were predicted using Blast2GO, InterProScan and MapMan. Additionally, a large set of gene models were manually corrected.

Expression data:

RNA-Seq data:

The used RNAseq data contains expression of 20 different tissues of Nicotiana attenuata: leaf control/treated, root treated, stem treated, flower buds, opening flower, corolla early/late, nectaries, ovary, style selfed/outcrossed/without pollination, pollentubes, pedicels, stigma, anthers, seeds dry/watered/smoked. The raw dataset contains expression data for 22,637 expressed genes with a TPM larger then 5.

The following tables shows the RNA libraries:

Tissue	Treatment/development stage	# of expressed genes
Root	Rosette stage plants, treated with 5 µL 1:1 diluted M. sexta oral secretion three times in leaves	15,499
Leaf	Rosette stage plants, treated with 5 µL 1:1 diluted M. sexta oral secretion three times in leaves	12,179
Leaf	Rosette stage plants, no treatment	11,840
Stem	Rosette stage plants, treated with 5 µL 1:1 diluted M. sexta oral secretion three times in leaves	14,682
Corolla	Early developmental stage, no treatment	13,662
Corolla	Late developmental stage, no treatment	13,486
Stigma	Mature stigma, no treatment	14,485
Pollen tube	No treatment	3,490
Style	Mature style without pollination	13,492
Style	Mature style, pollinated with pollens from different genotype	13,365
Style	Mature style, self-pollinated	13,533
Nectary	Mature nectary, no treatment	12,928
Anther	Mature anther no treatment	11,550
Ovary	Mature ovary, no treatment	13,960
Pedicel	Mature pedicel, no treatment	14,550
Flower	Fully opened flowers, no treatment	14,390
Flower bud	Two early developmental stages of flowers, no treatment	14,543
Seed	Treated with liquid smoke	9,227
Seed	Treated with water	8,872
Seed	Dry seeds	8,681

Further analysis were performed with log-transformed TPM values. This data was used for the eFP Browser and the co-expression network analysis.

Microarray data:

Currently, this database includes more than 222 microarrays of 10 different experiments. All expression values were log2 transformed and the mean of the probes value of all samples was used. For more detailed information, see the description part of the experiments in eFP Browser. All microarray probes were mapped to their corresponding genes in N. attenuata genome - only high confident probes were used.

The eFP Browser visualizes this data.

Dataset ID	Genotype used for microarray	Treatment	Tissues	Developmental stage	# of arrays	Data source	Reference
NaHER1	WT, irHER1	Wounding + OS	Leaves	Rosette stage	6	Molecular Ecology Department in MPI-CE	-
Cytokines and senescence	WT, SAG:IPT	M. sexta neonates feeding	Young rosette leaves	Early flowering stage	12	Molecular Ecology Department in MPI-CE	-
Coronatine spray on irAOC	irAOC	H2O ethanol spray, 1µM coronatine spray	Corolla, pistil, nectary	Flower buds before opening	6	GSE52765	11)
NaMYB5 transcription factor	WT, irMYB5	Wounding + OS	Leaves	Rosette stage	6	Molecular Ecology Department in MPI-CE	-
NaMYC2 transcription factors	EV, MYC2-VIGS, MYC2-like VIGS, MYC2-MYC2-like double VIGS	Wounding + OS	Leaves	Rosette stage	12	GSE45608	12)
N. attenuata under herbivore attack	WT	Control, wounding, wounding + OS	Leaves and roots	Rosette stage	138	GSE30287	13)
Volatile exposure - 6 h	WT, irLOX2/3, 35s::TPS10, irlox2/3x35S::TPS10	VOC exposure for 6 h	Leaves	Rosette stage	12	Molecular Ecology Department in MPI-CE	-
Volatile exposure - 30 min	WT, irLOX2	VOC exposure for 30 min and GLV supplementation	Leaves	Rosette stage	12	Molecular Ecology Department in MPI-CE	-
WRKY3/6	WT, irWRKY3, irWRKY6, irWRKY3/6	M. sexta neonates feeding	Leaves	Rosette stage	12	Molecular Ecology Department in MPI-CE	-
WRKY9	WT, irWRKY9	M. sexta neonates feeding	Leaves	Rosette stage	6	Molecular Ecology Department in MPI-CE	-

Reference:
11) Stitz,M., Hartl,M., Baldwin,I.T., Gaquerel,E. (2014) Jasmonoyl-l-Isoleucine Coordinates Metabolic Networks Required for Anthesis and Floral Attractant Emission in Wild Tobacco (Nicotiana attenuata). The Plant Cell. 26:10:3964-3983.
12) Woldemariam,M.G., Dinh,S.T., Oh,Y., Gaquerel,E., Baldwin,I.T., Galis,I. (2013) NaMYC2 transcription factor regulates a subset of plant defense responses in Nicotiana attenuata. BMC Plant Biology. 13:73.
13) Kim, S.G., Yon,F.,Gaquerel,E., Gulati,J., Baldwin,I.T. (2011) Tissue specific diurnal rhythms of metabolites and their regulation during herbivore attack in a native Tobacco, Nicotiana attenuata. PLoS ONE. 6.

Metabolomic data

Metabolites were analyzed in 14 different tissues of Nicotiana attenuata: leaf, root, stem, flower bud, corolla, pedicels, limb, nectaries, ovary, sepals, filament, style, anthers and seeds.

Data was obtained with UHPLC-ESI/qTOF-MS under following conditions:
The column used was a Acclaim column (150×2.1 mm, particle size 2.2 μm) with a 4 mm×4 mm i.d. guard column of the same material. The following binary gradient was applied using Dionex Ultimate 3000 UHPLC system: 0–1 min, isocratic 90% (vol/vol) A (de-ionized water, 0.1% acetonitrile, and 0.05% formic acid), 10% B (acetonitrile and 0.05% formic acid); 1–40 min, gradient phase to reach 10% A, 90% B; 40–45 min, isocratic 10% A, 90% B. Flow rate was 300 μL/min. Eluted compounds were detected by a high-resolution MicroToF mass spectrometer (Bruker Daltonics, Bremen, Germany) equipped with an electrospray ionization source operating in positive ionization mode. Typical instrument settings were as follows: capillary voltage 4500 V, capillary exit 130 V, dry gas temperature 200°C, dry gas flow of 8 L/min. Ions were detected from m/z 50 to 1400 at a repetition rate of 1 Hz. Mass calibration was performed using sodium formate clusters (10 mM solution of NaOH in 50/50% v/v isopropanol/water containing 0.2% formic acid). Raw data files were converted to netCDF format using the export function of the Data Analysis v4.0 software (Bruker Daltonics, Bremen, Germany).

Further analysis were made only with binary data.

Network analysis:

Gene-gene co-expression:

The RNA-Seq data described above was used to calculate expression similarity between genes. In a first step, the not expressed genes were removed. Therefor, genes that do not have a TPM value larger then 5 in at least one tissues were removed. In a second step, constantly expressed genes were removed. To archive this, genes with an expression variance with less than 1 were removed.
The resulting dataset contains the expression of 15,311 informative genes.

The gene expression similarity is calculated among all 20 tissues. As a measurement for gene expression similarity it can be chosen among the Gini correlation coefficient (GSS) [Ma et al], Pearson and Spearman correlation. The tissues specificity is calculated only among the four tissues - this tissues were selected based on biological relevance: treated leaf and root, flower bud and dry seed. The tissues specificity is calculated in two ways: Tau and Shannon entropy [Gerstberger et al].

Metabolite-metabolite co-expression:

The metabolomic data described above was used to calculate expression similarity between metabolites.
To overcome the limitations to quantify metabolite expression, the expression was transformed in binary values. Therefor, the expression was ZMAD transformed and a cutoff of 2 was applied to convert if to binary data. The expression similarity between metabolites was calculated using the Ochiai coefficient among 14 tissues.

Gene-metabolite & metabolite-gene co-expression:

The same binary data from the metabolite-metabolite co-expression network was used to overcome the limitations to quantify metabolite expression. Additionally, the gene expression data based on RNA-Seq data (described above) was also ZMAD transformed and a cutoff of 3 was applied to convert the expression to binary data.
Only 12 tissues were used from both data sets for comparison. anthers, nectaries, ovary, pedicels, root treated, leaf (control and treated merged), stem treated, seed (dry, watered and smoked merged), corolla (early and late merged), style (selfed, outcrossed and no pollination merged), open flower and flower bud from the RNA-Seq data, and anthers, flower bud, corolla, leaf, limb, nectaries, ovary, pedicels, root, seed, stem and style from the metabolomic data. The expression similarity between genes and metabolites was calculated using the Ochiai coefficient among this 12 tissues.