An elastic network model to identify characteristic stress response genes

June 19, 2017 | Autor: H. Ellinger-ziege... | Categoría: Biological Sciences, Gene Regulatory Networks, CHEMICAL SCIENCES, Biological markers, Physiological Stress Markers, Biology Chemistry, Gene Expression Regulation, Gene expression profiling, Biology Chemistry, Gene Expression Regulation, Gene expression profiling

Share Embed

Laporkan tautan ini

Descripción

Computational Biology and Chemistry 34 (2010) 193–202

Contents lists available at ScienceDirect

Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/compbiolchem

Research article

An elastic network model to identify characteristic stress response genes Sebastian Schneckener a,1 , Linus Görlitz a,1 , Heidrun Ellinger-Ziegelbauer b , Hans-Jürgen Ahr b , Andreas Schuppert a,c,∗ a b c

PT-AS-SBCS, Bayer Technology Services, 51368 Leverkusen, Germany Department of Special Toxicology, Bayer Schering Pharma AG, 42096 Wuppertal, Germany RWTH, 52056 Aachen, Germany

a r t i c l e

i n f o

Article history: Received 16 November 2009 Received in revised form 15 June 2010 Accepted 16 June 2010 Keywords: Gene expression Microarray data Toxicology prediction Stress response Pharmacology Drug development

a b s t r a c t Exposing eukaryotic cells to a toxic compound and subsequent gene expression proﬁling may allow the prediction of selected toxic effects based on changes in gene expression. This objective is complicated by the observation that compounds with different modes of toxicity cause similar changes in gene expression and that a global stress response affects many genes. We developed an elastic network model of global stress response with nodes representing genes which are connected by edges of graded coexpression. The expression of only few genes have to be known to model the global stress response of all but a few atypical responder genes. Those required genes and the atypical response genes are shown to be good biomarker for tox predictions. In total, 138 experiments and 13 different compounds were used to train models for different toxicity classes. The deduced biomarkers were shown to be biologically plausible. A neural network was trained to predict the toxic effects of compounds from proﬁling experiments. On a validation data set of 189 experiments with 16 different compounds the accuracy of the predictions was assessed: 14 out of 16 compounds have been classiﬁed correctly. Derivation of model based biomarkers through the elastic network approach can naturally be extended to other areas beyond toxicology since subtle signals against a broad response background are common in biological studies. © 2010 Elsevier Ltd. All rights reserved.

1. Introduction The state of a biological cell is well approximated by a genome-wide gene expression proﬁle—frequently measured by microarrays. A search in the literature reveals thousands of gene expression proﬁling experiments in recent years (Barrett et al., 2009). The popularity is due to the efﬁciency of the technology in terms of the number of measurements per experiment and the cost per measurement. It is common practice to use gene expression proﬁling for comparing the states of two cell populations that have been exposed to different conditions. This type of experiment is used to delineate differentially expressed probesets, genes or pathways. The result allows speculation about properties of the different conditions, reasoning about the functions of differentially expressed genes, or postulation of biomarkers as predictors of the

∗ Corresponding author. E-mail addresses: [email protected] (S. Schneckener), [email protected] (L. Görlitz), [email protected] (H. Ellinger-Ziegelbauer), [email protected] (H.-J. Ahr), [email protected] (A. Schuppert). 1 These authors contributed equally to this work. 1476-9271/$ – see front matter © 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiolchem.2010.06.004

applied conditions. A straightforward approach to discover differentially expressed genes is the use of p-values from a t-test (or ANOVA if more than two conditions are to be compared) as the criterion. Several advancements upon this basic approach have been developed, e.g. (Kadota et al., 2008). Performing such an analysis for full genome expression data requires correcting the p-values for multiple testing. Common approaches use the Bonferroni (Gordon et al., 2007) or Benjamini Hochberg FDR correction (Benjamini and Hochberg, 1995; Storey and Tibshirani, 2003). For many biological conditions under examination, the level of change to the transcriptome is quite minute, making it difﬁcult to detect differentially expressed genes at all, or, if tightly coregulated genes alter their expression level, the response is very broad (Khil and CameriniOtero, 2002; Luscombe et al., 2004), involving hundreds of genes. In particular, the situation with large numbers of differentially expressed genes is difﬁcult to interpret. Often the affected genes are spread over many different pathways or biological processes. While it is possible to identify pathways that are connected to the biological question studied, many other pathways are not directly related. The obvious challenge is to generate new hypotheses from this type of result. A gene-wise or probeset-wise analysis of expression data has the drawback that it does not take into account the coregulation structure of gene expression data. The coregulation structure is a

194

S. Schneckener et al. / Computational Biology and Chemistry 34 (2010) 193–202

fundamental property of biological networks (Stuart et al., 2003) and e.g. caused by a set of transcription factors binding sites that is shared among coregulated genes (Brivanlou and Darnell, 2002; Blais and Dynlacht, 2005). To improve statistical inference from microarray studies, the complex dependence structure between genes can be embraced. If several genes show differential gene expression and belong to the same biological theme, e.g. metabolic pathway, weak statistical signals of single genes sum up to more solid evidence on the level of gene sets (Subramanian et al., 2005). That method, commonly referred to as gene set enrichment analysis, relies on gene lists ranked according to p-values from a t-test statistic from gene set comparisons. In practice, these approaches have proven to be very powerful with the clear disadvantage of requiring prior knowledge on sets of related genes. 1.1. An elastic network model for genome-wide stress response If the response of a cell to external stress is measured by gene expression proﬁling, often a large fraction of genes shows differential expression, despite a distinct and localised mode of toxicity. The grade of coherent response between two different genes is governed by their degree of coregulated in the cellular stress reaction program. This article regards gene expression in close analogy to linear network theory. Genes are represented as nodes of a weighted network. The links between genes are weighed by their pairwise coexpression if stress is put on some nodes. A change in environmental conditions thus leads to an adjustment of the expression levels of many genes. The expression proﬁle x is modeled as x =A·u

(1)

with coexpression matrix A and a perturbation vector u. A detailed derivation of the model can be found in Appendix A. Consequently, searching for good biomarkers which appropriately characterize the response must exploit detailed knowledge of the interaction structure of genes. It is well known that this complex coregulation can be inferred from multiple measurements by calculating the coexpression matrix (Gardner et al., 2003). Experimentally inferring all entries of A requires an unrealistic number of experiments. It is known that a cell’s genome-wide stress response is caused by a small number of stress response modes (López-Maury et al., 2008). In eukaryotes such modes may be e.g. related to anti-oxidation, heat shock, or energy generation (Girardot et al., 2004). Each mode is represented by several coexpressed genes. In order to characterize the stress response of a cell it is thus sufﬁcient to reconstruct only those few modes. In this paper, we develop a method which focuses on the identiﬁcation of these stress response modes. More speciﬁcally, given two sets of cells (controls and stressed cells) the expression value xi of a probeset from stressed cells is modeled as the sum of a characteristic expression value x¯ i,control in controls, a shift component xi,char induced by applying stress and a noise component i . xi = x¯ i,control + xi,char +

i

(2)

The standard deviation of the noise component i is denoted by i . The noise component, though a mixture of technical and biological noise, is solely interpreted as biological. As the relative inﬂuence of technical noise decreases with increasing expression value it is advisable to restrict analysis to highly expressed probesets. Using Eq. (1) gives a linear model for the global stress response xchar which is dominated by a small number of response modes. Given the validity of the linear model and due to the coexpression among the genes within a response mode, the shift component xi,char can be represented by few probesets. These representative probesets are identiﬁed by performing a forward stagewise feature selection

procedure with probesets ranked by least-angle regression (LARS) (Efron et al., 2004) according to their importance for accurately modeling the characteristic shift component. This results in a vector u with the minimum number of nonzero entries required for appropriate modeling. The linear model is limited in that it does not cover on/off regulation of pathways or feedback loops that may sharply amplify the expression of pathways. Therefore, probesets whose expression value signiﬁcantly deviates from the linear model indicate changes in the coregulation structure of the respective genes induced by non-generic stress response mechanisms. These outlier probesets and those having a nonzero coefﬁcient in the perturbation vector u completely describe the cellular response. The model provides a background allowing the quantitative separation of the experimental, genome-wide data into the contributions of the generic core response mechanisms and the contributions of compound-speciﬁc, local or non-linear atypical stress response mechanisms (López-Maury et al., 2008). 1.2. Objectives of the study In toxicology, it is common praxis to assess the carcinogenicity of a chemical compound with a 2-year rodent bioassay. As a shortcut, it is envisioned to use gene expression proﬁling after as little as 2 weeks of experimentation. It is assumed that an early cellular response is predictive for the toxicological outcome. The toxicity classes under consideration, here genotoxic carcinogen and non-genotoxic carcinogen, display similar, genome-wide stress responses. Hence it is important to develop a method that can delineate speciﬁc and general aspects of stress responses. As outlined above, a linear elastic network model allows to identify markers for general and speciﬁc stress responses. We used this approach to analyze data from a toxicogenomic experiment (Ellinger-Ziegelbauer et al., 2008, 2005). A test set of 138 Affymetrix GeneChip experiments (13 different training compounds, replicas and controls) was used to develop the method and to train the model. An independent validation set of 189 experiments (16 validation compounds) was used to validate the accuracy of the approach. The test set was used to derive a network model for linear stress response for each toxicity class. Each model was deﬁned by a small set of probesets Gu , called attractors, which had a nonzero coefﬁcient in u. Some probesets deviated from the prediction and are denoted as outlier genes Go . Since Go and Gu completely describe the observed cellular response to stress, they are, together, denoted as biomarkers. The biomarkers were analyzed with respect to the biological role of their genes. In addition, they were evaluated as classiﬁers for each toxicity class using an artiﬁcial neural network. The performance of the classiﬁer was assessed using the validation data set. Fig. 1 presents a ﬂow chart of experimental and computational steps. 2. Methods 2.1. Expression proﬁles induced by genotoxic and non-genotoxic carcinogens in rat liver In this study, a methodology was developed to analyze gene expression response upon exposure to stress. The experimental data was previously generated with the ambition to identify biomarkers to classify the carcinogenic potential of chemical compounds (Ellinger-Ziegelbauer et al., 2008). Three different classes of xenobiotic compounds – genotoxic carcinogens, non-genotoxic carcinogens, and non-hepatocarcinogens – were analyzed. Male Wistar rats were treated for up to 14 days with selected compounds and expression proﬁles of the liver were measured with the Affymetrix GeneChip RAE230A platform.

S. Schneckener et al. / Computational Biology and Chemistry 34 (2010) 193–202

195

Fig. 1. Flow chart of experimental and computational steps. Workﬂow diagram for performed experimental and computational steps. Abbreviations: G, genotoxic hepatocarcinogen; NG, non-genotoxic hepatocarcinogen; NH, non-hepatocarcinogen; d, days; ANN, artiﬁcial neural network.

Genotoxic compounds are compounds that damage DNA, and by this action introduce mutations into the DNA which may cause cancer. Non-genotoxic carcinogenic compounds do not damage the DNA directly, but a carcinogenic effect can be observed. Non-hepatocarcinogenic compounds do not display a carcinogenic effect in the liver, but may potentially be toxic for hepatocytes depending on the dosing regime. Within the training set, the non-hepatocarcinogenic compounds were registered drugs for two different indications. Cefuroxime (Skosyreva et al., 1986) is an

antibiotic; Nifedipime (Kubo et al., 1981) and Propanolol (Igarashi et al., 1977) are used for the treatment of hypertension. The experimental doses were adapted from pharmacological dosing. The data set for method development and training (“training set”) encompassed ﬁve different genotoxic carcinogens, ﬁve non-genotoxic carcinogens and three non-hepatocarcinogens. An independent validation data set contained four different genotoxic carcinogens, six non-genotoxic carcinogens and six non-hepatocarcinogens. Experimental details have been published

196

S. Schneckener et al. / Computational Biology and Chemistry 34 (2010) 193–202

Table 1 Test and validation compounds. Class Training compounds Genotoxic carcinogen

Non-genotoxic carcinogen

Non-hepatocarcinogen

Validation compounds Genotoxic carcinogen

Non-genotoxic carcinogen

Non-hepatocarcinogen

Compound

Abbreviation

CAS number

Time course

2-Nitroﬂuorene Dimethylnitrosamine Aﬂatoxin B1 N-Nitrososmorpholine C.I Direct Black Methapyrilene HCl Thioacetamid Diethyl-stilbestrol Wy-14643 Piperonyl-butoxide Cefuroxime Nifedipine Propranolol

2-NF DMN AB1 NNM CIDB MPy TAA DES Wy PBO CFX Nif Prop

607-57-8 56-23-5 1162-65-8 59-89-2 1937-37-7 135-23-9 62-55-5 56-53-1 50892-23-4 51-03-6 55268-75-2 21829-25-4 525-66-6

3, 7 days 3, 7 days 3, 7 days 3, 7 days 3, 7 days 3, 7 days 3, 7 days 1, 3 days 1, 3 days 1, 3 days 1, 3, 7, 14 days 1, 3, 7, 14 days 1, 3, 7, 14 days

4-(Methylnitrosamino)-1(3-pyridyl)-1-butanone 2-Acetylaminoﬂuorene N-Nitrosopiperidine Methylenedianiline Methylcarbamate Acetamide Dehydroepiandrosterone Ethionine Acetaminophen Cyproterone acetate Clonidine Prazosin Ibuprofen Allyl alcohol 1,4-Dichlorobenzene 3-Methylcholanthrene

NNK

64091-91-4

7, 14 days

2-AAF NPip MDA Mcarb AA DHEA ETH Aap CPA Clon Praz Ibup AlAl DCB 3-MC

53-96-3 100-75-4 101-77-9 589-55-0 60-35-5 53-43-0 67-21-0 103-90-2 427-51-0 4205-90-7 19216-56-9 15687-27-1 107-18-6 106-46-7 56-49-5

1, 3, 7, 14 days 1, 3, 7, 14 days 1, 3, 7, 14 days 1, 3, 7, 14 days 1, 3, 7, 14 days 1, 3, 7, 14 days 1, 3, 7, 14 days 6, 12, 24, 48 h 1, 3, 7, 14 days 1, 3, 7, 14 days 1, 3, 7, 14 days 1, 3, 7, 14 days 1, 3, 7, 14 days 1, 3, 7, 14 days 1, 3, 7, 14 days

previously (Ellinger-Ziegelbauer et al., 2008). See Table 1 for details on compounds used in this study. The experimental data is available as Supplementary ﬁle in S2. 2.2. Linear network model

as those are believed to be potent biomarkers. To this end, the characteristic expression value of stressed cells is decomposed into the characteristic value in the controls x¯ i,control and a characteristic shift xi,char . x¯ i,stress = x¯ i,control + xi,char

We view the measured expression value of a probeset as the sum of a ‘true’ or characteristic expression value given by the current cell state and a mixture of biological noise. Strictly speaking the noise component contains biological and technical noise. As the relative inﬂuence of technical noise can be expected to decrease with increasing expression level, only the 4000 probesets which had the largest mean expression value across all experiments in the training set were included in this analysis. Due to coregulation of genes we propose to model the vector of characteristic expression values x¯ via a linear model. x¯ = A · u with A denoting the pairwise coexpression matrix of the probesets and u a vector of perturbations. The matrix A is estimated using the covariance matrix of probesets inferred from multiple measurements (Gardner et al., 2003) of cells exposed to the same type of stress. This way linear models are constructed for controls and stressed cells. Assuming that the coregulation is almost identical in stressed and unstressed cell populations leads to identical matrices A for both cases with different perturbation vectors ucontrol and ustress , respectively. x¯ i,control = A · ucontrol

(3)

x¯ i,stress = A · ustress

(4)

For this study A was inferred from multiple measurements of the stressed cells. The focus of this analysis was to identify probesets explaining the expression shift between control and stressed cells

(5)

x¯ i,control is estimated using the mean expression value of probeset i in all measurements from the control group and likewise x¯ i,stress using only measurements from the stressed group. xi,char is then given as the difference of these two values. Eqs. (3)–(5) then give a linear model for the characteristic shift component: xi,char = x¯ i,stress − x¯ i,control = A · (ustress − ucontrol ).

(6)

uchar

The aim of our analysis of gene expression data is to identify genes characterizing changes in the expression values between control and stressed cells. There are two types of possible changes. On the one hand the cell can up- or downregulate already activated pathways and on the other hand it might be necessary for the cell to completely activate or deactivate cellular functionality to cope with new environmental conditions. 2.3. Selection of attractor probesets It was recently shown that a cell’s stress response is dominated by only a few modes (López-Maury et al., 2008), i.e. that only a few columns of A allow sufﬁcient description of x. As up- or downregulation does not change the structure of the coregulation matrix A the model described by Eq. (6) can be used to identify those few probesets sufﬁcient to characterize the cellular stress response. This is performed by searching for the smallest model that sufﬁciently explains the compound induced expression shift and not explaining noise contained within the data. This model-selection

S. Schneckener et al. / Computational Biology and Chemistry 34 (2010) 193–202

197

compound class. The network output is a numerical value at each classiﬁer node. The larger this numerical value is the more likely do the input values represent a given compound class. The neural network was trained with the software “NN-Tool” (Bärmann, 2009) which uses the FBFBK-algorithm described in Bärmann and Biegler-König (1992). Compared to using “backpropagation” for training neural networks it shows superior convergence speed, higher reproducibility and reduced sensitivity to parameter initialization. All weights were initialized with 0. The training was performed with a set of 95 experiments each with 94 expression values. Fivefold cross-validation was used to estimate the optimal number of nodes in the hidden layer. Attention was paid to ensure that all experiments of a certain compound were grouped within one CV-set to avoid overﬁtting for a compound. This optimized network structure was trained with the full training set. The neural network classiﬁer was evaluated with an independent expression data set. Per experiment, the classiﬁer node with highest value was selected as the predictor of the toxicity class. Per compound, the prediction with the highest output value among all experiments was selected as compound prediction. Fig. 2. Error of elastic response network model for genotoxic stress. The dependency of the residual sum of squares (RSS, blue line) and its change compared to the preceding model (black dotted line) on the number of genes n in the model The cut off is indicated by a red vertical line.

task is tackled by ﬁrst running a method known as LARS (Efron et al., 2004). This method, applicable to linear models, is a highly efﬁcient means to create a sequence of nested models by sequentially adding the variable having the largest correlation with the residuals of the previous model and identifying all parameters of the resulting extended linear model. For any such model the predictive accuracy measured via the RSS value is computed. By construction, the RSS is monotonically decreasing in the number of variables. To ﬁnd a model which sufﬁciently explains the expression values of the cells the resulting RSS vs. n curve is searched for the minimum of the ﬁnite differences of the RSS values subject to application speciﬁc assumptions on the minimum model complexity (4 in this study). This point is visually distinct as a kink in the RSS vs. n curve. See Fig. 2 for an example. The probesets of the selected model are called attractor genes and are potential biomarkers for the type of stress.

3. Results Attractor and outlier probesets were estimated for each class of compounds using a linear elastic response model of gene expression as described in the methods section. The discrepancy between the measured gene expression and the network model, measured by its residual sum of squares (RSS) value, was calculated with respect to the allowed number of attractor genes n. Fig. 2 shows an example of the relationship between the model quality and number of genes. We chose a cut off for the number of genes by visual inspection of the RSS vs. n plots such that the beneﬁt on the RSS by an additional probeset signiﬁcantly declined. The number of genes picked were 6, 14, and 8 for the genotoxic (G) carcinogen, non-genotoxic (NG) carcinogen and non-hepatocarcinogen (NH) data sets, respectively. Fig. 3 displays the measured normalized expression shift xchar / (as shortcut for (xi,char /i )i and the normalized model prediction A/ · u (as shortcut for (Aij /i )i,j · u) for that probeset.

2.4. Outlier identiﬁcation Activation or deactivation of cellular functionality leads to structural changes in the coexpression matrix (Kostka and Spang, 2004; Fujita et al., 2008) and therefore, A will be locally different between controls and stressed cells. Thus it can not be assumed that the simple linear model (6) is sufﬁcient to explain all changes between stressed cells and controls. Consequently, not only attractors but also those probesets not fulﬁlling the linear model are of importance and characterize the response of the cell. The probesets which are not compatible with the model will be called outliers. They are identiﬁed by performing outlier statistics on the model’s normalized residuals. According to the theory of least-squares these residuals are i.i.d. samples from a univariate normal distribution. The analysis assigns each residual a p-value under this distribution. All probesets with assigned p-value less than 0.001 were identiﬁed as outliers. 2.5. Attractor and outlier genes as classiﬁers In the present study, an artiﬁcial neural network (ANN) was trained. Gene expression values of the selected probesets were employed as input to the neural network. The ANN had a hidden layer and three output nodes. Each output node is assigned to a

Fig. 3. Linear response model for genotoxic stress—outliers and attractors. This visualization of the network model displays for each gene the measured normalized expression shift xi /i at the y-axis and the normalized model prediction A/ · u (as shortcut for (Aij /i )i,j · u) at the x-axis. Grey dots and blue stars denote genes which are, within a error margin, correctly approximated by the model. Blue stars denote attractor genes Gu , and red crosses denote outlier genes.

198

S. Schneckener et al. / Computational Biology and Chemistry 34 (2010) 193–202

Table 2 Number of attractor and outlier genes per model.

Number of attractor genes Number of outlier genes Number of differentially expressed genes

Genotoxic compounds

Non-genotoxic compounds

Non-hepatocarcinogenic compounds

6 32 1136

14 12 1423

8 24 291

Each compound class had its speciﬁc model with distinct attractor and outlier genes. The number of differentially expressed genes between the treatment and the control group is substantially larger than the outlier/attractor group and overlap exists. Differentially expressed genes are identiﬁed by a t-test, p < 0.0001, without correction for multiple testing.

Table 3 Summary of classiﬁcation results. Compound

‘True’ classiﬁcation

Predicted G

Compound prediction

Benchmark performance

NNK 2AAF Npip MDA

G G G G

6 4 10 0

Predicted NG 0 1 0 4

Predicted NH 0 7 2 8

G G G NG

TP TP TP FP

Mcarb AA DEHA ETH Aap CPA

NG NG NG NG NG NG

0 0 0 3 0 0

3 4 12 8 8 12

9 8 0 4 4 0

NH NG NG NG NG NG

FN TP TP TP TP TP

Clon Praz Ibup AlAl DCB 3-MC

NH NH NH NH NH NH

0 0 0 0 0 0

0 0 0 0 1 2

12 12 12 12 11 10

NH NH NH NH NH NH

TP TP TP TP TP TP

Comparison of the accepted, literature based classiﬁcation of compounds, with the predicted classiﬁcation. The numbers refer to the number of experiments which were predicted for a certain toxicity class. Best prediction is the prediction with the highest value among all experiments for a compound. Compound prediction aggregates all predictions to a compound prediction by selecting the best (i.e. largest output value) single prediction. The last column compares compound and true classiﬁcation. For compound abbreviations consult Table 1. Other abbreviations: G, genotoxic hepatocarcinogen; NG, non-genotoxic hepatocarcinogen; NH, non-hepatocarcinogen; FP, false positive, FN; false negative, TP; true positive.

One network model for each class (genotoxic, non-genotoxic, and non-hepatocarcinogen) was estimated. Each model has a different number of biomarkers: Six attractors and 32 outliers for the genotox model, 14 attractors and 12 outlier for the non-genotox model, and 8 attractors and 24 outliers for the non-hepatocarcinogen model. All biomarker probesets and their mapping to genes are listed with additional annotation details in Appendix A. Table 2 summarizes the number of attractor and outlier probesets for each network model and compound class, respectively. Only two probesets were shared across more than one network model. The total number of non-redundant biomarkers is 94. This is the set of the most informative genes and was used for the training of the classiﬁers. The number of genes that describe a network model is compared to the number of differentially expressed genes as calculated from a Student’s t-test, without correction for multiple testing. The set of differentially expressed genes between stress condition and control is substantially larger then the set of attractor/outlier genes. The overlaps between the sets of differentially expressed genes consists of 660 genes (G vs. NG), 88 genes (G vs. NH), 123 genes (NG vs. NH), whereas 57 genes are in common for all three classes. 3.1. Classiﬁer A three layer artiﬁcial neural network (ANN) was used as a classiﬁer for each experiment. The ANN had 94 inputs, namely the log expression values of the selected probesets Gu and Go of all toxicity classes, a hidden layer with two neurons and the output layer with one neuron for each compound class. The ANN was trained on expression proﬁling experiments representing 13 compounds and was evaluated on a validation set consisting of experiments representing 16 additional compounds. Each compound was measured

at two to four time points, and each measurement was performed in up to three replicas, i.e. using three independent animals. It is not expected that a speciﬁc toxic effect can be observed at arbitrary time points, but the best time point for a measurement is not known a-priori. Therefore, each compound was assigned to the class having the largest output value over all replicas and timepoints. Table 3 summarizes the predictions for all validation compounds. Each compound was predicted twelve times, with the exception of 4-(methylnitrosamino)-1-(3-pyridul)-1-butanone (NNK) and ethionine (ETH) where six respectively 15 experiments, and predictions, were performed. Two compounds, methylcarbamate (Mcarb) and methylenedianiline (MDA) were misclassiﬁed but the majority of 14 out of 16 compounds was correctly classiﬁed. The accuracy of the compound-wise classiﬁcation is 88%. Analyzing the classiﬁcations on a per-experiment basis showed a still remarkable 72% accuracy on the independent validation set. 4. Discussion Our work was based on expression proﬁles of stressed and unstressed liver tissue. From gene expression data a linear elastic network model was calculated and biomarkers were derived. The validation of the elastic response network model is twofold: (A) The biological role of the biomarkers is examined, in particular in comparison with a-priori expectations for the different compound classes. (B) Biomarkers are used to build classiﬁers of toxicity classes. The classiﬁer is validated on an independent data set. 4.1. The biological plausibility of the elastic network models Attractor genes of the elastic network model can be used to explain the majority of changes in the gene expression after the

S. Schneckener et al. / Computational Biology and Chemistry 34 (2010) 193–202

exposure to xenobiotic compounds. Some genes, model outliers, do not follow the predicted expression shift. This is possibly due to a stress-speciﬁc regulation program of the cell that leads to an atypical or non-linear response in the expression of certain genes. Supplementary Table S1 lists both types of biomarkers for all three types of stress. The expected biological response upon stress exposure has several differences: Genotoxic carcinogens may induce biological processes related to oxidative stress/DNA damage (Ellinger-Ziegelbauer et al., 2008), and cell cycle control (Shackelford et al., 1999). For non-genotoxic carcinogens increased mitogenesis, decreased apoptosis, interference with gap junction intercellular communication, interference with tubulin polymerization (Nie et al., 2006), immunosuppression, and/or disturbance of endocrine functions (Ellinger-Ziegelbauer et al., 2008) are expected. Non-hepatocarcinogenic compounds are recognized as xenobiotics and are expected to interact at least with their pharmacological targets present in the liver. However, the response should be subtle due to the dosing regime that mimics pharmacological concentrations. In addition, the response should be compound speciﬁc due to different modes of action. 4.1.1. Genotoxic carcinogens Table S1 shows the biological annotation of attractor and outlier genes. The genotoxic stress response model is characterized by genes related to oxidative stress and DNA damage response represented by up-regulation of target genes of the p53 transcription factor (CDKN1A (p21, Cyclin-dependent kinase inhibitor 1A) (Hattinger et al., 2002), CCNG1 (Cyclin G1) (Ohtsuka et al., 2004), PHLDA3 (Pleckstrin homology-like domain family A member 3)) and cell cycle regulation (RHOB (Rhos homolog gene family, member B) (Ashburner et al., 2000), CCNG1, and CDKN1A) which is in good concordance with the expectations. In addition several genes related to a detoxiﬁcation response (Abcb1a/Abcb1b (Pglycoprotein 1; MDR), Aldh1a1 (Aldehyde dehydrogenase family 1, A1), Cyp2c12 (Cytochrome P450 2C12), Akr7a3 (Aldo-keto reductase family 7, A3) (Bodreddigari et al., 2008), Abcc2 (Multidrug resistance-associated protein 2), Sult1b1 (Sulfotransferase family 1B, 1), Ces2 (Carboxylesterase 2)) are expressed at increased levels. Down-regulation of several genes encoding metabolic enzymes (LOC368066 (similar to indolethylamine N-methyltransferase), AKR1C6 (aldo-keto reductase family 1, member C6), HSD17B9 (hydroxysteroid (17-beta) dehydrogenase 9), DPYS (dihydropyrimidinase), PAPSS2/ATPSK2 (Similar to ATP sulfurylase/APS kinase 2), HEBP1/HBP (heme binding protein 1 (predicted))) likely indicate dedifferentiation. Also, up-regulated genes with functions in cell adhesion, cell migration, and cytoskeletal regulation (LIMK (LIM motif-containing protein kinase 2) TUBB2C (Tubulin beta 2C) and HYAL3 (hyaluronoglucosaminidase 3)) are part of this model. Overall, the observed genes are coherently describing the possible response of the liver to genotoxic stress: Detoxiﬁcation of the agent is induced, and DNA damage is counteracted by signs of DNA damage response genes. Cell cycle progression is tightly controlled in cells with DNA damage to prevent proliferation. As additional response to damage, the hepatocytes start to dedifferentiate. 4.1.2. Non-genotoxic carcinogens The derived biomarkers of the non-genotoxic stress linear model are characteristic for regeneration, antiproliferative response, dedifferentiation, and reduced inﬂammatory competence. Five genes related to regeneration/cell cycle progression are up-regulated: CDK4 (Cyclin dependent kinase 4), Eif3s9 (Eukaryotic translation initiation factor 3 subunit 9), Eif5a, Eif2s2, Strap (serine/threonine kinase receptor associated protein), and Lsmd1 (RGD1559617 pre-

199

dicted, LSM domain containing 1). These genes are mostly involved in transcription and translation. The initial regeneration may be paralleled or even supported by a concurrent dedifferentiation as evidenced by the down-regulation of enzymes or membrane transporters vital to the function of hepatocytes, e.g. Mgat4b (Mannoside acetylglucosaminyltransferase 4B), Slco1b2 (Solute carrier organic anion transporter 1b2), and Cyp2t1 (Cytochrome P450 monooxigenase). This is possibly controlled by an antiproliferative response as indicated by an up-regulation of Phb (Prohibitin) and Rbaf600 (ZUBR1 zinc ﬁnger, UBR1 type 1). It is interesting to note that the decreased expression of genes related to blood coagulation was relevant to the non-genotox network model. In particular, F2 (coagulation Factor II), F10 (coagulation Factor X), and Plg (Plasminogen) were genes in the outlier set of the non-genotox net model and consistently down-regulated in the treated animal group. This is possibly indicative for a reduced inﬂammatory competence, a hypothesis supported by the observation that two further genes with functions in an inﬂammatory response, Faah (Fatty acid amide hydrolase) (Cravatt and Lichtmann, 2002) and Cfb (Complement factor B) are down-regulated and among the outlier genes. While the biomarkers can be interpreted as important for a response to non-genotoxic carcinogenic compounds, some aspects could not be observed, e.g. increased mitogenesis. However, the set of training compounds are not complete in the coverage of all possible modes of action for carcinogenic substances. Hence not all categories of how a substance can trigger uncontrolled cell growth are expected to be covered. The listed genes are a conservative view; many detected genes are poorly annotated with respect to biological pathways. In addition, several probesets map to genes that lack any description. 4.1.3. Non-hepatocarcinogenic compounds The response when exposed to non-hepatocarcinogenic compounds is subtle, due to a dosing comparable to a dosing at a safe therapeutic level aiming for minimal side effects. In addition, the biological response is very diverse due to a variety of different modes of action. Targets for the three non-hepatocarcinogenic compounds CFX, Nif, and Prop are bacterial transpeptidases, calcium channels, and beta adrenoreceptors, respectively. The derived biomarkers did not display a clear pattern but instead each gene, while to a certain degree plausible, was representing different aspects of the cellular responses. An ordinary statistical analysis of differential gene expression between the treated and the control state provided additional evidence for a more restrained response to the exposure to non-hepatocarcinogenic compounds: It was observed that upon genotoxic stress, 1136 out of 4000 genes were differentially expressed (p < 0.001 in a Student’s t-test, not corrected for multiple testing). Upon non-genotoxic carcinogenic stress 1423 out of 4000 genes showed differential expression (p < 0.001). This is a substantial fraction and demonstrates the broad global response upon toxic stress. However, upon exposure to nonhepatocarcinogenic compounds only 291 out of 4000 genes were differentially expressed (p < 0.001). 4.2. Quality of classiﬁers based on genes selected by elastic response network models Attractor and outlier genes are postulated to be characteristic for the type of stress applied and therefore possibly suitable biomarkers to build classiﬁers. Classiﬁers were trained with the attractor and outlier genes of all three compound classes and the expression data of those genes from the set of training experiments. As demonstrated, the classiﬁer performed well for a three class classiﬁcation problem of the three toxicity classes. Overall, for 72% of all valida-

200

S. Schneckener et al. / Computational Biology and Chemistry 34 (2010) 193–202

tion experiments the correct toxicological class was predicted. This is better than expected considering that gene expression measurements were spread over a long time period, ranging from 6 h to 14 days. This reﬂects a real world challenge for new, unclassiﬁed compounds, where the optimal time for measurement is not known in advance. The time from animal feeding to seeing full physiological effects in the liver is different between rat strains, compounds, and doses, and as such unknown for new substances. Aggregating all predictions for a certain validation compound to a single prediction improves the accuracy to 88%—only two out of 16 compounds were wrongly predicted. Methylenecarbamate (Mcarb) was tested negative in the Ames test for genotoxicity, is known to be mutagenic in Drosophila, a carcinogen in rat but a non-carcinogen in mouse (Foureman et al., 1994). Due to the negative Ames test and the carcinogenicity in rat Mcarb was classiﬁed here as non-genotoxic carcinogen. The prediction was – wrongly – non-hepatocarcinogen. Methylenedianiline (MDA) is generally classiﬁed as genotoxic in literature, but was wrongly predicted as non-genotoxic carcinogen. The classiﬁcation of compounds as genotoxic carcinogen, nongenotoxic carcinogen, and non-hepatocarcinogen is very useful from a pre-clinical development perspective. However, a classiﬁcation based on the biological response is challenged by the multitude of events that may result in the same clinical readout. For example, non-genotoxic carcinogens may act via the inhibition of spindle formation, or through decreased apoptosis. Both modes of action are fundamentally different and may result in different cellular responses. If a certain cellular response was not encountered during the training phase of a classiﬁer, the predictive power is restricted. 4.2.1. Comparison of classiﬁcations with other computational approaches Several alternative approaches to select genes to build classiﬁers of cellular stress have been evaluated previously (EllingerZiegelbauer et al., 2008). In a knowledge based approach, 141 genes were selected based on biological insight, e.g. genes related to cell survival/proliferation or oxidative protein damage response (Ellinger-Ziegelbauer et al., 2005). The accuracy of the classiﬁer based on those 141 genes was 63% per experiment and 81% for predictions aggregated on compound levels. A support vector machine (Schölkopf and Smola, 2001) with a linear kernel and a penalty parameter C set to 10 was based on 512 genes and accomplished an accuracy of 70% per experiment and 88% per compound. Recursive feature elimination (RFE, Guyon et al., 2002) was applied to select 256 genes. Here, the accuracy per experiment was 73% and per compound 88%. Using an analysis of variance (ANOVA) achieved an accuracy of 73% per experiment and 75% per compound. The overall performance of the classiﬁer based on the presented methodology is on par with previously published classiﬁers, but is based on substantially less probesets. This is true for accuracy per experiment and per aggregated compound level (72 and 88%, respectively). The network based approach is superior to all other methods in terms of sensitivity (Fig. 4). It classiﬁes the compounds with a mean sensitivity of 0.897 over all three toxicity classes compared to 0.870 for SVM and RFE, 0.803 for the knowledge based approach and 0.753 for ANOVA. The speciﬁcity of the network method is only slightly inferior to SVM and RFE (0.940 vs. 0.943) but superior compared to ANOVA (0.877) and the knowledge based approach (0.907). In a detailed analysis of the different classiﬁcation approaches, it can be observed that the same compound is prevalently misclassiﬁed across several methods. It can be speculated, that these compounds possess properties which have not been observed in the training set, making it challenging for any approach to predict them correctly.

Fig. 4. Sensitivity and speciﬁcity of different classiﬁcation approaches. Comparing the mean sensitivity and speciﬁcity values over genotoxic, non-genotoxic and nonhepatocarcinogen compounds demonstrates that using the network based approach improves the sensitivity to 0.897 compared to the optimum of 0.870 for RFE and SVM. The speciﬁcity of these three methods is almost identical (0.940 for the network based approach and 0.943 for SVM and RFE). The ANOVA and knowledge based approaches showed decreased sensitivity and speciﬁcity.

5. Conclusion Here a new method to identify biomarkers for speciﬁc stress states was developed. This method is based on describing global gene expression in terms of a linear network model. Validating this approach was two fold. It is claimed that the set of genes that deﬁne the network model, Gu , and the outliers to the model, Go , in conjunction deﬁne the response of the cell. If this is true, it shall be expected that the set of genes is biologically meaningful for the type of stress exerted. It is therefore assumed that the set of genes is suited to predict this particular type of stress and to differentiate from other stress types. The ﬁrst assumptions could be conﬁrmed by review of the observed gene sets; a remarkably high number of genes that are related to speciﬁc aspects of genotoxic and non-genotoxic carcinogenic stress could be observed. The second assumption was conﬁrmed by the satisfying prediction accuracy of 72% per experiment (p = 1.2 × 10−5 under random classiﬁcation of experiments) and 88% accuracy per compound (p < 10−6 under random classiﬁcation of compounds). The methodology has the potential to illuminate the principal biological response to a stress factor by focusing on the most relevant subset of genes. The strength of the methodology is that the correlation structure of gene expression is taken into consideration. As a result of building a linear network model of gene expression, the set of attractor and outlier genes constitute potent biomarkers. Acknowledgements Dr. Thomas Mrziglod supported the authors in the use of artiﬁcial neural networks. John Vitovsky kindly proof-read the manuscript. We acknowledge ﬁnancial support by the Bundesministerium für Bildung und Forschung (BMBF) for funding parts of this work through the HepatoSys network. Appendix A. Derivation of the elastic network model The current expression proﬁle x for all n genes of a cell is assumed to be a linear response of the cell to internal (e.g. processes

S. Schneckener et al. / Computational Biology and Chemistry 34 (2010) 193–202

required for survival of the cell) and external (e.g. toxic compounds) stress factors u invoked upon different genes with ui denoting the stress for gene i. Due to coregulation of genes a localized stress can alter the expression of many other genes. This dependence structure is given by a response matrix A. Thus the current expression is modeled via x =A·u

x: {x1 , . . . , xn } the vector of expression values of all n genes. A: the elastic response matrix. It is a symmetric n × n matrix, where the components aij quantitatively describe the mean quasi-elastic response of the gene i on a stress uj applied on gene j. A therefore represents the cell’s response to external stress. u: {u1 , . . . , un } the vector describing the stress acting on each gene caused by the external stress factors. The (i, j) th element Aij of A is nonzero if the genes i and j are coexpressed under the current stress conditions. Let {∗k } be the set of eigenvalues of A and {k } the corresponding set of eigenvectors. Then the stiffness of the network (in direction k ) can be expressed by the inverse k of the k th eigenvalue ∗k . Expressing x in terms of these eigenvectors gives: x=

1 k

bution of the energy follows the Boltzmann distribution, we get for the variance of the elongations in direction of eigenvector k:

k

k

k k , u

(A.2)

k

d = A + (t) dt

(A.3)

i

T

i,

(A.4)

k

The stochastic external stress (t) can be interpreted as coupling of the system to a heat bath with temperature T, corresponding to the intensity of the noise. Then, according to the laws of statistical thermodynamics (Honerkamp, 1998), the variation over the time of the stochastic elongations of the system in direction of the k th eigenvector of A will follow a Boltzmann distribution with the probability density (without normalization): p(E) = e−E/T

2k exp ⎝−

1 2

⎞ j 2n ⎠ d =

j=1,...,n

1 k

(A.7)

=

k

j T

(ki )

1

=

2

(A.8)

j

k

(ki k )

(A.9)

with ki denoting the ith component of the k th eigenvector. This results in

i , j = i j corT (i , j )

(A.10)

with corT denoting sample correlation. Applying this to A.2 we got for the elongation xi due to non-random, stationary, tensile stress:

xj =

∗k k u, k j

k

∗k k j

k

xj =

ui

i

(A.11)

ui ki

(A.12)

∗k ki k

(A.13)

i j

k

Substitution A.8 in A.14 and using the correlation of the noise induced elongations around steady states A.10, we get for the expected expression shift of the gene i:

xj =

k = , k

k

with

1 = Z

k

xj =

(A.6)

⎛

1

2

The shift component results in a distribution of local stress on possibly large parts of the network. The intensity of the distributed stress can be expressed in analogy to elongated elastic networks as an “elastic” energy E of the system, which is given by 1 E= k 2k 2

T

=0

with normalization constant Z and T donting sample means. From the variations 2k we calculate the statistical moments of the distributions for the noise induced elongations i of the single genes around the steady state by projection on the eigenvectors:

xj =

Let us assume that the system is coupled to changes in environmental conditions inducing an additive, time dependent stress (t) on the system (in analogy to the Brownian motion of particles). The stress (t) shall be a stochastic process in time with the characteristic of white noise. It induces a dynamic shift component around a steady state, whose dynamics is given by the solution of the Langevin equation (Honerkamp, 1998):

T

2

(A.1)

with

201

1 i

ui i , j

T

1 ui i corT (i , j ) j

(A.14)

(A.15)

i

Thus we have found a substitution for the unknown matrix A by the covariance matrix for all gene pairs, where at least one gene is involved in the stress response reaction. If the number of genes involved in the stress response is small (often the case if differences between two states are investigated) compared to the total number of genes, Eq. (A.14) provides a reduction of complexity. Appendix B. Supplementary data

(A.5)

If the environmental changes show a very high variability, we can assume that the system is ergodic. Then a randomly selected ﬁnite, sufﬁciently large, set of m samples will loose the dynamic information, but the ensemble will represent the main features of the entire set of variations of the systems which are relevant for the response behavior of the system on external stress. Because of the ergodicity, the set k1 ,l=1,...,m for the elongations in direction of the eigenvector k1 will be uncorrelated to the set k2 ,l=1,...,m for the elongations in direction of eigenvector k2 . Moreover, without loss generality, for the further analysis we set T = 1. Due to ergodicity, the energy in the system is equally distributed on the elongations in direction of all eigenvectors. Since the distri-

Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.compbiolchem.2010.06.004. References Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G., 2000. Gene ontology: tool for the uniﬁcation of biology. the gene ontology consortium. Nat. Genet. 25 (May (1)), 25–29. Bärmann, F., 2009. Nn-tool. http://www.baermann.de/. Bärmann, F., Biegler-König, F., 1992. On a class of efﬁcient learning algorithms for neural networks. Neural Netw. 5 (1), 139–144. Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I.F., Soboleva, A., Tomashevsky, M., Marshall, K.A., Phillippy, K.H., Sherman, P.M.,

202

S. Schneckener et al. / Computational Biology and Chemistry 34 (2010) 193–202

Muertter, R.N., Edgar, R., 2009. Ncbi geo: archive for high-throughput functional genomic data. Nucleic Acids Res. 37 (January (Suppl. 1)), D885–D890, http://dx.doi.org/10.1093/nar/gkn764. Benjamini, Y., Hochberg, Y., 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B: Methodol. 57 (1), 289–300. Blais, A., Dynlacht, B.D., 2005. Constructing transcriptional regulatory networks. Genes Dev. 19 (July (13)), 1499–1511. Bodreddigari, S., Jones, L.K., Egner, P.A., Groopman, J.D., Sutter, C.H., Roebuck, B.D., Guengerich, F.P., Kensler, T.W., Sutter, T.R., 2008. Protection against aﬂatoxin b1-induced cytotoxicity by expression of the cloned aﬂatoxin b1-aldehyde reductases rat akr7a1 and human akr7a3. Chem. Res. Toxicol. 21 (May (5)), 1134–1142. Brivanlou, A.H., Darnell, J.E., 2002. Signal transduction and the control of gene expression. Science 295 (February (5556)), 813–818. Cravatt, B., Lichtmann, A., 2002. The enzymatic inactivation of the fatty acid amide class of signaling lipids. Chem. Phys. Lipids 121 (December (1–2)), 135–148. Efron, B., Hastie, T., Johnstone, L., Tibshirani, R., 2004. Least angle regression. Ann. Stat. 32, 407–499. Ellinger-Ziegelbauer, H., Gmuender, H., Bandenburg, A., Ahr, H.J., 2008. Prediction of a carcinogenic potential of rat hepatocarcinogens using toxicogenomics analysis of short-term in vivo studies. Mutat. Res. 637 (January (1–2)), 23–39. Ellinger-Ziegelbauer, H., Stuart, B., Wahle, B., Bomann, W., Ahr, H.J., 2005. Comparison of the expression proﬁles induced by genotoxic and nongenotoxic carcinogens in rat liver. Mutat. Res. 575 (August (1–2)), 61–84. Foureman, P., Mason, J.M., Valencia, R., Zimmering, S., 1994. Chemical mutagenesis testing in drosophila. ix. Results of 50 coded compounds tested for the national toxicology program. Environ. Mol. Mutagen. 23 (1), 51–63. Fujita, A., Gomes, L., Sato, J., Yamaguchi, R., Thomaz, C., Sogayar, M., Miyano, S., 2008. Multivariate gene expression analysis reveals functional connectivity changes between normal/tumoral prostates. BMC Syst. Biol. 2 (2), 106–119. Gardner, T.S., di Bernardo, D., Lorenz, D., Collins, J.J., 2003. Inferring genetic networks and identifying compound mode of action via expression proﬁling. Science 301 (July (5629)), 102–105. Girardot, F., Monnier, V., Tricoire, H., 2004. Genome wide analysis of common and speciﬁc stress responses in adult drosophila melanogaster. BMC Genomics 5 (September (1)), 74–89. Gordon, A., Glazko, G., Qiu, X., Yakovlev, A., 2007. Control of the mean number of false discoveries, bonferroni and stability of multiple testing. Ann. Appl. Stat. 1 (September(1)), 179–190. Guyon, I., Weston, J., Barnhill, S., Vapnik, V., 2002. Gene selection for cancer classiﬁcation using support vector machines. Mach. Learn. 46 (January (1–3)), 389–422. Hattinger, C.M., Jochemsen, A.G., Tanke, H.J., Dirks, R.W., 2002. Induction of p21 mrna synthesis after short-wavelength uv light visualized in individual cells by rna ﬁsh. J. Histochem. Cytochem. 50 (January (1)), 81–89.

Honerkamp, J., 1998. Statistical Physics, 1st edition. Springer. Igarashi, T., Nakajima, Y., Ohtake, S., 1977. Antihypertensive effect of combined treatment with alpha- and beta-adrenergic blockers in the spontaneously hypertensive rat. Jpn. Circ. J. 41 (August (8)), 903–911. Kadota, K., Nakai, Y., Shimizu, K., 2008. A weighted average difference method for detecting differentially expressed genes from microarray data. Algorithms Mol. Biol. 3 (June), 1–12. Khil, P.P., Camerini-Otero, R.D., 2002. Over 1000 genes are involved in the dna damage response of Escherichia coli. Mol. Microbiol. 44 (April (1)), 89–105. Kostka, D., Spang, R., 2004. Finding disease speciﬁc alterations in the co-expression of genes. Bioinformatics 20 (August (1)), 194–199. Kubo, T., Fujie, K., Yamashita, M., Misu, Y., 1981. Antihypertensive effects of nifedipine on conscious normotensive and hypertensive rats. J. Pharmacobiodyn. 4 (4), 294–300. López-Maury, L., Marguerat, S., Bähler, J., 2008. Tuning gene expression to changing environments: from rapid responses to evolutionary adaptation. Nat. Rev. Genet. 9 (August (8)), 583–593. Luscombe, N.M., Babu, M.M., Yu, H., Snyder, M., Teichmann, S.A., Gerstein, M., 2004. Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431 (September (7006)), 308–312. Nie, A.Y., McMillian, M., Parker, J.B., Leone, A., Bryant, S., Yieh, L., Bittner, A., Nelson, J., Carmen, A., Wan, J., Lord, P.G., 2006. Predictive toxicogenomics approaches reveal underlying molecular mechanisms of nongenotoxic carcinogenicity. Mol. Carcinog. 45 (December (12)), 914–933. Ohtsuka, T., Jensen, M.R., Kim, H.G., Kim, K.T., Lee, S.W., 2004. The negative role of cyclin g in atm-dependent p53 activation. Oncogene 23 (July (31)), 5405–5408. Schölkopf, B., Smola, A.J., 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning). The MIT Press. Shackelford, R.E., Kaufmann, W.K., Paules, R.S., 1999. Cell cycle control, checkpoint mechanisms, and genotoxic stress. Environ. Health Perspect. 107 (February), 5–24. Skosyreva, A.M., Akhtamova, Z.M., Golovanova, I.V., Iavorskii˘, A.N., 1986. Effect of cefuroxime on embryonic and fetal development in an experiment. Antibiot. Med. Biotekhnol. 31 (April (4)), 277–280. Storey, J.D., Tibshirani, R., 2003. Statistical signiﬁcance for genomewide studies. PNAS 100 (August (16)), 9440–9445. Stuart, J.M., Segal, E., Koller, D., Kim, S.K., 2003. A gene-coexpression network for global discovery of conserved genetic modules. Science 302 (October (5643)), 249–255. Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P., 2005. Gene set enrichment analysis: a knowledge-based approach for interpreting genomewide expression proﬁles. PNAS 102 (October (43)), 15545–15550.

Lihat lebih banyak...

An elastic network model to identify characteristic stress response genes

Descripción

Comentarios