Classification of dendritic cell phenotypes from gene expression data

Tuana, Giacomo; Volpato, Viola; Ricciardi-Castagnoli, Paola; Zolezzi, Francesca; Stella, Fabio; Foti, Maria

doi:10.1186/1471-2172-12-50

Research article
Open access
Published: 29 August 2011

Classification of dendritic cell phenotypes from gene expression data

Giacomo Tuana¹,
Viola Volpato^2,4,
Paola Ricciardi-Castagnoli³,
Francesca Zolezzi^1,3,
Fabio Stella² &
…
Maria Foti^1,4

BMC Immunology volume 12, Article number: 50 (2011) Cite this article

10k Accesses
9 Citations
Metrics details

Abstract

Background

The selection of relevant genes for sample classification is a common task in many gene expression studies. Although a number of tools have been developed to identify optimal gene expression signatures, they often generate gene lists that are too long to be exploited clinically. Consequently, researchers in the field try to identify the smallest set of genes that provide good sample classification. We investigated the genome-wide expression of the inflammatory phenotype in dendritic cells. Dendritic cells are a complex group of cells that play a critical role in vertebrate immunity. Therefore, the prediction of the inflammatory phenotype in these cells may help with the selection of immune-modulating compounds.

Results

A data mining protocol was applied to microarray data for murine cell lines treated with various inflammatory stimuli. The learning and validation data sets consisted of 155 and 49 samples, respectively. The data mining protocol reduced the number of probe sets from 5,802 to 10, then from 10 to 6 and finally from 6 to 3. The performances of a set of supervised classification models were compared. The best accuracy, when using the six following genes --Il12b, Cd40, Socs3, Irgm1, Plin2 and Lgals3bp-- was obtained by Tree Augmented Naïve Bayes and Nearest Neighbour (91.8%). Using the smallest set of three genes --Il12b, Cd40 and Socs3-- the performance remained satisfactory and the best accuracy was with Support Vector Machine (95.9%). These data mining models, using data for the genes Il12b, Cd40 and Socs3, were validated with a human data set consisting of 27 samples. Support Vector Machines (71.4%) and Nearest Neighbour (92.6%) gave the worst performances, but the remaining models correctly classified all the 27 samples.

Conclusions

The genes selected by the data mining protocol proposed were shown to be informative for discriminating between inflammatory and steady-state phenotypes in dendritic cells. The robustness of the data mining protocol was confirmed by the accuracy for a human data set, when using only the following three genes: Il12b, Cd40 and Socs3. In summary, we analysed the longitudinal pattern of expression in dendritic cells stimulated with activating agents with the aim of identifying signatures that would predict or explain the dentritic cell response to an inflammatory agent.

Background

Genome-wide screening of expression profiles has provided a broad perspective on gene regulation in health and disease. Gene expression is controlled over a wide range through complex interplay between DNA regulatory proteins, microRNA molecules and epigenetic modifications determining transcript production [1–3]. For example, gene expression profiles in mouse dendritic cells (DCs) in response to microbial organisms and their components have been studied using a functional genomics approach and the molecular patterns involved in DCs activation have been determined [4–7]. However, the high-dimensionality inherent in genome-wide analyses makes it difficult to extract biologically useful information from gene expression data. Early attempts at genome-wide expression analysis used unsupervised methods to identify groups of genes or conditions with similar expression profiles [8–10]; the observation that functionally related or co-regulated genes often cluster together was used to provide biological insight. Classification studies in the field of microarray analysis have become important for the development of diagnostic tests. One of the most common approaches for supervised classification is binary classification, which distinguishes between two types of phenotype: positive, for example compound A-treated samples, and negative, often control or compound B-treated samples. A collection of samples with known type labels is used to train a classifier that is then used to classify new samples. For example, the supervised classification models Support Vector Machines [11], Classification Trees [12] and Artificial Neural Networks [13] have led to the generation of functional gene signatures for haematological malignancies [8, 14–16], and for the identification of molecular markers that provide accurate diagnosis, prognosis and selection of treatment regimens for human diseases [17–20]. These methods are able to identify genes and, consequently gene networks, associated with particular phenotypes. More recently, supervised classification models combining cross validation and heuristic search strategies have been used to discover optimal expression signatures in cancer [21–23]. However, despite the number of classification methods that have been developed for this kind of knowledge extraction, such knowledge has not yet been widely used in diagnostic or prognostic decision-support systems [13]. This is partly due to the variability of the results obtained [24] and also to the different data sets used [25, 26].

Few methods have been used to identify specific expression signatures that could contribute to the molecular diagnosis of inflammatory-based diseases. The Random Forests method has been used to generate a 44-gene signature in DCs to distinguish between inflammatory and non-inflammatory stimuli, but this gene signature is too large for clinical exploitation [5]. Here, we report a data mining protocol developed through the analysis of a database generated from microarray experiments with DCs exposed to various stimuli able to induce cell activation. This protocol allowed the selection of a small set of genes which were subsequently used by supervised classification models to make inferences concerning the inflammatory state of the samples.

Results

The Knowledge Extraction Protocol (KEP), depicted in Figure 1, was used to select relevant probe sets (genes) and to train supervised classification models to discriminate between "inflammatory" and "not inflammatory" phenotypes of DCs.

Data Selection

Mouse data: two microarray data sets, namely the Learning Data Set and the Validation Data Set, were defined. The Learning Data Set included the results obtained from microarray experiments performed with: Affymetrix MGU74Av2 arrays (89 samples - 9 different stimuli) [5], Affymetrix MOE430A arrays (44 samples - 4 different stimuli) and MOE430A 2.0 arrays (22 samples - 2 different stimuli). The Validation Data Set the results of microarray experiments performed with: Affymetrix MGU74Av2 arrays (43 samples - 6 different stimuli) [5] and MOE430A 2.0 arrays (6 samples - 1 stimulus; this stimulus is the only one that was not with the DC cell line D1 [27], but used bone marrow-derived DCs (BMDC) [28]).

Pre-processing

The differences in array formats required the data to be standardised. GeneChip Mouse Expression 430 (MOE430A 2.0) is the latest version of Affymetrix mouse arrays and contains 22,600 probe sets. All the probe sets of the MOE430A array are included in the MOE430A 2.0 array. The older mouse array, MGU74Av2, contains 12,488 probe sets that only partially match the probe sets of its more recent releases. Affymetrix provides "best match" probe set tables which allow the mapping of equivalent probe sets between different array releases.

The following pre-processing steps were performed: a) Probe set best matching between MOE430A and MGU74Av2. This resulted in 8,904 probe sets, also included in the MOE430A 2.0 array; b) Probe set filtering based on Affymetrix grading A annotation. This step retained 8,349 probe sets out of the 8,904 available; c) Probe set filtering based on expression signals. Every probe set whose expression signal was below 100 was discarded, such that 5,802 probe sets of the 8,349 available were retained; d) per sample Z-score computation.

The pre-processing procedure generated the Pre-processed Learning Data Set, which consisted of 155 samples (15 different stimuli), and the Pre-processed Validation Data Set, which consisted of 49 samples (7 different stimuli). Both data sets contained the same 5,802 probe sets. The class counts for the two data sets are summarised in Table 1 and the detailed list of the experiments and array types is reported in Additional file 1.

Table 1 Frequency of the class variable for Pre-processed Data Sets.

Full size table

Feature Selection

Feature selection involves the identification and removal of non significant features. The probe sets which provide no information helping to discriminate between "inflammatory" and "not inflammatory" states of the samples are thereby removed from the analysis.

The Weka software environment was used for feature selection [29]. The feature selection task was performed through an ADTree-based wrapper schema (default parameter values) applied to the Pre-processed Learning Data Set. This step selected an expression signature of ten probe sets (Table 2) from among the initial 5,802, which generated the Selected Features Learning Data Set.

Table 2 Selected Genes.

Full size table

Model Training and Performance Estimation

This task, implemented through the Weka software environment, used the Selected Features Learning Data Set to train, evaluate and compare the performance of the following supervised classification models: ZeroR, IB-3, C4.5, Logistic, Multi Layer Perceptron (MLP), Naïve Bayes (NB), Random Forest (RF), Support Vector Machines (SMO-puk) and Tree Augmented Naïve bayes (TAN).

These models were chosen because they are state-of-the-art for solving supervised classification problems. ZeroR uses the majority criteria to classify a sample, i.e. it classifies each sample according to the majority of the class distribution. The weighted averages, estimated through ten repeated 10-fold cross validations, of the following performance measures are reported in Table 3: Precision, Recall, F-measure, ROC and Accuracy. ZeroR was used as the baseline measure of performance, and the performance of the other models was assessed from ROC values: the ROC values were 97.5% for each C4.5, 100% for MLP 99.9% for IB-3 99.8% for RF, 99.0% for SMO-puk, and 99.2% for TAN, and 98.6% for both Logistic and NB. However, using accuracy to compare the supervised classification models, a different picture is obtained. The model with the highest accuracy value was RF (99.1%). The other accuracy values were 98.6% for both SMO-puk and MLP, 98.1% for IB-3, 96.3% for both TAN and C4.5, 95.5% for Logistic and 94.2%, the lowest value, for NB.

Table 3 Learning Performance Report.

Full size table

Validation

Supervised classification models, which generate the selected gene expression signature, need to be able to classify data sets other than the one they were trained on if they are to be useful. Therefore, the performance of the supervised classification models was evaluated by exploiting the Selected Features Validation Data Set (Table 4). The Bayesian models, NB (93.0%) and TAN (92.8%), attained the highest ROC values and both IB-3 (92.6%) and C4.5 (91.2%) gave good ROC values. However, the ROC values were substantially lower for RF (89.6%), MLP (88.1%), SMO-puk (86.7%) and Logistic (86.6%). The ZeroR model gave an ROC value of 50% confirming, as was expected, that it behaves like a random guessing model. A different picture emerged when the accuracy performance measure was used. Indeed, the best accuracy value (93.9%) was for C4.5 and RF. The accuracy value for the TAN model was 91.8% and that for SMO-puk was 89.8%. The accuracy values were lower for NB (87.8%), IB-3 (85.7%) and Logistic (81.6%). The model with the worst accuracy value was MLP (77.6%).

Table 4 Validation Performance Report.

Full size table

Functional Gene Selection

The annotations of the ten selected genes (Table 2) indicate that four, namely Socs3, Irgm1, Il12b and Cd40, are associated with known immune-related functions. Expression of six of the ten selected genes differs between the "non inflammatory" and "inflammatory" classes with an absolute Log2 FoldChange (LogFC) greater than 1. A heatmap (Figure 2) was established for the LogFC of the average signal intensities of the selected genes for the "non inflammatory" and "inflammatory" experiments, calculated on the median expression value for that gene. Il2b and Socs3 are up-regulated with LogFC values of 4.1 and 2.7, respectively. Irgm1, Plin2, Lgals3bp and Smarcc1 are down-regulated with LogFC values of -1.1, -5.6, -2.7 and -2.9, respectively in the samples induced with inflammatory stimuli. The remaining four genes, namely Cd40, Dock5, Rnf34 and Rab24, show a level of up-regulation or down-regulation resulting in a value of LogFC which is smaller than 1. To characterize the selected gene expression signature further, the ten genes were examined with Ingenuity^® Pathway Analysis (IPA) software and the Ingenuity^® Knowledge Base (IKB). The IPA software was queried to find the biological interactions (direct and indirect) among the ten genes. The top network retrieved (IPA score equal to 16), depicted in Figure 3, contains six genes of the selected gene expression signature (grey nodes in Figure 3) and 25 further genes (white nodes in Figure 3) that were added by the IKB to build the network. The biological functions associated with this network are the following: Cellular Growth and Proliferation, Haematological System Development and Function, Humoral Immune Response.

The molecular and cellular functions of the genes included in the selected gene expression signature were analysed with IPA (Table 5). This identified the Infection Mechanism to be the top function related to "Diseases and Disorders", the Cellular Growth and Proliferation to be the top function related to "Molecular and Cellular Functions" and the Haematological System Development and Function to be the top function related to "Physiological System Development and Function".

Table 5 Biological functions related to the selected genes.

Full size table

A smaller set of genes (Table 6) was obtained by removing those genes not included in the IPA top network (Figure 3). The performances of the classification models which exploit this reduced set of genes on the Selected Features Validation Data Set are reported in Table 7. The ROC values of RF, MLP, SMO-puk and IB-3 were not significantly affected by the functional gene selection step. However, the ROC values for NB, TAN and C4.5 increased whereas that for Logistic decreased. The accuracy values of TAN, SMO-puk and NB were not affected by the functional gene selection step; they increased from 85.7% to 91.8% for IB-3, from 77.6% to 81.6% for MLP and from 81.6% to 83.7% for Logistic, but decreased from 93.9% to 85.7% for both C4.5 and RF. The heatmap in Figure 4 shows the modulation of the six genes in the Selected Features Validation Data Set. Il2b, Socs3 and Cd40 were up-regulated in the Selected Features Validation Data Set also; with Cd40 being up-regulated (LogFC = 4.5) in the Selected Features Validation Data Set in comparison with the Selected Features Learning Data Set (LogFC = 0.45). Furthermore, Irgm1 was up-regulated (LogFC = 1.9) in the Selected Features Validation Data Set but down-regulated in the Selected Features Learning Data Set (LogFC = -5.6). Plin2, Lgals3bp and Smarcc1 were not modulated in the Selected Features Validation Data Set but were down-regulated in the Selected Features Validation Data Set (Figure 2). The best classification models, i.e. IB-3 and TAN, misclassified four of the 49 samples belonging to the Selected Features Validation Data Set. One sample was genuinely allocated to the wrong group, whereas two were known to be labelled with the wrong class and one was known to be an outlier.

Table 6 Reduced set of Genes.

Full size table

Table 7 Post-processing Performance (Functional Gene Selection I).

Full size table

Reducing the number of genes from ten to six on the basis of the information derived from the top network generated by IPA gave satisfactory accuracy values. Therefore, a further Functional Gene Selection step was performed. Three of the selected genes were directly linked to each other in the IPA top network: Cd40, Il12b and Socs3 (Figure 5). The results of the Validation task, when only the above genes were used, are reported in Table 8. The model that giving the best accuracy value was SMO-puk (95.9%). The second best accuracy value (91.8%) was with IB-3 and NB. Logistic and TAN gave the same, satisfactory, accuracy value (89.8%). That for MLP was 87.8% and the lowest value (85.7%) was for C4.5 and RF. The best model, i.e. SMO-puk, misclassified two of the 49 samples. These samples were those known to be labelled in the wrong class. These findings confirm that the three genes are sufficient for correct classification of all the samples of the Selected Features Validation Data Set.

Table 8 Post-processing Performance (Functional Gene Selection II).

Full size table

A 3-gene signature associated with inflammation in Human Dendritic Cells

Human Data. To test the general applicability of the proposed protocol, Affymetrix HGU133A gene expression microarray data for 27 human samples (corresponding to nine time series) was used to validate the performance of the 3-gene signature classifiers, also in human dendritic cells. A data set for human monocyte-derived dendritic cells treated with Mycobacteria tuberculosis was derived from a previous study [30] and tested (Table 9). All the supervised classification models, with the exception of IB-3 and SMO-puk, achieved an accuracy of 100% indicating that the 3-gene signature selected on mouse DCs indeed corresponds to a general signature of inflammation in dendritic cells in both human and mouse systems. Therefore, we suggest CD40, Il12b and Socs3 can be considered to be the master genes of inflammation and activation in DCs.

Table 9 Performance of 3-genes signature classifiers on the human data set.

Full size table

Discussion

In this study, we used advanced supervised analysis to derive specific transcriptional signatures from differentially activated DCs and assessed whether this molecular signatures can define DCs phenotypes in vitro. DCs form the connection between innate and adaptive mechanisms of the immune system. Studies in mice have demonstrated that cellular vaccination with antigen-bearing DCs is efficient in stimulating antigen-specific T cell responses. Because of the immune-regulating functions of DCs, the therapeutic use of DCs in medicine to control immune responses is an attractive strategy. DCs are indeed regarded as a powerful tool for anti-cancer immunotherapy [31]. In addition, to treat patients suffering from autoimmune or inflammatory diseases, it is desirable to downregulate immune responses in an antigen-specific or a tissue-specific manner without causing systemic immunosuppression. Moreover, graft-versus-host disease (GVHD) and graft rejection are the most serious problems in transplantation medicine, and control of alloreactive immune responses is the key to overcoming these problems. Therefore, antigen-specific negative regulation by DCs with immunosuppressive function is considered to be a promising treatment method also in the field of transplantation medicine [32, 33]. In summary, a number of studies describe the generation of DCs from sources aiming at cell therapy [34, 35]. Nevertheless, no methods exist today to test quality of the cell type generated. Therefore, a molecular test that could confirm DCs quality before their use in clinic will provide valuable information into the field of DCs therapies.

The problem of sample classification via gene signatures derived from transcriptional profiling has received increasing attention in the context of DNA microarrays. We used various aspects of the evaluation of gene selection approaches by combining the analysis of different markers of performance. First, we selected a list of genes, from whole-genome profiling of DCs, able to discriminate DC activation state. Second, to reduce the bias due to the classification model, we estimated different parameters through optimisation on an independent validation data set.

The Knowledge Extraction Protocol (KEP) (Figure 1) selected ten genes that, on the Selected Features Validation Data Set, discriminated between "inflammatory" and "not inflammatory" stimuli with an accuracy of 93.9% for C4.5 and RF and of 91.8% for TAN.

Six of the ten genes selected were modulated in the Selected Features Learning Data Set between the "not inflammatory" and "inflammatory" classes with an absolute Log2FoldChange (LogFC) greater than 1. The heatmap of the selected genes is shown in Figure 2 and revealed that two of them were up-regulated and four were down-regulated. Il2b, Socs3 and Cd40 were up-regulated (Figure 4) also in the Selected Features Validation Data Set; notably, Cd40 was up-regulated (4.5 LogFC) in the inflammatory state samples of the Selected Features Validation Data Set, compared to 0.45 LogFC in the Selected Features Learning Data Set. Plin2, Lgals3bp and Smarc1 were not substantially modulated in the Selected Features Validation Data Set and were down-regulated in the Selected Features Learning Data Set. Modulation of these selected genes should be further investigated biologically to validate these findings.

KEP misclassified four of the 49 samples of the Selected Features Validation Data Set; one sample was derived from D1 cells treated with the Listeria monocytogenes EGD for 4 h replicate A, and three samples from D1 treated with the Listeria innocua 0 h replicates A and B and 8 h replicate A. The two time 0 h samples of the Listeria innocua experiment were known to be mislabelled, and the sample 8 h was found to be an outlier. Hierarchical clustering analysis of the samples from this Listeria monocytogenes EGD experiment did not show any anomaly that might provide an explanation for the misclassification (data not shown). Remarkably, in the Selected Features Validation Data Set, samples from experiments involving cells from different sources (e.g. bone-marrow derived DCs) were not misclassified. This suggested that the KEP presented in this work may discriminate inflammatory signatures for DCs from diverse sources.

Several methods, including traditional statistical techniques and state of the art computer-intensive methodologies, have been investigated to predict inflammatory signatures in DCs. Activation of DCs with LPS and with IFN-β have been shown to generate cells prone to produce Th1 attractants that are effective for adoptive immune cancer therapy [36, 37]. It has been also demonstrated that DCs exposed to supernatants derived from tumours treated with some cytotoxic drugs are capable to modulate co-stimulatory markers and to trigger T cell responses [38]. A 44-gene signature in DCs, able to discriminate between different functional states, is described in [5]. Here, we report a significant improvement over the previous work by reducing the number of genes in the signature and by testing their performance with DCs derived from different hosts, namely mouse and human. We selected a signature of inflammation based on the expression of ten genes and demonstrated that this list could be further reduced to three genes without significantly affecting the classification performance. The three genes, namely CD40, Il12b and Socs3, can thus be considered to be the master genes of activation/inflammation in DCs. CD40 mediates a broad variety of immune and inflammatory responses, and the ligand-receptor interaction is responsible for immune activation; Il12b is a part of the IL12 cytokine complex, a cytokine that acts on T and natural killer cells, and has a broad range of biological activities, the most important being the induction of Th1 cells development; the Socs3 gene encodes a member of the STAT-induced STAT inhibitor (SSI) family, also known as the suppressor of cytokine signalling (SOCS) family. SSI family members are cytokine-inducible negative regulators of cytokine signalling [39–42]. Therefore, the regulation of these genes in concert in DCs suggests that they may serve as molecular markers of inflammation/activation both in human and murine DCs.

Conclusions

Experimental and bioinformatics strategies of this type may be used to improve treatment decisions for other inflammatory contexts, particularly chronic diseases. The whole-genome approach holds the promise to define the DCs functional quality that results in a better prediction of the stimulatory capacity of the cells. This approach may become a powerful strategy in personalised medicine.

Methods

The Knowledge Extraction Protocol (Figure 1) is based on Data Mining (DM) [43, 44] and consists of the following tasks; Data Selection, Pre-processing, Feature Selection, Model Training and Performance Estimation, Validation and Functional Gene Selection.