A (protein coding) gene is determined to be expressed in a cell or group of cells when its transcribed messenger RNA (mRNA), or the resulting protein product, is detected. There are a wide variety of techniques for determining and quantifying gene expression, and most of these have substantial analytical components to them.
We measure gene expression in order to compare the expression levels of one or more genes in cells from different sources. Comparisons of interest include tumor versus normal cells, cells from a specific organ in a mutant, or genetically modified organism versus cells from the same organ in a normal organism of the same strain, and cells before and after an intervention such as a drug treatment.
There are many techniques for measuring gene expression, but perhaps most common at the moment are ones which rely on DNA-RNA or DNA-DNA hybridization. This is the process through which single-stranded DNA and RNA molecules find and base-pair with their complementary sequences amidst a complex mixture of many molecules of the same kind.
The older cellular-wide method for measuring gene expression at the protein level was two-dimensional gel (2D-Gel) analysis, where complex mixtures were separated by pH and size using isoelectric focusing and polyacrylamide gel electrophoresis (PAGE). The technique was combined with mass spectrometry (MS) in the 1990s, and now there are a number of electrophoresis-free MS based approaches to measuring protein levels. More recently, protein arrays have been developed, and some of these will be discussed later in the year in Workshop 4.
On what scale do we measure gene expression? Much of the recent interest by statisticians in this area stems from the availability of data sets giving expression measurements on tens of thousands of genes; so-called microarray gene expression data. However, nylon membrane filters with thousands of genes spotted on them have been around for over a decade, and smaller-scale quantitative expression data for much longer. Similarly 2D-Gel data are quite extensive, and MS-techniques, especially when done in conjunction with other separation techniques can produce up to 10^8 data points per sample. There are many differences between these different technologies, but from the analytical viewpoint, many similarities as well.
In this workshop, we will survey some of the computational, mathematical, and statistical models and methods used in analyzing gene expression data. Much of our focus will be on approaches quantifying mRNA, as that is the most well developed. We shall also present a small sample of the extensive biological and technological background to gene expression anaylsis.
|Monday, October 11|
|8:45-9:15am||Coffee and Registration|
|9:15-9:30am||Welcome and Introduction: Avner Friedman, Shili Lin, and Terry Speed|
|9:30-10:30am||Earl Hubbell: Designing estimators for low-level expression analysis|
|11:00-11:30am||M. Kathleen Kerr: Comparison of Affymetrix and quantitative rtPCR measurements of relative gene expression|
|2:00-3:00pm||David Kreil: From spot to biology: challenges in microarray data analysis|
|Tuesday, October 12|
|9:00-10:00am||Darlene Goldstein: Strategies for quantifying GeneChip expression for large studies|
|10:30-11:30am||W. Evan Johnson: Adjusting for the batch effect: an empirical Bayes approach to combining microarray data from multiple sources|
|2:00-3:00pm||Raymond Carroll: Efficient estimation of gene-environment interactions in case-control studies with quantitative gene information|
|Wednesday, October 13|
|9:00-10:00am||Jason Hsu: Statistically designing microarray experiments and analyzing gene expression data in a decision-making processes|
|10:30-11:30am||Susmita Datta: Significant analysis using P-values for multiple hypotheses testing in microarray experiments|
|2:00-3:00pm||David Allison: Opportunities, challenges, and issues posed by massive multiple inference in high dimensional biology|
|3:30-4:30pm||Eric Schadt: Complex systems to understand complex traits: beyond reagent driven science|
|Thursday, October 14|
|9:00-10:00am||Kim-Anh Do: A Bayesian mixture model for differential gene expression|
|10:30-11:30am||Rainer Spang: Differential co-expression of genes|
|2:00-3:00pm||Ina Hoeschele: Genetical genomics analysis to infer gene regulatory networks|
|Friday, October 15|
|9:00-10:00am||Harmen Bussemaker: Inferring regulatory circuitry through model-based analysis of mRNA expression and ChIP data|
|10:30-11:30am||Hongyu Zhao: Integrated statistical analysis of gene expression data|
|2:00-3:00pm||Terry Speed: Overview and open problems in the analysis of gene expression microarray data|