BIOSTATISTICS 140.688: STATISTICS FOR GENE EXPRESSION, Spring 2003
MIDTERM EXAM


The midterm exam is a take-home data analysis project.  The experiment to be analyzed consists of 12 oligonucleotide arrays, 6 in each of two groups.  The goal is to identify genes that may be differentially expressed in the two groups. A project should contain two parts: a) identification of differentially expressed genes and b) assessment of the stability of the list (e.g. FDR's, significance falues or posterior probabiities).
You are free to use any software package you like for the project, but some explanation should be given about the procedure followed by the software and  about why the procedure is appropriate for the data at hand.

Data are available in R format as midterm.data.dget , which you can upload in R directly using the command
   XX  <- dget("midterm.data.dget")
which will generate an objectcalled XX that has the gene expression values.

Data are also available as tab delimited ascii as  midterm.data.txt , if you want to import in excel or other programs.

Columns are labeled C1-C6 (for the control group) and E1-E6 for the experimental group.

Don't try to read the .txt file using dget. You can read the .txt into R using read.table, but if you dget the file "midterm.data.dget",  the results will already be in the format you need, w/o additional work.

The deadline for the midterm is April 28-th by class time. Please email your project.

here is an R dump midterm.data.dget with the true fold changes that were used in simulating the data you analyzed.