A Calder's
mobile is a good way to imagine how a dendrogram, which is like a photo
of the mobile,
may be a deceiving map
of the position of objects in higher dimensional spaces.
OVERVIEW
The goal of this class is to introduce statistical concepts and tools
necessary to interpret and critically evaluate the literature on gene expression
array data. Advanced statistical material will be presented at an intuitive
level. However, knowledge of basic hypothesis testing, ANOVA, linear and
logistic regression are a prerequisite.
MIDTERM EXAM The midterm exam is a take-home data analysis project. The experiment
to be analyzed consists of 8 two channel arrays, 4 in each of two groups.
The green channel is always reference RNA. The red channel is from
wild type mice in group 1 and from knockout mice in group two. The
reference is unrelated to either type. The goal is to identify genes that
may be differentially expressed tn the two groups. A project should contain
two parts: a) normalization; and b) identification of differentially expressed
genes.
Data are available in R format as midterm.data.dget
, which you can upload in R directly using the command
Data are also available as tab delimited ascii as midterm.data.txt , if you want to import in excel or other programs. Columns are labeled R1-R8 and G1-G1. R and G indicate red and green channels. Numbers indicate arrays. Arrays 1-4 are the wild type group, arrays 5-8 are the knockout group. The layout of the array in sma format is in midterm.setup.dget . If you don't use sma, the layout is included in the tab delimited file and is the same as that of the arrays described in the Yang et al paper linked to the Apr 3 lecture. I tried things out on R 1.4.1 in Linux and on R 1.4.0 on windows 98. The way I got it to work in windows is by first changing the working directory from the file pulldown menu at the top left and then loading the data from the R prompt as described above. don't forget to assign the dget command to something, as in
also, don't try to read the .txt file using dget. You can read the .txt into R using read.table, but if you dget the file "midterm_data.dget", the results will already be in the format you need for sma, w/o additional work. The deadline for the midterm is May 8 by class time. Please email your project. The list of truly differentially expressed genes in the simulated data set is here FINAL EXAM The final exam is a take home data analysis project. The experiment to be analyzed consists of 75 samples, each hybridized to one array. Samples are from one of two known classes. The goals are to construct and evaluate a classification algorithm to predict class based on the expression profiles. I have divided the samples into a training set of 50 arrays and a validation set of 25 arrays. Data files are here:
You can use any approach you like and any software you like. Many of the packages under "data mining" in this List have classification tools. Excel users may want to check out the BRB site. Please develop your classifier in the 50 training samples, and report to me how you developed you classifier, and how the classifier performs on the validation set. You can present more than one classifier (although one is enough). The only rule is this: Please do not use the validation sample to train the classifier, and do not go back and retrain the classifier if it does poorly in the validation sample. I will not grade you on how well you do on the validation sample. R has software for all the classification algorithms described in class.
To load the data in R,
The deadline for the final is friday May 24th by 11:30pm. Please
email your project.
SOFTWARE Students are free to use any software they like to analyze array data in this class. Some software packages will be illustrated in class, and occasionally live R analysis will be shown step by step. These link include a variety of free state-of-the art tools for array analysis.
INSTRUCTOR The instructor for this class is Giovanni
Parmigiani. He is available via email at gp@jhu.edu,
after class every Wednesday until 1pm, and by appointment.
RELATED LINKS The web site of the microarray working group "Hopkins Expressionists", with links to software and papers by Hopkins faculty. A description of and information for the course "Introduction to Bioinformatics" M.E:440.714 |