Introduction to R and Bioconductor
Instructors
- Aedin Culhane contact: aedin@jimmy.harvard.edu
- Daniel Gusenleitner
- Benjamin Haibe-Kains
- Teaching Assistant: Markus Schroeder
Additional manuals
(these will not be covered in the course, but maybe helpful if you are new to R)
Lecture notes from Bio503 Programming and Statistical Modeling in R (Jan 2011)
Download A Introduction to R and Bioconductor. Includes information about installation and getting help. Basic Introduction to R and Bioconductor
Apologizes,links to some of the R and RNW files are currently having problems (Aug 2011) since we moved the location of the website. I will fix these asap
Schedule
May 23- 9:30am – 5:00pm
May 24 – 9:30am – 12:30pm
1. History and Background
- A brief history of R and Bioconductor
- Installing R and Bioconductor
- The concept of libraries in R and Bioconductor
- Resources for R: RStudio, R website, Bioconductor Website
1. History and Background (Slides)
Basic Introduction to R and Bioconductor (Installing R and Bioconductor Help file)
2. Introduction to the R language
- Objects – types of objects, classes, creating and accessing objects
- Arithmetic and matrix operations
- Reading and writing data
- Saving R session, R history
1.Introduction to Programming in R (Talk pdf or rnw file)
Data for Exercise 1
womenStats.txt file
3. Brief Introduction to Graphics in R
- Basic plotting
- Manipulating the plotting window
- Saving plots
Exercise 2: 11:30-12:00
4. Getting Genomics and Gene Expression Data into R
- Workflows
- Gene Expression Data Packages for different platforms
- Other genomics data, Next Generation Sequencing, SNP data workflows
3. WorkFlows – Getting Data into R/Bioconductor (slides)
Function to create an ExpressionSet given 2 data matrices (or data.frames) containing 1) expression data and 2) annotation
makeEset<-function(eSet, annt){ #Creating an ExpressionSet from eSet, a normalized gene expression matrix # and annt, a data.frame containing annotation metadata <- data.frame(labelDescription = colnames(annt), row.names=colnames(annt)) phenoData<-new("AnnotatedDataFrame", data=annt, varMetadata=metadata) if (inherits(eSet, "data.frame")) eSet= as.matrix(eSet) if (inherits(eSet, "ExpressionSet")) eSet=exprs(eSet) data.eSet<-new("ExpressionSet", exprs=eSet, phenoData=phenoData) print(varLabels(data.eSet)) return(data.eSet) }
Lunch
5. Exploratory Data Analysis
- Importance of EDA
- Clustering Data using hierarchical cluster analysis
- Dimensions reduction using principal components analysis
- Interpreting Results of EDA
4.Bioc – Exploratory Data Analysis (slides)
Exercise/tutorial 3 (Exploratory Data Analysis)
## Install R packages for this tutorial
install.packages(“RCurl”)
install.packages(“gplots”)
install.packages(“scatterplot3d”)
## Install Bioconductor Packages
source(“http://www.bioconductor.org/biocLite.R”)
#biocLite()
biocLite(“made4″)
biocLite(“hgu95av2.db”)
This code is available in the following R script
Exercise
Course Data Files
- Normalised (vsn) fibroblast data data.vsn.csv. To load directly into R, use
read.csv("http://dl.dropbox.com/u/23527371/data.vsn.csv")
- Samples annotation annt.txt. To read directly into R
read.table("http://dl.dropbox.com/u/23527371/annt.txt", header=TRUE)
- Download raw data (9 cel files) as a tar.gz or a zip compressed file
Break
6. Feature Selection
- The simple t-test
- Moderated variance t-tests using limma
- Visualizing Results – Producing Boxplots, dotplots, venn diagrams etc
Resources
Methods comparing different feature selection approaches
- Jeffery IB, Higgins DG, Culhane AC. (2006) Comparison and evaluation of microarray feature selection methods. BMC Bioinformatics 7:359.
- Murie C, Woody O, Lee AY, Nadon R. (2009) Comparison of small n statistical tests of differential expression applied to microarrays. BMC Bioinformatics. 10:45.
7. Annotating Biological Data
- Annotation Packages in Bioconductor
- biomaRt
6-Annotation using annAffy and biomaRt (pdf)
To help on how to browse BiocViews to find Annotation or for help installing Annotation Packages, see Introduction and Installing R and Bioconductor
Day 2:
8. Gene Set Analysis Presented by Dan
Links to Gene Set Analysis Resources
- GeneSigDB http://compbio.dfci.harvard.edu/genesigdb
- MSigDB http://www.broadinstitute.org/gsea/msigdb/index.jsp
- KEGG http://www.genome.jp/kegg/
- GO http://www.geneontology.org/
Exercise Files
- RData Files
- breastCancer.RData
- mSigDB.RData
- RScript
- gsea_tutorial.R
- preprocessGeneSet.R
Break
9. Network Analysis in R Presented by Ben
Materials for Exercise
install.packages(c("penalized", "igraph", "catnet", "network"))
Other files
rnw to regenerate R and pdf of tutorial
10. Recap/Advanced R
R Resources
- An excellent beginners guide to R is from Emmanuel Paradis
- Introduction to R classes and objects on R site
- Tom Short’s R reference card and other contributed are useful from the R the R contributed documentation
- Stephen Eglen’s publications in PLoS Computational Biology on A Quick Guide to Teaching R Programming to Computational Biology Students. It includes links to lecture notes and an overview of useful introductory books in R.
- Simple Intro to Linear Models view
Bioconductor Resources
- Bioconductor Courses
- An excellent starter to Affymetrix data analysis Jean Wu’s excellent lab on Affymetrix data analysis
- Guide to importing GEO soft data files into bioconductor
- Thomas Girke’s (UC Riverside) intro into R and Bioconductor
Updated.May 2011. Aedin Culhane