Course Description
This course is an introduction to R and Bioconductor, a powerful and flexible statistical language for analysis of genetic and genomics data (http://www.bioconductor.org/). The course will introduce attendees to the basics of using R for statistical programming, computation, graphics, and modeling, especially for analyzing high-throughput genomic data. We will start with a basic introduction to the R language, reading and writing data, and plotting data. Case studies and data will all be based on real gene expression and genomics data. We will introduce the main classes and packages in Bioconductor. Our goal is to get attendees up and running with R and Bioconductor such that they can use it in their research and are in a good position to expand their knowledge of R and Bioconductor on their own.
Instructors
- Aedin Culhane contact: aedin@jimmy.harvard.edu
- Benjamin Haibe-Kains
Required Software
I recommend the following software.- R. Download R from from the R home page
- The integrated development envirnoment (IDE) R Studio available for Windows, Mac or Linux OS
- Install Bioconductor. Start R and type the following command (it requires an internet connection)
source("http://www.bioconductor.org/biocLite.R") biocLite()
- Latex
- Latex Editor ( a comparison of editors )
Schedule
Dec 14th- 9:30am – 5:00pm
Dec 15th – 9:30am – 12:00pm
Agenda
Day 1- History and Background to R, Installing R website slides
- Introduction to R language (classes, subsetting etc)
- slides
- Function to create an ExpressionSet given 2 matrices (or data.frames) containing 1) expression data and 2) annotation
makeEset<-function(eSet, annt){ #Creating an ExpressionSet from eSet, a normalized gene expression matrix # and annt, a data.frame containing annotation metadata <- data.frame(labelDescription = colnames(annt), row.names=colnames(annt)) phenoData<-new("AnnotatedDataFrame", data=annt, varMetadata=metadata) if (inherits(eSet, "data.frame")) eSet= as.matrix(eSet) if (inherits(eSet, "ExpressionSet")) eSet=exprs(eSet) data.eSet<-new("ExpressionSet", exprs=eSet, phenoData=phenoData) print(varLabels(data.eSet)) return(data.eSet) }
## Install Bioconductor Packages source("http://www.bioconductor.org/biocLite.R") biocLite("arrayQualityMetrics") biocLite("GEOquery")
-
Install packages for this tutorial
## Install Bioconductor Packages source("http://www.bioconductor.org/biocLite.R") biocLite("made4") biocLite("hgu95av2.db")
- Feature Selection and Gene Annotation Notes on Feature Selection using Limma and Annotating Genes in R
- R code for Feature Selection/Annotation Notes
- References which compare different feature selection approaches
- Jeffery IB, Higgins DG, Culhane AC. (2006) Comparison and evaluation of microarray feature selection methods. BMC Bioinformatics 7:359.
- Murie C, Woody O, Lee AY, Nadon R. (2009) Comparison of small n statistical tests of differential expression applied to microarrays. BMC Bioinformatics. 10:45.
- Gene Annotation HTML output Results of aafTableAnn()
For today please install the required packages by coping these commands in R
source("http://www.bioconductor.org/biocLite.R") biocLite("survcomp")Files required for both survival and predictionNetworks are available as a compressed zip file
- Survival Analysis (Survcomp, Ben)
- Networks (Ben)
- Install packages from local zip file predictionet_1.0.0.zip
- Notes on Prediction Neworks
- R code for Prediction Neworks
- Rnw file for Prediction Neworks
- CCCB_course_netinf_hkb
- An excellent beginners guide to R is from Emmanuel Paradis
- Introduction to R classes and objects on R site
- Tom Short’s R reference card and other contributed are useful from the R the R contributed documentation
- Stephen Eglen’s publications in PLoS Computational Biology on A Quick Guide to Teaching R Programming to Computational Biology Students. It includes links to lecture notes and an overview of useful introductory books in R.
- Simple Intro to Linear Models view
- Bioconductor Courses
- An excellent starter to Affymetrix data analysis Jean Wu’s excellent lab on Affymetrix data analysis
- Guide to importing GEO soft data files into bioconductor
- Thomas Girke’s (UC Riverside) intro into R and Bioconductor
- Jeffery IB, Higgins DG, Culhane AC. (2006) Comparison and evaluation of microarray feature selection methods. BMC Bioinformatics 7:359.
- Murie C, Woody O, Lee AY, Nadon R. (2009) Comparison of small n statistical tests of differential expression applied to microarrays. BMC Bioinformatics. 10:45.
- GeneSigDB http://compbio.dfci.harvard.edu/genesigdb
- MSigDB http://www.broadinstitute.org/gsea/msigdb/index.jsp
- KEGG http://www.genome.jp/kegg/
- GO http://www.geneontology.org/
Additional manuals
These will not be covered in the course, but maybe helpful if you are new to R. Lecture notes from Bio503 Programming and Statistical Modeling in R (Jan 2011)
New to R: Installation and getting help. Basic Introduction to R and Bioconductor
R Resources
Bioconductor Resources
Methods comparing different feature selection approaches
Links to Gene Set Analysis Resources
A few Code Tips
Function to create an ExpressionSet given 2 data matrices (or data.frames) containing 1) expression data and 2) annotation
makeEset<-function(eSet, annt){ #Creating an ExpressionSet from eSet, a normalized gene expression matrix # and annt, a data.frame containing annotation metadata <- data.frame(labelDescription = colnames(annt), row.names=colnames(annt)) phenoData<-new("AnnotatedDataFrame", data=annt, varMetadata=metadata) if (inherits(eSet, "data.frame")) eSet= as.matrix(eSet) if (inherits(eSet, "ExpressionSet")) eSet=exprs(eSet) data.eSet<-new("ExpressionSet", exprs=eSet, phenoData=phenoData) print(varLabels(data.eSet)) return(data.eSet) }
Starting with Biomart
library(biomaRt) mart=useMart("ensembl") mart<-useDataset("hsapiens_gene_ensembl",mart) geneAnnt<-getBM(attributes=c("affy_hg_u95av2","hgnc_symbol","chromosome_name","band", "entrezgene"),filters="affy_hg_u95av2",values=c("1939_at","1503_at","1454_at"), mart=mart)
Updated.Dec 2011. Aedin Culhane