CCCB Introduction to R and Bioconductor, May 2011

Introduction to R and Bioconductor

syllabus


Instructors

  • Aedin Culhane contact: aedin@jimmy.harvard.edu
  • Daniel Gusenleitner
  • Benjamin Haibe-Kains
  • Teaching Assistant: Markus Schroeder

Additional manuals

(these will not be covered in the course, but maybe helpful if you are new to R)

Lecture notes from Bio503 Programming and Statistical Modeling in R (Jan 2011)

Download A Introduction to R and Bioconductor. Includes information about installation and getting help. Basic Introduction to R and Bioconductor


Apologizes,links to some of the R and RNW files are currently having problems (Aug 2011) since we moved the location of the website. I will fix these asap

Schedule

May 23- 9:30am – 5:00pm

May 24 – 9:30am – 12:30pm


1. History and Background

  • A brief history of R and Bioconductor
  • Installing R and Bioconductor
  • The concept of libraries in R and Bioconductor
  • Resources for R: RStudio, R website, Bioconductor Website

1. History and Background (Slides)

Basic Introduction to R and Bioconductor (Installing R and Bioconductor Help file)


2. Introduction to the R language

  • Objects – types of objects, classes, creating and accessing objects
  • Arithmetic and matrix operations
  • Reading and writing data
  • Saving R session, R history

1.Introduction to Programming in R (Talk pdf or rnw file)

R code

Data for Exercise 1

womenStats.txt file


3. Brief Introduction to Graphics in R

  • Basic plotting
  • Manipulating the plotting window
  • Saving plots

2-Plotting in R

ColorChart – Colors in R

R code

Exercise 2: 11:30-12:00


4. Getting Genomics and Gene Expression Data into R

  • Workflows
  • Gene Expression Data Packages for different platforms
  • Other genomics data, Next Generation Sequencing, SNP data workflows

3. WorkFlows – Getting Data into R/Bioconductor (slides)

Function to create an ExpressionSet given 2 data matrices (or data.frames) containing 1) expression data and 2) annotation

makeEset<-function(eSet, annt){
          #Creating an ExpressionSet from eSet, a normalized gene expression matrix
          # and annt, a data.frame containing annotation
    metadata <- data.frame(labelDescription = colnames(annt), row.names=colnames(annt))
    phenoData<-new("AnnotatedDataFrame", data=annt, varMetadata=metadata)
    if (inherits(eSet, "data.frame")) eSet= as.matrix(eSet)
    if (inherits(eSet, "ExpressionSet")) eSet=exprs(eSet)
    data.eSet<-new("ExpressionSet", exprs=eSet, phenoData=phenoData)
    print(varLabels(data.eSet))
    return(data.eSet)
}

Lunch


5. Exploratory Data Analysis

  • Importance of EDA
  • Clustering Data using hierarchical cluster analysis
  • Dimensions reduction using principal components analysis
  • Interpreting Results of EDA

4.Bioc – Exploratory Data Analysis (slides)


Exercise/tutorial 3 (Exploratory Data Analysis)

## Install R packages for this tutorial

install.packages(“RCurl”)
install.packages(“gplots”)
install.packages(“scatterplot3d”)

## Install Bioconductor Packages
source(“http://www.bioconductor.org/biocLite.R”)
#biocLite()
biocLite(“made4″)
biocLite(“hgu95av2.db”)

This code is available in the following R script

Exercise

Course Data Files

  • Normalised (vsn) fibroblast data data.vsn.csv. To load directly into R, use
      read.csv("http://dl.dropbox.com/u/23527371/data.vsn.csv")
  • Samples annotation annt.txt. To read directly into R
    read.table("http://dl.dropbox.com/u/23527371/annt.txt", header=TRUE)
  • Download raw data (9 cel files) as a tar.gz or a zip compressed file

Break


6. Feature Selection

  • The simple t-test
  • Moderated variance t-tests using limma
  • Visualizing Results – Producing Boxplots, dotplots, venn diagrams etc

5- FeatureSelection (pdf)

R code

Resources
Methods comparing different feature selection approaches


7. Annotating Biological Data

  • Annotation Packages in Bioconductor
  • biomaRt

6-Annotation using annAffy and biomaRt (pdf)

R code

HTML Results of aafTableAnn()

To help on how to browse BiocViews to find Annotation or for help installing Annotation Packages, see Introduction and Installing R and Bioconductor



Day 2:

8. Gene Set Analysis Presented by Dan

7-GSEA (slides)

7- GSEA (tutorial)

Links to Gene Set Analysis Resources

Exercise Files


Break


9. Network Analysis in R Presented by Ben

Network Interence Talk

Exercise/Tutorial

Materials for Exercise

You will need to install predictionet. To do this first install its dependencies
install.packages(c("penalized", "igraph", "catnet", "network"))
Then right mouse click on the .zip file to save it to your hard drive. Then within R, click install packages -> from local zip file

Other files

rnw to regenerate R and pdf of tutorial


10. Recap/Advanced R


R Resources


Bioconductor Resources


Updated.May 2011. Aedin Culhane