Instructor information
Aedin Culhane contact: aedin@jimmy.harvard.eduInstall and Set up script
Install scriptDatasets for Exercises
Women height and weight statistics
ToothGrowth Data
scripts
FUN
Manual and R code
Questions from Class
- How do I sort a matrix by 2 columns,one in decreasing order, the second ascending?
There are 2 ways to do thisx <- matrix(c(2,1,1,3,.5,.3,.5,.2), ncol=2) ## create an example dataset [,1] [,2] [1,] 2 0.5 [2,] 1 0.3 [3,] 1 0.5 [4,] 3 0.2
Sort the data in 2 steps:# Sort the second column in decreasing order x1 <- x[order(x[,2], decreasing=TRUE),] # Sort the first column in the partially sorted matrix x2 <- x1[order(x1[,1]),]
Or if both columns are numeric, you negatives sort in the reverse order of positivesx[order(x[,1], -x[,2]),]
If the values aren't known to be numeric, convert them to numeric before sortingx[order(xtfrm(x[,1]), -xtfrm(x[,2])),]
Note with both of these, NA will be appended to the end of the listz.vec<-c(5,NA,8,2,3.2) order(z.vec) z.vec[order(z.vec)] ## Results in 2.0 3.2 5.0 8.0 NA z.vec[order(z.vec, decreasing=TRUE)] ## Results in 8.0 5.0 3.2 2.0 NA
- Reading compressed data into R
Files compressed via the algorithm used by gzip can be used as connections created by the function gzfile, whereas files compressed by bzip2 can be used via bzfile. Suppose your data is in a compressed gzip or tar.gz file, you can use the R gzfile function to decompress on the fly. Do this:myDataFrame <- read.table(gzfile("myData.gz"), header=T)
- Functions for analyzing Survey data available in the Survery package
- Reading specialized data types SEER data SEER2R
- Merging data frame or matrices See attached Reports doc
Making Reports in R
- Examples in
- Also see our recent course in reproducible research
Resources for Spatial Data Analysis
- Tutorial that I wrote on the usage of the maps and ggmap packages. Open as a webpage or in microsoft
- Recent seminar from David Unwin " Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R" Slides (you need to enter your contact details, Revolution Analytics will email you their webinar series information, its only a few emails and most are interesting)
- Using google static maps Introduction to package RgoogleMaps
- ggamp combines the power of ggplot2 with the spatial contextual information in Google Maps or OpenStreetMaps (via RgoogleMaps) to provide an easy, consistent and modular framework for spatial graphics. For examples on how to get started with ggmap, see ?ggmap for several examples of its use.
- A recent tutorial on spatial data analysis broomspatial
- SAS and R. A blog devoted to examples of tasks (and their code) replicated in SAS and R http://sas-and-r.blogspot.com/
- R for SAS and SPSS Users. Download a free 80 page
document, R for SAS and SPSS Users which contains over 30 programs
written in all three languages. http://rforsasandspssusers.com/
Link to Early version of this book - Convert SAS or SPSS to R examples
- From the December 2009 issue of the R Journal. Transitioning to R: Replicating SAS, Stata, and SUDAAN Analysis Techniques in Health Policy Data. Anthony Damico http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Damico.pdf
- An excellent beginners guide to R is from Emmanuel Paradis
- Introduction to R classes and objects on R site
- Tom Short’s R reference card and other contributed are useful from the R the R contributed documentation
- Stephen Eglen’s publications in PLoS Computational Biology on A Quick Guide to Teaching R Programming to Computational Biology Students. It includes links to lecture notes and an overview of useful introductory books in R.
- Simple Intro to Linear Models view
- Books for learning R and Bioconductor my amazon wish list
Migrating from SAS or SPSS?
Resources and Tools to help users of other stats packages in learning R
R Resources
If you have your own laptop, I recommend the following
software for this course
- R Software: Download R from from the R home page and if you wish, the integrated development envirnoment (IDE) R Studio which is available for Windows, Mac or Linux OS
- Windows software: Download MikTex and an editor such as TexWorks, TeXnicCenter or simply just use an enchanced notepad like Notepad++
- Windows: There is also a easy-to-install Tex software bundle called proTeXt which includes MikTex, TeXnicCenter and Ghostscript
- MacOS software: Download MaxTEx and TeXshop for editing or
- TexMaker as a free cross-platform TeX editor
- Linux: I tend to use either Kate (within KDE), Emacs or Texworks which is cross platform
- More on Latex Editors from Wikipedia
- Convert Tex to a MSword Document using TeX4ht
These should each be pretty straightforward to download and install, but a little more detail instruction is provided in installing R (from May 2011)
Exploratory Data Analysis
- Importance of EDA
- Clustering Data using hierarchical cluster analysis
- Dimensions reduction using principal components analysis
- Interpreting Results of EDA
Exploratory Data Analysis (slides)
Exercise/tutorial
## Install R packages for this tutorial
install.packages(“RCurl”)
install.packages(“gplots”)
install.packages(“scatterplot3d”)
## Install Bioconductor Packages
source(“http://www.bioconductor.org/biocLite.R”)
#biocLite()
biocLite(“made4″)
biocLite(“hgu95av2.db”)
Reproducible Research
- Importance of Reproducdible Research
- Why should we perform reproducible research
- A survey of reproducibility, cases studies
- Using Sweave
Reproducible Research (A short course)
Updated.June 2012. Aedin Culhane