Sweave and Reproducible Research, Nov 20th 2012.
Reproducible Research
- Importance of Reproducdible Research
- Why should we perform reproducible research
- A survey of reproducibility, cases studies
- Using Sweave
Instructor information
Aedin Culhane contact: aedin@jimmy.harvard.eduLectures Notes and Manual
Slides Reproducible Research
Making Reports in R (Nov 20 th 2012)
we will use the following markdown file to create each of the files below R Markdown File of reports
- Examples of documents created from markdown file
- pdf File of reports
- MSOffice File of reports
- HTML reports
- Markdown File of reports
- R code from reports
Manual on producing documents using Sweave and creating Bioconductor packages
- List of R packages that are useful for Reproducible Research (from CRAN task views)
- A review of R resources for Reproducible Research (from R Task Views)
R2HTML Results of Demo
Useful within text
Add R code within text
The number of rows in matrix is N= \Sexpr{nrow(myMatrix)}
The number of rows in matrix is N= 10
Exercise Data Files
Download this Sweave style file and place it in your current working directory Sweave.sty
Load Example Sweave File (rnw file.edit this one) exampleSweave.rnw. use commands
Sweave(file="exampleSweave.Rnw") tools::texi2dvi(file="exampleSweave.tex", pdf=TRUE) Stangle(file="exampleSweave.Rnw")
which will build the tex file, convert the tex file to pdf, and extract the R code chunks respectively,
- Results of Sweave("exampleSweave.rnw") on Example Sweave File (tex file.. do not edit) exampleSweave.tex
- Results of tools::texi2dvi("exampleSweave.tex", pdf=TRUE) on tex file exampleSweave.pdf
- Results of Stangle("exampleSweave.rnw") on Example Sweave File exampleSweave.R
Embedding R code into other document types
HTML Document- Sweave can produce html output. Just edit a basic html template document
and then use the html Sweave driver:
Sweave("filename.rnw", driver=RweaveHTML)
- You can also use odfWeave to weave R code in OpenOffice documents. There is a nice tutorial on using odfweave available from Graham Williams in his book DATA MINING Desktop Survival Guide which is available online
- R code can be embedded in Excel documents using Statconn. I haven't tried this but it looks promising. They have a long (30 min) demo online
- Embedding R code in MSoffice document, spreadsheets (Excel) and presentations (powerpoint) using Inference for R . This is commerical software, but they offer a 1 year academic licence for free if you register with them. A commerical adverstisting Inference for R
What is knitr
knitr is a relatively new R package that extends Sweave, pdfSweave or cacheSweave and can created R code embedded in many different formats which are summarized below.Format |
Source file ending |
Output |
R Code Chunk |
R expression |
Rnw |
Rnw (.Rnw) |
Tex, pdf |
<<R example>>= x <- 1+1 rnorm(5) @ |
\Sexpr{pi} |
Github format markdown |
Markdown (.Rmd or .md) |
md, html |
``` {r example} x <- 1+1 rnorm(5) ``` |
`r pi`. |
HTML |
.html |
<!--R example x <- 1+1 rnorm(5) end.rcode--> |
<!--rinline pi --> |
|
reStructuredText |
.Rst |
.rst |
.. {R example} .. x <- 1+1 .. rnorm(5) .. .. NOTE:include space after the .. |
:r:`pi` |
*(GitHub does the job of parsing the md file to HTML)
library
(knitr
)
knit
('knitr-minimal.Rnw')
knit
('knitr-minimal.Rhtml')
knit
('knitr-minimal.Rmd')
BUT the best thing is RStudio knows all about knitr. So this is really easy to do. Just use File->New and select the document you want. RStudio has a button insert code chunk and it will create the correct formatting around your R code, so its works with the document style (tex, html, markdown etc)
Really cool new feature using markdown
One of the nicest recent develops is the ease with which one can convert documents from markdown. You can use pandoc to convert a markdown file to MS word, openoffice, LaTeX, html or many other file formats.
Pandoc is a universal document converter which will convert markdown,
and (subsets of) LaTex, HTML or reStructuredText to rich text (MS Office word), Open Office write, LaTeX, MediaWiki markup, Slidy HTML slide shows and many more formats.
- In RStudio File->New->R Markdown
- Type your Text and R code. For example:
First attempt ============== Test using knitr to make a .doc file ``` {r test1} a<-10 b<-20 myVec<-rnorm(5) a+b range(myVec) ``` Now for text within the document, the sum of the above analysis is `r sum(a+b)`. Isn't that **grand**
Font *italic* or **bold** are marked by asterisks. An R code chuck is contained with three single back - quote or a R expression with the text is contained within a single back-quote. See RStudio help for more about markdown> - Save the file as .Rmd
- use knitr to create to a markdown (.md file). (You can directly create a pdf or html file using knit2html and knit2pdf, sometime the latter fails, it so try using pandoc described below).
# Create mark down (.md) file knit("example.Rmd") knit2html("example.Rmd") knit2pdf("example.Rmd")
- To convert to other formats, either use pandoc from within R (in the knitr library) or call it directly from the command line. Within R the commands are:
Assuming you have a R markdown file called "example.Rmd" in your current working directory;require(knitr) # Produce the markdown (.md) file knit("example.Rmd") # help on the output and input formats accepted which include json, html, html5, odt, docx and epub and slide formats slidy, beamer, dzslides etc system("pandoc -h") # pdf file pandoc("example.md", format="latex") # html file pandoc("example.md", format="html5+lhs") # OpenOffice File pandoc("example.md", format="odt") # Microsoft Word pandoc("example.md", format="docx")
From the command line# pdf file, -t to "to" format, -o is the output filename pandoc example.md -t latex -o example.pdf # pdf file, -t to "to" format pandoc example.md -t latex -o example.pdf # html file pandoc example.md -o example.html # OpenOffice File pandoc example.md -o example.odt # Microsoft Word pandoc example.md -o example.docx
- These files:
- Input file created in RStudio editor: example.Rmd
- Result of running knit on example.Rmd example.md
- Pandoc output -pdf file example.pdf
- Pandoc output -html file example.html
- Pandoc output -OpenOffice file example.odt
- Pandoc output -MSword file example.docx
- RStudio provide a free website called Rpubs for you to share your R results. Simply
- create a markdown (.Rmd) as described using File -> New -> R Markdown.
- click the Knit HTML button in the doc toolbar to preview your document.
- in the preview window, click the Publish button. It will send the results directly to your Rpubs account
Creating Slides
There are several resources for creating slides from R. Among the most widely used in Beamer.-
To create slides using simple markdown. Create a .Rmd file with hash or pound system to title each slide
- Example slide markdown file slides.Rmd
- Output in html5 slides.html created using
knit("slides.Rmd") system("pandoc -s -S -i -t dzslides --mathjax slides.md -o slides.html")
links and resouces
- Description of how to embed R code in Latex Documents using Sweave from Friedrich Leisch
- Creating tables and Figures using Sweave
- More on making tables
- A template rnw file (from Keith Baggerly) template.rnw
- Converting Latex to Word
- Keith Baggerly's resources
- TeX User Group (TUG) Website. If you get more involved in TeX, who may wish to attend the TUG annual meeting which will be held in Boston in July 16-18 2012
- Using Lyx graphical user interface to latex
- Converting Sweave to other formats, including doc and html
Software
- R Software: Download R from from the R home page and if you wish, the integrated development envirnoment (IDE) R Studio which is available for Windows, Mac or Linux OS
- Windows software: Download MikTex and an editor such as TexWorks, TeXnicCenter or simply just use an enchanced notepad like Notepad++
- TexMaker as a free cross-platform TeX editor (recommended by our course student Thomas Wallis ;-)
- Windows: There is also a easy-to-install Tex software bundle called proTeXt which includes MikTex, TeXnicCenter and Ghostscript
- MacOS software: Download MaxTEx and TeXshop for editing
- Linux: I tend to use either Kate (within KDE), Emacs or Texworks which is cross platform
- More on Latex Editors from Wikipedia
- Convert Tex to a MSword Document using TeX4ht
Additional Resources and Manuals
(these will not be covered in the course, but maybe helpful if you are new to R)- Introduction to R Lecture notes from Bio503 Programming and Statistical Modeling in R (Jan 2011)
- Experimental Design Experimental design (slides)
To install directly from its development version on github (NOTE.... you can directly install from github ;-))
library(devtools) install_github('slidify', 'ramnathv')
References
- Dupuy A & Simon RM (2007) Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting JNCI 99:147-57 Simon's review in JNCI
- Ioannidis JP, Allison DB, Ball CA, Coulibaly I, Cui X, Culhane AC, Falchi M, Furlanello C, Game L, Jurman G, Mangion J, Mehta T, Nitzberg M, Page GP, Petretto E, van Noort V. (2009) Repeatability of published microarray gene expression analyses. Nat Genet. 41(2):149-155. Paper and accompanying Editorial