Sweave and Reproducible Research, Nov 20th 2012.


Reproducible Research

  • Importance of Reproducdible Research
  • Why should we perform reproducible research
  • A survey of reproducibility, cases studies
  • Using Sweave

Instructor information

Aedin Culhane contact: aedin@jimmy.harvard.edu

Lectures Notes and Manual

Slides Reproducible Research

Making Reports in R (Nov 20 th 2012)

we will use the following markdown file to create each of the files below R Markdown File of reports

  • Examples of documents created from markdown file

    Manual on producing documents using Sweave and creating Bioconductor packages

  • List of R packages that are useful for Reproducible Research (from CRAN task views)
  • A review of R resources for Reproducible Research (from R Task Views)
  • R2HTML Results of Demo

    Useful within text

    Add R code within text

    The number of rows in matrix is N= \Sexpr{nrow(myMatrix)}
    

    The number of rows in matrix is N= 10
    

    Exercise Data Files

    Download this Sweave style file and place it in your current working directory Sweave.sty

    Load Example Sweave File (rnw file.edit this one) exampleSweave.rnw. use commands

    Sweave(file="exampleSweave.Rnw")
    tools::texi2dvi(file="exampleSweave.tex", pdf=TRUE)
    Stangle(file="exampleSweave.Rnw")
    

    which will build the tex file, convert the tex file to pdf, and extract the R code chunks respectively,

    • Results of Sweave("exampleSweave.rnw") on Example Sweave File (tex file.. do not edit) exampleSweave.tex
    • Results of tools::texi2dvi("exampleSweave.tex", pdf=TRUE) on tex file exampleSweave.pdf
    • Results of Stangle("exampleSweave.rnw") on Example Sweave File exampleSweave.R

    Embedding R code into other document types

    HTML Document Office Documents
    • You can also use odfWeave to weave R code in OpenOffice documents. There is a nice tutorial on using odfweave available from Graham Williams in his book DATA MINING Desktop Survival Guide which is available online
    • R code can be embedded in Excel documents using Statconn. I haven't tried this but it looks promising. They have a long (30 min) demo online
    • Embedding R code in MSoffice document, spreadsheets (Excel) and presentations (powerpoint) using Inference for R . This is commerical software, but they offer a 1 year academic licence for free if you register with them. A commerical adverstisting Inference for R



      What is knitr

      knitr is a relatively new R package that extends Sweave, pdfSweave or cacheSweave and can created R code embedded in many different formats which are summarized below.

      Format

      Source file ending

      Output

      R Code Chunk

      R expression

      Rnw

      Rnw (.Rnw)

      Tex, pdf

      <<R example>>=
      x <- 1+1
      rnorm(5)
      @
      \Sexpr{pi}

      Github format markdown

      Markdown (.Rmd or .md)

      md, html

      ``` {r example}
      x <- 1+1
      rnorm(5)
      ```
      `r pi`.

      HTML

      Rhtml

      .html

      <!--R example
       x <- 1+1
       rnorm(5)
       end.rcode-->
      <!--rinline pi -->
       

      reStructuredText

      .Rst

      .rst

      .. {R example}

      .. x <- 1+1

      .. rnorm(5)

      .. ..

      NOTE:include space after the ..

      :r:`pi`

      *(GitHub does the job of parsing the md file to HTML)

      library(knitr)
      knit('knitr-minimal.Rnw')
      knit('knitr-minimal.Rhtml')
      knit('knitr-minimal.Rmd')

      BUT the best thing is RStudio knows all about knitr. So this is really easy to do. Just use File->New and select the document you want. RStudio has a button insert code chunk and it will create the correct formatting around your R code, so its works with the document style (tex, html, markdown etc)

      Really cool new feature using markdown

      One of the nicest recent develops is the ease with which one can convert documents from markdown. You can use pandoc to convert a markdown file to MS word, openoffice, LaTeX, html or many other file formats.
      Pandoc is a universal document converter which will convert markdown, and (subsets of) LaTex, HTML or reStructuredText to rich text (MS Office word), Open Office write, LaTeX, MediaWiki markup, Slidy HTML slide shows and many more formats.

      • In RStudio File->New->R Markdown
      • Type your Text and R code. For example:
        First attempt
        ==============
        
        Test using knitr to make a .doc file
        
        
        ``` {r test1}
        a<-10
        b<-20
        myVec<-rnorm(5)
        a+b
        range(myVec)
        ```
        
        
        Now for text within the document, the sum of the above analysis is `r sum(a+b)`. Isn't that **grand**
        
        
        Font *italic* or **bold** are marked by asterisks. An R code chuck is contained with three single back - quote or a R expression with the text is contained within a single back-quote. See RStudio help for more about markdown>
      • Save the file as .Rmd
      • use knitr to create to a markdown (.md file). (You can directly create a pdf or html file using knit2html and knit2pdf, sometime the latter fails, it so try using pandoc described below).
        # Create mark down (.md) file 
        knit("example.Rmd")
        knit2html("example.Rmd")
        knit2pdf("example.Rmd")
        
        
      • To convert to other formats, either use pandoc from within R (in the knitr library) or call it directly from the command line. Within R the commands are:
        Assuming you have a R markdown file called "example.Rmd" in your current working directory;
        require(knitr)
        # Produce the markdown (.md) file
        knit("example.Rmd")
        
        # help on the output and input formats accepted which include json, html, html5, odt, docx and epub and slide formats slidy, beamer, dzslides etc
        system("pandoc -h")
        # pdf file
        pandoc("example.md", format="latex")
        # html file
        pandoc("example.md", format="html5+lhs")
        # OpenOffice File
        pandoc("example.md", format="odt")
        # Microsoft Word
        pandoc("example.md", format="docx")
        

        From the command line
        # pdf file, -t to "to" format, -o is the output filename
        pandoc  example.md -t latex -o example.pdf
        
        # pdf file, -t to "to" format
        pandoc example.md -t latex -o example.pdf
        
        # html file
        pandoc example.md -o example.html
        
        # OpenOffice File
        pandoc example.md -o example.odt
        
        # Microsoft Word
        pandoc example.md -o example.docx
        

      • These files:
      • RStudio provide a free website called Rpubs for you to share your R results. Simply
        1. create a markdown (.Rmd) as described using File -> New -> R Markdown.
        2. click the Knit HTML button in the doc toolbar to preview your document.
        3. in the preview window, click the Publish button. It will send the results directly to your Rpubs account
      For more information on creating a .Rmd file in RStudio its the RStudio using markdown help and wikipedia markdown information

      Creating Slides

      There are several resources for creating slides from R. Among the most widely used in Beamer.
        To create slides using simple markdown. Create a .Rmd file with hash or pound system to title each slide
      • Example slide markdown file slides.Rmd
      • Output in html5 slides.html created using
        knit("slides.Rmd")
         system("pandoc -s -S -i -t dzslides --mathjax slides.md -o slides.html")
         

      links and resouces

      Software

      • R Software: Download R from from the R home page and if you wish, the integrated development envirnoment (IDE) R Studio which is available for Windows, Mac or Linux OS
      • Windows software: Download MikTex and an editor such as TexWorks, TeXnicCenter or simply just use an enchanced notepad like Notepad++
      • TexMaker as a free cross-platform TeX editor (recommended by our course student Thomas Wallis ;-)
      • Windows: There is also a easy-to-install Tex software bundle called proTeXt which includes MikTex, TeXnicCenter and Ghostscript
      • MacOS software: Download MaxTEx and TeXshop for editing
      • Linux: I tend to use either Kate (within KDE), Emacs or Texworks which is cross platform
      • More on Latex Editors from Wikipedia
      • Convert Tex to a MSword Document using TeX4ht

      Additional Resources and Manuals

      (these will not be covered in the course, but maybe helpful if you are new to R) One recent package that looks interesting is called Slidify and was recently discussed on R bloggers. The default style is html5slides. It has other framework (deck.js, dzslides, html5slides, shower, and slidy). It allows different themes, transitions and highlight style. For more information see slidify help.
      To install directly from its development version on github (NOTE.... you can directly install from github ;-))
      library(devtools)
      install_github('slidify', 'ramnathv')
      

      References

      • Dupuy A & Simon RM (2007) Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting JNCI 99:147-57 Simon's review in JNCI
      • Ioannidis JP, Allison DB, Ball CA, Coulibaly I, Cui X, Culhane AC, Falchi M, Furlanello C, Game L, Jurman G, Mangion J, Mehta T, Nitzberg M, Page GP, Petretto E, van Noort V. (2009) Repeatability of published microarray gene expression analyses. Nat Genet. 41(2):149-155. Paper and accompanying Editorial