I had been programming R for about 9 months when I asked a computational biologist friend if he could review my first R project before I released it. His reply both surprised and fascinated me. It was simply that if all my functions and their side effects were not documented, he would pass on the code. He added there is so much R code out there with no documentation that no one has enough time to determine what is useful. Great feedback, time to implement it.

R’s Roxygen2 package will generate documentation from annotated source code. However, this only works when you are writing a package. No matter how hard I tried, I could not get the functions to work on a collection of R files in a project directory. Transforming my project into an R package would be a good learning experience.

Here’s the list of changes I made:

  1. Renamed project directory to conform to R’s package naming conventions.
  2. Moved .R files into a subdirectory named R.
  3. Installed the devtools package, install.packages("devtools").
  4. Executed devtools::create("mypackage") to generate a bare bones DESCRIPTION file.
  5. Coverted R comments to Roxygen comments. You do have comments, right?
  6. Built my package in RStudio, Build->Build and Reload. I changed Build->Configure Build Tools… to generate documentation on Build and Reload.

The fifth step was really the work I needed to complete. I used RStudio’s Code->Insert Roxygen Skelton to create the template for each function. Then moved any relevant comments into the proper section and changed R’s comment delimiter, #, to Roxygen’s #'. I created a package level documentation page by creating an R file in the form myPackage-package.R. It’s a good spot to document any package options. I was psyched I could now split my code into multiple files for better organization.

The Roxygen comments also include @import and @export directives so the build process will generate the NAMESPACE file. Any library() or require() calls must be converted to @import or @importFrom directives. Any function available outside the package needs the @export directive.

The next benefit is how easy it becomes to share your code. devtools::install_github() will install your package from Github and devtools::install_git() will install it from a git repo. That’s all there is to it.

The last benefit is a more explicit coding style, a la Python, which increases the compatibility of your R analysis scripts. A Python file typically starts with a section of import statements to gain access to code in other modules. R’s analog is the library() function. If you adopt this philosophy towards writing R code, it eliminates the need to set the defaultPackages option in .Rprofile or Rprofile.site, and makes your code simple to migrate to production.

This project exceeded my expectations. My initial purpose was simply to produce documentation from annoated source code, but it quicky became clear that writing a package had practical benefits beyond documentation. The basics are easy to learn and RStudio’s features make it easy to code. I’m aleady working on my second package.