Subsetting and Functions in R

I’ve started a new blog (and hopefully I will write more than two articles this time)!

As a first post I thought I would write about something I learnt recently; something I wish I had learnt long ago. That something is the different ways of subsetting in R. For a long time I thought there were only two ways of subsetting: [] for atomic vectors/lists and $ for data frames. If I wanted to extract a particular element I would use x[1] and if I wanted a column from a data frame I would use mtcars$mpg.

What I never realised was that $ is shorthand for [[. Perhaps this was because $ is so convenient when working with data frames as it allows tab completion when looking up variables. Typing mtcars[["mpg"]] takes much longer and is not really that great when wanting to quickly explore and model data.

However, the $ operator does not always work so well with functions (the fact that I have never really written functions in R is probably another reason I never worked out that $ and [[ were related). I realised this when I was recently trying to write a function that I could then iterate over multiple columns in a data frame. What I was trying to do was something like this:

double <- function(x) {
    print(mtcars$x * 2)
}

double(mpg)

But this doesn’t work! Since I had run into the problem in the past I thought I would try resolve it once and for all. Turns out all I had to do was read chapter 2 of Hadley Wickham’s Advanced R textbook! Although it doesn’t deal with functions specifically, it made me realise that I could write the above function like this:

double <- function(x) {
   print(mtcars[[x]] * 2)
}

double("mpg")

Not only does this work but I can now iterate over all the variables in mtcars:

# Extract column names
names <- colnames(mtcars)

# Apply the double function to each column in mtcars
lapply(names, double)

As a side note: I feel like this is one of those trade-off areas of R as a programming language where its ease of adoption and usage can come at the expense of learning the fundamentals of the language. Personally I have been using R for the last three years and have only learnt this now (hopefully this post might help someone learn the lesson earlier on in their journey). But the positive side of the trade-off is that by being accessible, R introduced me to the world of programming and in trying to learn more I have now come to care more about the fundamentals of the R language!