Categories
dataframe r r-faq

Dynamically select data frame columns using $ and a character value

167

I have a vector of different column names and I want to be able to loop over each of them to extract that column from a data.frame. For example, consider the data set mtcars and some variable names stored in a character vector cols. When I try to select a variable from mtcars using a dynamic subset of cols, nether of these work

cols <- c("mpg", "cyl", "am")
col <- cols[1]
col
# [1] "mpg"

mtcars$col
# NULL
mtcars$cols[1]
# NULL

how can I get these to return the same values as

mtcars$mpg

Furthermore how can I loop over all the columns in cols to get the values in some sort of loop.

for(x in seq_along(cols)) {
   value <- mtcars[ order(mtcars$cols[x]), ]
}

0

    224

    You can’t do that kind of subsetting with $. In the source code (R/src/main/subset.c) it states:

    /*The $ subset operator.
    We need to be sure to only evaluate the first argument.
    The second will be a symbol that needs to be matched, not evaluated.
    */

    Second argument? What?! You have to realise that $, like everything else in R, (including for instance ( , + , ^ etc) is a function, that takes arguments and is evaluated. df$V1 could be rewritten as

    `$`(df , V1)
    

    or indeed

    `$`(df , "V1")
    

    But…

    `$`(df , paste0("V1") )
    

    …for instance will never work, nor will anything else that must first be evaluated in the second argument. You may only pass a string which is never evaluated.

    Instead use [ (or [[ if you want to extract only a single column as a vector).

    For example,

    var <- "mpg"
    #Doesn't work
    mtcars$var
    #These both work, but note that what they return is different
    # the first is a vector, the second is a data.frame
    mtcars[[var]]
    mtcars[var]
    

    You can perform the ordering without loops, using do.call to construct the call to order. Here is a reproducible example below:

    #  set seed for reproducibility
    set.seed(123)
    df <- data.frame( col1 = sample(5,10,repl=T) , col2 = sample(5,10,repl=T) , col3 = sample(5,10,repl=T) )
    
    #  We want to sort by 'col3' then by 'col1'
    sort_list <- c("col3","col1")
    
    #  Use 'do.call' to call order. Seccond argument in do.call is a list of arguments
    #  to pass to the first argument, in this case 'order'.
    #  Since  a data.frame is really a list, we just subset the data.frame
    #  according to the columns we want to sort in, in that order
    df[ do.call( order , df[ , match( sort_list , names(df) ) ]  ) , ]
    
       col1 col2 col3
    10    3    5    1
    9     3    2    2
    7     3    2    3
    8     5    1    3
    6     1    5    4
    3     3    4    4
    2     4    3    4
    5     5    1    4
    1     2    5    5
    4     5    3    5
    

    2

    • 3

      Has this situation changed in the years since?

      – Dunois

      Feb 20, 2020 at 22:23

    • I just came across wiith the same problem, ’do.call’ helps a lot, here is my code: df[do.call(order, df[cols]), ]

      – Ibrahimli

      Oct 30, 2021 at 9:08


    6

    Using dplyr provides an easy syntax for sorting the data frames

    library(dplyr)
    mtcars %>% arrange(gear, desc(mpg))
    

    It might be useful to use the NSE version as shown here to allow dynamically building the sort list

    sort_list <- c("gear", "desc(mpg)")
    mtcars %>% arrange_(.dots = sort_list)
    

    3

    4

    If I understand correctly, you have a vector containing variable names and would like loop through each name and sort your data frame by them. If so, this example should illustrate a solution for you. The primary issue in yours (the full example isn’t complete so I”m not sure what else you may be missing) is that it should be order(Q1_R1000[,parameter[X]]) instead of order(Q1_R1000$parameter[X]), since parameter is an external object that contains a variable name opposed to a direct column of your data frame (which when the $ would be appropriate).

    set.seed(1)
    dat <- data.frame(var1=round(rnorm(10)),
                       var2=round(rnorm(10)),
                       var3=round(rnorm(10)))
    param <- paste0("var",1:3)
    dat
    #   var1 var2 var3
    #1    -1    2    1
    #2     0    0    1
    #3    -1   -1    0
    #4     2   -2   -2
    #5     0    1    1
    #6    -1    0    0
    #7     0    0    0
    #8     1    1   -1
    #9     1    1    0
    #10    0    1    0
    
    for(p in rev(param)){
       dat <- dat[order(dat[,p]),]
     }
    dat
    #   var1 var2 var3
    #3    -1   -1    0
    #6    -1    0    0
    #1    -1    2    1
    #7     0    0    0
    #2     0    0    1
    #10    0    1    0
    #5     0    1    1
    #8     1    1   -1
    #9     1    1    0
    #4     2   -2   -2
    

    0