Categories
r r-faq reshape reshape2 tidyr

Reshaping multiple sets of measurement columns (wide format) into single columns (long format)

53

I have a dataframe in a wide format, with repeated measurements taken within different date ranges. In my example there are three different periods, all with their corresponding values. E.g. the first measurement (Value1) was measured in the period from DateRange1Start to DateRange1End:

ID DateRange1Start DateRange1End Value1 DateRange2Start DateRange2End Value2 DateRange3Start DateRange3End Value3
1 1/1/90 3/1/90 4.4 4/5/91 6/7/91 6.2 5/5/95 6/6/96 3.3 

I’m looking to reshape the data to a long format such that the DateRangeXStart and DateRangeXEnd columns are grouped,. Thus, what was 1 row in the original table becomes 3 rows in the new table:

ID DateRangeStart DateRangeEnd Value
1 1/1/90 3/1/90 4.4
1 4/5/91 6/7/91 6.2
1 5/5/95 6/6/96 3.3

I know there must be a way to do this with reshape2/melt/recast/tidyr, but I can’t seem to figure it out how to map the multiple sets of measure variables into single sets of value columns in this particular way.

2

  • 6

    As a general practice, you might want to have a nicer naming pattern in the future. For example, it would be much easier/cleaner to work with “DateRangeStart1”, “DateRangeEnd1”, “Value1” (in other words, “VariableMeasurement”) than having the measurement value stuck somewhere in a variable name.

    Jan 31, 2018 at 8:07

  • Must the answer use reshape2/melt/recast/tidyr? (This question makes a better, more general dupe target if not)

    – smci

    Jan 24, 2020 at 19:09


23

Reshaping from wide to long format with multiple value/measure columns is possible with the function pivot_longer() of the tidyr package since version 1.0.0.

This is superior to the previous tidyr strategy of gather() than spread() (see answer by @AndrewMacDonald), because the attributes are no longer dropped (dates remain dates and numerics remain numerics in the example below).

library("tidyr")
library("magrittr")

a <- structure(list(ID = 1L, 
                    DateRange1Start = structure(7305, class = "Date"), 
                    DateRange1End = structure(7307, class = "Date"), 
                    Value1 = 4.4, 
                    DateRange2Start = structure(7793, class = "Date"),
                    DateRange2End = structure(7856, class = "Date"), 
                    Value2 = 6.2, 
                    DateRange3Start = structure(9255, class = "Date"), 
                    DateRange3End = structure(9653, class = "Date"), 
                    Value3 = 3.3),
               row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))

pivot_longer() (counterpart: pivot_wider()) works similar to gather().
However, it offers additional functionality such as multiple value columns.
With only one value column, all colnames of the wide data set would go into one long column with the name given in names_to.
For multiple value columns, names_to may receive multiple new names.

This is easiest if all column names follow a specific pattern like Start_1, End_1, Start_2, etc.
Therefore, I renamed the columns in the first step.

(names(a) <- sub("(\\d)(\\w*)", "\\2_\\1", names(a)))
#>  [1] "ID"               "DateRangeStart_1" "DateRangeEnd_1"  
#>  [4] "Value_1"          "DateRangeStart_2" "DateRangeEnd_2"  
#>  [7] "Value_2"          "DateRangeStart_3" "DateRangeEnd_3"  
#> [10] "Value_3"

pivot_longer(a, 
             cols = -ID, 
             names_to = c(".value", "group"),
             # names_prefix = "DateRange",
             names_sep = "_")
#> # A tibble: 3 x 5
#>      ID group DateRangeEnd DateRangeStart Value
#>   <int> <chr> <date>       <date>         <dbl>
#> 1     1 1     1990-01-03   1990-01-01       4.4
#> 2     1 2     1991-07-06   1991-05-04       6.2
#> 3     1 3     1996-06-06   1995-05-05       3.3

Alternatively, the reshape may be done using a pivot spec that offers finer control (see link below):

spec <- a %>%
    build_longer_spec(cols = -ID) %>%
    dplyr::transmute(.name = .name,
                     group = readr::parse_number(name),
                     .value = stringr::str_extract(name, "Start|End|Value"))

pivot_longer(a, spec = spec)

Created on 2019-03-26 by the reprex package (v0.2.1)

See also: https://tidyr.tidyverse.org/articles/pivot.html

3

  • 2

    This is actually an answer to a slightly different question, namely how to avoid loss of attributes with tidy-methods. The originally accepted answer (to use stats::reshape) never had that problem. And the original question clearly did not have Date-classed variables, either. The reshape function preserved factor levels and Date classes.

    – IRTFM

    Jun 15, 2019 at 15:55


  • I totally agree that your stats::reshape() solution (+1) does the job equally well.

    Jun 19, 2019 at 9:07

  • 1

    The regex can be simplified to names(a) <- sub("(\\d)(\\w*)", "\\2_\\1", names(a))

    Aug 21, 2019 at 9:49

35

data.table‘s melt function can melt into multiple columns. Using that, we can simply do:

require(data.table)
melt(setDT(dat), id=1L,
     measure=patterns("Start$", "End$", "^Value"), 
     value.name=c("DateRangeStart", "DateRangeEnd", "Value"))

#    ID variable DateRangeStart DateRangeEnd Value
# 1:  1        1         1/1/90       3/1/90   4.4
# 2:  1        2         4/5/91       6/7/91   6.2
# 3:  1        3         5/5/95       6/6/96   3.3

Alternatively, you can also reference the three sets of measure columns by the column position:

melt(setDT(dat), id = 1L, 
     measure = list(c(2,5,8), c(3,6,9), c(4,7,10)), 
     value.name = c("DateRangeStart", "DateRangeEnd", "Value"))

    23

    Reshaping from wide to long format with multiple value/measure columns is possible with the function pivot_longer() of the tidyr package since version 1.0.0.

    This is superior to the previous tidyr strategy of gather() than spread() (see answer by @AndrewMacDonald), because the attributes are no longer dropped (dates remain dates and numerics remain numerics in the example below).

    library("tidyr")
    library("magrittr")
    
    a <- structure(list(ID = 1L, 
                        DateRange1Start = structure(7305, class = "Date"), 
                        DateRange1End = structure(7307, class = "Date"), 
                        Value1 = 4.4, 
                        DateRange2Start = structure(7793, class = "Date"),
                        DateRange2End = structure(7856, class = "Date"), 
                        Value2 = 6.2, 
                        DateRange3Start = structure(9255, class = "Date"), 
                        DateRange3End = structure(9653, class = "Date"), 
                        Value3 = 3.3),
                   row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))
    

    pivot_longer() (counterpart: pivot_wider()) works similar to gather().
    However, it offers additional functionality such as multiple value columns.
    With only one value column, all colnames of the wide data set would go into one long column with the name given in names_to.
    For multiple value columns, names_to may receive multiple new names.

    This is easiest if all column names follow a specific pattern like Start_1, End_1, Start_2, etc.
    Therefore, I renamed the columns in the first step.

    (names(a) <- sub("(\\d)(\\w*)", "\\2_\\1", names(a)))
    #>  [1] "ID"               "DateRangeStart_1" "DateRangeEnd_1"  
    #>  [4] "Value_1"          "DateRangeStart_2" "DateRangeEnd_2"  
    #>  [7] "Value_2"          "DateRangeStart_3" "DateRangeEnd_3"  
    #> [10] "Value_3"
    
    pivot_longer(a, 
                 cols = -ID, 
                 names_to = c(".value", "group"),
                 # names_prefix = "DateRange",
                 names_sep = "_")
    #> # A tibble: 3 x 5
    #>      ID group DateRangeEnd DateRangeStart Value
    #>   <int> <chr> <date>       <date>         <dbl>
    #> 1     1 1     1990-01-03   1990-01-01       4.4
    #> 2     1 2     1991-07-06   1991-05-04       6.2
    #> 3     1 3     1996-06-06   1995-05-05       3.3
    

    Alternatively, the reshape may be done using a pivot spec that offers finer control (see link below):

    spec <- a %>%
        build_longer_spec(cols = -ID) %>%
        dplyr::transmute(.name = .name,
                         group = readr::parse_number(name),
                         .value = stringr::str_extract(name, "Start|End|Value"))
    
    pivot_longer(a, spec = spec)
    

    Created on 2019-03-26 by the reprex package (v0.2.1)

    See also: https://tidyr.tidyverse.org/articles/pivot.html

    3

    • 2

      This is actually an answer to a slightly different question, namely how to avoid loss of attributes with tidy-methods. The originally accepted answer (to use stats::reshape) never had that problem. And the original question clearly did not have Date-classed variables, either. The reshape function preserved factor levels and Date classes.

      – IRTFM

      Jun 15, 2019 at 15:55


    • I totally agree that your stats::reshape() solution (+1) does the job equally well.

      Jun 19, 2019 at 9:07

    • 1

      The regex can be simplified to names(a) <- sub("(\\d)(\\w*)", "\\2_\\1", names(a))

      Aug 21, 2019 at 9:49