Categories
r r-faq reshape

How to reshape data from long to wide format

344

I’m having trouble rearranging the following data frame:

set.seed(45)
dat1 <- data.frame(
    name = rep(c("firstName", "secondName"), each=4),
    numbers = rep(1:4, 2),
    value = rnorm(8)
    )

dat1
       name  numbers      value
1  firstName       1  0.3407997
2  firstName       2 -0.7033403
3  firstName       3 -0.3795377
4  firstName       4 -0.7460474
5 secondName       1 -0.8981073
6 secondName       2 -0.3347941
7 secondName       3 -0.5013782
8 secondName       4 -0.1745357

I want to reshape it so that each unique “name” variable is a rowname, with the “values” as observations along that row and the “numbers” as colnames. Sort of like this:

     name          1          2          3         4
1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

I’ve looked at melt and cast and a few other things, but none seem to do the job.

4

  • 3

    possible duplicate of Reshape three column data frame to matrix

    – Frank

    Oct 8, 2013 at 20:53

  • 7

    @Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.

    – smci

    Apr 11, 2014 at 5:21

  • A much more canonical answer can be found at the question linked about, now with the name Reshape three column data frame to matrix (“long” to “wide” format). In my opinion, it would have been better for this one to have been closed as a duplicate of that.

    Oct 14, 2021 at 17:36


  • The fact that the other question has one answer with a lot of options doesn’t make it necessarily better than this; which has also a lot of options but in several answers. Furthermore, the definition of a duplicate is “This question already has answer here” (with a link to another earlier asked question).

    – Jaap

    Oct 15, 2021 at 12:08

330

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

8

  • 17

    +1 and you don’t need to rely on external packages, since reshape comes with stats. Not to mention that it’s faster! =)

    – aL3xa

    May 5, 2011 at 0:07


  • 8

    reshape is an outstanding example for a horrible function API. It is very close to useless.

    Oct 26, 2017 at 15:18

  • 25

    The reshape comments and similar argument names aren’t all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? 😉

    – Brian D

    Nov 17, 2017 at 17:11


  • 5

    I would say base R still wins vote-wise by a factor of about 2 to 1

    – vonjd

    Nov 22, 2018 at 15:14

  • 1

    Sometimes there are two idvars=, in this case we can do the following: reshape(dat1, idvar=c("name1", "name2"), timevar="numbers", direction="wide")

    – jay.sf

    Jul 12, 2021 at 16:54


157

The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.

Edit: Now, in 2019, tidyr v 1.0 has launched and set spread and gather on a deprecation path, preferring instead pivot_wider and pivot_longer, which you can find described in this answer. Read on if you want a brief glimpse into the brief life of spread/gather.

library(tidyr)
spread(dat1, key = numbers, value = value)

From github,

tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.

Just as reshape2 did less than reshape, tidyr does less than reshape2. It’s designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.

1

  • 7

    Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.

    – Jake

    Apr 12, 2017 at 13:01

83

You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is

library(reshape)
cast(dat1, name ~ numbers)

Or using reshape2

library(reshape2)
dcast(dat1, name ~ numbers)

3

  • 3

    It might be worth noting that just using cast or dcast will not work nicely if you don’t have a clear “value” column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.varcast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.

    Jun 21, 2017 at 22:37

  • Note that reshape2 is deprecated and you should be migrating your code away from using it.

    – dpel

    Jan 21, 2021 at 9:54

  • 5

    @dpel A more optimistic spin is to say that reshape2 is finally done and you can now use it without fear that Hadley will change it again and break your code!

    – Ista

    Jan 22, 2021 at 22:48