How to reshape data from long to wide format

Question

I'm having trouble rearranging the following data frame:

set.seed(45)
dat1 <- data.frame(
    name = rep(c("firstName", "secondName"), each=4),
    numbers = rep(1:4, 2),
    value = rnorm(8)
    )

dat1
       name  numbers      value
1  firstName       1  0.3407997
2  firstName       2 -0.7033403
3  firstName       3 -0.3795377
4  firstName       4 -0.7460474
5 secondName       1 -0.8981073
6 secondName       2 -0.3347941
7 secondName       3 -0.5013782
8 secondName       4 -0.1745357

I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:

     name          1          2          3         4
1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

I've looked at melt and cast and a few other things, but none seem to do the job.

possible duplicate of Reshape three column data frame to matrix — Frank, Commented Oct 8, 2013 at 20:53
@Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms. — smci, Commented Apr 11, 2014 at 5:21
A much more canonical answer can be found at the question linked about, now with the name Reshape three column data frame to matrix ("long" to "wide" format). In my opinion, it would have been better for this one to have been closed as a duplicate of that. — Aaron - mostly inactive, Commented Oct 14, 2021 at 17:36
The fact that the other question has one answer with a lot of options doesn't make it necessarily better than this; which has also a lot of options but in several answers. Furthermore, the definition of a duplicate is "This question already has answer here" (with a link to another earlier asked question). — Jaap, Commented Oct 15, 2021 at 12:08
I check in every so often so see if Stack Overflow is still more annoying than helpful -- yup, it is. I'll stay gone. — Aaron - mostly inactive, Commented Aug 1, 2023 at 1:06

Chase · Accepted Answer · 2011-05-04 23:20:03Z

369

Answer recommended by R Language Collective

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

answered May 4, 2011 at 23:20

Chase

68.8k18 gold badges146 silver badges164 bronze badges

25

+1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)
– aL3xa
Commented May 5, 2011 at 0:07
10

reshape is an outstanding example for a horrible function API. It is very close to useless.
– NoBackingDown
Commented Oct 26, 2017 at 15:18
28

The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)
– Brian D
Commented Nov 17, 2017 at 17:11
5

I would say base R still wins vote-wise by a factor of about 2 to 1
– vonjd
Commented Nov 22, 2018 at 15:14
2

Sometimes there are two idvars=, in this case we can do the following: reshape(dat1, idvar=c("name1", "name2"), timevar="numbers", direction="wide")
– jay.sf
Commented Jul 12, 2021 at 16:54

| Show 3 more comments

Community · Accepted Answer · 2020-06-20 09:12:55Z

The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.

Edit: Now, in 2019, tidyr v 1.0 has launched and set spread and gather on a deprecation path, preferring instead pivot_wider and pivot_longer, which you can find described in this answer. Read on if you want a brief glimpse into the brief life of spread/gather.

library(tidyr)
spread(dat1, key = numbers, value = value)

From github,

tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.

Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.

Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations. — Jake, Commented Apr 12, 2017 at 13:01

David Arenburg · Accepted Answer · 2015-05-26 14:52:24Z

87

You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is

library(reshape)
cast(dat1, name ~ numbers)

Or using reshape2

library(reshape2)
dcast(dat1, name ~ numbers)

edited May 26, 2015 at 14:52

David Arenburg

92.2k18 gold badges141 silver badges199 bronze badges

answered May 4, 2011 at 22:42

Ista

10.4k2 gold badges38 silver badges39 bronze badges

3

It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.
– thelatemail
Commented Jun 21, 2017 at 22:37
Note that reshape2 is deprecated and you should be migrating your code away from using it.
– dpel
Commented Jan 21, 2021 at 9:54
7

@dpel A more optimistic spin is to say that reshape2 is finally done and you can now use it without fear that Hadley will change it again and break your code!
– Ista
Commented Jan 22, 2021 at 22:48

Add a comment |

SymbolixAU · Accepted Answer · 2016-03-27 22:51:39Z

65

Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions

(Reference: Efficient reshaping using data.tables)

library(data.table)

setDT(dat1)
dcast(dat1, name ~ numbers, value.var = "value")

#          name          1          2         3         4
# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078
# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814

And, as of data.table v1.9.6 we can cast on multiple columns

## add an extra column
dat1[, value2 := value * 2]

## cast multiple value columns
dcast(dat1, name ~ numbers, value.var = c("value", "value2"))

#          name    value_1    value_2   value_3   value_4   value2_1   value2_2 value2_3  value2_4
# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078  0.3672866 -1.6712572 3.190562 0.6590155
# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814 -1.6409368  0.9748581 1.476649 1.1515627

edited Mar 27, 2016 at 22:51

answered Mar 27, 2016 at 22:35

SymbolixAU

26.2k4 gold badges70 silver badges146 bronze badges

8

data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!
– joel.wilson
Commented Aug 31, 2017 at 12:06
Great answer. Thank you. For multiple columns, I got "Error in .subset2(x, i, exact = exact)", and could fix this by forcing the use of data.table dcast: see stackoverflow.com/a/44271092/190791
– Timothée HENRY
Commented Jul 3, 2019 at 7:07

Add a comment |

zephryl · Accepted Answer · 2022-11-01 13:25:32Z

With tidyr, there is pivot_wider() and pivot_longer() which are generalized to do reshaping from long -> wide or wide -> long, respectively. Using the OP's data:

single column long -> wide

library(tidyr)

dat1 %>% 
    pivot_wider(names_from = numbers, values_from = value)

# # A tibble: 2 x 5
#   name          `1`    `2`    `3`    `4`
#   <fct>       <dbl>  <dbl>  <dbl>  <dbl>
# 1 firstName   0.341 -0.703 -0.380 -0.746
# 2 secondName -0.898 -0.335 -0.501 -0.175

multiple columns long -> wide

pivot_wider() is also capable of more complex pivot operations. For example, you can pivot multiple columns simultaneously:

# create another column for showing the functionality
dat2 <- dat1 %>% 
    dplyr::rename(valA = value) %>%
    dplyr::mutate(valB = valA * 2) 

dat2 %>% 
    pivot_wider(names_from = numbers, values_from = c(valA, valB))

# # A tibble: 2 × 9
#   name       valA_1 valA_2 valA_3 valA_4 valB_1 valB_2 valB_3 valB_4
#   <chr>       <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#  1 firstName   0.341 -0.703 -0.380 -0.746  0.682 -1.41  -0.759 -1.49 
#  2 secondName -0.898 -0.335 -0.501 -0.175 -1.80  -0.670 -1.00  -0.349

There is much more functionality to be found in the docs.

zx8754 · Accepted Answer · 2016-09-02 07:37:07Z

31

Using your example dataframe, we could:

xtabs(value ~ name + numbers, data = dat1)

edited Sep 2, 2016 at 7:37

zx8754

55.4k12 gold badges123 silver badges221 bronze badges

answered May 4, 2011 at 22:58

user666993

3

this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages
– cloudscomputes
Commented Oct 20, 2017 at 4:44
The result is just a matrix with a fancy class name. When x is the result of xtabs, attr(x,"class")=NULL;class(x) returns [1] "matrix" "array". This makes it look like a regular matrix: attr(x,"class")=NULL;attr(x,"call")=NULL;dimnames(x)=unname(dimnames(x)).
– nisetama
Commented Aug 10, 2022 at 13:08
This converts the result of xtabs to a dataframe: class(x)=NULL;as.data.frame(x). Without class(x)=NULL, the result is converted back to long format.
– nisetama
Commented Aug 10, 2022 at 13:35

Add a comment |

mpalanco · Accepted Answer · 2015-07-14 17:44:08Z

28

Other two options:

Base package:

df <- unstack(dat1, form = value ~ numbers)
rownames(df) <- unique(dat1$name)
df

sqldf package:

library(sqldf)
sqldf('SELECT name,
      MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1, 
      MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,
      MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,
      MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4
      FROM dat1
      GROUP BY name')

answered Jul 14, 2015 at 17:44

mpalanco

13.5k2 gold badges64 silver badges71 bronze badges

1

Instead of hardcoding numbers, the query can be set up like this: ValCol <- unique(dat1$numbers);s <- sprintf("MAX(CASE WHEN numbers = %s THEN value ELSE NULL END) `%s`,", ValCol, ValCol);mquerym <- gsub('.{1}$','',paste(s, collapse = "\n"));mquery <- paste("SELECT name,", mquerym, "FROM dat1", "GROUP BY name", sep = "\n");sqldf(mquery)
– M--
Commented Apr 29, 2019 at 17:58

Add a comment |

Onyambu · Accepted Answer · 2017-12-25 04:05:05Z

20

Using base R aggregate function:

aggregate(value ~ name, dat1, I)

# name           value.1  value.2  value.3  value.4
#1 firstName      0.4145  -0.4747   0.0659   -0.5024
#2 secondName    -0.8259   0.1669  -0.8962    0.1681

edited Dec 25, 2017 at 4:05

Onyambu

77.6k3 gold badges28 silver badges60 bronze badges

answered Sep 2, 2016 at 7:52

Ronak Shah

387k20 gold badges167 silver badges229 bronze badges

Add a comment |

SymbolixAU · Accepted Answer · 2018-08-29 03:00:45Z

The base reshape function works perfectly fine:

df <- data.frame(
  year   = c(rep(2000, 12), rep(2001, 12)),
  month  = rep(1:12, 2),
  values = rnorm(24)
)
df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")
df_wide

Where

idvar is the column of classes that separates rows
timevar is the column of classes to cast wide
v.names is the column containing numeric values
direction specifies wide or long format
the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

If no idvar exists, create one before using the reshape() function:

df$id   <- c(rep("year1", 12), rep("year2", 12))
df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")
df_wide

Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.

dmi3kno · Accepted Answer · 2017-12-23 23:01:37Z

There's very powerful new package from genius data scientists at Win-Vector (folks that made vtreat, seplyr and replyr) called cdata. It implements "coordinated data" principles described in this document and also in this blog post. The idea is that regardless how you organize your data, it should be possible to identify individual data points using a system of "data coordinates". Here's a excerpt from the recent blog post by John Mount:

The whole system is based on two primitives or operators cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These operators have pivot, un-pivot, one-hot encode, transpose, moving multiple rows and columns, and many other transforms as simple special cases.

It is easy to write many different operations in terms of the cdata primitives. These operators can work-in memory or at big data scale (with databases and Apache Spark; for big data use the cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN() variants). The transforms are controlled by a control table that itself is a diagram of (or picture of) the transform.

We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.

library(cdata)
# first build the control table
pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset
                        columnToTakeKeysFrom = 'numbers', # this will become column headers
                        columnToTakeValuesFrom = 'value', # this contains data
                        sep="_")                          # optional for making column names

# perform the move of data to columns
dat_wide <- moveValuesToColumnsD(tallTable =  dat1, # reference to dataset
                    keyColumns = c('name'),         # this(these) column(s) should stay untouched 
                    controlTable = pivotControlTable# control table above
                    ) 
dat_wide

#>         name  numbers_1  numbers_2  numbers_3  numbers_4
#> 1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
#> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

Answer needs updating, since the package seems to be rewritten (and links are dead) — runr, Commented Jan 6, 2022 at 11:59

zhang jing · Accepted Answer · 2019-07-26 05:47:41Z

much easier way!

devtools::install_github("yikeshu0611/onetree") #install onetree package

library(onetree)
widedata=reshape_toWide(data = dat1,id = "name",j = "numbers",value.var.prefix = "value")
widedata

        name     value1     value2     value3     value4
   firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
  secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

if you want to go back from wide to long, only change Wide to Long, and no changes in objects.

reshape_toLong(data = widedata,id = "name",j = "numbers",value.var.prefix = "value")

        name numbers      value
   firstName       1  0.3407997
  secondName       1 -0.8981073
   firstName       2 -0.7033403
  secondName       2 -0.3347941
   firstName       3 -0.3795377
  secondName       3 -0.5013782
   firstName       4 -0.7460474
  secondName       4 -0.1745357

nisetama · Accepted Answer · 2022-08-10 15:38:10Z

This works even if you have missing pairs and it doesn't require sorting (as.matrix(dat1)[,1:2] can be replaced with cbind(dat1[,1],dat1[,2])):

> set.seed(45);dat1=data.frame(name=rep(c("firstName","secondName"),each=4),numbers=rep(1:4,2),value=rnorm(8))
> u1=unique(dat1[,1]);u2=unique(dat1[,2])
> m=matrix(nrow=length(u1),ncol=length(u2),dimnames=list(u1,u2))
> m[as.matrix(dat1)[,1:2]]=dat1[,3]
> m
                    1          2          3          4
firstName   0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

This doesn't work if you have missing pairs and it requires sorting, but it's a bit shorter in case the pairs are already sorted:

> u1=unique(dat1[,1]);u2=unique(dat1[,2])
> dat1=dat1[order(dat1[,1],dat1[,2]),] # not actually needed in this case
> matrix(dat1[,3],length(u1),,T,list(u1,u2))
                    1          2          3          4
firstName   0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

Here's a function version of the first approach (add as.data.frame to make it work with tibbles):

l2w=function(x,row=1,col=2,val=3,sort=F){
  u1=unique(x[,row])
  u2=unique(x[,col])
  if(sort){u1=sort(u1);u2=sort(u2)}
  out=matrix(nrow=length(u1),ncol=length(u2),dimnames=list(u1,u2))
  out[cbind(x[,row],x[,col])]=x[,val]
  out
}

Or if you only have the values of the lower triangle, you can do this:

> euro=as.matrix(eurodist)[1:3,1:3]
> lower=data.frame(V1=rownames(euro)[row(euro)[lower.tri(euro)]],V2=colnames(euro)[col(euro)[lower.tri(euro)]],V3=euro[lower.tri(euro)])
> lower
         V1        V2   V3
1 Barcelona    Athens 3313
2  Brussels    Athens 2963
3  Brussels Barcelona 1318
> n=unique(c(lower[,1],lower[,2]))
> full=rbind(lower,setNames(lower[,c(2,1,3)],names(lower)),data.frame(V1=n,V2=n,V3=0))
> full
         V1        V2   V3
1 Barcelona    Athens 3313
2  Brussels    Athens 2963
3  Brussels Barcelona 1318
4    Athens Barcelona 3313
5    Athens  Brussels 2963
6 Barcelona  Brussels 1318
7    Athens    Athens    0
8 Barcelona Barcelona    0
9  Brussels  Brussels    0
> l2w(full,sort=T)
          Athens Barcelona Brussels
Athens         0      3313     2963
Barcelona   3313         0     1318
Brussels    2963      1318        0

Or here's another approach:

> rc=as.matrix(lower[-3])
> n=sort(unique(c(rc)))
> m=matrix(0,length(n),length(n),,list(n,n))
> m[rc]=lower[,3]
> m[rc[,2:1]]=lower[,3]
> m
          Athens Barcelona Brussels
Athens         0      3313     2963
Barcelona   3313         0     1318
Brussels    2963      1318        0

Another simple method in base R is to use xtabs. The result of xtabs is basically just a matrix with a fancy class name, but you can make it look like a regular matrix with class(x)=NULL;attr(x,"call")=NULL;dimnames(x)=unname(dimnames(x)):

> x=xtabs(value~name+numbers,dat1);x
            numbers
name                  1          2          3          4
  firstName   0.3407997 -0.7033403 -0.3795377 -0.7460474
  secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
> str(x)
 'xtabs' num [1:2, 1:4] 0.341 -0.898 -0.703 -0.335 -0.38 ...
 - attr(*, "dimnames")=List of 2
  ..$ name   : chr [1:2] "firstName" "secondName"
  ..$ numbers: chr [1:4] "1" "2" "3" "4"
 - attr(*, "call")= language xtabs(formula = value ~ name + numbers, data = dat1)
> class(x)
[1] "xtabs" "table"
> class(as.matrix(x)) # `as.matrix` has no effect because `x` is already a matrix
[1] "xtabs" "table"
> class(x)=NULL;class(x)
[1] "matrix" "array"
> attr(x,"call")=NULL;dimnames(x)=unname(dimnames(x))
> x # now it looks like a regular matrix
                    1          2          3          4
firstName   0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
> str(x)
 num [1:2, 1:4] 0.341 -0.898 -0.703 -0.335 -0.38 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2] "firstName" "secondName"
  ..$ : chr [1:4] "1" "2" "3" "4"

Normally as.data.frame(x) converts the result of xtabs back to long format, but you can avoid it with class(x)=NULL:

> x=xtabs(value~name+numbers,dat1);as.data.frame(x)
        name numbers       Freq
1  firstName       1  0.3407997
2 secondName       1 -0.8981073
3  firstName       2 -0.7033403
4 secondName       2 -0.3347941
5  firstName       3 -0.3795377
6 secondName       3 -0.5013782
7  firstName       4 -0.7460474
8 secondName       4 -0.1745357
> class(x)=NULL;as.data.frame(x)
                    1          2          3          4
firstName   0.3407997 -0.7033403 -0.3795377 -0.7460474
secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

This converts data in wide fromat to long format (unlist converts a dataframe to a vector and c converts a matrix to a vector):

w2l=function(x)data.frame(V1=rownames(x)[row(x)],V2=colnames(x)[col(x)],V3=unname(c(unlist(x))))

BKS · Accepted Answer · 2022-10-31 23:40:17Z

Came here via a linked question Reshape three column data frame to matrix ("long" to "wide" format). That question is closed, so I writing an alternative solution here.

I found a alternative solution, perhaps useful for someone looking for converting three columns to a matrix. I am referring to decoupleR (2.3.2) package. Below is copied from their site

Generates a kind of table where the rows come from id_cols, the columns from names_from and the values from values_from.

Usage

pivot_wider_profile(
data,
id_cols,
names_from,
values_from,
values_fill = NA,
to_matrix = FALSE,
to_sparse = FALSE,
...
)

Ronak Shah · Accepted Answer · 2024-06-14 15:20:23Z

0

I have recently started looking into collapse package which is super fast and useful. In collapse we can use the pivot function to do this transformation.

library(collapse)
pivot(dat1, "name", "value", "numbers", how = "wider")

#        name          1          2          3          4
#1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
#2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

answered Jun 14 at 15:20

Ronak Shah

387k20 gold badges167 silver badges229 bronze badges

Add a comment |

fmassica · Accepted Answer · 2021-11-08 11:29:10Z

Using only dplyr and map.

library(dplyr)
library(purrr)
set.seed(45)
dat1 <- data.frame(
  name = rep(c("firstName", "secondName"), each=4),
  numbers = rep(1:4, 2), value = rnorm(8)
)
longer_to_wider <- function(data, name_from, value_from){
  group <- colnames(data)[!(colnames(data) %in% c(name_from,value_from))]
  data %>% group_by(.data[[group]]) %>%
    summarise( name = list(.data[[name_from]]), 
               value = list(.data[[value_from]])) %>%
    {
      d <- data.frame(
        name = .[[name_from]] %>% unlist() %>% unique()
      )
      e <- map_dfc(.[[group]],function(x){
          y <- data_frame(
            x = data %>% filter(.data[[group]] == x) %>% pull(value_from)
          )
          colnames(y) <- x
          y
      })
      cbind(d,e)
    }
}
longer_to_wider(dat1, "name", "value")
#    name          1          2          3          4
# 1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474
# 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

Collectives™ on Stack Overflow

How to reshape data from long to wide format

15 Answers 15

Not the answer you're looking for? Browse other questions tagged
r
reshape
r-faq
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

15 Answers 15

Not the answer you're looking for? Browse other questions tagged rreshaper-faq or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
r
reshape
r-faq
or ask your own question.