Categories
casting r r-faq

How to convert a factor to integer\numeric without loss of information?

679

When I convert a factor to a numeric or integer, I get the underlying level codes, not the values as numbers.

f <- factor(sample(runif(5), 20, replace = TRUE))
##  [1] 0.0248644019011408 0.0248644019011408 0.179684827337041 
##  [4] 0.0284090070053935 0.363644931698218  0.363644931698218 
##  [7] 0.179684827337041  0.249704354675487  0.249704354675487 
## [10] 0.0248644019011408 0.249704354675487  0.0284090070053935
## [13] 0.179684827337041  0.0248644019011408 0.179684827337041 
## [16] 0.363644931698218  0.249704354675487  0.363644931698218 
## [19] 0.179684827337041  0.0284090070053935
## 5 Levels: 0.0248644019011408 0.0284090070053935 ... 0.363644931698218

as.numeric(f)
##  [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2

as.integer(f)
##  [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2

I have to resort to paste to get the real values:

as.numeric(paste(f))
##  [1] 0.02486440 0.02486440 0.17968483 0.02840901 0.36364493 0.36364493
##  [7] 0.17968483 0.24970435 0.24970435 0.02486440 0.24970435 0.02840901
## [13] 0.17968483 0.02486440 0.17968483 0.36364493 0.24970435 0.36364493
## [19] 0.17968483 0.02840901

Is there a better way to convert a factor to numeric?

3

  • 7

    The levels of a factor are stored as character data type anyway (attributes(f)), so I don’t think there is anything wrong with as.numeric(paste(f)). Perhaps it would be better to think why (in the specific context) you are getting a factor in the first place, and try to stop that. E.g., is the dec argument in read.table set correctly?

    – CJB

    Jan 25, 2016 at 9:44


  • If you use a dataframe you can use convert from hablar. df %>% convert(num(column)). Or if you have a factor vector you can use as_reliable_num(factor_vector)

    – davsjob

    Nov 1, 2018 at 9:53

  • Thank good for this question. This is SO MUCH frustrating to see numbers get transformed into other numbers pretty much randomly.

    May 11 at 18:06

817

See the Warning section of ?factor:

In particular, as.numeric applied to
a factor is meaningless, and may
happen by implicit coercion. To
transform a factor f to
approximately its original numeric
values, as.numeric(levels(f))[f] is
recommended and slightly more
efficient than
as.numeric(as.character(f)).

The FAQ on R has similar advice.


Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?

as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won’t be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don’t worry too much about it.


Some timings

library(microbenchmark)
microbenchmark(
  as.numeric(levels(f))[f],
  as.numeric(levels(f)[f]),
  as.numeric(as.character(f)),
  paste0(x),
  paste(x),
  times = 1e5
)
## Unit: microseconds
##                         expr   min    lq      mean median     uq      max neval
##     as.numeric(levels(f))[f] 3.982 5.120  6.088624  5.405  5.974 1981.418 1e+05
##     as.numeric(levels(f)[f]) 5.973 7.111  8.352032  7.396  8.250 4256.380 1e+05
##  as.numeric(as.character(f)) 6.827 8.249  9.628264  8.534  9.671 1983.694 1e+05
##                    paste0(x) 7.964 9.387 11.026351  9.956 10.810 2911.257 1e+05
##                     paste(x) 7.965 9.387 11.127308  9.956 11.093 2419.458 1e+05

6

  • 5

    For timings see this answer: stackoverflow.com/questions/6979625/…

    Aug 8, 2011 at 11:27

  • 3

    Many thanks for your solution. Can I ask why the as.numeric(levels(f))[f] is more precise and faster? Thanks.

    – Sam

    Apr 18, 2014 at 0:25

  • 7

    @Sam as.character(f) requires a “primitive lookup” to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f].

    – Jonathan

    Jun 27, 2014 at 19:12


  • 18

    when apply as.numeric(levels(f))[f] OR as.numeric(as.character(f)), I have an warning msg: Warning message:NAs introduced by coercion. Do you know where the problem could be? thank you !

    – maycca

    Apr 13, 2016 at 21:23

  • 1

    @user08041991 I have the same issue as maycca. I suspect this is from gradual changes in R over time (this answer was posted in 2010), and this answer is now outdated

    – MBorg

    Dec 13, 2020 at 18:27


105

R has a number of (undocumented) convenience functions for converting factors:

  • as.character.factor
  • as.data.frame.factor
  • as.Date.factor
  • as.list.factor
  • as.vector.factor

But annoyingly, there is nothing to handle the factor -> numeric conversion. As an extension of Joshua Ulrich’s answer, I would suggest to overcome this omission with the definition of your own idiomatic function:

as.double.factor <- function(x) {as.numeric(levels(x))[x]}

that you can store at the beginning of your script, or even better in your .Rprofile file.

7

  • 14

    There’s nothing to handle the factor-to-integer (or numeric) conversion because it’s expected that as.integer(factor) returns the underlying integer codes (as shown in the examples section of ?factor). It’s probably okay to define this function in your global environment, but you might cause problems if you actually register it as an S3 method.

    Apr 18, 2014 at 12:03

  • 2

    That’s a good point and I agree: a complete redefinition of the factor->numeric conversion is likely to mess a lot of things. I found myself writing the cumbersome factor->numeric conversion a lot before realizing that it is in fact a shortcoming of R: some convenience function should be available… Calling it as.numeric.factor makes sense to me, but YMMV.

    – Jealie

    Apr 18, 2014 at 20:11

  • 7

    If you find yourself doing that a lot, then you should do something upstream to avoid it all-together.

    Apr 18, 2014 at 22:44

  • 2

    as.numeric.factor returns NA?

    – jO.

    Aug 8, 2014 at 7:56

  • 1

    @rui-barradas comment = as a historical anomaly, R has two types for floating point vectors: numeric and double. According to the documentation, it is better to write code for the double type, thus as.double.factor seems like a more proper name. Link to documentation: stat.ethz.ch/R-manual/R-devel/library/base/html/numeric.html . Thanks @rui-barradas !

    – Jealie

    Oct 24, 2021 at 15:45

41

Note: this particular answer is not for converting numeric-valued factors to numerics, it is for converting categorical factors to their corresponding level numbers.


Every answer in this post failed to generate results for me , NAs were getting generated.

y2<-factor(c("A","B","C","D","A")); 
as.numeric(levels(y2))[y2] 
[1] NA NA NA NA NA Warning message: NAs introduced by coercion

What worked for me is this –

as.integer(y2)
# [1] 1 2 3 4 1

8

  • Are you sure you had a factor? Look at this example.y<-factor(c("5","15","20","2")); unclass(y) %>% as.numeric This returns 4,1,3,2, not 5,15,20,2. This seems like incorrect information.

    – MrFlick

    Feb 22, 2017 at 19:19

  • 5

    OK, well that’s not the question that was asked above. In this question the factor levels are all “numeric”. In your case , as.numeric(y) should have worked just fine, no need for the unclass(). But again, that’s not what this question was about. This answer isn’t appropriate here.

    – MrFlick

    Feb 22, 2017 at 19:37

  • 6

    Well, I really hope it helps someone who was in a hurry like me and read just the title !

    – Indi

    Feb 22, 2017 at 19:45

  • 1

    If you have characters representing the integers as factors, this is the one I would recommend. this is the only one that worked for me.

    – aimme

    Dec 12, 2019 at 16:14

  • 1

    This is the answer so many of us are after and the first hit in Google. I can’t find a similar question.

    Sep 22, 2021 at 18:50