Categories
pandas python python-3.x type-conversion

Convert strings to float in all pandas columns, where this is possible

I created a pandas dataframe from a list of lists

import pandas as pd
df_list = [["a", "1", "2"], ["b", "3", np.nan]]
df = pd.DataFrame(df_list, columns = list("ABC"))
>>> A B C
0 a 1 2
1 b 3 NaN

Is there a way to convert all columns of the dataframe to float, that can be converted, i.e. B and C? The following works, if you know, which columns to convert:

  df[["B", "C"]] = df[["B", "C"]].astype("float")

But what do you do, if you don’t know in advance, which columns contain the numbers? When I tried

  df = df.astype("float", errors = "ignore")

all columns are still strings/objects. Similarly,

df[["B", "C"]] = df[["B", "C"]].apply(pd.to_numeric)

converts both columns (though “B” is int and “C” is “float”, because of the NaN value being present), but

df = df.apply(pd.to_numeric)

obviously throws an error message and I don’t see a way to suppress this.

Is there a possibility to perform this string-float conversion without looping through each column, to try .astype("float", errors = "ignore")?

I think you need parameter errors="ignore" in to_numeric:

df = df.apply(pd.to_numeric, errors="ignore")
print (df.dtypes)
A object
B int64
C float64
dtype: object

It working nice if not mixed values – numeric with strings:

df_list = [["a", "t", "2"], ["b", "3", np.nan]]
df = pd.DataFrame(df_list, columns = list("ABC"))
df = df.apply(pd.to_numeric, errors="ignore")
print (df)
A B C
0 a t 2.0 <=added t to column B for mixed values
1 b 3 NaN
print (df.dtypes)
A object
B object
C float64
dtype: object

EDIT:

You can downcast also int to floats:

df = df.apply(pd.to_numeric, errors="ignore", downcast="float")
print (df.dtypes)
A object
B float32
C float32
dtype: object

It is same as:

df = df.apply(lambda x: pd.to_numeric(x, errors="ignore", downcast="float"))
print (df.dtypes)
A object
B float32
C float32
dtype: object