Categories
categorical-data pandas python

Categorical Variables In A Pandas Dataframe?

I am working my way through Wes’s Python For Data Analysis, and I’ve run into a strange problem that is not addressed in the book.

In the code below, based on page 199 of his book, I create a dataframe and then use pd.cut() to create cat_obj. According to the book, cat_obj is

“a special Categorical object. You can treat it like an array of
strings indicating the bin name; internally it contains a levels array
indicating the distinct category names along with a labeling for the
ages data in the labels attribute”

Awesome! However, if I use the exact same pd.cut() code (In [5] below) to create a new column of the dataframe (called df['cat']), that column is not treated as a special categorical variable but simply as a regular pandas series.

How, then, do I create a column in a dataframe that is treated as a categorical variable?

In [4]:
import pandas as pd
raw_data = {'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
'score': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['name', 'score'])
bins = [0, 25, 50, 75, 100]
group_names = ['Low', 'Okay', 'Good', 'Great']
In [5]:
cat_obj = pd.cut(df['score'], bins, labels=group_names)
df['cat'] = pd.cut(df['score'], bins, labels=group_names)
In [7]:
type(cat_obj)
Out[7]:
pandas.core.categorical.Categorical
In [8]:
type(df['cat'])
Out[8]:
pandas.core.series.Series