Categories
numpy pandas python

Check reference list in pandas column using numpy vectorization

I have a reference list

ref = ['September', 'August', 'July', 'June', 'May', 'April', 'March']

And a dataframe

df = pd.DataFrame({'Month_List': [['July'], ['August'], ['July', 'June'], ['May', 'April', 'March']]})
df
Month_List
0 [July]
1 [August]
2 [July, June]
3 [May, April, March]

I want to check which elements from reference list is present in each row, and convert into binary list

I can achieve this using apply

def convert_month_to_binary(ref,lst):
s = pd.Series(ref)
return s.isin(lst).astype(int).tolist()
df['Binary_Month_List'] = df['Month_List'].apply(lambda x: convert_month_to_binary(ref, x))
df
Month_List Binary_Month_List
0 [July] [0, 0, 1, 0, 0, 0, 0]
1 [August] [0, 1, 0, 0, 0, 0, 0]
2 [July, June] [0, 0, 1, 1, 0, 0, 0]
3 [May, April, March] [0, 0, 0, 0, 1, 1, 1]

However, using apply on large datasets is very slow and hence I am looking to use numpy vectorization. How can I improve my performance?

Extension:

I wanted to use numpy vectorization because I need to now apply another function on this list

I am trying like this, but performance is very slow. Similar results with apply

def count_one(lst):
index = [i for i, e in enumerate(lst) if e != 0]
return len(index)
vfunc = np.vectorize(count_one)
df['Value'] = vfunc(df['Binary_Month_List'])