Categories
array-broadcasting pandas python

When does Pandas default to broadcasting Series and Dataframes?

I came across something curious (to me) while trying to answer this question.

Say I want to compare a series of shape (10,) to a df of shape (10,10):

np.random.seed(0)
my_ser = pd.Series(np.random.randint(0, 100, size=10))
my_df = pd.DataFrame(np.random.randint(0, 100, size=100).reshape(10,10))
my_ser > 10 * my_df

yields, as expected, a matrix of the shape of the df (10,10). The comparison seems to be row-wise.

However consider this case:

df = pd.DataFrame({'cell1':[0.006209, 0.344955, 0.004521, 0, 0.018931, 0.439725, 0.013195, 0.009045, 0, 0.02614, 0],
'cell2':[0.048043, 0.001077, 0,0.010393, 0.031546, 0.287264, 0.016732, 0.030291, 0.016236, 0.310639,0],
'cell3':[0,0,0.020238, 0, 0.03811, 0.579348, 0.005906, 0,0,0.068352, 0.030165],
'cell4':[0.016139, 0.009359, 0,0,0.025449, 0.47779, 0, 0.01282, 0.005107, 0.004846, 0],
'cell5': [0,0,0,0.012075, 0.031668, 0.520258, 0,0,0,2.728218, 0.013418]})
i = 0
df.iloc[:,i].shape
>(11,)
(10 * df.drop(df.columns[i], axis=1)).shape
>(11,4)
(df.iloc[:,i] > (10 * df.drop(df.columns[i], axis=1))).shape
>(11,15)

As far as I can tell, here Pandas broadcasts the Series with the df. Why is this?

The desired behaviour can be gotten with:

(10 * df.drop(df.columns[i], axis=1)).lt(df.iloc[:,i], axis=0).shape
>(11,4)
pd.__version__
'0.24.0'