Categories

# Pandas DataFrame.unstack() Changes Order of Row and Column Headers

I have run into the following problem of sorting the row and column headers.

Here is how to reproduce this:

``X =pd.DataFrame(dict(x=np.random.normal(size=100), y=np.random.normal(size=100))) A=pd.qcut(X['x'], [0,0.25,0.5,0.75,1.0]) #create a factor B=pd.qcut(X['y'], [0,0.25,0.5,0.75,1.0]) # create another factor g = X.groupby([A,B])['x'].mean() #do a two-way bucketing print g  #this gives the following and so far so good x                 y                [-2.315, -0.843]  [-2.58, -0.567]    -1.041167                   (-0.567, 0.0321]   -1.722926                   (0.0321, 0.724]    -1.245856                   (0.724, 3.478]     -1.240876 (-0.843, -0.228]  [-2.58, -0.567]    -0.576264                   (-0.567, 0.0321]   -0.501709                   (0.0321, 0.724]    -0.522697                   (0.724, 3.478]     -0.506259 (-0.228, 0.382]   [-2.58, -0.567]     0.175768                   (-0.567, 0.0321]    0.214353                   (0.0321, 0.724]     0.113650                   (0.724, 3.478]     -0.013758 (0.382, 2.662]    [-2.58, -0.567]     0.983807                   (-0.567, 0.0321]    1.214640                   (0.0321, 0.724]     0.808608                   (0.724, 3.478]      1.515334 Name: x, dtype: float64 #Now let's make a two way table and here is the problem: HTML(g.unstack().to_html()) ``

This shows:

``y                 (-0.567, 0.0321]  (0.0321, 0.724]  (0.724, 3.478]  [-2.58, -0.567] x                                                                                    (-0.228, 0.382]           0.214353         0.113650       -0.013758         0.175768 (-0.843, -0.228]         -0.501709        -0.522697       -0.506259        -0.576264 (0.382, 2.662]            1.214640         0.808608        1.515334         0.983807 [-2.315, -0.843]         -1.722926        -1.245856       -1.240876        -1.041167 ``

Note how the headers are no longer sorted. I am wondering what is a good way to solve this problem so as to making interactive work easy.

To further track down where the problem is, run the following:

``g.unstack().columns ``

It gives me this:
Index([(-0.567, 0.0321], (0.0321, 0.724], (0.724, 3.478], [-2.58, -0.567]], dtype=object)

Now compare this with B.levels:

``B.levels Index([[-2.58, -0.567], (-0.567, 0.0321], (0.0321, 0.724], (0.724, 3.478]], dtype=object) ``

Obviously the order originally in Factor is lost.

Now to make the matter even worse, let’s do a multi-level cross table:

``g2 = X.groupby([A,B]).agg('mean') g3 = g2.stack().unstack(-2) HTML(g3.to_html()) ``

It shows something like:

``y                   (-0.567, 0.0321]  (0.0321, 0.724]  (0.724, 3.478]   x                                                                        (-0.228, 0.382]  x          0.214353         0.113650       -0.013758                     y         -0.293465         0.321836        1.180369    (-0.843, -0.228] x         -0.501709        -0.522697       -0.506259                     y         -0.204811         0.324571        1.167005    (0.382, 2.662]   x          1.214640         0.808608        1.515334                     y         -0.195446         0.161198        1.074532    [-2.315, -0.843] x         -1.722926        -1.245856       -1.240876                     y         -0.392896         0.335471        1.730513   ``

With both the row and column labels sorted incorrectly.

Thanks.