Categories
aggregate dataframe group-by pandas python

How do I Pandas group-by to get sum?

351

I am using this data frame:

Fruit   Date      Name  Number
Apples  10/6/2016 Bob    7
Apples  10/6/2016 Bob    8
Apples  10/6/2016 Mike   9
Apples  10/7/2016 Steve 10
Apples  10/7/2016 Bob    1
Oranges 10/7/2016 Bob    2
Oranges 10/6/2016 Tom   15
Oranges 10/6/2016 Mike  57
Oranges 10/6/2016 Bob   65
Oranges 10/7/2016 Tony   1
Grapes  10/7/2016 Bob    1
Grapes  10/7/2016 Tom   87
Grapes  10/7/2016 Bob   22
Grapes  10/7/2016 Bob   12
Grapes  10/7/2016 Tony  15

I want to aggregate this by Name and then by Fruit to get a total number of Fruit per Name. For example:

Bob,Apples,16

I tried grouping by Name and Fruit but how do I get the total number of Fruit?

1

397

Use GroupBy.sum:

df.groupby(['Fruit','Name']).sum()

Out[31]: 
               Number
Fruit   Name         
Apples  Bob        16
        Mike        9
        Steve      10
Grapes  Bob        35
        Tom        87
        Tony       15
Oranges Bob        67
        Mike       57
        Tom        15
        Tony        1

7

  • 158

    How can pandas knows that I want to sum the col named Number ?

    – Kingname

    Oct 23, 2017 at 12:32

  • 25

    @Kingname it’s the last column left if you take out NAME and FRUIT. if you add 2 columns left, it would sum both columns

    – Steven G

    Oct 23, 2017 at 16:51

  • 36

    How to specify which column to sum?

    – tgdn

    Nov 5, 2019 at 14:38

  • 129

    @tgdn df.groupby([‘Name’, ‘Fruit’])[‘Number’].sum()

    – Steven G

    Nov 8, 2019 at 17:34

  • 8

    @StevenG For the answer provided to sum up a specific column, the output comes out as a Pandas series instead of Dataframe. From the comment by Jakub Kukul (in below answer), we can use double square brackets around ‘Number’ to get a Dataframe.

    Jan 16, 2020 at 10:41


247

Also you can use agg function,

df.groupby(['Name', 'Fruit'])['Number'].agg('sum')

3

  • 3

    This differs from the accepted answer in that this returns a Series whereas the other returns a GroupBy object.

    May 8, 2019 at 15:53

  • 45

    @GaurangTandon to get DataFrame object instead (like in the accepted answer), use double square brackets around 'Number', i.e.: df.groupby(['Name', 'Fruit'])[['Number']].agg('sum')

    Aug 21, 2019 at 17:05


  • 1

    Very helpful in cleaning up badly-encoded query report.

    Oct 9, 2019 at 20:39

182

If you want to keep the original columns Fruit and Name, use reset_index(). Otherwise Fruit and Name will become part of the index.

df.groupby(['Fruit','Name'])['Number'].sum().reset_index()

Fruit   Name       Number
Apples  Bob        16
Apples  Mike        9
Apples  Steve      10
Grapes  Bob        35
Grapes  Tom        87
Grapes  Tony       15
Oranges Bob        67
Oranges Mike       57
Oranges Tom        15
Oranges Tony        1

As seen in the other answers:

df.groupby(['Fruit','Name'])['Number'].sum()

               Number
Fruit   Name         
Apples  Bob        16
        Mike        9
        Steve      10
Grapes  Bob        35
        Tom        87
        Tony       15
Oranges Bob        67
        Mike       57
        Tom        15
        Tony        1