Categories
greatest-n-per-group group-by mysql ranking sql

Using LIMIT within GROUP BY to get N results per group?

485

The following query:

SELECT
year, id, rate
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC

yields:

year    id  rate
2006    p01 8
2003    p01 7.4
2008    p01 6.8
2001    p01 5.9
2007    p01 5.3
2009    p01 4.4
2002    p01 3.9
2004    p01 3.5
2005    p01 2.1
2000    p01 0.8
2001    p02 12.5
2004    p02 12.4
2002    p02 12.2
2003    p02 10.3
2000    p02 8.7
2006    p02 4.6
2007    p02 3.3

What I’d like is only the top 5 results for each id:

2006    p01 8
2003    p01 7.4
2008    p01 6.8
2001    p01 5.9
2007    p01 5.3
2001    p02 12.5
2004    p02 12.4
2002    p02 12.2
2003    p02 10.3
2000    p02 8.7

Is there a way to do this using some kind of LIMIT like modifier that works within the GROUP BY?

3

  • 13

    This can be done in MySQL, but it is not as simple as adding a LIMIT clause. Here is an article that explains the problem in detail: How to select the first/least/max row per group in SQL It’s a good article – he introduces an elegant but naïve solution to the “Top N per group” problem, and then gradually improves on it.

    – danben

    Jan 25, 2010 at 1:31


  • SELECT * FROM (SELECT year, id, rate FROM h WHERE year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2) GROUP BY id, year ORDER BY id, rate DESC) LIMIT 5

    – Mixcoatl

    Feb 3, 2016 at 20:05

  • This problem has been resolved by introducing sql windowing functions as it is explained in this answer. stackoverflow.com/a/38854846/2723942

    Feb 1 at 14:52


147

You want to find top n rows per group. This answer provides a generic solution using example data that is different from OP.

In MySQL 8 or later you can use the ROW_NUMBER, RANK or DENSE_RANK function depending on the exact definition of top 5. Below are the numbers generated by these functions based on value sorted descending. Notice how ties are handled:

pkidcatidvaluerow_numberrankdense_rank
1p01100*1*1*1
2p0190*2*2*2
3p0190*3*2*2
4p0180*4*4*3
5p0180*5*4*3
6p01806*4*3
7p017077*4
8p016088*5
9p0150996
10p014010107

Once you have chosen the function, use it like so:

SELECT *
FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY value DESC) AS n
    FROM t
) AS x
WHERE n <= 5

DB<>Fiddle


In MySQL 5.x you can use poor man’s rank over partition to achieve desired result: outer join the table with itself and for each row, count the number of rows before it (e.g. the before row could be the one with higher value).

The following will produce results similar to RANK function:

SELECT t.pkid, t.catid, t.value, COUNT(b.value) + 1 AS rank
FROM t
LEFT JOIN t AS b ON b.catid = t.catid AND b.value > t.value
GROUP BY t.pkid, t.catid, t.value
HAVING COUNT(b.value) + 1 <= 5
ORDER BY t.catid, t.value DESC, t.pkid

Make the following change to produce results similar to DENSE_RANK function:

COUNT(DISTINCT b.value)

Or make the following change to produce results similar to ROW_NUMBER function:

ON b.catid = t.catid AND (b.value > t.value OR b.value = t.value AND b.pkid < t.pkid)

DB<>Fiddle

2

  • 1

    +1 your answer rewrite is very valid, as modern MySQL/MariaDB versions follow the ANSI/ISO SQL 1992/1999/2003 standards more where it was never really was allowed to use ORDER BY in deliverd/subqueries like that.. That is the reason why modern MySQL/MariaDB versions ignore the ORDER BY in subquery without using LIMIT, i believe ANSI/ISO SQL Standards 2008/2011/2016 makes ORDER BY in deliverd/subqueries legal when using it in combination with FETCH FIRST n ROWS ONLY

    May 4, 2019 at 23:05


  • Great, this works perfectly … I came across another solution (stackoverflow.com/a/48593547) which uses a correlated subquery, that one also works and yields the same results, however I think your solution (with a join) runs a lot faster.

    – leo

    Mar 14, 2020 at 15:05

23

For me something like

SUBSTRING_INDEX(group_concat(col_name order by desired_col_order_name), ',', N) 

works perfectly. No complicated query.


for example: get top 1 for each group

SELECT 
    *
FROM
    yourtable
WHERE
    id IN (SELECT 
            SUBSTRING_INDEX(GROUP_CONCAT(id
                            ORDER BY rate DESC),
                        ',',
                        1) id
        FROM
            yourtable
        GROUP BY year)
ORDER BY rate DESC;

1

  • Your solution worked perfectly, but I also want to retrieve year and other columns from the subquery, How can we do that?

    – MaNn

    Jul 17, 2019 at 13:57