Categories
delimiter postgresql postgresql-8.4 split sql

Split comma separated column data into additional columns

86

I have comma separated data in a column:

Column 
------- 
a,b,c,d 

I want to split the comma separated data into multiple columns to get this output:

Column1  Column2 Column3 Column4 
-------  ------- ------- -------
a        b       c       d 

How can this be achieved?

    94

    If the number of fields in the CSV is constant then you could do something like this:

    select a[1], a[2], a[3], a[4]
    from (
        select regexp_split_to_array('a,b,c,d', ',')
    ) as dt(a)
    

    For example:

    => select a[1], a[2], a[3], a[4] from (select regexp_split_to_array('a,b,c,d', ',')) as dt(a);
     a | a | a | a 
    ---+---+---+---
     a | b | c | d
    (1 row)
    

    If the number of fields in the CSV is not constant then you could get the maximum number of fields with something like this:

    select max(array_length(regexp_split_to_array(csv, ','), 1))
    from your_table
    

    and then build the appropriate a[1], a[2], ..., a[M] column list for your query. So if the above gave you a max of 6, you’d use this:

    select a[1], a[2], a[3], a[4], a[5], a[6]
    from (
        select regexp_split_to_array(csv, ',')
        from your_table
    ) as dt(a)
    

    You could combine those two queries into a function if you wanted.

    For example, give this data (that’s a NULL in the last row):

    => select * from csvs;
         csv     
    -------------
     1,2,3
     1,2,3,4
     1,2,3,4,5,6
    
    (4 rows)
    
    => select max(array_length(regexp_split_to_array(csv, ','), 1)) from csvs;
     max 
    -----
       6
    (1 row)
    
    => select a[1], a[2], a[3], a[4], a[5], a[6] from (select regexp_split_to_array(csv, ',') from csvs) as dt(a);
     a | a | a | a | a | a 
    ---+---+---+---+---+---
     1 | 2 | 3 |   |   | 
     1 | 2 | 3 | 4 |   | 
     1 | 2 | 3 | 4 | 5 | 6
       |   |   |   |   | 
    (4 rows)
    

    Since your delimiter is a simple fixed string, you could also use string_to_array instead of regexp_split_to_array:

    select ...
    from (
        select string_to_array(csv, ',')
        from csvs
    ) as dt(a);
    

    Thanks to Michael for the reminder about this function.

    You really should redesign your database schema to avoid the CSV column if at all possible. You should be using an array column or a separate table instead.

    7

    • 15

      Consider using string_to_array instead of regexp_split_to_array; it should be faster since it doesn’t have the overhead of regular expression processing.

      – Michael

      Sep 15, 2014 at 6:46

    • 1

      @Michael You could add that as another answer if you’d like. Or I could add string_to_array as an option in mine, not sure how I missed that.

      Sep 16, 2014 at 4:53

    • 2

      @DennisBauszus: Nice. Did you check split_part too? Just curious.

      Mar 15, 2016 at 19:52

    • 1

      Smashing. Faster by factor 3 over string_to_array. Should be marked as the answer. Note to myself: Must read all the answers.

      Mar 16, 2016 at 14:13

    • 1

      @DennisBauszus Wouldn’t argue with that. It is usually a good idea to look for what Erwin Brandsetter or Craig Ringer have to say on PostgreSQL questions, they really know their stuff and their answers tend to be quite thorough.

      Mar 16, 2016 at 18:19

    94

    If the number of fields in the CSV is constant then you could do something like this:

    select a[1], a[2], a[3], a[4]
    from (
        select regexp_split_to_array('a,b,c,d', ',')
    ) as dt(a)
    

    For example:

    => select a[1], a[2], a[3], a[4] from (select regexp_split_to_array('a,b,c,d', ',')) as dt(a);
     a | a | a | a 
    ---+---+---+---
     a | b | c | d
    (1 row)
    

    If the number of fields in the CSV is not constant then you could get the maximum number of fields with something like this:

    select max(array_length(regexp_split_to_array(csv, ','), 1))
    from your_table
    

    and then build the appropriate a[1], a[2], ..., a[M] column list for your query. So if the above gave you a max of 6, you’d use this:

    select a[1], a[2], a[3], a[4], a[5], a[6]
    from (
        select regexp_split_to_array(csv, ',')
        from your_table
    ) as dt(a)
    

    You could combine those two queries into a function if you wanted.

    For example, give this data (that’s a NULL in the last row):

    => select * from csvs;
         csv     
    -------------
     1,2,3
     1,2,3,4
     1,2,3,4,5,6
    
    (4 rows)
    
    => select max(array_length(regexp_split_to_array(csv, ','), 1)) from csvs;
     max 
    -----
       6
    (1 row)
    
    => select a[1], a[2], a[3], a[4], a[5], a[6] from (select regexp_split_to_array(csv, ',') from csvs) as dt(a);
     a | a | a | a | a | a 
    ---+---+---+---+---+---
     1 | 2 | 3 |   |   | 
     1 | 2 | 3 | 4 |   | 
     1 | 2 | 3 | 4 | 5 | 6
       |   |   |   |   | 
    (4 rows)
    

    Since your delimiter is a simple fixed string, you could also use string_to_array instead of regexp_split_to_array:

    select ...
    from (
        select string_to_array(csv, ',')
        from csvs
    ) as dt(a);
    

    Thanks to Michael for the reminder about this function.

    You really should redesign your database schema to avoid the CSV column if at all possible. You should be using an array column or a separate table instead.

    7

    • 15

      Consider using string_to_array instead of regexp_split_to_array; it should be faster since it doesn’t have the overhead of regular expression processing.

      – Michael

      Sep 15, 2014 at 6:46

    • 1

      @Michael You could add that as another answer if you’d like. Or I could add string_to_array as an option in mine, not sure how I missed that.

      Sep 16, 2014 at 4:53

    • 2

      @DennisBauszus: Nice. Did you check split_part too? Just curious.

      Mar 15, 2016 at 19:52

    • 1

      Smashing. Faster by factor 3 over string_to_array. Should be marked as the answer. Note to myself: Must read all the answers.

      Mar 16, 2016 at 14:13

    • 1

      @DennisBauszus Wouldn’t argue with that. It is usually a good idea to look for what Erwin Brandsetter or Craig Ringer have to say on PostgreSQL questions, they really know their stuff and their answers tend to be quite thorough.

      Mar 16, 2016 at 18:19

    -1

    You can use split function.

        SELECT 
        (select top 1 item from dbo.Split(FullName,',') where id=1 ) Column1,
        (select top 1 item from dbo.Split(FullName,',') where id=2 ) Column2,
        (select top 1 item from dbo.Split(FullName,',') where id=3 ) Column3,
        (select top 1 item from dbo.Split(FullName,',') where id=4 ) Column4,
        FROM MyTbl
    

    1

    • 1

      The question refers to PostgreSQL – that’s SQL Server syntax?

      – Vérace

      Aug 30, 2021 at 9:48