Categories
c++ split string

How do I iterate over the words of a string?

3270

How do I iterate over the words of a string composed of words separated by whitespace?

Note that I’m not interested in C string functions or that kind of character manipulation/access. I prefer elegance over efficiency. My current solution:

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main() {
    string s = "Somewhere down the road";
    istringstream iss(s);

    do {
        string subs;
        iss >> subs;
        cout << "Substring: " << subs << endl;
    } while (iss);
}

8

  • 676

    Dude… Elegance is just a fancy way to say “efficiency-that-looks-pretty” in my book. Don’t shy away from using C functions and quick methods to accomplish anything just because it is not contained within a template 😉

    – user19302

    Oct 25, 2008 at 9:04

  • 17

    while (iss) { string subs; iss >> subs; cout << "Substring: " << sub << endl; }

    – pyon

    Sep 29, 2009 at 15:47

  • 26

    @Eduardo: that’s wrong too… you need to test iss between trying to stream another value and using that value, i.e. string sub; while (iss >> sub) cout << "Substring: " << sub << '\n';

    Apr 11, 2012 at 2:24

  • 13

    Various options in C++ to do this by default: cplusplus.com/faq/sequences/strings/split

    – hB0

    Oct 31, 2013 at 0:23

  • 24

    There’s more to elegance than just pretty efficiency. Elegant attributes include low line count and high legibility. IMHO Elegance is not a proxy for efficiency but maintainability.

    – Matt

    Mar 31, 2017 at 13:22

1490

For what it’s worth, here’s another way to extract tokens from an input string, relying only on standard library facilities. It’s an example of the power and elegance behind the design of the STL.

#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>

int main() {
    using namespace std;
    string sentence = "And I feel fine...";
    istringstream iss(sentence);
    copy(istream_iterator<string>(iss),
         istream_iterator<string>(),
         ostream_iterator<string>(cout, "\n"));
}

Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.

vector<string> tokens;
copy(istream_iterator<string>(iss),
     istream_iterator<string>(),
     back_inserter(tokens));

… or create the vector directly:

vector<string> tokens{istream_iterator<string>{iss},
                      istream_iterator<string>{}};

28

  • 176

    Is it possible to specify a delimiter for this? Like for instance splitting on commas?

    – l3dx

    Aug 6, 2009 at 11:49

  • 17

    @Jonathan: \n is not the delimiter in this case, it’s the deliminer for outputting to cout.

    – huy

    Feb 3, 2010 at 12:37

  • 795

    This is a poor solution as it doesn’t take any other delimiter, therefore not scalable and not maintable.

    Jan 10, 2011 at 3:57

  • 41

    Actually, this can work just fine with other delimiters (though doing some is somewhat ugly). You create a ctype facet that classifies the desired delimiters as whitespace, create a locale containing that facet, then imbue the stringstream with that locale before extracting strings.

    Dec 19, 2012 at 20:30

  • 63

    @Kinderchocolate “The string can be assumed to be composed of words separated by whitespace” – Hmm, doesn’t sound like a poor solution to the question’s problem. “not scalable and not maintable” – Hah, nice one.

    Feb 7, 2013 at 15:08

1490

For what it’s worth, here’s another way to extract tokens from an input string, relying only on standard library facilities. It’s an example of the power and elegance behind the design of the STL.

#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>

int main() {
    using namespace std;
    string sentence = "And I feel fine...";
    istringstream iss(sentence);
    copy(istream_iterator<string>(iss),
         istream_iterator<string>(),
         ostream_iterator<string>(cout, "\n"));
}

Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.

vector<string> tokens;
copy(istream_iterator<string>(iss),
     istream_iterator<string>(),
     back_inserter(tokens));

… or create the vector directly:

vector<string> tokens{istream_iterator<string>{iss},
                      istream_iterator<string>{}};

28

  • 176

    Is it possible to specify a delimiter for this? Like for instance splitting on commas?

    – l3dx

    Aug 6, 2009 at 11:49

  • 17

    @Jonathan: \n is not the delimiter in this case, it’s the deliminer for outputting to cout.

    – huy

    Feb 3, 2010 at 12:37

  • 795

    This is a poor solution as it doesn’t take any other delimiter, therefore not scalable and not maintable.

    Jan 10, 2011 at 3:57

  • 41

    Actually, this can work just fine with other delimiters (though doing some is somewhat ugly). You create a ctype facet that classifies the desired delimiters as whitespace, create a locale containing that facet, then imbue the stringstream with that locale before extracting strings.

    Dec 19, 2012 at 20:30

  • 63

    @Kinderchocolate “The string can be assumed to be composed of words separated by whitespace” – Hmm, doesn’t sound like a poor solution to the question’s problem. “not scalable and not maintable” – Hah, nice one.

    Feb 7, 2013 at 15:08

867

A possible solution using Boost might be:

#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));

This approach might be even faster than the stringstream approach. And since this is a generic template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of delimiters.

See the documentation for details.

18

  • 36

    Speed is irrelevant here, as both of these cases are much slower than a strtok-like function.

    – Tom

    Mar 1, 2009 at 16:51

  • 53

    And for those who don’t already have boost… bcp copies over 1,000 files for this 🙂

    Jun 9, 2010 at 20:12

  • 13

    Warning, when given an empty string (“”), this method return a vector containing the “” string. So add an “if (!string_to_split.empty())” before the split.

    – Offirmo

    Oct 11, 2011 at 13:10

  • 29

    @Ian Embedded developers aren’t all using boost.

    Jan 31, 2012 at 18:23

  • 33

    as an addendum: I use boost only when I must, normally I prefer to add to my own library of code which is standalone and portable so that I can achieve small precise specific code, which accomplishes a given aim. That way the code is non-public, performant, trivial and portable. Boost has its place but I would suggest that its a bit of overkill for tokenising strings: you wouldnt have your whole house transported to an engineering firm to get a new nail hammered into the wall to hang a picture…. they may do it extremely well, but the prosare by far outweighed by the cons.

    – GMasucci

    May 22, 2013 at 8:19