[Java] Splitting a comma-separated string but ignoring commas in quotes
Mar 24, 2018·
·
2 min read

Shenghui (Samuel) Gu
Sometimes we need to parse strings like this:
"1234567890","James",man,"New York, NY, USA"
And the output we need is as follows:
"1234567890"
"James"
man
"New York, NY, USA"
We can try the following code:
String line = "\"1234567890\",\"James\",man,\"New York, NY, USA\"";
String[] tokens = line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", -1);
Arrays.stream(tokens).forEach(System.out::println);
In other words: split on the comma only if that comma has zero, or an even number of quotes ahead of it.
A bit friendlier for the eyes:
String line = "\"1234567890\",\"James\",man,\"New York, NY, USA\"";
String otherThanQuote = " [^\"] ";
String quotedString = String.format(" \" %s* \" ", otherThanQuote);
String regex = String.format("(?x) "+ // enable comments, ignore white spaces
", "+ // match a comma
"(?= "+ // start positive look ahead
" (?: "+ // start non-capturing group 1
" %s* "+ // match 'otherThanQuote' zero or more times
" %s "+ // match 'quotedString'
" )* "+ // end group 1 and repeat it zero or more times
" %s* "+ // match 'otherThanQuote'
" $ "+ // match the end of the string
") ", // stop positive look ahead
otherThanQuote, quotedString, otherThanQuote);
String[] tokens = line.split(regex, -1);
Arrays.stream(tokens).forEach(System.out::println);
About split(String regex, int limit)
method.
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array.
- If the limit
n
is greater than zero then the pattern will be applied at mostn - 1
times, the array’s length will be no greater thann
, and the array’s last entry will contain all input beyond the last matched delimiter. - If
n
is non-positive then the pattern will be applied as many times as possible and the array can have any length. - If
n
is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
The string “boo:and:foo”, for example, yields the following results with these parameters:
Regex | Limit | Result |
---|---|---|
: | 2 | { “boo”, “and:foo” } |
: | 5 | { “boo”, “and”, “foo” } |
: | -2 | { “boo”, “and”, “foo” } |
o | 5 | { “b”, “”, “:and:f”, “”, "" } |
o | -2 | { “b”, “”, “:and:f”, “”, "" } |
o | 0 | { “b”, “”, “:and:f” } |