Regular expressions can be used to break a string into fields. The
split function does this, and the
join function glues the pieces back together.
split function takes a regular expression and a string, and looks for all occurrences of the regular expression within that string. The parts of the string that don't match the regular expression are returned in sequence as a list of values. For example, here's something to parse colon-separated fields, such as in UNIX /etc/passwd files:
$line = "merlyn::118:10:Randal:/home/merlyn:/usr/bin/perl"; @fields = split(/:/,$line); # split $line, using : as delimiter # now @fields is ("merlyn","","118","10","Randal", # "/home/merlyn","/usr/bin/perl")
@fields = split(/:+/, $line);
This matches one or more adjacent colons together, so there is no empty second field.
One common string to split is the
$_ variable, and that turns out to be the default:
$_ = "some string"; @words = split(/ /); # same as @words = split(/ /, $_);
For this split, consecutive spaces in the string to be split will cause null fields (empty strings) in the result. A better pattern would be
/ +/, or ideally
/\s+/, which matches one or more whitespace characters together. In fact, this pattern is the default pattern, so if you're splitting the
$_ variable on whitespace, you can use all the defaults and merely say:
 Actually, the " " string is the default pattern, and this will cause leading whitespace to be ignored, but that's still close enough for this discussion.
@words = split; # same as @words = split(/\s+/, $_);
Empty trailing fields do not normally become part of the list. This is not generally a concern. A solution like this,
$line = "merlyn::118:10:Randal:/home/merlyn:"; ($name,$password,$uid,$gid,$gcos,$home,$shell) = split(/:/,$line); # split $line, using : as delimiter
$bigstring = join($glue,@list);
For example, to rebuild the password line, try something like:
$outline = join(":", @fields);
Note that the glue string is not a regular expression - just an ordinary string of zero or more characters.
If you need to get glue ahead of every item instead of just between items, a simple cheat suffices:
$result = join ("+", "", @fields);
Here, the extra
"" is treated as an empty element, to be glued together with the first data element of
@fields. This results in glue ahead of every element. Similarly, you can get trailing glue with an empty element at the end of the list, like so:
$output = join ("\n", @data, "");