This class provides methods for creating and using regular expressions.
This class provides methods for creating and using regular expressions. It is based on the regular expressions of the JDK since 1.4.
Its main goal is to extract strings that match a pattern, or the subgroups that make it up. For that reason, it is usually used with for comprehensions and matching (see methods for examples).
A Regex is created from a java.lang.String representation of the regular expression pattern1. That pattern is compiled during construction, so frequently used patterns should be declared outside loops if performance is of concern. Possibly, they might be declared on a companion object, so that they need only to be initialized once.
The canonical way of creating regex patterns is by using the method r
, provided
on java.lang.String through an implicit conversion into
scala.collection.immutable.WrappedString. Using triple quotes to write these
strings avoids having to quote the backslash character (\
).
Using the constructor directly, on the other hand, makes it possible to declare names for subgroups in the pattern.
For example, both declarations below generate the same regex, but the second one associate names with the subgroups.
val dateP1 = """(\d\d\d\d)-(\d\d)-(\d\d)""".r val dateP2 = new scala.util.matching.Regex("""(\d\d\d\d)-(\d\d)-(\d\d)""", "year", "month", "day")
There are two ways of using a Regex
to find a pattern: calling methods on
Regex, such as findFirstIn
or findAllIn
, or using it as an extractor in a
pattern match.
Note that, when calling findAllIn
, the resulting scala.util.matching.Regex.MatchIterator
needs to be initialized (by calling hasNext
or next()
, or causing these to be
called) before information about a match can be retrieved:
val msg = "I love Scala" // val start = " ".r.findAllIn(msg).start // throws an IllegalStateException val matches = " ".r.findAllIn(msg) matches.hasNext // initializes the matcher val start = matches.start
When Regex is used as an extractor in a pattern match, note that it only succeeds if the whole text can be matched. For this reason, one usually calls a method to find the matching substrings, and then use it as an extractor to break match into subgroups.
As an example, the above patterns can be used like this:
val dateP1(year, month, day) = "2011-07-15" // val dateP1(year, month, day) = "Date 2011-07-15" // throws an exception at runtime val copyright: String = dateP1 findFirstIn "Date of this document: 2011-07-15" match { case Some(dateP1(year, month, day)) => "Copyright "+year case None => "No copyright" } val copyright: Option[String] = for { dateP1(year, month, day) <- dateP1 findFirstIn "Last modified 2011-07-15" } yield year def getYears(text: String): Iterator[String] = for (dateP1(year, _, _) <- dateP1 findAllIn text) yield year def getFirstDay(text: String): Option[String] = for (m <- dateP2 findFirstMatchIn text) yield m group "day"
Regex does not provide a method that returns a scala.Boolean. One can
use java.lang.String matches
method, or, if Regex
is preferred,
either ignore the return value or test the Option
for emptyness. For example:
def hasDate(text: String): Boolean = (dateP1 findFirstIn text).nonEmpty def printLinesWithDates(lines: Traversable[String]) { lines foreach { line => dateP1 findFirstIn line foreach { _ => println(line) } } }
There are also methods that can be used to replace the patterns on a text. The substitutions can be simple replacements, or more complex functions. For example:
val months = Map( 1 -> "Jan", 2 -> "Feb", 3 -> "Mar", 4 -> "Apr", 5 -> "May", 6 -> "Jun", 7 -> "Jul", 8 -> "Aug", 9 -> "Sep", 10 -> "Oct", 11 -> "Nov", 12 -> "Dec") import scala.util.matching.Regex.Match def reformatDate(text: String) = dateP2 replaceAllIn ( text, (m: Match) => "%s %s, %s" format (months(m group "month" toInt), m group "day", m group "year") )
You can use special pattern syntax constructs like (?idmsux-idmsux)
¹ to switch
various regex compilation options like CASE_INSENSITIVE
or UNICODE_CASE
.
1.1, 29/01/2008
¹ A detailed description is available in java.util.regex.Pattern.
java.util.regex.Pattern
This object defines inner classes that describe regex matches and helper objects.