scala.util.matching

Regex

class Regex extends Serializable

A regular expression is used to determine whether a string matches a pattern and, if it does, to extract or transform the parts that match.

This class delegates to the java.util.regex package of the Java Platform. See the documentation for java.util.regex.Pattern for details about the regular expression syntax for pattern strings.

An instance of Regex represents a compiled regular expression pattern. Since compilation is expensive, frequently used Regexes should be constructed once, outside of loops and perhaps in a companion object.

The canonical way to create a Regex is by using the method r, provided implicitly for strings:

val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r

Since escapes are not processed in multi-line string literals, using triple quotes avoids having to escape the backslash character, so that "\\d" can be written """\d""".

To extract the capturing groups when a Regex is matched, use it as an extractor in a pattern match:

"2004-01-20" match {
  case date(year, month, day) => s"$year was a good year for PLs."
}

To check only whether the Regex matches, ignoring any groups, use a sequence wildcard:

"2004-01-20" match {
  case date(_*) => "It's a date!"
}

That works because a Regex extractor produces a sequence of strings. Extracting only the year from a date could also be expressed with a sequence wildcard:

"2004-01-20" match {
  case date(year, _*) => s"$year was a good year for PLs."
}

In a pattern match, Regex normally matches the entire input. However, an unanchored Regex finds the pattern anywhere in the input.

val embeddedDate = date.unanchored
"Date: 2004-01-20 17:25:18 GMT (10 years, 28 weeks, 5 days, 17 hours and 51 minutes ago)" match {
  case embeddedDate("2004", "01", "20") => "A Scala is born."
}

To find or replace matches of the pattern, use the various find and replace methods. There is a flavor of each method that produces matched strings and another that produces Match objects.

For example, pattern matching with an unanchored Regex, as in the previous example, is the same as using findFirstMatchIn, except that the findFirst methods return an Option, or None for no match:

val dates = "Important dates in history: 2004-01-20, 1958-09-05, 2010-10-06, 2011-07-15"
val firstDate = date findFirstIn dates getOrElse "No date found."
val firstYear = for (m <- date findFirstMatchIn dates) yield m group 1

To find all matches:

val allYears = for (m <- date findAllMatchIn dates) yield m group 1

But findAllIn returns a special iterator of strings that can be queried for the MatchData of the last match:

val mi = date findAllIn dates
val oldies = mi filter (_ => (mi group 1).toInt < 1960) map (s => s"$s: An oldie but goodie.")

Note that findAllIn finds matches that don't overlap. (See findAllIn for more examples.)

val num = """(\d+)""".r
val all = (num findAllIn "123").toList  // List("123"), not List("123", "23", "3")

Text replacement can be performed unconditionally or as a function of the current match:

val redacted    = date replaceAllIn (dates, "XXXX-XX-XX")
val yearsOnly   = date replaceAllIn (dates, m => m group 1)
val months      = (0 to 11) map { i => val c = Calendar.getInstance; c.set(2014, i, 1); f"$c%tb" }
val reformatted = date replaceAllIn (dates, _ match { case date(y,m,d) => f"${months(m.toInt - 1)} $d, $y" })

Pattern matching the Match against the Regex that created it does not reapply the Regex. In the expression for reformatted, each date match is computed once. But it is possible to apply a Regex to a Match resulting from a different pattern:

val docSpree = """2011(?:-\d{2}){2}""".r
val docView  = date replaceAllIn (dates, _ match {
  case docSpree() => "Historic doc spree!"
  case _          => "Something else happened"
})
Self Type
Regex
Annotations
@SerialVersionUID()
Source
Regex.scala
Version

1.1, 29/01/2008

See also

java.util.regex.Pattern

Linear Supertypes
Serializable, java.io.Serializable, AnyRef, Any
Known Subclasses
Type Hierarchy Learn more about scaladoc diagrams
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. Regex
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
Implicitly
  1. by any2stringadd
  2. by StringFormat
  3. by Ensuring
  4. by ArrowAssoc
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Regex(regex: String, groupNames: String*)

    Compile a regular expression, supplied as a string, into a pattern that can be matched against inputs.

    Compile a regular expression, supplied as a string, into a pattern that can be matched against inputs.

    If group names are supplied, they can be used this way:

    val namedDate  = new Regex("""(\d\d\d\d)-(\d\d)-(\d\d)""", "year", "month", "day")
    val namedYears = for (m <- namedDate findAllMatchIn dates) yield m group "year"

    This constructor does not support options as flags, which must be supplied as inline flags in the pattern string: (?idmsux-idmsux).

    regex

    The regular expression to compile.

    groupNames

    Names of capturing groups.

Value Members

  1. final def !=(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Definition Classes
    AnyRef → Any
  3. def +(other: String): String

    Implicit information
    This member is added by an implicit conversion from Regex to any2stringadd[Regex] performed by method any2stringadd in scala.Predef.
    Definition Classes
    any2stringadd
  4. def ->[B](y: B): (Regex, B)

    Implicit information
    This member is added by an implicit conversion from Regex to ArrowAssoc[Regex] performed by method ArrowAssoc in scala.Predef.
    Definition Classes
    ArrowAssoc
    Annotations
    @inline()
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  6. def anchored: Regex

  7. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def ensuring(cond: (Regex) ⇒ Boolean, msg: ⇒ Any): Regex

    Implicit information
    This member is added by an implicit conversion from Regex to Ensuring[Regex] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  10. def ensuring(cond: (Regex) ⇒ Boolean): Regex

    Implicit information
    This member is added by an implicit conversion from Regex to Ensuring[Regex] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  11. def ensuring(cond: Boolean, msg: ⇒ Any): Regex

    Implicit information
    This member is added by an implicit conversion from Regex to Ensuring[Regex] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  12. def ensuring(cond: Boolean): Regex

    Implicit information
    This member is added by an implicit conversion from Regex to Ensuring[Regex] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  13. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  14. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  15. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. def findAllIn(source: CharSequence): MatchIterator

    Return all non-overlapping matches of this Regex in the given character sequence as a scala.util.matching.Regex.MatchIterator, which is a special scala.collection.Iterator that returns the matched strings but can also be queried for more data about the last match, such as capturing groups and start position.

    Return all non-overlapping matches of this Regex in the given character sequence as a scala.util.matching.Regex.MatchIterator, which is a special scala.collection.Iterator that returns the matched strings but can also be queried for more data about the last match, such as capturing groups and start position.

    A MatchIterator can also be converted into an iterator that returns objects of type scala.util.matching.Regex.Match, such as is normally returned by findAllMatchIn.

    Where potential matches overlap, the first possible match is returned, followed by the next match that follows the input consumed by the first match:

    val hat  = "hat[^a]+".r
    val hathaway = "hathatthattthatttt"
    val hats = (hat findAllIn hathaway).toList                     // List(hath, hattth)
    val pos  = (hat findAllMatchIn hathaway map (_.start)).toList  // List(0, 7)

    To return overlapping matches, it is possible to formulate a regular expression with lookahead (?=) that does not consume the overlapping region.

    val madhatter = "(h)(?=(at[^a]+))".r
    val madhats   = (madhatter findAllMatchIn hathaway map {
      case madhatter(x,y) => s"$x$y"
    }).toList                                       // List(hath, hatth, hattth, hatttt)

    Attempting to retrieve match information before performing the first match or after exhausting the iterator results in java.lang.IllegalStateException. See scala.util.matching.Regex.MatchIterator for details.

    source

    The text to match against.

    returns

    A scala.util.matching.Regex.MatchIterator of matched substrings.

    Example:
    1. for (words <- """\w+""".r findAllIn "A simple example.") yield words
  17. def findAllMatchIn(source: CharSequence): Iterator[Match]

    Return all non-overlapping matches of this regexp in given character sequence as a scala.collection.Iterator of scala.util.matching.Regex.Match.

    Return all non-overlapping matches of this regexp in given character sequence as a scala.collection.Iterator of scala.util.matching.Regex.Match.

    source

    The text to match against.

    returns

    A scala.collection.Iterator of scala.util.matching.Regex.Match for all matches.

    Example:
    1. for (words <- """\w+""".r findAllMatchIn "A simple example.") yield words.start
  18. def findFirstIn(source: CharSequence): Option[String]

    Return an optional first matching string of this Regex in the given character sequence, or None if there is no match.

    Return an optional first matching string of this Regex in the given character sequence, or None if there is no match.

    source

    The text to match against.

    returns

    An scala.Option of the first matching string in the text.

    Example:
    1. """\w+""".r findFirstIn "A simple example." foreach println // prints "A"
  19. def findFirstMatchIn(source: CharSequence): Option[Match]

    Return an optional first match of this Regex in the given character sequence, or None if it does not exist.

    Return an optional first match of this Regex in the given character sequence, or None if it does not exist.

    If the match is successful, the scala.util.matching.Regex.Match can be queried for more data.

    source

    The text to match against.

    returns

    A scala.Option of scala.util.matching.Regex.Match of the first matching string in the text.

    Example:
    1. ("""[a-z]""".r findFirstMatchIn "A simple example.") map (_.start) // returns Some(2), the index of the first match in the text
  20. def findPrefixMatchOf(source: CharSequence): Option[Match]

    Return an optional match of this Regex at the beginning of the given character sequence, or None if it matches no prefix of the character sequence.

    Return an optional match of this Regex at the beginning of the given character sequence, or None if it matches no prefix of the character sequence.

    Unlike findFirstMatchIn, this method will only return a match at the beginning of the input.

    source

    The text to match against.

    returns

    A scala.Option of the scala.util.matching.Regex.Match of the matched string.

    Example:
    1. """\w+""".r findPrefixMatchOf "A simple example." map (_.after) // returns Some(" simple example.")
  21. def findPrefixOf(source: CharSequence): Option[String]

    Return an optional match of this Regex at the beginning of the given character sequence, or None if it matches no prefix of the character sequence.

    Return an optional match of this Regex at the beginning of the given character sequence, or None if it matches no prefix of the character sequence.

    Unlike findFirstIn, this method will only return a match at the beginning of the input.

    source

    The text to match against.

    returns

    A scala.Option of the matched prefix.

    Example:
    1. """\p{Lower}""".r findPrefixOf "A simple example." // returns None, since the text does not begin with a lowercase letter
  22. def formatted(fmtstr: String): String

    Returns string formatted according to given format string.

    Returns string formatted according to given format string. Format strings are as for String.format (@see java.lang.String.format).

    Implicit information
    This member is added by an implicit conversion from Regex to StringFormat[Regex] performed by method StringFormat in scala.Predef.
    Definition Classes
    StringFormat
    Annotations
    @inline()
  23. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  24. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  25. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  26. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  27. final def notify(): Unit

    Definition Classes
    AnyRef
  28. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  29. val pattern: Pattern

    The compiled pattern

  30. def regex: String

  31. def replaceAllIn(target: CharSequence, replacer: (Match) ⇒ String): String

    Replaces all matches using a replacer function.

    Replaces all matches using a replacer function. The replacer function takes a scala.util.matching.Regex.Match so that extra information can be obtained from the match. For example:

    import scala.util.matching.Regex
    val datePattern = new Regex("""(\d\d\d\d)-(\d\d)-(\d\d)""", "year", "month", "day")
    val text = "From 2011-07-15 to 2011-07-17"
    val repl = datePattern replaceAllIn (text, m => s"${m group "month"}/${m group "day"}")

    In the replacement String, a dollar sign ($) followed by a number will be interpreted as a reference to a group in the matched pattern, with numbers 1 through 9 corresponding to the first nine groups, and 0 standing for the whole match. Any other character is an error. The backslash (\) character will be interpreted as an escape character and can be used to escape the dollar sign. Use Regex.quoteReplacement to escape these characters.

    target

    The string to match.

    replacer

    The function which maps a match to another string.

    returns

    The target string after replacements.

  32. def replaceAllIn(target: CharSequence, replacement: String): String

    Replaces all matches by a string.

    Replaces all matches by a string.

    In the replacement String, a dollar sign ($) followed by a number will be interpreted as a reference to a group in the matched pattern, with numbers 1 through 9 corresponding to the first nine groups, and 0 standing for the whole match. Any other character is an error. The backslash (\) character will be interpreted as an escape character and can be used to escape the dollar sign. Use Regex.quoteReplacement to escape these characters.

    target

    The string to match

    replacement

    The string that will replace each match

    returns

    The resulting string

    Example:
    1. """\d+""".r replaceAllIn ("July 15", "") // returns "July "
  33. def replaceFirstIn(target: CharSequence, replacement: String): String

    Replaces the first match by a string.

    Replaces the first match by a string.

    In the replacement String, a dollar sign ($) followed by a number will be interpreted as a reference to a group in the matched pattern, with numbers 1 through 9 corresponding to the first nine groups, and 0 standing for the whole match. Any other character is an error. The backslash (\) character will be interpreted as an escape character and can be used to escape the dollar sign. Use Regex.quoteReplacement to escape these characters.

    target

    The string to match

    replacement

    The string that will replace the match

    returns

    The resulting string

  34. def replaceSomeIn(target: CharSequence, replacer: (Match) ⇒ Option[String]): String

    Replaces some of the matches using a replacer function that returns an scala.Option.

    Replaces some of the matches using a replacer function that returns an scala.Option. The replacer function takes a scala.util.matching.Regex.Match so that extra information can be btained from the match. For example:

    import scala.util.matching.Regex._
    
    val vars = Map("x" -> "a var", "y" -> """some $ and \ signs""")
    val text = "A text with variables %x, %y and %z."
    val varPattern = """%(\w+)""".r
    val mapper = (m: Match) => vars get (m group 1) map (quoteReplacement(_))
    val repl = varPattern replaceSomeIn (text, mapper)

    In the replacement String, a dollar sign ($) followed by a number will be interpreted as a reference to a group in the matched pattern, with numbers 1 through 9 corresponding to the first nine groups, and 0 standing for the whole match. Any other character is an error. The backslash (\) character will be interpreted as an escape character and can be used to escape the dollar sign. Use Regex.quoteReplacement to escape these characters.

    target

    The string to match.

    replacer

    The function which optionally maps a match to another string.

    returns

    The target string after replacements.

  35. def runMatcher(m: Matcher): Boolean

    Attributes
    protected
  36. def split(toSplit: CharSequence): Array[String]

    Splits the provided character sequence around matches of this regexp.

    Splits the provided character sequence around matches of this regexp.

    toSplit

    The character sequence to split

    returns

    The array of strings computed by splitting the input around matches of this regexp

  37. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  38. def toString(): String

    The string defining the regular expression

    The string defining the regular expression

    Definition Classes
    Regex → AnyRef → Any
  39. def unanchored: UnanchoredRegex

    Create a new Regex with the same pattern, but no requirement that the entire String matches in extractor patterns.

    Create a new Regex with the same pattern, but no requirement that the entire String matches in extractor patterns.

    Normally, matching on date behaves as though the pattern were enclosed in anchors, "^pattern$".

    The unanchored Regex behaves as though those anchors were removed.

    Note that this method does not actually strip any matchers from the pattern.

    Calling anchored returns the original Regex.

    val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r.unanchored
    
    val date(year, month, day) = "Date 2011-07-15"                       // OK
    
    val copyright: String = "Date of this document: 2011-07-15" match {
      case date(year, month, day) => s"Copyright $year"                  // OK
      case _                      => "No copyright"
    }
    returns

    The new unanchored regex

  40. def unapplySeq(m: Match): Option[List[String]]

    Tries to match on a scala.util.matching.Regex.Match.

    Tries to match on a scala.util.matching.Regex.Match.

    A previously failed match results in None.

    If a successful match was made against the current pattern, then that result is used.

    Otherwise, this Regex is applied to the previously matched input, and the result of that match is used.

  41. def unapplySeq(c: Char): Option[List[Char]]

    Tries to match the String representation of a scala.Char.

    Tries to match the String representation of a scala.Char.

    If the match succeeds, the result is the first matching group if any groups are defined, or an empty Sequence otherwise.

    For example:

    val cat = "cat"
    // the case must consume the group to match
    val r = """(\p{Lower})""".r
    cat(0) match { case r(x) => true }
    cat(0) match { case r(_) => true }
    cat(0) match { case r(_*) => true }
    cat(0) match { case r() => true }     // no match
    
    // there is no group to extract
    val r = """\p{Lower}""".r
    cat(0) match { case r(x) => true }    // no match
    cat(0) match { case r(_) => true }    // no match
    cat(0) match { case r(_*) => true }   // matches
    cat(0) match { case r() => true }     // matches
    
    // even if there are multiple groups, only one is returned
    val r = """((.))""".r
    cat(0) match { case r(_) => true }    // matches
    cat(0) match { case r(_,_) => true }  // no match
    c

    The Char to match

    returns

    The match

  42. def unapplySeq(s: CharSequence): Option[List[String]]

    Tries to match a java.lang.CharSequence.

    Tries to match a java.lang.CharSequence.

    If the match succeeds, the result is a list of the matching groups (or a null element if a group did not match any input). If the pattern specifies no groups, then the result will be an empty list on a successful match.

    This method attempts to match the entire input by default; to find the next matching subsequence, use an unanchored Regex.

    For example:

    val p1 = "ab*c".r
    val p1Matches = "abbbc" match {
      case p1() => true               // no groups
      case _    => false
    }
    val p2 = "a(b*)c".r
    val p2Matches = "abbbc" match {
      case p2(_*) => true             // any groups
      case _      => false
    }
    val numberOfB = "abbbc" match {
      case p2(b) => Some(b.length)    // one group
      case _     => None
    }
    val p3 = "b*".r.unanchored
    val p3Matches = "abbbc" match {
      case p3() => true               // find the b's
      case _    => false
    }
    val p4 = "a(b*)(c+)".r
    val p4Matches = "abbbcc" match {
      case p4(_*) => true             // multiple groups
      case _      => false
    }
    val allGroups = "abbbcc" match {
      case p4(all @ _*) => all mkString "/" // "bbb/cc"
      case _            => ""
    }
    val cGroup = "abbbcc" match {
      case p4(_, c) => c
      case _        => ""
    }
    s

    The string to match

    returns

    The matches

  43. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  44. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  45. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  46. def [B](y: B): (Regex, B)

    Implicit information
    This member is added by an implicit conversion from Regex to ArrowAssoc[Regex] performed by method ArrowAssoc in scala.Predef.
    Definition Classes
    ArrowAssoc

Deprecated Value Members

  1. def unapplySeq(target: Any): Option[List[String]]

    Tries to match target.

    Tries to match target.

    target

    The string to match

    returns

    The matches

    Annotations
    @deprecated
    Deprecated

    (Since version 2.11.0) Extracting a match result from anything but a CharSequence or Match is deprecated

Inherited from Serializable

Inherited from java.io.Serializable

Inherited from AnyRef

Inherited from Any

Inherited by implicit conversion any2stringadd from Regex to any2stringadd[Regex]

Inherited by implicit conversion StringFormat from Regex to StringFormat[Regex]

Inherited by implicit conversion Ensuring from Regex to Ensuring[Regex]

Inherited by implicit conversion ArrowAssoc from Regex to ArrowAssoc[Regex]

Ungrouped