public final class CaseCanonicalize
extends java.lang.Object
From section 15.10.2.9,
The abstract operation Canonicalize takes a character parameter ch and performs the following steps:
- If IgnoreCase is false, return ch.
- Let u be ch converted to upper case as if by calling the standard built-in method
String.prototype.toUpperCase
on the one-character String ch.- If u does not consist of a single character, return ch.
- Let cu be u's character.
- If ch's code unit value is greater than or equal to decimal 128 and cu's code unit value is less than decimal 128, then return ch.
- Return cu.
Modifier and Type | Field and Description |
---|---|
static com.google.javascript.jscomp.regex.CharRanges |
CASE_SENSITIVE
Set of code units that are case-insensitively equivalent to some other
code unit according to the EcmaScript
Canonicalize operation
described in section 15.10.2.8.
|
Modifier and Type | Method and Description |
---|---|
static char |
caseCanonicalize(char ch)
Returns the case canonical version of the given code-unit.
|
static java.lang.String |
caseCanonicalize(java.lang.String s)
Returns the case canonical version of the given string.
|
static com.google.javascript.jscomp.regex.CharRanges |
expandToAllMatched(com.google.javascript.jscomp.regex.CharRanges ranges)
Given a character range that may include case sensitive code-units,
such as
[0-9B-M] , returns the character range that includes all
the code-units in the input and those that are case-insensitively
equivalent to a code-unit in the input. |
static com.google.javascript.jscomp.regex.CharRanges |
reduceToMinimum(com.google.javascript.jscomp.regex.CharRanges ranges)
Given a character range that may include case sensitive code-units,
such as
[0-9B-M] , returns the character range that includes
the minimal set of code units such that for every code unit in the
input there is a case-sensitively equivalent canonical code unit in the
output. |
public static final com.google.javascript.jscomp.regex.CharRanges CASE_SENSITIVE
String.prototype.toUpperCase
which is itself based on Unicode 3.0.0
as specified at
UnicodeData-3.0.0
and
SpecialCasings-2.txt
.
This table was generated by running the below on Chrome:
for (var cc = 0; cc < 0x10000; ++cc) { var ch = String.fromCharCode(cc); var u = ch.toUpperCase(); if (ch != u && u.length === 1) { var cu = u.charCodeAt(0); if (cc <= 128 || u.charCodeAt(0) > 128) { print('0x' + cc.toString(16) + ', 0x' + cu.toString(16) + ','); } } }
public static java.lang.String caseCanonicalize(java.lang.String s)
public static char caseCanonicalize(char ch)
public static com.google.javascript.jscomp.regex.CharRanges expandToAllMatched(com.google.javascript.jscomp.regex.CharRanges ranges)
[0-9B-M]
, returns the character range that includes all
the code-units in the input and those that are case-insensitively
equivalent to a code-unit in the input.public static com.google.javascript.jscomp.regex.CharRanges reduceToMinimum(com.google.javascript.jscomp.regex.CharRanges ranges)
[0-9B-M]
, returns the character range that includes
the minimal set of code units such that for every code unit in the
input there is a case-sensitively equivalent canonical code unit in the
output.Copyright © 2009-2019 Google. All Rights Reserved.