Class DcvDomainName
com
or foo.co.uk
. Only
syntactic analysis is performed; no DNS lookups or other network interactions take place. Thus
there is no guarantee that the domain actually exists on the internet.
One common use of this class is to determine whether a given string is likely to represent an
addressable domain on the web -- that is, for a candidate string "xxx"
, might browsing to
"http://xxx/"
result in a webpage being displayed? In the past, this test was frequently
done by determining whether the domain ended with a public suffix
but was not itself a public suffix. However, this test is no longer accurate. There are many
domains which are both public suffixes and addressable as hosts; "uk.com"
is one example.
Using the subset of public suffixes that are registry suffixes,
one can get a better result, as only a few registry suffixes are addressable. However, the most
useful test to determine if a domain is a plausible web host is hasPublicSuffix()
. This
will return true
for many domains which (currently) are not hosts, such as "com"
,
but given that any public suffix may become a host without warning, it is better to err on the
side of permissiveness and thus avoid spurious rejection of valid sites. Of course, to actually
determine addressability of any host, clients of this class will need to perform their own DNS
lookups.
During construction, names are normalized in two ways:
- ASCII uppercase characters are converted to lowercase.
- Unicode dot separators other than the ASCII period (
'.'
) are converted to the ASCII period.
The normalized values will be returned from toString()
and parts()
, and will
be reflected in the result of equals(Object)
.
Internationalized domain
names such as 网络.cn
are supported, as are the equivalent IDNA Punycode-encoded
versions.
- Since:
- 5.0
- Author:
- Catherine Berry
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final Pattern
The Regex to test for ASCII charactersprivate static final String
The Regex to test for different dashesprivate static final String
The regular expression for the separators between domain parts.private static final int
Maximum size of a single part of a domain name.private static final int
Maximum length of a full domain name, including separators, and leaving room for the root label.private static final int
Maximum parts (labels) in a domain name.private final String
The full domain name, converted to lower case.private static final int
Value ofpublicSuffixIndex()
orregistrySuffixIndex()
which indicates that no relevant suffix was found.The parts of the domain name, converted to lower case.private int
Cached value of #publicSuffixIndex().private int
Cached value of #registrySuffixIndex().private static final int
Value ofpublicSuffixIndexCache
orregistrySuffixIndexCache
which indicates that they were not initialized yet. -
Constructor Summary
ConstructorsModifierConstructorDescription(package private)
DcvDomainName
(String name) Constructor used to implementfrom(String)
, and from subclasses.private
DcvDomainName
(String name, List<String> parts) Internal constructor that skips validations when creating an instance from parts of an already-validated DcvDomainName. -
Method Summary
Modifier and TypeMethodDescription(package private) DcvDomainName
ancestor
(int levels) Returns the ancestor of the current domain at the given number of levels "higher" (rightward) in the subdomain list.static void
checkArgument
(boolean expression, Object errorMessage) Ensures the truth of an expression involving one or more parameters to the calling method.static void
checkArgument
(boolean expression, String errorMessageTemplate, Object p1) Ensures the truth of an expression involving one or more parameters to the calling method.static <T> T
checkNotNull
(T reference) Checks that the specified object reference is not null.static void
checkState
(boolean expression, String errorMessageTemplate, Object errorMessageArg) Ensures the truth of an expression involving the state of the calling instance, but not involving any parameters to the calling method.Creates and returns a newDcvDomainName
by prepending the argument and a dot to the current name.private static String
Joins the components of a domain name with dots.Splits a domain name into its component parts.boolean
Equality testing is based on the text supplied by the caller, after normalization as described in the class documentation.private int
findSuffixOfType
(PublicSuffixType desiredType) Returns the index of the leftmost part of the suffix, or -1 if not found.static DcvDomainName
Returns an instance ofDcvDomainName
after lenient validation.int
hashCode()
Returns a hash code for this domain name.boolean
Indicates whether this domain is composed of two or more parts.boolean
Indicates whether this domain name ends in a public suffix, including if it is a public suffix itself.boolean
Indicates whether this domain name ends in a registry suffix, including if it is a registry suffix itself.private static boolean
isExceptionFound
(PslData pslData, String ancestorName) Test if the ancestor name is an exception in the PSL data.boolean
Indicates whether this domain name represents a public suffix, as defined by the Mozilla Foundation's Public Suffix List (PSL).boolean
Indicates whether this domain name represents a registry suffix, as defined by a subset of the Mozilla Foundation's Public Suffix List (PSL).private static boolean
isSuffixFound
(PublicSuffixType desiredType, PslData pslData, String ancestorName) Tests if the desired type of suffix is found in the PSL data.boolean
Indicates whether this domain name is composed of exactly one subdomain component followed by a registry suffix.boolean
Indicates whether this domain name is composed of exactly one subdomain component followed by a public suffix.boolean
Indicates whether this domain name ends in a public suffix, while not being a public suffix itself.boolean
Indicates whether this domain name ends in a registry suffix, while not being a registry suffix itself.static boolean
Indicates whether the argument is a syntactically valid domain name using lenient validation.private static boolean
isWildcardFound
(PublicSuffixType desiredType, PslData pslData, String ancestorName) Test if a wildcard is found in the PSL data.parent()
Returns anDcvDomainName
that is the immediate ancestor of this one; that is, the current domain with the leftmost part removed.parts()
Returns the individual components of this domain name, normalized to all lower case.Returns the public suffix portion of the domain name, ornull
if no public suffix is present.private int
The index in theparts()
list at which the public suffix begins.Returns the registry suffix portion of the domain name, ornull
if no registry suffix is present.private int
The index in theparts()
list at which the registry suffix begins.Returns the portion of this domain name that is one level beneath the registry suffix.Returns the portion of this domain name that is one level beneath the public suffix.toString()
Returns the domain name, normalized to all lower case.private static boolean
validatePart
(String part, boolean isFinalPart) Helper method forvalidateSyntax(List)
.(package private) static boolean
validateSyntax
(List<String> parts) Validation method used byfrom
to ensure that the domain name is syntactically valid according to RFC 1035.
-
Field Details
-
DOTS_REGEX
The regular expression for the separators between domain parts. This includes the ASCII period ('.'
) and the fullwidth forms'。'
,'.'
, and'。'
.- See Also:
-
NO_SUFFIX_FOUND
private static final int NO_SUFFIX_FOUNDValue ofpublicSuffixIndex()
orregistrySuffixIndex()
which indicates that no relevant suffix was found.- See Also:
-
SUFFIX_NOT_INITIALIZED
private static final int SUFFIX_NOT_INITIALIZEDValue ofpublicSuffixIndexCache
orregistrySuffixIndexCache
which indicates that they were not initialized yet.- See Also:
-
MAX_PARTS
private static final int MAX_PARTSMaximum parts (labels) in a domain name. This value arises from the 255-octet limit described in RFC 2181 part 11 with the fact that the encoding of each part occupies at least two bytes (dot plus label externally, length byte plus label internally). Thus, if all labels have the minimum size of one byte, 127 of them will fit.- See Also:
-
MAX_LENGTH
private static final int MAX_LENGTHMaximum length of a full domain name, including separators, and leaving room for the root label. See RFC 2181 part 11.- See Also:
-
MAX_DOMAIN_PART_LENGTH
private static final int MAX_DOMAIN_PART_LENGTHMaximum size of a single part of a domain name. See RFC 2181 part 11.- See Also:
-
name
The full domain name, converted to lower case. -
parts
The parts of the domain name, converted to lower case. -
publicSuffixIndexCache
private int publicSuffixIndexCacheCached value of #publicSuffixIndex(). Do not use directly.Since this field isn't
volatile
, if an instance of this class is shared across threads before it is initialized, then each thread is likely to compute their own copy of the value. -
registrySuffixIndexCache
private int registrySuffixIndexCacheCached value of #registrySuffixIndex(). Do not use directly.Since this field isn't
volatile
, if an instance of this class is shared across threads before it is initialized, then each thread is likely to compute their own copy of the value. -
DASH_REGEX
The Regex to test for different dashes- See Also:
-
ASCII_PATTERN
The Regex to test for ASCII characters
-
-
Constructor Details
-
DcvDomainName
DcvDomainName(String name) Constructor used to implementfrom(String)
, and from subclasses.- Parameters:
name
- the domain name
-
DcvDomainName
Internal constructor that skips validations when creating an instance from parts of an already-validated DcvDomainName.- Parameters:
name
- the domain nameparts
- the parts of the domain name
-
-
Method Details
-
publicSuffixIndex
private int publicSuffixIndex()The index in theparts()
list at which the public suffix begins. For example, for the domain namemyblog.blogspot.co.uk
, the value would be 1 (the index of theblogspot
part). The value is negative (specifically,NO_SUFFIX_FOUND
) if no public suffix was found.- Returns:
- the index of the leftmost part of the suffix, or -1 if not found
-
registrySuffixIndex
private int registrySuffixIndex()The index in theparts()
list at which the registry suffix begins. For example, for the domain namemyblog.blogspot.co.uk
, the value would be 2 (the index of theco
part). The value is negative (specifically,NO_SUFFIX_FOUND
) if no registry suffix was found.- Returns:
- the index of the leftmost part of the suffix, or -1 if not found
-
findSuffixOfType
Returns the index of the leftmost part of the suffix, or -1 if not found. Note that the value defined as a suffix may not producetrue
results fromisPublicSuffix()
orisRegistrySuffix()
if the domain ends with an excluded domain pattern such as"nhs.uk"
.If a
desiredType
is specified, this method only finds suffixes of the given type. Otherwise, it finds the first suffix of any type.- Parameters:
desiredType
- the desired type of suffix to find- Returns:
- the index of the leftmost part of the suffix, or -1 if not found
-
isSuffixFound
private static boolean isSuffixFound(PublicSuffixType desiredType, PslData pslData, String ancestorName) Tests if the desired type of suffix is found in the PSL data.- Parameters:
desiredType
- the desired type of suffix to findpslData
- the PSL data to useancestorName
- the name of the ancestor domain- Returns:
- boolean indicating if the suffix was found
-
isExceptionFound
Test if the ancestor name is an exception in the PSL data.- Parameters:
pslData
- the PSL data to useancestorName
- the name of the ancestor domain- Returns:
- boolean indicating if the ancestor name is an exception
-
isWildcardFound
private static boolean isWildcardFound(PublicSuffixType desiredType, PslData pslData, String ancestorName) Test if a wildcard is found in the PSL data.- Parameters:
desiredType
- the desired type of suffix to findpslData
- the PSL data to useancestorName
- the name of the ancestor domain- Returns:
- boolean indicating if a wildcard was found
-
from
Returns an instance ofDcvDomainName
after lenient validation. Specifically, validation against RFC 3490 ("Internationalizing Domain Names in Applications") is skipped, while validation against RFC 1035 is relaxed in the following ways:- Any part containing non-ASCII characters is considered valid.
- Underscores ('_') are permitted wherever dashes ('-') are permitted.
- Parts other than the final part may start with a digit, as mandated by RFC 1123.
- Parameters:
domain
- A domain name (not IP address)- Returns:
- An instance of
DcvDomainName
for the given domain - Since:
- 10.0 (previously named
fromLenient
)
-
validateSyntax
Validation method used byfrom
to ensure that the domain name is syntactically valid according to RFC 1035.- Parameters:
parts
- The parts of the domain name- Returns:
- Is the domain name syntactically valid?
-
validatePart
Helper method forvalidateSyntax(List)
. Validates that one part of a domain name is valid.- Parameters:
part
- The domain name part to be validatedisFinalPart
- Is this the final (rightmost) domain part?- Returns:
- Whether the part is valid
-
parts
Returns the individual components of this domain name, normalized to all lower case. For example, for the domain namemail.google.com
, this method returns the list["mail", "google", "com"]
.- Returns:
- A list of the individual components of this domain name
-
isPublicSuffix
public boolean isPublicSuffix()Indicates whether this domain name represents a public suffix, as defined by the Mozilla Foundation's Public Suffix List (PSL). A public suffix is one under which Internet users can directly register names, such ascom
,co.uk
orpvt.k12.wy.us
. Examples of domain names that are not public suffixes includegoogle.com
,foo.co.uk
, andmyblog.blogspot.com
.Public suffixes are a proper superset of registry suffixes. The list of public suffixes additionally contains privately owned domain names under which Internet users can register subdomains. An example of a public suffix that is not a registry suffix is
blogspot.com
. Note that it is true that all public suffixes have registry suffixes, since domain name registries collectively control all internet domain names.For considerations on whether the public suffix or registry suffix designation is more suitable for your application, see this article.
- Returns:
true
if this domain name appears exactly on the public suffix list- Since:
- 6.0
-
hasPublicSuffix
public boolean hasPublicSuffix()Indicates whether this domain name ends in a public suffix, including if it is a public suffix itself. For example, returnstrue
forwww.google.com
,foo.co.uk
andcom
, but not forinvalid
orgoogle.invalid
. This is the recommended method for determining whether a domain is potentially an addressable host.Note that this method is equivalent to
hasRegistrySuffix()
because all registry suffixes are public suffixes and all public suffixes have registry suffixes.- Returns:
true
if this domain name ends in a public suffix- Since:
- 6.0
-
publicSuffix
Returns the public suffix portion of the domain name, ornull
if no public suffix is present.- Returns:
- the public suffix of the domain name, or
null
if no public suffix is present - Since:
- 6.0
-
isUnderPublicSuffix
public boolean isUnderPublicSuffix()Indicates whether this domain name ends in a public suffix, while not being a public suffix itself. For example, returnstrue
forwww.google.com
,foo.co.uk
andmyblog.blogspot.com
, but not forcom
,co.uk
,google.invalid
, orblogspot.com
.This method can be used to determine whether it will probably be possible to set cookies on the domain, though even that depends on individual browsers' implementations of cookie controls. See RFC 2109 for details.
- Returns:
true
if the domain name ends in a public suffix, but is not a public suffix- Since:
- 6.0
-
isTopPrivateDomain
public boolean isTopPrivateDomain()Indicates whether this domain name is composed of exactly one subdomain component followed by a public suffix. For example, returnstrue
forgoogle.com
foo.co.uk
, andmyblog.blogspot.com
, but not forwww.google.com
,co.uk
, orblogspot.com
.This method can be used to determine whether a domain is probably the highest level for which cookies may be set, though even that depends on individual browsers' implementations of cookie controls. See RFC 2109 for details.
- Returns:
true
if the domain name is a top private domain- Since:
- 6.0
-
topPrivateDomain
Returns the portion of this domain name that is one level beneath the public suffix. For example, forx.adwords.google.co.uk
it returnsgoogle.co.uk
, sinceco.uk
is a public suffix. Similarly, formyblog.blogspot.com
it returns the same domain,myblog.blogspot.com
, sinceblogspot.com
is a public suffix.If
isTopPrivateDomain()
is true, the current domain name instance is returned.This method can be used to determine the probable highest level parent domain for which cookies may be set, though even that depends on individual browsers' implementations of cookie controls.
- Returns:
- the top private domain of the domain name
- Since:
- 6.0
-
isRegistrySuffix
public boolean isRegistrySuffix()Indicates whether this domain name represents a registry suffix, as defined by a subset of the Mozilla Foundation's Public Suffix List (PSL). A registry suffix is one under which Internet users can directly register names via a domain name registrar, and have such registrations lawfully protected by internet-governing bodies such as ICANN. Examples of registry suffixes includecom
,co.uk
, andpvt.k12.wy.us
. Examples of domain names that are not registry suffixes includegoogle.com
andfoo.co.uk
.Registry suffixes are a proper subset of public suffixes. The list of public suffixes additionally contains privately owned domain names under which Internet users can register subdomains. An example of a public suffix that is not a registry suffix is
blogspot.com
. Note that it is true that all public suffixes have registry suffixes, since domain name registries collectively control all internet domain names.For considerations on whether the public suffix or registry suffix designation is more suitable for your application, see this article.
- Returns:
true
if this domain name appears exactly on the public suffix list as part of the registry suffix section (labelled "ICANN").- Since:
- 23.3
-
hasRegistrySuffix
public boolean hasRegistrySuffix()Indicates whether this domain name ends in a registry suffix, including if it is a registry suffix itself. For example, returnstrue
forwww.google.com
,foo.co.uk
andcom
, but not forinvalid
orgoogle.invalid
.Note that this method is equivalent to
hasPublicSuffix()
because all registry suffixes are public suffixes and all public suffixes have registry suffixes.- Returns:
true
if this domain name ends in a registry suffix- Since:
- 23.3
-
registrySuffix
Returns the registry suffix portion of the domain name, ornull
if no registry suffix is present.- Returns:
- The domain name of the registry suffix, or
null
if no registry suffix is present - Since:
- 23.3
-
isUnderRegistrySuffix
public boolean isUnderRegistrySuffix()Indicates whether this domain name ends in a registry suffix, while not being a registry suffix itself. For example, returnstrue
forwww.google.com
,foo.co.uk
andblogspot.com
, but not forcom
,co.uk
, orgoogle.invalid
.- Returns:
true
if the domain name ends in a registry suffix- Since:
- 23.3
-
isTopDomainUnderRegistrySuffix
public boolean isTopDomainUnderRegistrySuffix()Indicates whether this domain name is composed of exactly one subdomain component followed by a registry suffix. For example, returnstrue
forgoogle.com
,foo.co.uk
, andblogspot.com
, but not forwww.google.com
,co.uk
, ormyblog.blogspot.com
.Warning: This method should not be used to determine the probable highest level parent domain for which cookies may be set. Use
topPrivateDomain()
for that purpose.- Returns:
true
if the domain name is a top domain under a registry suffix- Since:
- 23.3
-
topDomainUnderRegistrySuffix
Returns the portion of this domain name that is one level beneath the registry suffix. For example, forx.adwords.google.co.uk
it returnsgoogle.co.uk
, sinceco.uk
is a registry suffix. Similarly, formyblog.blogspot.com
it returnsblogspot.com
, sincecom
is a registry suffix.If
isTopDomainUnderRegistrySuffix()
is true, the current domain name instance is returned.Warning: This method should not be used to determine whether a domain is probably the highest level for which cookies may be set. Use
isTopPrivateDomain()
for that purpose.- Returns:
- the top domain under the registry suffix of the domain name
- Since:
- 23.3
-
hasParent
public boolean hasParent()Indicates whether this domain is composed of two or more parts.- Returns:
true
if the domain has a parent
-
parent
Returns anDcvDomainName
that is the immediate ancestor of this one; that is, the current domain with the leftmost part removed. For example, the parent ofwww.google.com
isgoogle.com
.- Returns:
- the immediate ancestor of this domain
-
ancestor
Returns the ancestor of the current domain at the given number of levels "higher" (rightward) in the subdomain list. The number of levels must be non-negative, and less thanN-1
, whereN
is the number of parts in the domain.- Parameters:
levels
- the level of the ancestor to check.- Returns:
- the ancestor of the domain at the given number of levels higher
-
child
Creates and returns a newDcvDomainName
by prepending the argument and a dot to the current name. For example,DcvDomainName.from("foo.com").child("www.bar")
returns a newDcvDomainName
with the valuewww.bar.foo.com
. Only lenient validation is performed, as describedhere
.- Parameters:
leftParts
- the parts to append to the current domain name- Returns:
- a new
DcvDomainName
with the combined parts
-
isValid
Indicates whether the argument is a syntactically valid domain name using lenient validation. Specifically, validation against RFC 3490 ("Internationalizing Domain Names in Applications") is skipped.The following two code snippets are equivalent:
domainName = DcvDomainName.isValid(name) ? DcvDomainName.from(name) : DEFAULT_DOMAIN;
try { domainName = DcvDomainName.from(name); } catch (IllegalArgumentException e) { domainName = DEFAULT_DOMAIN; }
- Parameters:
name
- the domain name to validate- Returns:
true
if the argument is a syntactically valid domain name- Since:
- 8.0 (previously named
isValidLenient
)
-
dotSplit
Splits a domain name into its component parts.- Parameters:
name
- the domain name- Returns:
- the parts of the domain name in a list of strings
-
dotJoin
Joins the components of a domain name with dots.- Parameters:
name
- the components of a domain name- Returns:
- the domain name as a string
-
checkArgument
Ensures the truth of an expression involving one or more parameters to the calling method.- Parameters:
expression
- a boolean expressionerrorMessage
- the exception message to use if the check fails; will be converted to a string usingString.valueOf(Object)
- Throws:
IllegalArgumentException
- ifexpression
is false
-
checkArgument
Ensures the truth of an expression involving one or more parameters to the calling method.- Parameters:
expression
- The expression to checkerrorMessageTemplate
- A template for the exception message should the check fail.p1
- The arguments to be substituted into the message template.- Since:
- 20.0 (varargs overload since 2.0)
-
checkNotNull
public static <T> T checkNotNull(T reference) Checks that the specified object reference is not null.- Type Parameters:
T
- the type of the reference- Parameters:
reference
- an object reference- Returns:
- the non-null reference that was validated
- Throws:
NullPointerException
- ifreference
is null
-
checkState
public static void checkState(boolean expression, String errorMessageTemplate, Object errorMessageArg) Ensures the truth of an expression involving the state of the calling instance, but not involving any parameters to the calling method.- Parameters:
expression
- a boolean expressionerrorMessageTemplate
- a template for the exception message should the check fail. The message is formed by replacing each%s
placeholder in the template with an argument. These are matched by position - the first%s
getserrorMessageArgs[0]
, etc. Unmatched arguments will be appended to the formatted message in square braces. Unmatched placeholders will be left as-is.errorMessageArg
- the arguments to be substituted into the message template. Arguments are converted to strings usingString.valueOf(Object)
.- Throws:
IllegalStateException
- ifexpression
is false
-
toString
Returns the domain name, normalized to all lower case. -
equals
Equality testing is based on the text supplied by the caller, after normalization as described in the class documentation. For example, a non-ASCII Unicode domain name and the Punycode version of the same domain name would not be considered equal. -
hashCode
public int hashCode()Returns a hash code for this domain name.
-