public interface NCModel
A model generally defines:
getId()
getName()
getVersion()
query(NCQueryContext)
- the main method that user implements to provide result.getElements()
should provide at least one user-defined element.
initialize(NCProbeContext)
- called only once during the model deployment
to initialize the model.
discard()
- called only once to discard the model during the orderly shutdown of the probe.
This method may not be called if the probe process was killed.
NCModelAdapter
or NCModelFileAdapter
.
NCModelFileAdapter
for loading JSON or YAML models). All JSON properties correspond to their counterparts in this interface. For example:
{ "id": "user.defined.id", "name": "User Defined Name", "version": "1.0", "description": "Short model description.", "enabledTokens": ["google:person", "google:location"] "examples": [], "macros": [], "metadata": { "myConfig": "myProperty" }, "elements": [ { "id": "x:id", "group": "default", "parentId": null, "excludedSynonyms": [], "synonyms": [], "relations": {}, "metadata": {}, "values": [] } ], "additionalStopwords": [], "excludedStopwords": [], "suspiciousWords": [] }Note that many examples shipped with NLPCraft use external JSON or YAML model configuration.
NCModelAdapter
,
NCModelFileAdapter
Modifier and Type | Field and Description |
---|---|
static Set<String> |
DFLT_ENABLED_TOKENS
Default set of enabled built-in tokens.
|
static boolean |
DFLT_IS_DUP_SYNONYMS_ALLOWED
Default value returned from
isDupSynonymsAllowed() method. |
static boolean |
DFLT_IS_NO_NOUNS_ALLOWED
Default value returned from
isNoNounsAllowed() method. |
static boolean |
DFLT_IS_NO_USER_TOKENS_ALLOWED
Default value returned from
isNoUserTokensAllowed() method. |
static boolean |
DFLT_IS_NON_ENGLISH_ALLOWED
Default value returned from
isNonEnglishAllowed() method. |
static boolean |
DFLT_IS_NOT_LATIN_CHARSET_ALLOWED
Default value returned from
isNotLatinCharsetAllowed() method. |
static boolean |
DFLT_IS_PERMUTATE_SYNONYMS
Default value returned from
isPermutateSynonyms() method. |
static boolean |
DFLT_IS_SWEAR_WORDS_ALLOWED
Default value returned from
isSwearWordsAllowed() method. |
static int |
DFLT_JIGGLE_FACTOR
Default value returned from
getJiggleFactor() method. |
static int |
DFLT_MAX_FREE_WORDS
Default value returned from
getMaxFreeWords() method. |
static int |
DFLT_MAX_SUSPICIOUS_WORDS
Default value returned from
getMaxSuspiciousWords() method. |
static int |
DFLT_MAX_TOKENS
Default value returned from
getMaxTokens() method. |
static int |
DFLT_MAX_TOTAL_SYNONYMS
Default value returned from
getMaxTotalSynonyms() method. |
static int |
DFLT_MAX_UNKNOWN_WORDS
Default value returned from
getMaxUnknownWords() method. |
static int |
DFLT_MAX_WORDS
Default value returned from
getMaxWords() method. |
static NCMetadata |
DFLT_METADATA
Default value returned from
getJiggleFactor() method. |
static int |
DFLT_MIN_NON_STOPWORDS
Default value returned from
getMinNonStopwords() method. |
static int |
DFLT_MIN_TOKENS
Default value returned from
getMinTokens() method. |
static int |
DFLT_MIN_WORDS
Default value returned from
getMinWords() method. |
static Function<NCQueryContext,NCQueryResult> |
DFLT_QRY_FUNCTION
Default query method implementation that throw exception.
|
Modifier and Type | Method and Description |
---|---|
default void |
discard()
A callback before this model instance gets discarded.
|
default Set<String> |
getAdditionalStopWords()
Gets an optional list of stopwords to add to the built-in ones.
|
default String |
getDescription()
Gets optional short model description.
|
default Set<NCElement> |
getElements()
Gets a set of model elements.
|
default Set<String> |
getEnabledTokens()
Gets set of IDs for built-in tokens that should be enabled and detected for this model.
|
default Set<String> |
getExamples()
Gets an optional list of example sentences demonstrating what can be asked with this model.
|
default Set<String> |
getExcludedStopWords()
Gets an optional list of stopwords to exclude from the built-in list of stopwords.
|
String |
getId()
Gets unique, immutable ID of this model.
|
default int |
getJiggleFactor()
Measure of how much sparsity is allowed when user input words are reordered in attempt to
match the multi-word synonyms.
|
default Map<String,String> |
getMacros()
Gets an optional map of macros to be used in this model.
|
default int |
getMaxFreeWords()
Gets maximum number of free words until automatic rejection.
|
default int |
getMaxSuspiciousWords()
Gets maximum number of suspicious words until automatic rejection.
|
default int |
getMaxTokens()
Gets maximum number of all tokens (system and user defined) above which user input will be
automatically rejected as too long.
|
default int |
getMaxTotalSynonyms()
Total number of synonyms allowed per model.
|
default int |
getMaxUnknownWords()
Gets maximum number of unknown words until automatic rejection.
|
default int |
getMaxWords()
Gets maximum word count (including stopwords) above which user input will be automatically
rejected as too long.
|
default NCMetadata |
getMetadata()
Gets optional user specific model metadata can be set by the developer and accessed later.
|
default int |
getMinNonStopwords()
Gets minimum word count (excluding stopwords) below which user input will be automatically rejected
as ambiguous sentence.
|
default int |
getMinTokens()
Gets minimum number of all tokens (system and user defined) below which user input will be
automatically rejected as too short.
|
default int |
getMinWords()
Gets minimum word count (including stopwords) below which user input will be automatically
rejected as too short.
|
String |
getName()
Gets descriptive name of this model.
|
default NCCustomParser |
getParser()
Gets optional custom user parser for model elements.
|
default Set<String> |
getSuspiciousWords()
Gets an optional list of suspicious words.
|
String |
getVersion()
Gets the version of this model using semantic versioning.
|
default void |
initialize(NCProbeContext probeCtx)
Probe calls this method to initialize the model when it gets deployed in the probe.
|
default boolean |
isDupSynonymsAllowed()
Whether or not duplicate synonyms are allowed.
|
default boolean |
isNonEnglishAllowed()
Whether or not to allow non-English language in user input.
|
default boolean |
isNoNounsAllowed()
Whether or not to allow user input without a single noun.
|
default boolean |
isNotLatinCharsetAllowed()
Whether or not to allow non-Latin charset in user input.
|
default boolean |
isNoUserTokensAllowed()
Whether or not to allow the user input with no user token detected.
|
default boolean |
isPermutateSynonyms()
Whether or not to permutate multi-word synonyms.
|
default boolean |
isSwearWordsAllowed()
Whether or not to allow known English swear words in user input.
|
default NCQueryResult |
query(NCQueryContext ctx)
Processes user input provided in the given query context and either returns the query result or throws
an exception.
|
static final int DFLT_JIGGLE_FACTOR
getJiggleFactor()
method.static final NCMetadata DFLT_METADATA
getJiggleFactor()
method.static final int DFLT_MAX_UNKNOWN_WORDS
getMaxUnknownWords()
method.static final int DFLT_MAX_FREE_WORDS
getMaxFreeWords()
method.static final int DFLT_MAX_SUSPICIOUS_WORDS
getMaxSuspiciousWords()
method.static final int DFLT_MIN_WORDS
getMinWords()
method.static final int DFLT_MAX_WORDS
getMaxWords()
method.static final int DFLT_MIN_TOKENS
getMinTokens()
method.static final int DFLT_MAX_TOKENS
getMaxTokens()
method.static final int DFLT_MIN_NON_STOPWORDS
getMinNonStopwords()
method.static final boolean DFLT_IS_NON_ENGLISH_ALLOWED
isNonEnglishAllowed()
method.static final boolean DFLT_IS_NOT_LATIN_CHARSET_ALLOWED
isNotLatinCharsetAllowed()
method.static final boolean DFLT_IS_SWEAR_WORDS_ALLOWED
isSwearWordsAllowed()
method.static final boolean DFLT_IS_NO_NOUNS_ALLOWED
isNoNounsAllowed()
method.static final boolean DFLT_IS_PERMUTATE_SYNONYMS
isPermutateSynonyms()
method.static final boolean DFLT_IS_DUP_SYNONYMS_ALLOWED
isDupSynonymsAllowed()
method.static final int DFLT_MAX_TOTAL_SYNONYMS
getMaxTotalSynonyms()
method.static final boolean DFLT_IS_NO_USER_TOKENS_ALLOWED
isNoUserTokensAllowed()
method.static final Function<NCQueryContext,NCQueryResult> DFLT_QRY_FUNCTION
String getId()
Note that model IDs are immutable while name and version can be changed freely. Changing model ID is equal to creating a completely new model. Model IDs (unlike name and version) are not exposed to the end user and only serve a technical purpose. ID's max length is 32 characters.
JSON
If using JSON/YAML model presentation this is set by id
property:
{ "id": "my.model.id" }
String getName()
JSON
If using JSON/YAML model presentation this is set by name
property:
{ "name": "My Model" }
String getVersion()
JSON
If using JSON/YAML model presentation this is set by version
property:
{ "version": "1.0.0" }
default String getDescription()
JSON
If using JSON/YAML model presentation this is set by description
property:
{ "description": "Model description..." }
default int getMaxUnknownWords()
Default
If not provided by the model the default value DFLT_MAX_UNKNOWN_WORDS
will be used.
JSON
If using JSON/YAML model presentation this is set by maxUnknownWords
property:
{ "maxUnknownWords": 2 }
default int getMaxFreeWords()
Default
If not provided by the model the default value DFLT_MAX_FREE_WORDS
will be used.
JSON
If using JSON/YAML model presentation this is set by maxFreeWords
property:
{ "maxFreeWords": 2 }
default int getMaxSuspiciousWords()
Default
If not provided by the model the default value DFLT_MAX_SUSPICIOUS_WORDS
will be used.
JSON
If using JSON/YAML model presentation this is set by maxSuspiciousWords
property:
{ "maxSuspiciousWords": 2 }
default int getMinWords()
Default
If not provided by the model the default value DFLT_MIN_WORDS
will be used.
JSON
If using JSON/YAML model presentation this is set by minWords
property:
{ "minWords": 2 }
default int getMaxWords()
Default
If not provided by the model the default value DFLT_MAX_WORDS
will be used.
JSON
If using JSON/YAML model presentation this is set by maxWords
property:
{ "maxWords": 50 }
default int getMinTokens()
Default
If not provided by the model the default value DFLT_MIN_TOKENS
will be used.
JSON
If using JSON/YAML model presentation this is set by minTokens
property:
{ "minTokens": 1 }
default int getMaxTokens()
Default
If not provided by the model the default value DFLT_MAX_TOKENS
will be used.
JSON
If using JSON/YAML model presentation this is set by maxTokens
property:
{ "maxTokens": 100 }
default int getMinNonStopwords()
Default
If not provided by the model the default value DFLT_MIN_NON_STOPWORDS
will be used.
JSON
If using JSON/YAML model presentation this is set by minNonStopwords
property:
{ "minNonStopwords": 2 }
default boolean isNonEnglishAllowed()
Default
If not provided by the model the default value DFLT_IS_NON_ENGLISH_ALLOWED
will be used.
JSON
If using JSON/YAML model presentation this is set by nonEnglishAllowed
property:
{ "nonEnglishAllowed": false }
default boolean isNotLatinCharsetAllowed()
false
such user input will be automatically
rejected.
Default
If not provided by the model the default value DFLT_IS_NOT_LATIN_CHARSET_ALLOWED
will be used.
JSON
If using JSON/YAML model presentation this is set by nonLatinCharsetAllowed
property:
{ "nonLatinCharsetAllowed": false }
default boolean isSwearWordsAllowed()
false
- user input with
detected known English swear words will be automatically rejected.
Default
If not provided by the model the default value DFLT_IS_SWEAR_WORDS_ALLOWED
will be used.
JSON
If using JSON/YAML model presentation this is set by swearWordsAllowed
property:
{ "swearWordsAllowed": false }
default boolean isNoNounsAllowed()
false
such user input
will be automatically rejected. Typically for command or query-oriented models this should be set to
false
as any command or query should have at least one noun subject. However, for conversational
models this can be set to false
to allow for a smalltalk and one-liners.
Default
If not provided by the model the default value DFLT_IS_NO_NOUNS_ALLOWED
will be used.
JSON
If using JSON/YAML model presentation this is set by noNounsAllowed
property:
{ "noNounsAllowed": false }
default boolean isPermutateSynonyms()
Default
If not provided by the model the default value DFLT_IS_PERMUTATE_SYNONYMS
will be used.
JSON
If using JSON/YAML model presentation this is set by permutateSynonyms
property:
{ "permutateSynonyms": true }
default boolean isDupSynonymsAllowed()
true
- the model will pick the random
model element when multiple elements found due to duplicate synonyms. If false
- model
will print error message and will not deploy.
Default
If not provided by the model the default value DFLT_IS_DUP_SYNONYMS_ALLOWED
will be used.
JSON
If using JSON/YAML model presentation this is set by dupSynonymsAllowed
property:
{ "dupSynonymsAllowed": true }
default int getMaxTotalSynonyms()
Default
If not provided by the model the default value DFLT_MAX_TOTAL_SYNONYMS
will be used.
JSON
If using JSON/YAML model presentation this is set by maxTotalSynonyms
property:
{ "maxTotalSynonyms": true }
default boolean isNoUserTokensAllowed()
false
such user
input will be automatically rejected. Note that this property only applies to user-defined
token (i.e. model element). Even if there are no user defined tokens, the user input may still
contain system token like nlpcraft:geo
or nlpcraft:date
. In many cases models
should be build to allow user input without user tokens. However, set it to false
if presence
of at least one user token is mandatory.
Default
If not provided by the model the default value DFLT_IS_NO_USER_TOKENS_ALLOWED
will be used.
JSON
If using JSON/YAML model presentation this is set by noUserTokensAllowed
property:
{ "noUserTokensAllowed": false }
default int getJiggleFactor()
2
proved to be a good default value in most cases. Note that larger
values mean that synonym words can be almost in any random place in the user input which makes
synonym matching practically meaningless. Maximum value is 4
.
Default
If not provided by the model the default value DFLT_JIGGLE_FACTOR
will be used.
JSON
If using JSON/YAML model presentation this is set by jiggleFactor
property:
{ "jiggleFactor": 2 }
default NCMetadata getMetadata()
default Set<String> getAdditionalStopWords()
Stopword is an individual word (i.e. sequence of characters excluding whitespaces) that contribute no semantic meaning to the sentence. For example, 'the', 'wow', or 'hm' provide no semantic meaning to the sentence and can be safely excluded from semantic analysis.
NLPCraft comes with a carefully selected list of English stopwords which should be sufficient for a majority of use cases. However, you can add additional stopwords to this list. The typical use for user-defined stopwords are jargon parasite words that are specific to the model's domain.
JSON
If using JSON/YAML model presentation this is set by additionalStopwords
property:
{ "additionalStopwords": [ "stopword1", "stopword2" ] }
default Set<String> getExcludedStopWords()
Just like you can add additional stopwords via getAdditionalStopWords()
you can exclude
certain words from the list of stopwords. This can be useful in rare cases when default built-in
stopword has specific meaning of your model. In order to process them you need to exclude them
from the list of stopwords.
JSON
If using JSON/YAML model presentation this is set by excludedStopwords
property:
{ "excludedStopwords": [ "excludedStopword1", "excludedStopword2" ] }
default Set<String> getExamples()
JSON
If using JSON/YAML model presentation this is set by examples
property:
{ "examples": [ "Example questions one", "Another sample sentence" ] }
default Set<String> getSuspiciousWords()
MAX_SUSPICIOUS_WORDS
property set to zero.
Note that by setting model's metadata MAX_SUSPICIOUS_WORDS
property to non-zero value you can
adjust the sensitivity of suspicious words auto-rejection logic.
JSON
If using JSON/YAML model presentation this is set by suspiciousWords
property:
{ "suspiciousWords": [ "sex", "porn" ] }
default Map<String,String> getMacros()
NCElement
for documentation on macros.
JSON
If using JSON/YAML model presentation this is set by macros
property:
{ "macros": [ { "name": "<OF>", "macro": "{of|for|per}" }, { "name": "<CUR>", "macro": "{current|present|moment|now}" } ] }
default Set<NCElement> getElements()
An element is the main building block of the semantic model. User data model element defines an entity that will be automatically recognized in the user input either by one of its synonyms or values, or directly by its ID.
Note that unless model elements are loaded dynamically it is highly recommended to declare model
elements in the external JSON/YAML model configuration (under elements
property):
{ "elements": [ { "id": "wt:hist", "synonyms": [ "{<WEATHER>|*} <HISTORY>", "<HISTORY> {<OF>|*} <WEATHER>" ], "description": "Past weather conditions." } ] }
default void discard()
Note that if model has an important state it is highly recommended that it would store it periodically instead of relying on this method.
Default
Default implementation is a no-op.
JSON
If using JSON/YAML model presentation this method will have no-op implementation.
default void initialize(NCProbeContext probeCtx)
Default
Default implementation stores provided probe context in the metadata under __NC_PROBE_CTX
name:
default void initialize(NCProbeContext probeCtx) { getMetadata().put("__NC_PROBE_CTX", probeCtx); }
probeCtx
- Probe context.default NCQueryResult query(NCQueryContext ctx) throws NCRejection
NCIntentSolver
for intent-based
user input processing for a simplified way to encode that processing logic.ctx
- Query context containing parsed user input and all associated data.null
. In case of any errors this method should
throw NCRejection
exception.NCRejection
- Thrown when user input cannot be processed as is and should be rejected.NCIntentSolver
default Set<String> getEnabledTokens()
Default
The following built-in tokens are enabled by default implementation of this method:
nlpcraft:date
nlpcraft:geo
nlpcraft:num
nlpcraft:coordinate
nlpcraft:function
NCToken
for the list of all supported built-in tokens.
JSON
If using JSON/YAML model presentation this is set by enabledTokens
property:
{ "enabledTokens": [ "google:person", "google:location", "stanford:money" ] }
default NCCustomParser getParser()
By default the semantic data model detects its elements by their synonyms declared in the model. However, in some cases the synonyms (or the regular expressions) are simply not expressive enough. In such cases, a user-defined custom parser can be defined for the model that would allow the user to define its own logic to detect the model elements in the user input programmatically. Note that there can be only one custom parser per model and it can detect any number of model elements.
null
if not used (default).Copyright © 2013-2019 NLPCraft Project. All rights reserved.