a model identifier
the model to use for exploitation. This MUST be deterministic for the probability to be correct.
The model must return a value in the range 1 to classLabels.size
(inclusive).
the exploration/exploitation tradeoff parameter. epsilon must be in the interval [0, 1]. 0 indicates never select an action randomly. 1 indicates always select an action randomly.
a function that generates a salt for the randomization layer. This salt allows the random choice of which policy to follow to be repeatable.
a list of class labels to output for the final type. Also note that the size of this controls the number of actions. If the submodel returns a score < 1 or > classLabels.size (note the 1 offset) then a RuntimeException will be thrown.
a list of class labels to output for the final type.
a list of class labels to output for the final type. Also note that the size of this controls the number of actions. If the submodel returns a score < 1 or > classLabels.size (note the 1 offset) then a RuntimeException will be thrown.
the model to use for exploitation.
the model to use for exploitation. This MUST be deterministic for the probability to be correct.
The model must return a value in the range 1 to classLabels.size
(inclusive).
the exploration/exploitation tradeoff parameter.
the exploration/exploitation tradeoff parameter. epsilon must be in the interval [0, 1]. 0 indicates never select an action randomly. 1 indicates always select an action randomly.
a model identifier
a model identifier
a function that generates a salt for the randomization layer.
a function that generates a salt for the randomization layer. This salt allows the random choice of which policy to follow to be repeatable.
A model which does epsilon greedy style exploration. This will choose a random action with probability epsilon or an action from the defaultPolicy with probability 1 - epsilon. Note that the default policy MUST return a value between 1 and the number of actions, and if not an exception will be thrown.
model input type
model output type
a model identifier
the model to use for exploitation. This MUST be deterministic for the probability to be correct. The model must return a value in the range 1 to
classLabels.size
(inclusive).the exploration/exploitation tradeoff parameter. epsilon must be in the interval [0, 1]. 0 indicates never select an action randomly. 1 indicates always select an action randomly.
a function that generates a salt for the randomization layer. This salt allows the random choice of which policy to follow to be repeatable.
a list of class labels to output for the final type. Also note that the size of this controls the number of actions. If the submodel returns a score < 1 or > classLabels.size (note the 1 offset) then a RuntimeException will be thrown.