Holds attributes of an Anomaly Detection Model.
Anomaly detection (also outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a data set.
The Association Rule model represents rules where some set of items is associated to another set of items.
We consider association rules of the form "<antecedent itemset> => <consequent itemset>" next:
Defines input attributes for each scorecard characteristic are defined in terms of predicates.
For a discrete field, each BayesInput contains the counts pairing the discrete values of that field with those of the target field.
Contains several BayesInput elements.
Contains the counts associated with the values of the target field.
Defines the point allocation strategy for each scorecard characteristic (numeric or categorical).
Envelopes for all scorecard characteristics.
A cluster is defined by its center vector or by statistics.
A cluster model basically consists of a set of clusters.
Coefficient αi is described
Used to store the support vector coefficients αi and b.
Comparisons is a matrix which contains the similarity values or distance values, depending on the attribute modelClass in ClusteringModel.
Defines ComplexPartialScore, the actual partial score is the value returned by the EXPRESSION (see org.pmml4s .transformations for more information).
CompoundRule consists of a predicate and one or more rules.
Defines the connections coming into that parent element.
Stores coordinate-by-coordinate variances (diagonal cells) and covariances (non-diagonal cells).
List of covariate names.
DataModel is a container for all info about metadata, it's the parent model of all predictive models.
Model Composition
List of factor (categorical predictor) names.
Definition of a general regression model.
Serves as an envelope for all the fields included in the training instances.
Obviously the id of an Item must be unique.
Item references point to elements of type Item
encapsulates several KNNInput elements which define the fields used to query the k-NN model, one KNNInput element per field.
The element KohonenMap is appropriate for clustering models that were produced by a Kohonen map algorithm.
Linear basis functions which lead to a hyperplane as classifier.
Contains an array of non-negative real values, it is required when the algorithm type is clusterMeanDist.
The element MiningModel allows precise specification of the usage of multiple models within one PMML file.
MissingValueWeights is used to adjust distance or similarity measures for missing data.
Abstract class that represents a PMML model
Naïve Bayes uses Bayes' Theorem, combined with a ("naive") presumption of conditional independence, to predict the value of a target (output), from evidence given by one or more predictor (input) fields.
k-Nearest Neighbors (k-NN) is an instance-based learning algorithm.
Defines how input fields are normalized so that the values can be processed in the neural network.
An input neuron represents the normalized value for an input field.
A neural network has one or more input nodes and one or more neurons.
Defines how the output of the neural network must be interpreted.
Contains an identifier id which must be unique in all layers.
This element is an encapsulation for either defining a split or a leaf in a tree model.
Cell in the ParamMatrix.
Matrix of Parameter estimate covariances.
Cell in the PPMatrix.
Predictor-to-Parameter correlation matrix.
PairCounts lists, for a field Ii's discrete value Iij, the TargetValueCounts that pair the value Iij with each value of the target field.
Parameter matrix.
Each Parameter contains a required name and optional label.
Lists all Parameters.
Polynomial basis functions which lead to a polynome classifier.
Describes a categorical (factor) or a continuous (covariate) predictor for the model.
Radial basis functions, the most common kernel type K(x,y) = exp(-gamma*||x - y||2)
The regression functions are used to determine the relationship between the dependent variable (target field) and one or more independent variables.
Describes how rules are selected to apply the model to a new case
Ruleset models can be thought of as flattened decision tree models.
A data mining model contains a set of input fields which are used to predict a certain target value.
Holds attributes of a Scorecard.
Sigmoid kernel functions for some models of Neural Network type K(x,y) = tanh(gamma*<x,y>+coef0)
SimpleRule consists of an identifier, a predicate, a score and information on rule performance.
SupportVector which only has the attribute vectorId - the reference to the support vector in VectorDictionary.
Holds a single instance of an SVM.
Support Vector Machine models for classification and regression are considered.
Contains all support vectors required for the respective SVM instance.
Lists the counts associated with each value of the target field, However, a TargetValueCount whose count is zero may be omitted.
Used for a continuous input field Ii to define statistical measures associated with each value of the target field.
Serves as the envelope for element TargetValueStat.
Encapsulates the definition of the fields included in the training instances as well as their values.
Holds attributes of a Tree model
The TreeModel in PMML allows for defining either a classification or prediction structure.
Contains the set of support vectors which are of the typeVectorInstance.
Defines which entries in the vectors correspond to which fields.
A data vector given in dense or sparse array format.
Defines model types used by the anomaly model.
An informational string describing the technique used by the model designer to establish the baseline scores.
Definition is used for specifying a cumulative link function used in ordinalMultinomial model.
The probability distribution of the dependent variable for generalizedLinear model.
Specifies the type of regression model in use.
Definition is used for specifies the type of link function to use when generalizedLinear model type is specified.
The missing prediction treatment options are used when at least one model for which the predicate in the Segment evaluates to true has a missing result.
Defines a strategy for dealing with missing values.
Specifying how all the models applicable to a record should be combined.
A normalization method softmax ( pj = exp(yj) / Sumi(exp(yi) ) ) or simplemax ( pj = yj / Sumi(yi) ) can be applied to the computed activation values.
Defines what to do in situations where scoring cannot reach a leaf node.
Describes how reason codes shall be ranked.
Specifies the type of a regression model.
Describes how the prediction is converted into a confidence value (aka probability).
The two most popular methods for multi-class classification are one-against-all (also known as one-against-rest) and one-against-one.
Usually the SVM model uses support vectors to define the model function.
Indicates whether non-leaf Nodes in the tree model have exactly two children, or an unrestricted number of children.
PMML is a standard for XML documents which express trained instances of analytic models. The following classes of model are addressed: