org.pmml4s.transformations

At various places the mining models use simple functions in order to map user data to values that are easier to use in the specific model. For example, neural networks internally work with numbers, usually in the range from 0 to 1. Numeric input data are mapped to the range [0..1], and categorical fields are mapped to series of 0/1 indicators.

PMML defines various kinds of simple data transformations:

  • Normalization: map values to numbers, the input can be continuous or discrete.
  • Discretization: map continuous values to discrete values.
  • Value mapping: map discrete values to discrete values.
  • Text Indexing: derive a frequency-based value for a given term.
  • Functions: derive a value by applying a function to one or more parameters
  • Aggregation: summarize or collect groups of values, e.g., compute average.
  • Lag: use a previous value of the given input field.

Type members

Classlikes

object ACos extends UnaryArithmetic
object ASin extends UnaryArithmetic
object ATan extends UnaryArithmetic
object Abs extends UnaryArithmetic
object Add extends BinaryArithmetic
object And extends MultipleBoolean
class Apply(val function: Function, val children: Array[Expression], val mapMissingTo: Option[Any], val defaultValue: Option[Any], val invalidValueTreatment: InvalidValueTreatment) extends Expression

Apply defines the application of a function. The function itself is identified by name with the function attribute. The actual parameters of the function application are given in the content of the element. Each actual argument value is given by an EXPRESSION and are mapped by position to the formal parameters in the corresponding function definition.

Apply defines the application of a function. The function itself is identified by name with the function attribute. The actual parameters of the function application are given in the content of the element. Each actual argument value is given by an EXPRESSION and are mapped by position to the formal parameters in the corresponding function definition.

object Avg extends MultipleArithmetic
trait BinaryFunction extends Function
object Ceil extends UnaryArithmetic
object Concat extends Function
class Constant(val value: Any, val dataType: Option[DataType], val missing: Boolean) extends LeafExpression

Constant values can be used in expressions which have multiple arguments. . The actual value of a constant is given by the content of the element. For example, 1.05 represents the number 1.05. The dataType of Constant can be optionally specified.

Constant values can be used in expressions which have multiple arguments. . The actual value of a constant is given by the content of the element. For example, 1.05 represents the number 1.05. The dataType of Constant can be optionally specified.

object Cos extends UnaryArithmetic
object CosH extends UnaryArithmetic
object CountHits extends Enumeration
  • allHits: count all hits
  • bestHits: count all hits with the lowest Levenshtein distance
class DefineFunction(val name: String, val parameterFields: Array[ParameterField], val expr: Expression, val opType: OpType, val dataType: DataType) extends Function with HasOpType with HasDataType with PmmlElement

Defines new (user-defined) functions as variations or compositions of existing functions or transformations. The function's name must be unique and must not conflict with other function names, either defined by PMML or other user-defined functions. The EXPRESSION in the content of DefineFunction is the function body that actually defines the meaning of the new function. The function body must not refer to fields other than the parameter fields.

Defines new (user-defined) functions as variations or compositions of existing functions or transformations. The function's name must be unique and must not conflict with other function names, either defined by PMML or other user-defined functions. The EXPRESSION in the content of DefineFunction is the function body that actually defines the meaning of the new function. The function body must not refer to fields other than the parameter fields.

class DerivedField(val name: String, val displayName: Option[String], val dataType: DataType, val opType: OpType, val values: Array[Value], val expr: Expression) extends DataField with Expression

Provides a common element for the various mappings. They can also appear at several places in the definition of specific models such as neural network or Naive Bayes models. Transformed fields have a name such that statistics and the model can refer to these fields.

Provides a common element for the various mappings. They can also appear at several places in the definition of specific models such as neural network or Naive Bayes models. Transformed fields have a name such that statistics and the model can refer to these fields.

class Discretize(val discretizeBins: Array[DiscretizeBin], val field: Field, val mapMissingTo: Option[Any], val defaultValue: Option[Any], val dataType: Option[DataType]) extends FieldExpression

Discretization of numerical input fields is a mapping from continuous to discrete values using intervals.

Discretization of numerical input fields is a mapping from continuous to discrete values using intervals.

class DiscretizeBin(val interval: Interval, val binValue: Any) extends PmmlElement
object Divide extends BinaryArithmetic
object Equal extends BinaryBoolean
object Erf extends UnaryArithmetic
object Exp extends UnaryArithmetic
object Expm1 extends UnaryArithmetic
trait Expression extends Evaluator with PmmlElement

Trait of Expression that defines how the values of the new field are computed.

Trait of Expression that defines how the values of the new field are computed.

Companion:
object
object Expression
Companion:
class
class FieldColumnPair(val field: Field, val column: String) extends PmmlElement
class FieldRef(val field: Field, val mapMissingTo: Option[Any]) extends FieldExpression with MixedEvaluator

Field references are simply pass-throughs to fields previously defined in the DataDictionary, a DerivedField, or a result field. For example, they are used in clustering models in order to define center coordinates for fields that don't need further normalization.

Field references are simply pass-throughs to fields previously defined in the DataDictionary, a DerivedField, or a result field. For example, they are used in clustering models in order to define center coordinates for fields that don't need further normalization.

A missing input will produce a missing result. The optional attribute mapMissingTo may be used to map a missing result to the value specified by the attribute. If the attribute is not present, the result remains missing.

object Floor extends UnaryArithmetic
trait Function extends PmmlElement
object GreaterThan extends BinaryCompare
object Hypot extends BinaryArithmetic
object If extends Function
object IsIn extends Function
object IsMissing extends UnaryBoolean
object IsNotIn extends Function
object IsNotMissing extends UnaryBoolean
object IsNotValid extends UnaryBoolean
object IsValid extends UnaryBoolean
object LessOrEqual extends BinaryCompare
object LessThan extends BinaryCompare
class LinearNorm(val orig: Double, val norm: Double) extends PmmlElement
object Ln extends UnaryArithmetic
object Ln1p extends UnaryArithmetic
object LocalTermWeights extends Enumeration
  • termFrequency: use the number of times the term occurs in the document (x = freqi).
  • binary: use 1 if the term occurs in the document or 0 if it doesn't (x = χ(freqi)).
  • logarithmic: take the logarithm (base 10) of 1 + the number of times the term occurs in the document. (x = log(1 + freqi))
  • augmentedNormalizedTermFrequency: this formula adds to the binary frequency a "normalized" component expressing the frequency of a term relative to the highest frequency of terms observed in that document (x = 0.5 * (χ(freqi) + (freqi / maxk(freqk))) )

LocalTransformations holds derived fields that are local to the model.

LocalTransformations holds derived fields that are local to the model.

object Log10 extends UnaryArithmetic
object Lowercase extends UnaryString
class MapValues(val fieldColumnPairs: Array[FieldColumnPair], val table: Table, val outputColumn: String, val mapMissingTo: Option[Any], val defaultValue: Option[Any], val dataType: Option[DataType]) extends Expression

Any discrete value can be mapped to any possibly different discrete value by listing the pairs of values. This list is implemented by a table, so it can be given inline by a sequence of XML markups or by a reference to an external table.

Any discrete value can be mapped to any possibly different discrete value by listing the pairs of values. This list is implemented by a table, so it can be given inline by a sequence of XML markups or by a reference to an external table.

object Matches extends BinaryBoolean
object Max extends MultipleArithmetic
object Median extends MultipleArithmetic
object Min extends MultipleArithmetic
object Modulo extends BinaryArithmetic
object Multiply extends BinaryArithmetic
class NormContinuous(val linearNorms: Array[LinearNorm], val field: Field, val mapMissingTo: Option[Double], val outliers: OutlierTreatmentMethod) extends NumericFieldExpression

Normalization provides a basic framework for mapping input values to specific value ranges, usually the numeric range [0 .. 1]. Normalization is used, e.g., in neural networks and clustering models.

Normalization provides a basic framework for mapping input values to specific value ranges, usually the numeric range [0 .. 1]. Normalization is used, e.g., in neural networks and clustering models.

Defines how to normalize an input field by piecewise linear interpolation. The mapMissingTo attribute defines the value the output is to take if the input is missing. If the mapMissingTo attribute is not specified, then missing input values produce a missing result.

class NormDiscrete(val field: Field, val value: Any, val mapMissingTo: Option[Double]) extends FieldExpression

Encode string values into numeric values in order to perform mathematical computations. For example, regression and neural network models often split categorical and ordinal fields into multiple dummy fields. This kind of normalization is supported in PMML by the element NormDiscrete.

Encode string values into numeric values in order to perform mathematical computations. For example, regression and neural network models often split categorical and ordinal fields into multiple dummy fields. This kind of normalization is supported in PMML by the element NormDiscrete.

An element (f, v) defines that the unit has value 1.0 if the value of input field f is v, otherwise it is 0.

The set of NormDiscrete instances which refer to a certain input field define a fan-out function which maps a single input field to a set of normalized fields.

If the input value is missing and the attribute mapMissingTo is not specified then the result is a missing value as well. If the input value is missing and the attribute mapMissingTo is specified then the result is the value of the attribute mapMissingTo.

object Not extends UnaryFunction
object NotEqual extends BinaryBoolean
object Or extends MultipleBoolean
class ParameterField(val name: String, val opType: OpType, val dataType: DataType, displayName: Option[String]) extends AbstractField
object Pow extends BinaryArithmetic
object RInt extends UnaryArithmetic
object Replace extends TernaryFunction
object Round extends UnaryArithmetic
<DefineFunction name="SAS-EM-String-Normalize" optype="categorical" dataType="string">
<ParameterField name="FMTWIDTH" optype="continuous"/>
<ParameterField name="AnyCInput" optype="categorical"/>
<Apply function="trimBlanks">
  <Apply function="uppercase">
    <Apply function="substring">
    <FieldRef field="AnyCInput"/>
    <Constant>1</Constant>
    <Constant>FMTWIDTH</Constant>
    </Apply>
  </Apply>
</Apply>
</DefineFunction>
<DefineFunction name="SAS-FORMAT-$CHARw" optype="categorical" dataType="string">
<ParameterField name="FMTWIDTH" optype="continuous"/>
<ParameterField name="AnyCInput" optype="continuous"/>
<Apply function="substring">
  <FieldRef field="AnyCInput"/>
  <Constant>1</Constant>
  <Constant>FMTWIDTH</Constant>
</Apply>
</DefineFunction>
<DefineFunction name="SAS-FORMAT-BESTw" optype="categorical" dataType="string">
<ParameterField name="FMTWIDTH" optype="continuous"/>
<ParameterField name="AnyNInput" optype="continuous"/>
<Apply function="formatNumber">
  <FieldRef field="AnyNInput"/>
  <Constant>FMTWIDTH</Constant>
</Apply>
</DefineFunction>
object Sin extends UnaryArithmetic
object SinH extends UnaryArithmetic
object Sqrt extends UnaryArithmetic
object Substring extends TernaryFunction
object Subtract extends BinaryArithmetic
object Sum extends MultipleArithmetic
object Tan extends UnaryArithmetic
object TanH extends UnaryArithmetic
class TextIndex(val field: Field, val expression: Expression, val textIndexNormalizations: Array[TextIndexNormalization], val localTermWeights: LocalTermWeights, val isCaseSensitive: Boolean, val maxLevenshteinDistance: Int, val countHits: CountHits, val wordSeparatorCharacterRE: String, val tokenize: Boolean) extends NumericFieldExpression

The TextIndex element fully configures how the text in textField should be processed and translated into a frequency metric for a particular term of interest. The actual frequency metric to be returned is defined through the localTermWeights attribute.

The TextIndex element fully configures how the text in textField should be processed and translated into a frequency metric for a particular term of interest. The actual frequency metric to be returned is defined through the localTermWeights attribute.

Companion:
object
object TextIndex
Companion:
class
class TextIndexNormalization(val table: Table, val isCaseSensitive: Option[Boolean], val maxLevenshteinDistance: Option[Int], val wordSeparatorCharacterRE: Option[String], val tokenize: Option[Boolean], val inField: String, val outField: String, val regexField: String, val recursive: Boolean) extends PmmlElement

A TextIndexNormalization element offers more advanced ways of normalizing text input into a more controlled vocabulary that corresponds to the terms being used in invocations of this indexing function. The normalization operation is defined through a translation table, specified through a TableLocator or InlineTable element.

A TextIndexNormalization element offers more advanced ways of normalizing text input into a more controlled vocabulary that corresponds to the terms being used in invocations of this indexing function. The normalization operation is defined through a translation table, specified through a TableLocator or InlineTable element.

class TransformationDictionary(val fields: Array[DerivedField], val defineFunctions: Array[DefineFunction]) extends Dictionary[DerivedField] with Transformer with FunctionProvider with PmmlElement

The TransformationDictionary allows for transformations to be defined once and used by any model element in the PMML document.

The TransformationDictionary allows for transformations to be defined once and used by any model element in the PMML document.

object TrimBlanks extends UnaryString
trait UnaryFunction extends Function
object Uppercase extends UnaryString

Defines several user-defined functions produced by various vendors, actually, well-defined "DefineFunction" is fully supported by pmml4s, while some could be not. Here is the place for those user-defined functions are not well defined.

Defines several user-defined functions produced by various vendors, actually, well-defined "DefineFunction" is fully supported by pmml4s, while some could be not. Here is the place for those user-defined functions are not well defined.