Class BinaryEncoder

java.lang.Object
smile.feature.extraction.BinaryEncoder
All Implemented Interfaces:
Function<smile.data.Tuple,int[]>

public class BinaryEncoder extends Object implements Function<smile.data.Tuple,int[]>
Encodes categorical features using sparse one-hot scheme. The categorical attributes will be converted to binary dummy variables in a compact representation in which only indices of nonzero elements are stored in an integer array. In Maximum Entropy Classifier, the data are expected to store in this format.
  • Constructor Summary

    Constructors
    Constructor
    Description
    BinaryEncoder(smile.data.type.StructType schema, String... columns)
    Constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    int[][]
    apply(smile.data.DataFrame data)
    Generates the compact representation of sparse binary features for a data frame.
    int[]
    apply(smile.data.Tuple x)
    Generates the compact representation of sparse binary features for given object.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface java.util.function.Function

    andThen, compose
  • Constructor Details

    • BinaryEncoder

      public BinaryEncoder(smile.data.type.StructType schema, String... columns)
      Constructor.
      Parameters:
      schema - the data frame schema.
      columns - the column names of categorical variables. If empty, all categorical columns will be used.
  • Method Details

    • apply

      public int[] apply(smile.data.Tuple x)
      Generates the compact representation of sparse binary features for given object.
      Specified by:
      apply in interface Function<smile.data.Tuple,int[]>
      Parameters:
      x - an object of interest.
      Returns:
      an integer array of nonzero binary features.
    • apply

      public int[][] apply(smile.data.DataFrame data)
      Generates the compact representation of sparse binary features for a data frame.
      Parameters:
      data - a data frame.
      Returns:
      the binary feature vectors.