public class CRF extends java.lang.Object implements SequenceLabeler<double[]>, java.io.Serializable
A CRF is a Markov random field that was trained discriminatively. Therefore it is not necessary to model the distribution over always observed variables, which makes it possible to include arbitrarily complicated features of the observed variables into the model. This class implements an algorithm that trains CRFs via gradient tree boosting. In tree boosting, the CRF potential functions are represented as weighted sums of regression trees, which provide compact representations of feature interactions. So the algorithm does not explicitly consider the potentially large parameter space. As a result, gradient tree boosting scales linearly in the order of the Markov model and in the order of the feature interactions, rather than exponentially as in previous algorithms based on iterative scaling and gradient descent.
Modifier and Type | Class and Description |
---|---|
static class |
CRF.Trainer
Trainer for CRF.
|
Modifier and Type | Method and Description |
---|---|
double[] |
featureset(double[] features,
int label)
Returns a feature set with the class label of previous position.
|
int[] |
featureset(int[] features,
int label)
Returns a feature set with the class label of previous position.
|
boolean |
isViterbi()
Returns true if using Viterbi algorithm for sequence labeling.
|
int[] |
predict(double[][] x)
Predicts the sequence labels.
|
int[] |
predict(int[][] x) |
CRF |
setViterbi(boolean viterbi)
Sets if using Viterbi algorithm for sequence labeling.
|
public double[] featureset(double[] features, int label)
features
- the indices of the nonzero features.label
- the class label of previous position as a feature.public int[] featureset(int[] features, int label)
features
- the indices of the nonzero features.label
- the class label of previous position as a feature.public boolean isViterbi()
public CRF setViterbi(boolean viterbi)
public int[] predict(double[][] x)
SequenceLabeler
predict
in interface SequenceLabeler<double[]>
x
- a sequence. At each position, it may be the original symbol or
a feature set about the symbol, its neighborhood, and/or other information.public int[] predict(int[][] x)