public class CompoundUtils
extends java.lang.Object
TermValueProviders).| Constructor and Description |
|---|
CompoundUtils() |
| Modifier and Type | Method and Description |
|---|---|
static java.util.List<Component> |
allSizeComponents(Word word)
Returns all possible components for a compound word
by combining its atomic components.
|
static java.util.Set<Component> |
getPossibleComponentsAt(Word word,
int i)
Gives all possible component of a compound word
at given index.
|
static java.util.List<Pair<Component>> |
innerComponentPairs(Word word)
Produces the set of all pairs of non-overlapping components
for a given word.
|
static java.util.Set<Pair<Component>> |
innerContiguousComponentPairs(Word word)
WARNING: This method does not behave as
innerComponentPairs(Word). |
static Component |
merge(Word word,
int begin,
int end) |
static Component |
merge(Word word,
java.lang.Iterable<? extends Component> components)
Merges
n consecutive components of a compound
word into a single Component object. |
static java.lang.String |
toClassString(java.lang.String s1,
java.lang.String s2) |
static java.util.Set<java.lang.String> |
toIndexStrings(Pair<Component> pair) |
public static java.util.List<Component> allSizeComponents(Word word)
word - the compound wordpublic static Component merge(Word word, java.lang.Iterable<? extends Component> components)
n consecutive components of a compound
word into a single Component object.
The lemma of the returned Component is
the concatenation of the 1st to n-1-th param components' substring
and the last param component's lemma.word - The compound wordcomponents - The list of consecutive components of the word to mergejava.lang.IllegalArgumentException - when the components param is emptyjava.lang.IllegalArgumentException - when the components are not consecutivejava.lang.IllegalArgumentException - when the components offsets do not match with the word size.public static Component merge(Word word, int begin, int end)
word - begin - end - merge(Word, Iterable)public static java.util.List<Pair<Component>> innerComponentPairs(Word word)
word - the compound wordpublic static java.lang.String toClassString(java.lang.String s1,
java.lang.String s2)
public static java.util.Set<Pair<Component>> innerContiguousComponentPairs(Word word)
innerComponentPairs(Word).
This method enforces that returned pairs cover the input word completely and
without any overlap.
Example 1: with a word that is not a compound, it returns an empty list.
Example 2: with a word that is a size-2 compound, it returns the only pair of lemmas possible:
w = "ab|cd"
returnedPairs are [["ab","cd"]]
Example 3: with a word that is a size-3 compound, it returns two pairs of lemmas:
w = "ab|cd|ef"
returnedPairs are [["ab","cded"], ["abcd","ef"]]
Example 4: with a word that is a size-n compound, it returns n-1 pairs of lemmas:
w = "comp1|comp2|...|compn"
returnedPairs are [
["comp1","comp2comp3...compn"],
["comp1comp2","comp3comp4...compn"],
...,
["comp1comp2...compn-1","compn"]
]
word - The input compound wordpublic static java.util.Set<Component> getPossibleComponentsAt(Word word, int i)
word - i -