public class CompoundUtils
extends java.lang.Object
TermValueProviders
).Constructor and Description |
---|
CompoundUtils() |
Modifier and Type | Method and Description |
---|---|
static java.util.List<Component> |
allSizeComponents(Word word)
Returns all possible components for a compound word
by combining its atomic components.
|
static java.util.List<Pair<java.lang.String>> |
asLemmaPairs(Word word)
WARNING: This method does not behave as
innerComponentPairs(Word) . |
static java.util.List<Pair<Component>> |
innerComponentPairs(Word word)
Produces the set of all pairs of non-overlapping components
for a given word.
|
static Component |
merge(Word word,
java.lang.Iterable<? extends Component> components)
Merges
n consecutive components of a compound
word into a single Component object. |
static java.lang.String |
toIndexString(Pair<Component> pair) |
public static java.util.List<Component> allSizeComponents(Word word)
word
- the compound wordpublic static Component merge(Word word, java.lang.Iterable<? extends Component> components)
n
consecutive components of a compound
word into a single Component
object.
The lemma
of the returned Component
is
the concatenation of the 1st to n-1-th param components' substring
and the last param component's lemma
.word
- The compound wordcomponents
- The list of consecutive components of the word to mergejava.lang.IllegalArgumentException
- when the components
param is emptyjava.lang.IllegalArgumentException
- when the components
are not consecutivejava.lang.IllegalArgumentException
- when the components offsets do not match with the word
size.public static java.util.List<Pair<Component>> innerComponentPairs(Word word)
word
- the compound wordpublic static java.util.List<Pair<java.lang.String>> asLemmaPairs(Word word)
innerComponentPairs(Word)
.
This method enforces that returned pairs cover the input word completely and
without any overlap.
Example 1: with a word that is not a compound, it returns an empty list.
Example 2: with a word that is a size-2 compound, it returns the only pair of lemmas possible:
w = "ab|cd"
returnedPairs are [["ab","cd"]]
Example 3: with a word that is a size-3 compound, it returns two pairs of lemmas:
w = "ab|cd|ef"
returnedPairs are [["ab","cded"], ["abcd","ef"]]
Example 4: with a word that is a size-n compound, it returns n-1 pairs of lemmas:
w = "comp1|comp2|...|compn"
returnedPairs are [
["comp1","comp2comp3...compn"],
["comp1comp2","comp3comp4...compn"],
...,
["comp1comp2...compn-1","compn"]
]
word
- The input compound word