The ApplicabilitySignature is a ClassificationTask for which we are trying to predict the next transition, given that only a subset of possible transitions are applicable.
The ArcEagerGuidedCostFunction uses a gold parse tree to make deterministic decisions about which transition to apply in any given state.
The ArcEagerGuidedCostFunction uses a gold parse tree to make deterministic decisions about which transition to apply in any given state. Since the decision is uniquely determined by the gold parse, the returned map will have only a single mapping that assigns zero cost to the correct transition (all other transitions therefore have an implicit cost of infinity).
The ArcEagerInvertedLeftArc operator creates an inverse arc from the next buffer item to the stack top and then performs a Reduce (see above).
The ArcEagerInvertedLeftArc operator creates an inverse arc from the next buffer item to the stack top and then performs a Reduce (see above).
the label to attach to the created arc
The ArcEagerInvertedRightArc operator creates an inverse arc from the stack top to the next buffer item and then performs a Shift (see above).
The ArcEagerInvertedRightArc operator creates an inverse arc from the stack top to the next buffer item and then performs a Shift (see above).
the label to attach to the created arc
The ArcEagerLeftArc operator creates an arc from the next buffer item to the stack top and then performs a Reduce (see above).
The ArcEagerLeftArc operator creates an arc from the next buffer item to the stack top and then performs a Reduce (see above).
the label to attach to the created arc
The ArcEagerRightArc operator creates an arc from the stack top to the next buffer item and then performs a Shift (see above).
The ArcEagerRightArc operator creates an arc from the stack top to the next buffer item and then performs a Shift (see above).
the label to attach to the created arc
The ArcInverter takes a PolytreeParse and inverts arcs whose labels are in the argument set
inverseArcLabels
.
The ArcInverter takes a PolytreeParse and inverts arcs whose labels are in the argument set
inverseArcLabels
. Note that this operation should only affect the children
field of a
PolytreeParse, since the other fields only care about the underlying undirected tree.
The purpose of this class is to convert standard dependency parses into polytree dependency parses. For instance, we may wish to invert all arcs x ---> y for which the arc label is 'det (effectively this would invert the relationship between a determiner and its noun to say that the determiner "requires" the noun, rather than vice-versa).
A BreadcrumbRef is a StateRef (see above) whose apply operation returns the breadcrumb of
the index
th element of the stack, if it exists.
A BreadcrumbRef is a StateRef (see above) whose apply operation returns the breadcrumb of
the index
th element of the stack, if it exists.
the desired stack element, counting from 0 (i.e. 0 is the stack top)
Maps the tokens of a neighborhood to their respective Brown clusters.
Maps the tokens of a neighborhood to their respective Brown clusters.
the Brown clusters
the maximum granularity we want to consider for a Brown cluster (i.e. the depth in the Brown cluster tree)
a label for the transform
A BufferRef is a StateRef (see above) whose apply operation returns the index
th element of
the buffer, if it exists.
A BufferRef is a StateRef (see above) whose apply operation returns the index
th element of
the buffer, if it exists.
the desired buffer element, counting from 0 (i.e. 0 is the front of the buffer)
Generates a feature for each neighborhood histogram and transform in the argument list.
Generates a feature for each neighborhood histogram and transform in the argument list.
the neighborhood histograms
the neighborhood transforms
Iterates through all neighborhoods from all parses in a PolytreeParseSource.
Creates a data source from a file of parse trees.
Creates a data source from a file of parse trees.
the file containing the parse trees
the file format
A ForbiddenArcLabel constraints designates a transition as illegal if it would directly create an arc (in either direction) with the specified label between the tokens at the given indices.
A ForbiddenArcLabel constraints designates a transition as illegal if it would directly create an arc (in either direction) with the specified label between the tokens at the given indices. It also implicitly creates a RequestedArc constraint for the specified arc (basically it says that we DO want an arc between the specified indices, just not with this label).
Note that argument order (of the token indices) does not matter for the constructor.
index of the first token
index of the second token
label that is forbidden between the two tokens
A ForbiddenEdge constraint designates a transition as illegal if it would directly create an arc (in either direction) between the tokens at the given indices.
A ForbiddenEdge constraint designates a transition as illegal if it would directly create an arc (in either direction) between the tokens at the given indices.
Note that argument order does not matter for the constructor.
index of the first token
index of the second token
A GoldParseSource reduces parse trees to states of a finite-state machine.
A GoldParseSource reduces parse trees to states of a finite-state machine.
the source for the parse trees
the transition system to use (for generating states)
A GoldParseTrainingVectorSource reduces a gold parse tree to a set of feature vectors for classifier training.
A GoldParseTrainingVectorSource reduces a gold parse tree to a set of feature vectors for classifier training.
Essentially, we derive the 2*n parser states that lead to the gold parse. Each of these states becomes a feature vector (using the apply method of the provided TransitionParserFeature), labeled with the transition executed from that state in the gold parse.
One of the constructor arguments is a TaskIdentifer. This will dispatch the feature vectors to train different classifiers. For instance, if taskIdentifier(state) != taskIdentifier(state2), then their respective feature vectors (i.e. feature(state) and feature(state2)) will be used to train different classifiers.
the data source for the parse trees
identifies the ClassificationTask associated with each feature vector
the transition system to use (for generating states)
a trained cost function to adapt (optional)
The KeywordFeature maps a token to its word representation, if its word appears in the
argument set keywords
.
The KeywordFeature maps a token to its word representation, if its word appears in the
argument set keywords
. Otherwise its apply function will return an empty set.
See the definition of TokenFeature (above) for more details about the interface.
The KeywordTransform maps a token to its word representation, if its word appears in the
argument set keywords
.
The KeywordTransform maps a token to its word representation, if its word appears in the
argument set keywords
. Otherwise its apply function will return an empty set (if the
StateRef points to a valid token) or TokenTransform.noTokenHere (if the StateRef points
to an invalid token)
See the definition of TokenTransform (above) for more details about the interface.
Scores parses based on a linear combination of features.
Gets the n-best greedy parses for a given sentence.
A Neighborhood is a sequence of tokens, generally taken from a parse tree.
A Neighborhood is a sequence of tokens, generally taken from a parse tree.
For instance, one might want to consider neighborhoods like: - a node and its children - a node and its parents - a node and its breadcrumb
a sequence of tokens, usually associated in some way (see NeighborhoodExtractors for examples of such associations)
Collects statistics over "neighborhood events."
Collects statistics over "neighborhood events."
An example might help. A neighborhood is a collection of tokens, e.g. a node and its children in a dependency parse. A neighborhood event is a mapping of these tokens to a sequence of strings, e.g. we might map each token to its part-of-speech tag.
Given a corpus of dependency parses, we might want to collect a histogram that tells us how many times each neighborhood event like (VERB, NOUN, NOUN) occurs in the corpus. This is what the NeighborhoodEventStatistic does.
a label for this object
a histogram over observed neighborhoods
a transformation from neighborhoods to events
Maps a parse tree to an iterator over its neighborhoods.
Maps a parse tree to an iterator over its neighborhoods.
Different extractors will define "neighborhood" in different ways. For instance, one might want to consider neighborhoods like: - a node and its children - a node and its parents - a node and its breadcrumb
TODO: create unit tests for all inheriting instances.
A data source for neighborhoods.
A NeighborhoodTransform maps a Neighborhood into an "event" (a sequence of strings).
A NeighborhoodTransform maps a Neighborhood into an "event" (a sequence of strings).
An example might help. Suppose that we have a neighborhood consisting of (node, child1, child2), i.e. three nodes of a parse tree. A transform might map these to the sequence of their POS tags, e.g. ("VERB", "NOUN", "NOUN").
The NumChildrenToTheLeft transform maps a token to how many of its children appear to its
left in the state's tokens
sequence.
The NumChildrenToTheLeft transform maps a token to how many of its children appear to its
left in the state's tokens
sequence.
It takes an argument max
which allows you to specify an upper bound. For instance,
if max
= 3 and a token has 5 children, then applying this transform to that token will return
Set(Symbol("3")), not Set(Symbol("5")).
See the definition of TokenTransform (above) for more details about the interface.
an upper bound on the number of children (anything higher will round down to max
)
The NumChildrenToTheRight transform maps a token to how many of its children appear to its
right in the state's tokens
sequence.
The NumChildrenToTheRight transform maps a token to how many of its children appear to its
right in the state's tokens
sequence. This will only be relevant for nodes on the stack
(it is impossible for a buffer node to be associated with nodes to its right)
It takes an argument max
which allows you to specify an upper bound. For instance,
if max
= 3 and a token has 5 children to its right, then applying this transform to that
token will return Set(Symbol("3")), not Set(Symbol("5")).
See the definition of TokenTransform (above) for more details about the interface.
an upper bound on the number of children (anything higher will round down to max
)
A ParsePool is a collection of parse candidates for the same input sentence.
A ParsePool is a collection of parse candidates for the same input sentence.
a sequence of parse trees
A data source for ParsePool objects.
Contains the key components of a parser (for serialization purposes).
Contains the key components of a parser (for serialization purposes).
the cost function for the transition parser param labelingCostFunction the cost function for the arc labeler
the cost function for parse reranking
the nbest size to generate for reranking
A PolytreeParse is a polytree-structured dependency parse.
A PolytreeParse is a polytree-structured dependency parse. A polytree is a directed graph whose undirected structure is a tree. The nodes of this graph will correspond to an indexed sequence of tokens (think the words from a sentence), whose zeroth element is a reserved 'nexus' token which does not correspond to a word in the original sentence. The nexus must be one of the roots of the directed graph (i.e. it cannot be the child of any node).
Since the undirected structure is a tree, every node (other than the nexus) has a unique neighbor which is one step closer to the nexus than itself (this may be the nexus itself). This neighbor is referred to as the node's 'breadcrumb'.
It has four major fields:
- tokens
is a vector of Token objects (in the order that they appear in the associated
sentence). The zeroth element is assumed to be the nexus.
- breadcrumb
tells you the unique neighbor that is closer to the nexus in the
undirected tree (this can be the nexus itself); for instance, if breadcrumb(5) = 3,
then token 3 is one step closer to the nexus from token 5. The breadcrumb of the nexus
should be -1.
- children
tells you the set of children of a node in the polytree; for instance, if
children(5) = Set(3,6,7), then token 5 has three children: tokens 3, 6, and 7
- arclabels
tells you the labeled neighbors of a node in the undirected tree; for instance,
if arclabels(5) = Set((4, 'det), (7, 'amod)), then token 5 has two neighbors, reached with
arcs labeled 'det and 'amod (the labels are scala Symbol objects)
the parsed sentence (the zeroth token of which should be the nexus)
the breadcrumb of each token (see above definition)
the set of children of each token in the polytree
the set of labeled neighbors of each token in the undirected tree
Maps a scored parse into a feature vector.
A PolytreeParseFeatureUnion merges the output of a list of features.
A PolytreeParseFeatureUnion merges the output of a list of features.
a list of the features we want to merge into a single feature
A data source for PolytreeParse objects.
The PrefixFeature maps a token to the set of its prefixes that are contained in a set of "key" prefixes.
The PrefixFeature maps a token to the set of its prefixes that are contained in a set of "key" prefixes.
See the definition of TokenFeature (above) for more details about the interface.
the set of prefixes to treat as "key" prefixes
The PrefixTransform maps a token to the set of its prefixes that are contained in a set of "key" prefixes.
The PrefixTransform maps a token to the set of its prefixes that are contained in a set of "key" prefixes.
See the definition of TokenTransform (above) for more details about the interface.
the set of prefixes to treat as "key" prefixes
A RequestedArc constraint requests that the output parse MUST contain the requested arc.
A RequestedArc constraint requests that the output parse MUST contain the requested arc.
The arc is specified using the index of the token at the arc's head followed by the index of the token at the arc's tail.
Note: currently this constraint does not pay attention to the arc direction, nor the arc label. It only enforces that that there is some edge between the two specified tokens.
index of the first token
index of the second token
desired label for the arc
Uses the parser model to create an n-best list, then chooses the best parse from this n-best list (according to the reranking function).
Uses the parser model to create an n-best list, then chooses the best parse from this n-best list (according to the reranking function).
configuration object for the parser
Extracts neighborhoods of the form (node, breadcrumb, grandcrumb, ..., root) from a parse tree.
A StackRef is a StateRef (see above) whose apply operation returns the index
th element of
the stack, if it exists.
A StackRef is a StateRef (see above) whose apply operation returns the index
th element of
the stack, if it exists.
the desired stack element, counting from 0 (i.e. 0 is the stack top)
A StateRef allows you to figure out the token that corresponds to a particular aspect of a TransitionParserState.
A StateRef allows you to figure out the token that corresponds to a particular aspect of a TransitionParserState.
For instance, we may want to know what token is at the top of the stack for a given state. Applying StackRef(0) to the state will return the index of the token. More accurately, a set is returned, which will be empty if the StateRef refers to a non-existent element of the state. For instance, applying StackRef(3) to a state whose stack has 3 or fewer elements will return the empty set.
This set of classes is used primarily to facilitate feature creation (e.g. see StateRefFeature).
The StateRefProperty is a ClassificationTask for which we are trying to predict the next transition, given that we know some property of a particular token of the parser state.
The StateRefPropertyIdentifier identifies the ClassificationTask of a parser state according to the coarse part-of-speech tag of a particular word of the state (as identified by a StateRef).
The SuffixFeature maps a token to the set of its suffixes that are contained in a set of "key" suffixes.
The SuffixFeature maps a token to the set of its suffixes that are contained in a set of "key" suffixes.
See the definition of TokenFeature (above) for more details about the interface.
the set of suffixes to treat as "key" suffixes
The SuffixTransform maps a token to the set of its suffixes that are contained in a set of "key" suffixes.
The SuffixTransform maps a token to the set of its suffixes that are contained in a set of "key" suffixes.
See the definition of TokenTransform (above) for more details about the interface.
the set of suffixes to treat as "key" suffixes
Maps the tokens of a neighborhood to a particular property in their token's property map.
The TokenPropertyFeature maps a token to one of its properties.
The TokenPropertyFeature maps a token to one of its properties.
See the definition of TokenFeature (above) for more details about the interface.
The TokenPropertyTransform maps a token to one of its properties.
The TokenPropertyTransform maps a token to one of its properties.
See the definition of TokenTransform (above) for more details about the interface.
A TokenTransform is a function that maps a token to a set of symbols.
A TokenTransform is a function that maps a token to a set of symbols.
The token is described using a TransitionParserState and a StateRef (see the definition of StateRef for details). For instance, using StackRef(0) will cause the TokenTransform to operate on the token at the top of the stack in the current parser state.
The purpose of a TokenTransform is primarily to facilitate feature creation (e.g. see StackRefFeature) by allowing us, say for instance, to map the token at top of the state's stack to its word representation. This would be achieved with:
WordTransform(state, StackRef(0))
A TokenTransformFeature creates a TransitionParserFeature from a TokenTransform and a StateRef.
A TokenTransformFeature creates a TransitionParserFeature from a TokenTransform and a StateRef.
Essentially it simply applies the TokenTransform to the token referenced by the StateRef (see definitions of TokenTransform and StateRef for details).
For instance, suppose we want a binary feature that gives us the word at the top of the stack. We can achieve this with TokenTransformFeature(StackRef(0), WordTransform).
the StateRef that refers to our desired token
the transformation we want to perform on our desired token
A TransitionParser implements a parsing algorithm for a transition-based parser.
A TransitionParserState captures the current state of a transition-based parser (i.e.
A TransitionParserState captures the current state of a transition-based parser (i.e. it
corresponds to a partially constructed PolytreeParse). It includes the following fields:
- the stack
holds the indices of the tokens (note: the index of a token is its index in
the tokens
vector) on the stack. It is a vector of integers. The head of the vector
represents the top of the stack.
- the bufferPosition
is an integer representing the index of the token that is currently
at the front of the buffer.
- breadcrumb
maps the index of a token to its breadcrumb (see
org.allenai.nlpstack.parse.poly.polyparser.PolytreeParse for the definition of breadcrumb). If a token
index does not appear as a key in breadcrumb
, then its breadcrumb has not yet been
determined.
- children
maps the index of a token to the indices of its children (in the partially
constructed polytree).
- arcLabels
maps a pair of token indices to the label of the arc between them. This
presupposes that the two tokens are neighbors in the partially constructed polytree. Note
that the pair of token indices is represented as a Set, so order is irrelevant.
- tokens
is the sequence of tokens in the sentence we are trying to parse. This will be
invariant for all states of a given parsing process.
the indices of the token indices on the 'stack' (stack.head is the stack top)
the index of the token at the front of the 'buffer'
the breadcrumbs of the partially constructed PolytreeParse
the children of the partially constructed PolytreeParse
the arc labels of the partially constructed PolytreeParse
the sentence we want to parse
The ApplicabilitySignatureIdentifier identifies the ClassificationTask of a parser state according to the state's applicability signature.
The ArcEagerReduce operator pops the top stack item.
The ArcEagerShift operator pops the next buffer item and pushes it onto the stack.
Simply passes along the original score of the parse as a feature.
The BreadcrumbAssigned transform maps a token to whether its breadcrumb has been assigned.
The BreadcrumbAssigned transform maps a token to whether its breadcrumb has been assigned.
See the definition of TokenTransform (above) for more details about the interface.
Extracts neighborhoods of the form (node, breadcrumb) from a parse tree.
Extracts neighborhoods of the form (node, child1, ..., childN) from a parse tree.
A FirstRef is a StateRef (see above) whose apply operation returns the first element of the sentence.
The IsBracketedTransform maps a token to a symbol which is 'yes if its word appears between a pair of parentheses, 'no if it is outside of all parentheses pairs, '( if it is a left paren and ') if it is a right paren.
The IsBracketedTransform maps a token to a symbol which is 'yes if its word appears between a pair of parentheses, 'no if it is outside of all parentheses pairs, '( if it is a left paren and ') if it is a right paren. It will return a TokenTransform.noTokenHere if the StateRef points to an invalid token.
See the definition of TokenTransform (above) for more details about the interface.
A LastRef is a StateRef (see above) whose apply operation returns the final element of the sentence.
Extracts neighborhoods of the form (node, leftChild1, ..., leftChildN) from a parse tree.
A function that adds new token properties to a sentence if that token appears within a multi-word expression in the dictionary.
A function that adds new token properties to a sentence if that token appears within a multi-word expression in the dictionary. The new properties are
MultiWordTagger.mweSymbol -> MultiWordTagger.mweValue
and
MultiWordTagger.symbolFor(mwe) -> MultiWordTagger.mweValue
The first property encodes the the fact that the token appears within any MWE. The second property encodes the fact that the token appears within a particular MWE. Tokens that do not occur within a particular MWE will not be given any additional properties.
Extracts neighborhoods of the form (node, parent1, ..., parentN) from a parse tree.
Extracts neighborhoods of the form (node, rightChild1, ..., rightChildN) from a parse tree.
Simply passes along the length of the sentence as a feature.
The WordFeature maps a token to its word representation.
The WordFeature maps a token to its word representation.
See the definition of TokenFeature (above) for more details about the interface.
The WordTransform maps a token to its word representation.
The WordTransform maps a token to its word representation.
See the definition of TokenTransform (above) for more details about the interface.
The ApplicabilitySignature is a ClassificationTask for which we are trying to predict the next transition, given that only a subset of possible transitions are applicable.
If we choose this as our ClassificationTask, we will train separate classifiers for parser states that have different ApplicabilitySignatures.
true iff Shift is applicable
true iff Reduce is applicable
true iff LeftArc and InvertedLeftArc are both applicable (for any labeling)
true iff RightArc and InvertedRightArc are both applicable (for any labeling)