Parsing with Unification Constraints

July 4, 2021 § Leave a comment


We now have all the pieces necessary to integrate feature structures and unification into a parser. Fortunately, the order-independent nature of unification allows us to largely ignore the actual search strategy used in the parser. Once we have associated unification constraints with the context-free rules of the grammar, and feature structures with the states of the search, we can use any of the standard search algorithms.

Integration of Unification into an Earley Parser

The first change involves the various representations used in the original code. Recall that the Earley algorithm operates by using a set of unadorned context-free grammar rules to fill in a data structure, called a chart, with a set of states. At the end of the parse, the states that make up this chart represent all possible parses of the input. Therefore, we begin our changes by altering the representations of both the context-free grammar rules and the states of the chart.

The rules are altered in such a way that in addition to their current components, they also include a feature structure derived from their unification constraints. More specifically, we use the constraints listed with a rule to build a feature structure, represented as a DAG, for use with that rule during parsing.

The final change to the original algorithm concerns the check for states already contained in the chart. In the original algorithm, the ENQUEUE function refused to enter into the chart any state that was identical to one already present in the chart. “Identical” means the same rule, with the same start and finish positions, and the same position of the °. It was this check that allowed the algorithm to, among other things, avoid the infinite recursion problems associated with left-recursive rules.

The problem, of course, is that our states are now more complex since they have complex feature structures associated with them. States that appeared identical under the original criteria might in fact now be different since their associated DAG may differ. One solution to this problem is to extend the identity check to include the DAGs associated with the states, but it turns out that we can improve on this solution.

The motivation for the improvement lies in the motivation for the identity check. Its purpose is to prevent the wasteful addition of a state into the chart whose effect on the parse would be accomplished by an already existing state. Put another way, we want to prevent the entry into the chart of any state that would duplicate the work that will eventually be done by other states. Of course, thi will clearly be the case with identical states, but it turns out it is also the case for states in the chart that are more general than new states being considered.

In sum nothing worthwhile is accomplished by entering into the chart a state that is more specific than a state already in the chart.

Example

A python example of the earley parser (with instruction about how to try it) can be found in this repo.

Tagged: ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

What’s this?

You are currently reading Parsing with Unification Constraints at aboutcomments.

meta

%d bloggers like this: