Feature Structure and Unification Formalism

May 18, 2021 § Leave a comment

From a reductionist perspective, the history of the natural sciences over the last few hundred years can be seen as an attempt to explain the behaviour of larger structures by the combined action of smaller primitives.

In this chapter we introduce the idea that grammatical categories like Vpto, Staht, Non3sgAux, or 3spNP, as well as grammatical rules like S→ NP VP that make use of them, should be thought of as objects that can have complex sets of properties associated with them. The information in these properties is represented as constraints, and so these kinds of models are often called constraint based formalisms.

Naive models of grammatical phenomena such as agreement and subcategorization can lead to overgeneration problems. These new categories led , in turn, to an explosion in the number of grammar rules and a corrsponding loss of generality of the grammar. A constraint-based representation scheme will allow us to represent fine-grained information about number and person agreement, and subcategorization, as well as semantic categories like mass/count.

Constraint-based formalism have other advantages, such as the ability to model more complex phenomena than does context-free grammar, and the ability to efficiently and conveniently compute semantics for synctactic representations.

Of course, simply associating these properties with various words and phrases does not solve any of our overgeneration problems. To make these properties useful, we need the ability to perform simple operations, such as equality tests, on them. By pairing such tests with our grammar rules, we can add various constraints to help ensure that only grammatical strings are generated by the grammar.

The remainder of this chapter provides the details of one computational implementation of a constraint-based formalism, based on feature structures and unification. The next section describes feature structures, the representation used to capture the kinds of grammatical properties we have in mind.

Feature structures

One of the simplest ways to encode the kind of properties that we have in mind is through the use of feature structures. These are simply sets of feature-value pairs, where features are unanalyzable atomic symbols drawn from some finite set, and values are either atomic symbols or feature structures themselves. Such feature structures are illustrated with the following kind of diagram, called attribute-value matrix or AVM:

FEATURE1 value1

FEATURE2 value2




FEATUREn valuen

As mentioned earlier, features are not limited to atomic symbols as their values;they can also have other features structures as their values. This lumping together can be captured by an AGREEMENT feature that takes a feature structure consisting of the NUMBER and PERSON feature-value pairs as its value.

This ability to use feature structures as values leads fairly directly to the notion of feature path. A feature path is noting more than a sequence of features through a feature structure leading to a particular value.

In these diagrams, feature structures are depicted as directed graphs where features appear as labeled edges and values as nodes.

An additional important kind of feature structure: one that contains features that actually share some feature structure as value. Such feature structures are referred to as reentrant structures. What we have in mind here is not the simple idea that two features might have equal values, but rather that they share precisely the same feature structure (or node of the graph). These two cases can be distinguished clearly if we think in terms of path through a graph. In the case of a reentrant structure, two features paths actually lead to the same node in the structure.

These simple structure give us the ability to express linguistic generalizations in surprisingly compact and elegant ways.

Unification of Features Structures

As noted earlier, features structures would be of little use without being able to perform reasonably efficient and powerful operations on them. The two principal operations we need to perform are merging the information content of two structures and rejecting the merger of structures that are incompatible. A single computational technique, called unification, suffices for both of these purposes.

Unification is a binary operation (represented here as U) that accepts two feature structures as arguments and returns a feature structure when it succeeds.

The value [] in the second structure indicates that the value has been left unspecified. A feature with such a [] value can be successfully matched to any value in a corresponding feature in antoher structure. The result of this type of unification is a structure with the value provided by the more specific, not null, value.

The result of the unification can be a merger of the original two structures into one larger structure. It is important to understand why these structures are judged to be compatible: they are compatible because they contain no features that are explicitly incompatible – that they each contain a feature value pair that the other lacks does not cause the unification to fail.

Unification can be seen as a way of merging the information in each feature structure or of the describing object that satisfy both sets of constraints. Intuitively, unifying two feature structures produces a new feature structure that is more specific (has more information) than, or is identical to, either of the input feature structures. We say that a less specific (more abstract) feature structure subsumes an equally of more specific one.

Since every feature structure is subsumed by the empty structure [], the relation among feature structures can be defines as a semilattice. Unification can be defined in terms of the subsumtion semilattice.

Since the information ordering defined by subsumption is a semilattice, the unification operation is monotonic. This means that if some description is true of a feature struture, unifying it with another feature structure results in a feature structure that still satisfies the original description. The unification operation is therefore associative; given a finite set of feature strutures to unify, we can check them in any order and get the same result.

To summarize, unification is a way of implementing the integration of knowledge from different constraints. Given two compatible feature structures as input, unification produces tha most general feature structure that nonetheless contains all the information in the input. Given two incompatible feature structures, it fails.

Feature Structures in the Grammar

Our primary purpose in introducing feature structures and unification has been to provide a way to elegantly express syntactic constrains that would be difficult to express with the mechanism of context-free grammars alone. Our next step, therefore, is to specify a way to integrate feature structures and unification operations into the specification of a grammar. We can accomplish this by augmenting the rules of ordinary context-free grammars with attachments that specify feature structures for the constituents of the rules, along with appropriate unification operations that express constraints on those constituents.

Taking a step back from these notation, it is important to note that in this approach the simple generative nature of context-free rules has been fundamentally changed by this augmentation. Ordinary context-free rules are based on the simple notion of concatenation. In the new scheme, this concatenation must be accomplished by a successful unification operation. This leads naturally to questions about the computational complexity of the unification operation and its effect on the generative power of this new grammar.

  • The elements of context-free grammar rules will have feature-based constraints associated with them. This reflects a shift from atomic grammatical categories to more complex categories with properties.
  • The constraints associated with individual rules can make reference to the feature structures associated with the parts of the rule to which they are attached.

Head Features

The preceding section introduced the notion of copying feature structures from phrase-structure children to their parents. This turn out to be a specific instance of a much more general phenomenon in constraint based grammar. Specifically, the features for most grammatical categories are copied from one of the children to the parent. This child that provides the features is called the head of the phrase, and the features copied are referred to as head features.

As a result, we can say that the agreement feature structure is a head feature. We can rewrite our rules to reflect these generalizations by placing the agreement feature structure under a HEAD feature and then copying the feature upward.


Recall that subcategorization is the notion that verbs can be picky about the patterns of arguments they will allow themselves to appear with.

The proper way to avoid this proliferation is to introduce feature structures to distinguish among the various members of the verb category. We can accomplish this goal by associating with each of the verbs of the lexicon an atomic feature, called SUBCAT, with an appropriate value.

This is a somewhat clumsy approach since these unanalyzable SUBCAT symbols to not directly encode either the number or type of the arguments that the verb expects to take. To see this, note that we cannot simply examine a verb’s entry in the lexicon and know what its subcategorization frame is. Rather, we must use the value of the SUBCAT feature indirectly as a pointer to those verb phrase rules in the grammar that can accept the verb in question. A more elegant solution, which makes better use of the expressive power of feature structures, allows the verb entries to directly specify the order and type of their arguments.

Combining even a limited set of phrase types results ina very large set of possible subcategorization frames.

Although the notion of subcategorization, or valence as it is often called, was originally conceived for verbs, more recent work has focused on the fact that many other kinds of words exhibit forms of valence-like behavious.

Manu adjectives and nouns also have subcategorization frames.

Verbs express subcategorization constraints on their subjects as well as their complments.

Long-Distance Dependencies

The model of subcategorization we have developed so far has two components. Each headword has a SUBCAT feature that contains a list of the complements it expects. Then, phrasal rules like the VP rule match up each expected complement in the SUBCAT list with an actual constituents. The mechanism works fine when the complements of a verb are in fact to be found in the verb phrase.

Sometimes, however, a constituent subcategorized for by the verb is not locally instantiated but stands in a long-distance relationship with its predicate.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

What’s this?

You are currently reading Feature Structure and Unification Formalism at aboutcomments.


%d bloggers like this: