Context-free Grammars

October 18, 2020 § Leave a comment

Formal Grammars of English

In thi chapter, we introduce three main new ideas: constituency, grammatical relations, and  subcategorization and dependency.

The fundamental idea of constituency is that groups of words may behave as a single unit or phrase, called a constituent.

This chapter introduces the use of context free grammars, a formalism that will allow us to model these constituency facts.

Grammatical relations are a formalization of ideas from a traditional grammar such as subjects and objects and other related notions.

Subcategorization and dependency relations refer to certain kinds of relation between words and phrases. For example, the verb want can be followed by any infinitive, but the verb find cannot be followed by an infinitive. These are called facts about the subcategorization of the verb.

As we show, none of the syntactic mechanism that we’ve discussed up until now can easily capture such phenomena. They can be modeled much more naturally by grammars that are based on context-free-grammars. Context-free grammars are thus the backbone of many formal models of the syntax of natural language (and, for that matter, of computer languages). They are powerful enough to express sophisticated relations among the words in a sentence, yet computationally tractable enough that efficient algorithms exist for parsing sentences with them.


How do words group together in ENglish? Consider the noun phrase, a sequence of word surrounding at least one noun. How do we know that these words group together (or “form constituents”)? One piece of evidence is that they can all appear in similar syntactic environments, for example, before a verb.

But while the whole noun phrase can occur before a verb, this is not true of each of the individual words that make up a noun phrase.

Context-Free Grammars

The most commonly used mathematical system for modeling constituent structure in ENglish and in other natural languages is the Context-Free Grammar, or CFG. Context Free grammars are also called Phrase-Structure Grammars, and the formalism is equivalent to what is also called Backus-Naur Form, BNF.

A context free grammar consist of a set of rules or productions, each of which expresses the ways that symbols of the language can be grouped and ordered together, and a lexicon of words and symbols.

Context free rules can be hierarchically embedded, so we can combine the previous rules with others. Th symbols that are used in a CFG are divided into two classes. The symbols that correspond to words in the language (“the”, “nightclub”) are called terminal symbols; the lexicon is the set of rules that introduces these terminal symbols. The symbols that express clusters or generalizations of these are called non-terminals. In each context free rule, the itam to the right of the arrow (->) is an ordered list of one or more terminals and nonn terminals; to the left of the arrow is a single non-terminal symbol expressing some cluster or generalization. Notice that in the lexicon, the non-terminal associated with each word is its lexical category, or part of speech (POS).

A CFG can be thought of in two ways: as a device for generating sentences and as a device for assigning a structure to a given sentence.

The sequence of rules expansion is called a derivation of the string of words. It is common to represent a derivation by a parse tree:

enn Treebank), a phrase is placed at the beginning of the sentence for discourse purposes.

What differentiates sentence constructions from the rest of the grammar is the notion that they are in some sense complete. In this way they correspond to the notion of clause, which traditional grammars often describe as forming a complete thought. One way of making this notion of “complete thought” more precise is to say an S is a node of the parse tree below which the main verb of the S has all its arguments.

Noun phrases can begin with simple lexical determiners. The role of the determiner in English noun phrases can also be filled by more complex expressions, like a possessive expression. Mass nouns also don’t require determination.

The nominal construction follows the determiner and contains any pre- and post-head noun modifiers. 

Nominal -> Noun

This rule also provides the basis for the bottom of various recursive rules used to capture more complex nominal constructions. A number of different kinds of word classes can appear before the head noun (“the post determiners”). These include cardinal numbers, ordinal numbers and quantifiers. Adjectives occur after quantifiers but before nouns. Adjectives can also be grouped into a phrase called an adjective phrase or AP. APs can have an adverb before the adjective. A head noun can be followed by post modifiers. The three most common kinds of non-finite post modifiers are the gerundive and the infinitive forms.

A post nominal relative clause (more correctly a restrictive relative clause) is a clause that often begins with a relative pronoun. The relative pronoun functions as the subject of the embedded verb. Word classes that modify and appear before Noun phrases are called pre determiners.

Rule proliferation will also have to happen for the noun’s case: for example, English pronouns have nominative and accusative versions. This allows agreement.

The verb phrase consists of the verb and a number of other constituents. In the simple rules we have built so far, these other constituents include NPs and PPs and combination of the two.

The verb phrase can be significantly more complicated than this. Many othe kind of constituents, such an entire embedded sentence, can follow the verb. These are called sentential complements.

Verbs can also be followed by particles, words that resemble a preposition but tha combine with the verb to form a phrasal verb like take off.

Where traditional grammars subcategorize verbs into these two categories (transitive and intransitive), modern grammars distinguish as many as 100 subcategories. We say tha verbs like find subcategorizes for NP, and a verb like want subcategorizes for an NP or a non-finite VP. We also call these constituents the complements of the verb. These possible set of complements are called the subcategorization frame of the verb.

The problem with this approach, as with the same solution to the agreement feature problem, is a vast explosion in the number of rules. The standard solution to both af these problems is the feature structure.

The subclass of verbs called auxiliaries or helping verbs have particular syntactic constraints that can be viewed as a kind of subcategorization. Auxiliaries include the modal verbs can, could, may, might, must, will would, shall, and should. The perfect auxiliary have, the progressive auxiliary be, and the passive auxiliary be. Each of these verbs places a constraint on the form of the following verb, and each of these must also combine in a particular order.

A sentence can have multiple auxiliary verbs, but they must occur in a particular order: modal < perfect < progressive < passive. One way of capturing the ordering constraints among auxiliaries, commonly used in the systemic grammar of Halliday, is to introduce a special constituent called verb group, whose sub constituents include all the auxiliaries as well as the main verb.

The major phrase types discussed here can be conjoined with conjunctions like and, or, and but to form larger constructions of the same type. For example, a coordinate noun phrase can consist of two other noun phrases separated by a conjunction. This relation can be defined through metarules.

X->X and X

This metarule simply states that any non-terminal can be conjoined with the same non terminal to yield a constituent of the same type. Of course, the variable X must be designated as a variable that stands for any non terminal rather than a non-terminal itself.

Context-free grammar rules of the type that we have explored so far can in principle be used to assign a parse tree to any sentence. This means that it is possible to build a corpus in which every sentence is syntactically annotated with a parse tree.

We suggested informally earlier that syntactic constituents could be associated with a lexical head.

Grammar Equivalence and Normal From

A formal language is defined as a (possibly infinite) set of strings of words. This suggest that we could ask if two grammars are equivalent by asking if they generate the same set of strings. In fact, it is possible to have two distinct context-free grammars generate the same language.

We usually distinguish two kinds of grammar equivalence: weak equivalence and strong equivalence. Two grammars are strongly equivalent if they generate the same set of strings and if they assign the same phrase structure to each sentence (allowing merely for renaming of the non terminal symbols). Two grammars are weakly equivalent if they generate the same set of strings but do not assign the same phrase structure to each sentence.

It is sometimes useful to have a normal form for grammars, in which each of the productions takes a particular form. For example, a context free grammar is in Chomsky normal form(CNF) if it is e-free and if in addition each production is either in the form:

A->B C or A-> a

That is, the right hand side of each rule contains two non-terminal symbols or one terminal symbol. Chomsky normal form grammars are binary branching, that is they have binary trees.

The generation of a symbol A with a potentially infinite sequence of symbols B with a rule in the form A -> A B is known as Chomsky-adjunction.

Finite-State and Context-Free Grammars

We argued that adequate models of grammr need to be able to represent complex interrelated facts about constituency, subcategorization, and dependency relations, and we implied that at the least the power of context-free grammars is needed to accomplish this. But why is it that we cannot just use finite state methods to capture these syntactic facts? The answer to this question is critical since a considerable price is paid in term of processing speed when we switch from regular languages to context-free ones.

There are two answers for this question. The first is mathematical: certain syntactic structures present in English and other natural languages make them not regular languages. The second answer is more subjective and has to do with notions of expressiveness; even when finite state methods are capable of dealing with the syntactic facts in question, they often don’t express them in ways that make generalization obvious, lead to understandable formalism, or produce structures of immediate use in subsequent semantic processing.

In other words, a language ca be generated by a finite state machine if and only if the grammar that generates it does not have any center-embedded recursion of this form. Intuitively this is because grammar rules in which the non terminal symbols are always on either the right or Left edge of the rule can be processed iteratively rather than recursively. Such center-embedding rules are needed to deal with artificial problems (formal languages) or for practical problems, such as checking for correctly matching delimiters in programming and markup languages.

So is there no role for finite state methods in syntactic analysis? A quick review of the rules used for noun phrases in this chapter, as well as those used in the Penn Treebank grammar, reveals that a considerable portion of them can be handled by finite state methods. For many practical purposes for which matching syntactic and semantic rules aren’t necessary finite -state rules are quite sufficient.

Dependency Grammars

We have focused in this chapter on context-free grammars because many available treebanks and parsers produce this kind of syntactic representations. But in a class of grammar formalisms called dependency grammars, which are becoming quite important in speech and language processing, constituents and phrase structure rules do not play any important rule. Instead, the syntactic structure of a sentence is described purely in terms of words and binary semantic or syntactic relations between these words.

Indeed the notion in traditional grammar of “parsing a sentence into subject and predicate” is based on lexical relations rather than constituent relations.

Other dependency-based computational grammars, such as Link Grammar, use different but roughly overlapping links. In untyped dependency parsers the links are unlabeled. One advantage of dependency formalism is the strong predictive parsing power that words have for their dependents.

Dependency grammar researchers argue that another advantage of dependency grammars is their ability to handle languages with entirely free word order. A phrase structure grammar would need a separate rule for each possible place in the parse tree where such an adverbial phrase could occur.

The reader may have noticed the similarity between dependency graphs and head structures. In Fact, an unlabeled dependency graph can be automatically derived from a context-free parse through the use of head rules

Categorial grammar is an early lexicalized grammar model. We focus on  combinatory categorial grammar. A categorial grammar has two components. The categorial lexicon associates each word with a syntactic and semantic category. The combinatory rules allow functions and arguments to be combined. There are two types of categories: functors and arguments. Arguments, like nouns, have simple categories like N. Verbs or determiners act as funtors.

Modern categorial grammars include more complex combinatorial rules that are needed for coordination and other complex phenomena, and also include composition of semantic categories as well as syntactic ones.

Do people use context-free grammars in their mental processing of language? It has proved very difficult to find clear-cut evidence that they do. For example, some early experiments asked subjects to judge which word in a sequence were more closely connected, finding that their intuitive groupings corresponded to syntactic constituents.

Distransitive alternation A distransitive verb is one like give that can take two arguments.

Tower of Hanoi

December 24, 2010 § Leave a comment

This is a well known little problem of procedural thinking. I’m on it nearly from the whole afternoon (just to underline how much i’m slow), and i only found out that the solution i can find online is different from the one my teacher gave to the problem.

A visualization of the solution for 4 disks

Tower of Hanoi

I’m starting to wonder how much is he against the free software, so this is the main reason which holds me from speaking about every little difference between the two approaches, given that some of my reader would like this boring list. So i’ll go over..
« Read the rest of this entry »

Latex include and input

December 18, 2010 § Leave a comment

It seems like there is not a complete manual online (and i’m referring to the wikibook manual) about how to use include and input in a LaTeX document. As you probably know LeTeX is a typesetting system, used mostly by computer scientists and academics.

The (l)on(e)ly information that is given in the wikibook manual about LaTeX is about how the include command works but not about how the included document should be structured in order to obtain a sequence in page numbers and section numbers in the toc.

The trick is quite easy to understand if you try yourself all the possibilities but it is better to know how to do it from the first time: you should firstly omit all the commands which deals with the main document structure, for example, « Read the rest of this entry »

Inverse image of a function

October 31, 2010 § Leave a comment

The first post of the blog is about sets and applications. The problem could be easily explained by saying that the inverse image of a function is actually the image of the “inverse” of this function, but let us say that things are not so, cause there are rules, in mathematics, which state that a function can have an inverse only if it respects some standard. Those standards are not met by every function (it should be bijective), but a function does not need to meet those standards to have an inverse image. There is a post in google docs which explains the problem in mathematical terms; the inverse image of a function results in a subset of the domain of the function, so the two exercises are correct by definition. However I do not think you can understand clearly the idea from a mathematical explanation so I am going to repeat the problem in a more specific manner.

There are many academics of the communication who have done a functional research about languages, and who have arrived to some specific conclusion, which I cannot explain in this post. However those assumptions let you define mathematical functions in communicative terms, as if they were operation on elements of a language. The language could be constituted by various elements, even not the spoken language; I will use two metaphors to explain the mathematical argument of this post, one with the language of design, and the second with the language of narration.
Capitan America Comic

They are both different languages from the natural one, which we speak every day, even though both have a syntagmatic and paradigmatic side. The syntagmatic side of a particular design, for instance, associates to a certain content, its typography and graphical elements, as the violet image at the beginning of this post, which is made by a “comic” character setting and a subsequence of images from right to left, a Japanese order. Another kind of design could be the one of a bank report, which usually is more formal in the typography and uses text and images from left to right.

In the language of narrative, and at most in the narrative for tabloids, a syntagm could be for instance built by the death and the consequent investigation from one or more characters, as in thrillers and spy stories, in which you can find a recursive plot, with multiple recursive steps even. In both these examples the syntagms can be considered as functions with a specific domain (the terms which compose them) and a co-domain equally specific, which is the code or the rhetorical meaning they refer to (manga comic or thriller story).
« Read the rest of this entry »

%d bloggers like this: