Parsing with Unification Constraints

July 4, 2021 § Leave a comment


We now have all the pieces necessary to integrate feature structures and unification into a parser. Fortunately, the order-independent nature of unification allows us to largely ignore the actual search strategy used in the parser. Once we have associated unification constraints with the context-free rules of the grammar, and feature structures with the states of the search, we can use any of the standard search algorithms.

Integration of Unification into an Earley Parser

The first change involves the various representations used in the original code. Recall that the Earley algorithm operates by using a set of unadorned context-free grammar rules to fill in a data structure, called a chart, with a set of states. At the end of the parse, the states that make up this chart represent all possible parses of the input. Therefore, we begin our changes by altering the representations of both the context-free grammar rules and the states of the chart.

The rules are altered in such a way that in addition to their current components, they also include a feature structure derived from their unification constraints. More specifically, we use the constraints listed with a rule to build a feature structure, represented as a DAG, for use with that rule during parsing.

The final change to the original algorithm concerns the check for states already contained in the chart. In the original algorithm, the ENQUEUE function refused to enter into the chart any state that was identical to one already present in the chart. “Identical” means the same rule, with the same start and finish positions, and the same position of the °. It was this check that allowed the algorithm to, among other things, avoid the infinite recursion problems associated with left-recursive rules.

The problem, of course, is that our states are now more complex since they have complex feature structures associated with them. States that appeared identical under the original criteria might in fact now be different since their associated DAG may differ. One solution to this problem is to extend the identity check to include the DAGs associated with the states, but it turns out that we can improve on this solution.

The motivation for the improvement lies in the motivation for the identity check. Its purpose is to prevent the wasteful addition of a state into the chart whose effect on the parse would be accomplished by an already existing state. Put another way, we want to prevent the entry into the chart of any state that would duplicate the work that will eventually be done by other states. Of course, thi will clearly be the case with identical states, but it turns out it is also the case for states in the chart that are more general than new states being considered.

In sum nothing worthwhile is accomplished by entering into the chart a state that is more specific than a state already in the chart.

Example

A python example of the earley parser (with instruction about how to try it) can be found in this repo.

Implementation of Unification

June 5, 2021 § Leave a comment


The unification operator takes two feature structures as input and returns a single merged feature structure if successful or a failure signal if the inputs are not compatible.

A notable aspect of this algorithm is that rather than constructing a new feature structure with the unified information from the two arguments, it destructively alters the arguments so that in the end they point to exactly the same information. Thus, the result of a successful call to the unification operator consists of suitably altered versions of the arguments.

Unification data structures

To facilitate the destructive merger aspect of the algorithm, we add a small complication to the DAGs used to represent the input feature structures: feature structures are represented by DAGs with additional edges, or fields. Specifically, each feature structure consists of two fields: a content field and a pointer field. The content field may be null or contain an ordinary feature structure. Similarly, the pointer field may be null or contain a pointer to another feature structure. If the pointer field of the DAG is null, then the content field of the DAG contains the actual structure to be processed. If, on the other hand, the pointer field is non-null, then the destination of the pointer represents the actual feature structure to be processed. The merger aspects of unification are achieved by the alteration of the pointer field of DAGs during processing.

def unify(f1-orig, f2-orig):
	f1 = f1-orig
	f2 = f2-orig
	if(f1 ==f2):
		f1.pointer = f2
		return f2
	else if(f1 == None):
		f1.pointer = f2
		return f2
	else if(f2==None):
		f2.pointer = f1
		return f1
	else if(f1 != None and f2 != None):
		f2.pointer = f1
		for feature_2 in f2.feature:
			present = False
			for feature_1 in f1.feature:
				if(feature_1 == feature_2):
					present = True
			for feature_1 in f1.feature:
				if(present == False):
					f1.feature += feature_2
					if (unify(feature_1.value, feature_2.value)== "failure"):
						return "failure"
		return f1
	else:
		return "failure"

P.S. To collaborate to the creation of the algorithm you can subscribe to this repo by emailing my at the email in the Contacts page.

Feature Structure and Unification Formalism

May 18, 2021 § Leave a comment


From a reductionist perspective, the history of the natural sciences over the last few hundred years can be seen as an attempt to explain the behaviour of larger structures by the combined action of smaller primitives.

In this chapter we introduce the idea that grammatical categories like Vpto, Staht, Non3sgAux, or 3spNP, as well as grammatical rules like S→ NP VP that make use of them, should be thought of as objects that can have complex sets of properties associated with them. The information in these properties is represented as constraints, and so these kinds of models are often called constraint based formalisms.

Naive models of grammatical phenomena such as agreement and subcategorization can lead to overgeneration problems. These new categories led , in turn, to an explosion in the number of grammar rules and a corrsponding loss of generality of the grammar. A constraint-based representation scheme will allow us to represent fine-grained information about number and person agreement, and subcategorization, as well as semantic categories like mass/count.

Constraint-based formalism have other advantages, such as the ability to model more complex phenomena than does context-free grammar, and the ability to efficiently and conveniently compute semantics for synctactic representations.

Of course, simply associating these properties with various words and phrases does not solve any of our overgeneration problems. To make these properties useful, we need the ability to perform simple operations, such as equality tests, on them. By pairing such tests with our grammar rules, we can add various constraints to help ensure that only grammatical strings are generated by the grammar.

The remainder of this chapter provides the details of one computational implementation of a constraint-based formalism, based on feature structures and unification. The next section describes feature structures, the representation used to capture the kinds of grammatical properties we have in mind.

Feature structures

One of the simplest ways to encode the kind of properties that we have in mind is through the use of feature structures. These are simply sets of feature-value pairs, where features are unanalyzable atomic symbols drawn from some finite set, and values are either atomic symbols or feature structures themselves. Such feature structures are illustrated with the following kind of diagram, called attribute-value matrix or AVM:

FEATURE1 value1

FEATURE2 value2

.

.

.

FEATUREn valuen

As mentioned earlier, features are not limited to atomic symbols as their values;they can also have other features structures as their values. This lumping together can be captured by an AGREEMENT feature that takes a feature structure consisting of the NUMBER and PERSON feature-value pairs as its value.

This ability to use feature structures as values leads fairly directly to the notion of feature path. A feature path is noting more than a sequence of features through a feature structure leading to a particular value.

In these diagrams, feature structures are depicted as directed graphs where features appear as labeled edges and values as nodes.

An additional important kind of feature structure: one that contains features that actually share some feature structure as value. Such feature structures are referred to as reentrant structures. What we have in mind here is not the simple idea that two features might have equal values, but rather that they share precisely the same feature structure (or node of the graph). These two cases can be distinguished clearly if we think in terms of path through a graph. In the case of a reentrant structure, two features paths actually lead to the same node in the structure.

These simple structure give us the ability to express linguistic generalizations in surprisingly compact and elegant ways.

Unification of Features Structures

As noted earlier, features structures would be of little use without being able to perform reasonably efficient and powerful operations on them. The two principal operations we need to perform are merging the information content of two structures and rejecting the merger of structures that are incompatible. A single computational technique, called unification, suffices for both of these purposes.

Unification is a binary operation (represented here as U) that accepts two feature structures as arguments and returns a feature structure when it succeeds.

The value [] in the second structure indicates that the value has been left unspecified. A feature with such a [] value can be successfully matched to any value in a corresponding feature in antoher structure. The result of this type of unification is a structure with the value provided by the more specific, not null, value.

The result of the unification can be a merger of the original two structures into one larger structure. It is important to understand why these structures are judged to be compatible: they are compatible because they contain no features that are explicitly incompatible – that they each contain a feature value pair that the other lacks does not cause the unification to fail.

Unification can be seen as a way of merging the information in each feature structure or of the describing object that satisfy both sets of constraints. Intuitively, unifying two feature structures produces a new feature structure that is more specific (has more information) than, or is identical to, either of the input feature structures. We say that a less specific (more abstract) feature structure subsumes an equally of more specific one.

Since every feature structure is subsumed by the empty structure [], the relation among feature structures can be defines as a semilattice. Unification can be defined in terms of the subsumtion semilattice.

Since the information ordering defined by subsumption is a semilattice, the unification operation is monotonic. This means that if some description is true of a feature struture, unifying it with another feature structure results in a feature structure that still satisfies the original description. The unification operation is therefore associative; given a finite set of feature strutures to unify, we can check them in any order and get the same result.

To summarize, unification is a way of implementing the integration of knowledge from different constraints. Given two compatible feature structures as input, unification produces tha most general feature structure that nonetheless contains all the information in the input. Given two incompatible feature structures, it fails.

Feature Structures in the Grammar

Our primary purpose in introducing feature structures and unification has been to provide a way to elegantly express syntactic constrains that would be difficult to express with the mechanism of context-free grammars alone. Our next step, therefore, is to specify a way to integrate feature structures and unification operations into the specification of a grammar. We can accomplish this by augmenting the rules of ordinary context-free grammars with attachments that specify feature structures for the constituents of the rules, along with appropriate unification operations that express constraints on those constituents.

Taking a step back from these notation, it is important to note that in this approach the simple generative nature of context-free rules has been fundamentally changed by this augmentation. Ordinary context-free rules are based on the simple notion of concatenation. In the new scheme, this concatenation must be accomplished by a successful unification operation. This leads naturally to questions about the computational complexity of the unification operation and its effect on the generative power of this new grammar.

  • The elements of context-free grammar rules will have feature-based constraints associated with them. This reflects a shift from atomic grammatical categories to more complex categories with properties.
  • The constraints associated with individual rules can make reference to the feature structures associated with the parts of the rule to which they are attached.

Head Features

The preceding section introduced the notion of copying feature structures from phrase-structure children to their parents. This turn out to be a specific instance of a much more general phenomenon in constraint based grammar. Specifically, the features for most grammatical categories are copied from one of the children to the parent. This child that provides the features is called the head of the phrase, and the features copied are referred to as head features.

As a result, we can say that the agreement feature structure is a head feature. We can rewrite our rules to reflect these generalizations by placing the agreement feature structure under a HEAD feature and then copying the feature upward.

Subcategorization

Recall that subcategorization is the notion that verbs can be picky about the patterns of arguments they will allow themselves to appear with.

The proper way to avoid this proliferation is to introduce feature structures to distinguish among the various members of the verb category. We can accomplish this goal by associating with each of the verbs of the lexicon an atomic feature, called SUBCAT, with an appropriate value.

This is a somewhat clumsy approach since these unanalyzable SUBCAT symbols to not directly encode either the number or type of the arguments that the verb expects to take. To see this, note that we cannot simply examine a verb’s entry in the lexicon and know what its subcategorization frame is. Rather, we must use the value of the SUBCAT feature indirectly as a pointer to those verb phrase rules in the grammar that can accept the verb in question. A more elegant solution, which makes better use of the expressive power of feature structures, allows the verb entries to directly specify the order and type of their arguments.

Combining even a limited set of phrase types results ina very large set of possible subcategorization frames.

Although the notion of subcategorization, or valence as it is often called, was originally conceived for verbs, more recent work has focused on the fact that many other kinds of words exhibit forms of valence-like behavious.

Manu adjectives and nouns also have subcategorization frames.

Verbs express subcategorization constraints on their subjects as well as their complments.

Long-Distance Dependencies

The model of subcategorization we have developed so far has two components. Each headword has a SUBCAT feature that contains a list of the complements it expects. Then, phrasal rules like the VP rule match up each expected complement in the SUBCAT list with an actual constituents. The mechanism works fine when the complements of a verb are in fact to be found in the verb phrase.

Sometimes, however, a constituent subcategorized for by the verb is not locally instantiated but stands in a long-distance relationship with its predicate.

First Order Logic

May 11, 2021 § Leave a comment


First Order Logic (FOL) is a flexible, well-understood, and computationally tractable approach to the representation of knowledge that satisfies many of the desiderata for a meaning representation language.

In addition, an attractive feature of FOL is that it makes very few specific commitments as to how things  ought to be represented.

Basic elements of First Order Logic

We will explore FOL in a bottom-up fashion by first examining its various atomic elements and then showing how they can be composed to create larger meaning representation.

Constants in FOL refer to specific objects in the world being described. Like programming language constants, FOL constants refer exactly one object. Objects can, how ever, have multiple constants that refer to them.

Functions in FOL corresponds to concepts that are often expressed in English as genitives.

Function provide a convenient way to refer to specific objects without having to associate a named constant with them. This is particularly convenient in cases in which many named objects, like restaurants, have a unique concept such as a location associated with them.

The notion of a variable is our final FOL mechanism for referring to objects. Variables, which are normally depicted as sigle lower case letters, let us make assertions and draw inferences about objects without having to make reference to any particular named object.

Predicates are symbols that refer to, or name, the relations tha hold among some fixed number of objects in a given domain.

Restaurant(Maharani)

A one-place predicate that is used, not to relate multiple objects, but rather to assert a property of a single object. In this case, it encodes the category membership Maharani.

Larger composite representations can also be put together through the use of logical connectives.

Variables and Quantifiers

Variable are used in two ways in FOL: to refer to a particular anonymous objects and to refer generically to al objects in a collection. These two uses are made possible through the use of operators known as quantifiers.

To review, variables in logical formulas must be either existentially or universally quantified. To satisfy an existentially quantified variable, at least one substitution must result in a true sentence. Sentences with universally quantified variables must be true under all possible substitutions.

Lambda Notation

The final element we need to complete our discussion of FOL is called lambda notation. This notation provides a way to abstract from fully specified FOL formula in a way that will be particularly useful for semantic analysis.

lambda.x.P(x)

Such expressions consist of the greek letter lambda, followed by one or more variables, followed by a FOL formula that makes use of those variables.

This process is known as lambda-reduction and consist of a simple textual replacement of the lambda variables with the specified FOL items, accompanied by the subsequent removal of the lambda.

This general technique, called currying, is a way of converting  a predicate with multiple arguments into a sequence of single argument predicates. The lambda notation provides a way to incrementally gather arguments to a predicate when they do not all appear together as daughters of the predicate in a parse tree.

The Semantics of First Order Logic

The model theoretic approach: this approach employs simple set-theoretic notions to provide a truth conditional mapping from the expressions in a meaning representation to the state of affairs being modeled.

We can start by asserting that the objects in our world, FOL terms, denote elements in a domain, and asserting that atomic formulas are captured either as sets of domain elements for properties, or as sets of tuples of elements for relations.

There are no variables in our set based models, only elements of the domain and relations that hold among them. We can provide model based account for formulas with variables by employing the notion of a substitution introduced earlier in this page. Formulas involving the existential quantifier are true if a substitution of terms for variables results in a formula that is true in the model. Formulas involving the universal quantifier must be true under all possible substitutions.

Inference

One of the most important desiderata given for meaning representation language is that it should support inference, or deduction. That is, the ability to add valid new proposition to a knowledge base or to determine the truth of propositions not explicitly contained within a knowledge base. This section briefly discusses modus-ponens, the most widely implemented inference method provided by FOL.

Modus-ponens is a familiar form of inference that corresponds to what is informally known as if-then reasoning. We can abstractly define modus ponens as follows, where alpha and beta should be taken as FOL formulas.

Alpha

alpha=>beta

___________

Beta

Modus-ponens simply states that if the left-hand side of an implication rule is true, then the right-hand side of the rule can be inferred. In the following discussions, we will refer to the left-hand side of an implication as the antecedent and the right-hand side as the consequent.

Modus-ponens can be put to practical use in one of two ways: forward chaining and backward chaining. In forward chaining systems, as individual facts are added to the knowledge base, modus ponens is used to fire all the applicable implications rules. 

The forward chaining approach has the advantage that facts will be present in the knowledge base when needed, because, in a sense all inference is performed in advance. This can substantially reduce the time needed to answer subsequent queries since they should all amount to simple lookups. The disadvantage of this approach is that facts that will never be needed may be inferred and stored.

Production systems which are used extensively in cognitive modeling research are forward chaining inference systems augmented with additional control knowledge taht governs which rules are to be fired.

In backward chaining modus ponens is run in reverse to prove specific propositions called queries. The first step is to see if the query formula is true by determining if ot os present in the knowledge base. If it is not, then the next step is to search for applicable implication rules present in the knowledge base. An applicable rule is one whereby the consequent of the rule matches the query formula. The Prolog programming language is a backward chaining system that implements this strategy.

Note that it is critical to distinguish between reasoning by backward chaining from queries to known facts and reasoning backwards from known consequents to unknown antecedents. To be specific, by reasoning backwards we mean that if the consequent of a rule is known to be true, we assume that the antecedent will be as well.

While backward chaining is a sound method of reasoning, reasoning backwards is an invalid, though frequently useful, form of plausible reasoning. Plausible reasoning from consequents to antecedents is known as abduction, and is often useful in accounting for many of the inferences people make while analyzing extended discourses.


While forward and backward reasoning are sound, neither is complete. This means that there are valid inferences that cannot  be found by systems using these methods alone. Fortunately, there is an alternative inference technique called resolution that is sound and complete.

New NLP project BETA

January 10, 2021 § Leave a comment


This project is only a scratch, yet, but it has big potential and i’ve been working on it in the last year in the free hours with some friends and volunteers collaborators. http://www.word-b.com

There are still some fix to do to the latest version, however if you want to collaborate to the project just contact me at agnese.camellini@gmail.com. The project is open source, you can find the repository of the back end (with NLTK and Django) at this link, and the repository of the front end (with vueJS ant vuetify) at this link.

Context-free Grammars

October 18, 2020 § Leave a comment


Formal Grammars of English

In thi chapter, we introduce three main new ideas: constituency, grammatical relations, and  subcategorization and dependency.

The fundamental idea of constituency is that groups of words may behave as a single unit or phrase, called a constituent.

This chapter introduces the use of context free grammars, a formalism that will allow us to model these constituency facts.

Grammatical relations are a formalization of ideas from a traditional grammar such as subjects and objects and other related notions.

Subcategorization and dependency relations refer to certain kinds of relation between words and phrases. For example, the verb want can be followed by any infinitive, but the verb find cannot be followed by an infinitive. These are called facts about the subcategorization of the verb.

As we show, none of the syntactic mechanism that we’ve discussed up until now can easily capture such phenomena. They can be modeled much more naturally by grammars that are based on context-free-grammars. Context-free grammars are thus the backbone of many formal models of the syntax of natural language (and, for that matter, of computer languages). They are powerful enough to express sophisticated relations among the words in a sentence, yet computationally tractable enough that efficient algorithms exist for parsing sentences with them.

Constituency

How do words group together in ENglish? Consider the noun phrase, a sequence of word surrounding at least one noun. How do we know that these words group together (or “form constituents”)? One piece of evidence is that they can all appear in similar syntactic environments, for example, before a verb.

But while the whole noun phrase can occur before a verb, this is not true of each of the individual words that make up a noun phrase.

Context-Free Grammars

The most commonly used mathematical system for modeling constituent structure in ENglish and in other natural languages is the Context-Free Grammar, or CFG. Context Free grammars are also called Phrase-Structure Grammars, and the formalism is equivalent to what is also called Backus-Naur Form, BNF.

A context free grammar consist of a set of rules or productions, each of which expresses the ways that symbols of the language can be grouped and ordered together, and a lexicon of words and symbols.

Context free rules can be hierarchically embedded, so we can combine the previous rules with others. Th symbols that are used in a CFG are divided into two classes. The symbols that correspond to words in the language (“the”, “nightclub”) are called terminal symbols; the lexicon is the set of rules that introduces these terminal symbols. The symbols that express clusters or generalizations of these are called non-terminals. In each context free rule, the itam to the right of the arrow (->) is an ordered list of one or more terminals and nonn terminals; to the left of the arrow is a single non-terminal symbol expressing some cluster or generalization. Notice that in the lexicon, the non-terminal associated with each word is its lexical category, or part of speech (POS).

A CFG can be thought of in two ways: as a device for generating sentences and as a device for assigning a structure to a given sentence.

The sequence of rules expansion is called a derivation of the string of words. It is common to represent a derivation by a parse tree:

enn Treebank), a phrase is placed at the beginning of the sentence for discourse purposes.

What differentiates sentence constructions from the rest of the grammar is the notion that they are in some sense complete. In this way they correspond to the notion of clause, which traditional grammars often describe as forming a complete thought. One way of making this notion of “complete thought” more precise is to say an S is a node of the parse tree below which the main verb of the S has all its arguments.

Noun phrases can begin with simple lexical determiners. The role of the determiner in English noun phrases can also be filled by more complex expressions, like a possessive expression. Mass nouns also don’t require determination.

The nominal construction follows the determiner and contains any pre- and post-head noun modifiers. 

Nominal -> Noun

This rule also provides the basis for the bottom of various recursive rules used to capture more complex nominal constructions. A number of different kinds of word classes can appear before the head noun (“the post determiners”). These include cardinal numbers, ordinal numbers and quantifiers. Adjectives occur after quantifiers but before nouns. Adjectives can also be grouped into a phrase called an adjective phrase or AP. APs can have an adverb before the adjective. A head noun can be followed by post modifiers. The three most common kinds of non-finite post modifiers are the gerundive and the infinitive forms.

A post nominal relative clause (more correctly a restrictive relative clause) is a clause that often begins with a relative pronoun. The relative pronoun functions as the subject of the embedded verb. Word classes that modify and appear before Noun phrases are called pre determiners.

Rule proliferation will also have to happen for the noun’s case: for example, English pronouns have nominative and accusative versions. This allows agreement.

The verb phrase consists of the verb and a number of other constituents. In the simple rules we have built so far, these other constituents include NPs and PPs and combination of the two.

The verb phrase can be significantly more complicated than this. Many othe kind of constituents, such an entire embedded sentence, can follow the verb. These are called sentential complements.

Verbs can also be followed by particles, words that resemble a preposition but tha combine with the verb to form a phrasal verb like take off.

Where traditional grammars subcategorize verbs into these two categories (transitive and intransitive), modern grammars distinguish as many as 100 subcategories. We say tha verbs like find subcategorizes for NP, and a verb like want subcategorizes for an NP or a non-finite VP. We also call these constituents the complements of the verb. These possible set of complements are called the subcategorization frame of the verb.

The problem with this approach, as with the same solution to the agreement feature problem, is a vast explosion in the number of rules. The standard solution to both af these problems is the feature structure.

The subclass of verbs called auxiliaries or helping verbs have particular syntactic constraints that can be viewed as a kind of subcategorization. Auxiliaries include the modal verbs can, could, may, might, must, will would, shall, and should. The perfect auxiliary have, the progressive auxiliary be, and the passive auxiliary be. Each of these verbs places a constraint on the form of the following verb, and each of these must also combine in a particular order.

A sentence can have multiple auxiliary verbs, but they must occur in a particular order: modal < perfect < progressive < passive. One way of capturing the ordering constraints among auxiliaries, commonly used in the systemic grammar of Halliday, is to introduce a special constituent called verb group, whose sub constituents include all the auxiliaries as well as the main verb.

The major phrase types discussed here can be conjoined with conjunctions like and, or, and but to form larger constructions of the same type. For example, a coordinate noun phrase can consist of two other noun phrases separated by a conjunction. This relation can be defined through metarules.

X->X and X

This metarule simply states that any non-terminal can be conjoined with the same non terminal to yield a constituent of the same type. Of course, the variable X must be designated as a variable that stands for any non terminal rather than a non-terminal itself.

Context-free grammar rules of the type that we have explored so far can in principle be used to assign a parse tree to any sentence. This means that it is possible to build a corpus in which every sentence is syntactically annotated with a parse tree.

We suggested informally earlier that syntactic constituents could be associated with a lexical head.

Grammar Equivalence and Normal From

A formal language is defined as a (possibly infinite) set of strings of words. This suggest that we could ask if two grammars are equivalent by asking if they generate the same set of strings. In fact, it is possible to have two distinct context-free grammars generate the same language.

We usually distinguish two kinds of grammar equivalence: weak equivalence and strong equivalence. Two grammars are strongly equivalent if they generate the same set of strings and if they assign the same phrase structure to each sentence (allowing merely for renaming of the non terminal symbols). Two grammars are weakly equivalent if they generate the same set of strings but do not assign the same phrase structure to each sentence.

It is sometimes useful to have a normal form for grammars, in which each of the productions takes a particular form. For example, a context free grammar is in Chomsky normal form(CNF) if it is e-free and if in addition each production is either in the form:

A->B C or A-> a

That is, the right hand side of each rule contains two non-terminal symbols or one terminal symbol. Chomsky normal form grammars are binary branching, that is they have binary trees.

The generation of a symbol A with a potentially infinite sequence of symbols B with a rule in the form A -> A B is known as Chomsky-adjunction.

Finite-State and Context-Free Grammars

We argued that adequate models of grammr need to be able to represent complex interrelated facts about constituency, subcategorization, and dependency relations, and we implied that at the least the power of context-free grammars is needed to accomplish this. But why is it that we cannot just use finite state methods to capture these syntactic facts? The answer to this question is critical since a considerable price is paid in term of processing speed when we switch from regular languages to context-free ones.

There are two answers for this question. The first is mathematical: certain syntactic structures present in English and other natural languages make them not regular languages. The second answer is more subjective and has to do with notions of expressiveness; even when finite state methods are capable of dealing with the syntactic facts in question, they often don’t express them in ways that make generalization obvious, lead to understandable formalism, or produce structures of immediate use in subsequent semantic processing.

In other words, a language ca be generated by a finite state machine if and only if the grammar that generates it does not have any center-embedded recursion of this form. Intuitively this is because grammar rules in which the non terminal symbols are always on either the right or Left edge of the rule can be processed iteratively rather than recursively. Such center-embedding rules are needed to deal with artificial problems (formal languages) or for practical problems, such as checking for correctly matching delimiters in programming and markup languages.

So is there no role for finite state methods in syntactic analysis? A quick review of the rules used for noun phrases in this chapter, as well as those used in the Penn Treebank grammar, reveals that a considerable portion of them can be handled by finite state methods. For many practical purposes for which matching syntactic and semantic rules aren’t necessary finite -state rules are quite sufficient.

Dependency Grammars

We have focused in this chapter on context-free grammars because many available treebanks and parsers produce this kind of syntactic representations. But in a class of grammar formalisms called dependency grammars, which are becoming quite important in speech and language processing, constituents and phrase structure rules do not play any important rule. Instead, the syntactic structure of a sentence is described purely in terms of words and binary semantic or syntactic relations between these words.

Indeed the notion in traditional grammar of “parsing a sentence into subject and predicate” is based on lexical relations rather than constituent relations.

Other dependency-based computational grammars, such as Link Grammar, use different but roughly overlapping links. In untyped dependency parsers the links are unlabeled. One advantage of dependency formalism is the strong predictive parsing power that words have for their dependents.

Dependency grammar researchers argue that another advantage of dependency grammars is their ability to handle languages with entirely free word order. A phrase structure grammar would need a separate rule for each possible place in the parse tree where such an adverbial phrase could occur.

The reader may have noticed the similarity between dependency graphs and head structures. In Fact, an unlabeled dependency graph can be automatically derived from a context-free parse through the use of head rules

Categorial grammar is an early lexicalized grammar model. We focus on  combinatory categorial grammar. A categorial grammar has two components. The categorial lexicon associates each word with a syntactic and semantic category. The combinatory rules allow functions and arguments to be combined. There are two types of categories: functors and arguments. Arguments, like nouns, have simple categories like N. Verbs or determiners act as funtors.

Modern categorial grammars include more complex combinatorial rules that are needed for coordination and other complex phenomena, and also include composition of semantic categories as well as syntactic ones.

Do people use context-free grammars in their mental processing of language? It has proved very difficult to find clear-cut evidence that they do. For example, some early experiments asked subjects to judge which word in a sequence were more closely connected, finding that their intuitive groupings corresponded to syntactic constituents.

Distransitive alternation A distransitive verb is one like give that can take two arguments.

Tower of Hanoi

December 24, 2010 § Leave a comment


This is a well known little problem of procedural thinking. I’m on it nearly from the whole afternoon (just to underline how much i’m slow), and i only found out that the solution i can find online is different from the one my teacher gave to the problem.

A visualization of the solution for 4 disks

Tower of Hanoi


I’m starting to wonder how much is he against the free software, so this is the main reason which holds me from speaking about every little difference between the two approaches, given that some of my reader would like this boring list. So i’ll go over..
« Read the rest of this entry »

Latex include and input

December 18, 2010 § Leave a comment


It seems like there is not a complete manual online (and i’m referring to the wikibook manual) about how to use include and input in a LaTeX document. As you probably know LeTeX is a typesetting system, used mostly by computer scientists and academics.

The (l)on(e)ly information that is given in the wikibook manual about LaTeX is about how the include command works but not about how the included document should be structured in order to obtain a sequence in page numbers and section numbers in the toc.

The trick is quite easy to understand if you try yourself all the possibilities but it is better to know how to do it from the first time: you should firstly omit all the commands which deals with the main document structure, for example, « Read the rest of this entry »

Inverse image of a function

October 31, 2010 § Leave a comment



The first post of the blog is about sets and applications. The problem could be easily explained by saying that the inverse image of a function is actually the image of the “inverse” of this function, but let us say that things are not so, cause there are rules, in mathematics, which state that a function can have an inverse only if it respects some standard. Those standards are not met by every function (it should be bijective), but a function does not need to meet those standards to have an inverse image. There is a post in google docs which explains the problem in mathematical terms; the inverse image of a function results in a subset of the domain of the function, so the two exercises are correct by definition. However I do not think you can understand clearly the idea from a mathematical explanation so I am going to repeat the problem in a more specific manner.

There are many academics of the communication who have done a functional research about languages, and who have arrived to some specific conclusion, which I cannot explain in this post. However those assumptions let you define mathematical functions in communicative terms, as if they were operation on elements of a language. The language could be constituted by various elements, even not the spoken language; I will use two metaphors to explain the mathematical argument of this post, one with the language of design, and the second with the language of narration.
Capitan America Comic

They are both different languages from the natural one, which we speak every day, even though both have a syntagmatic and paradigmatic side. The syntagmatic side of a particular design, for instance, associates to a certain content, its typography and graphical elements, as the violet image at the beginning of this post, which is made by a “comic” character setting and a subsequence of images from right to left, a Japanese order. Another kind of design could be the one of a bank report, which usually is more formal in the typography and uses text and images from left to right.

In the language of narrative, and at most in the narrative for tabloids, a syntagm could be for instance built by the death and the consequent investigation from one or more characters, as in thrillers and spy stories, in which you can find a recursive plot, with multiple recursive steps even. In both these examples the syntagms can be considered as functions with a specific domain (the terms which compose them) and a co-domain equally specific, which is the code or the rhetorical meaning they refer to (manga comic or thriller story).
« Read the rest of this entry »

%d bloggers like this: