Royal Society Publishing

Formal grammar and information theory: together again?

Fernando Pereira


In the last 40 years, research on models of spoken and written language has been split between two seemingly irreconcilable traditions: formal linguistics in the Chomsky tradition, and information theory in the Shannon tradition. Zellig Harris had advocated a close alliance between grammatical and information–theoretic principles in the analysis of natural language, and early formal–language theory provided another strong link between information theory and linguistics. Nevertheless, in most research on language and computation, grammatical and information–theoretic approaches had moved far apart.

Today, after many years on the defensive, the information–theoretic approach has gained new strength and achieved practical successes in speech recognition, information retrieval, and, increasingly, in language analysis and machine translation. The exponential increase in the speed and storage capacity of computers is the proximate cause of these engineering successes, allowing the automatic estimation of the parameters of probabilistic models of language by counting occurrences of linguistic events in very large bodies of text and speech. However, I will argue that information–theoretic and computational ideas are also playing an increasing role in the scientific understanding of language, and will help bring together formal–linguistic and information–theoretic perspectives.

Royal Society Login

Log in through your institution