Bi-Weekly BTP Meeting Bhargava Reddy 110050078. Contents Origin and importance of Language...

15
Bi-Weekly BTP Meeting Bhargava Reddy 110050078

Transcript of Bi-Weekly BTP Meeting Bhargava Reddy 110050078. Contents Origin and importance of Language...

Bi-Weekly BTP MeetingBhargava Reddy

110050078

Contents

• Origin and importance of Language• Linguistic Tree• Formal Definition of a Language• Is English a finite or infinite Language• How do we learn a language

Origin of Language?

• We know that we human beings have the same ancestors• We have evolved into the most intelligent beings• But we need to remember that even the languages have evolved• But can we say that in the lines of human beings all the languages

have a same ancestor?• We know that Telugu, Tamil, Kannada and Malayalam have the same

origin which is the Dravidian family of languages• Do all the languages start from a common one?

Atkinson QD, Meade A, Venditti C, Greenhill SJ, Pagel M (2008) Languages evolve in punctuational bursts. Science 319: 588.

Importance of Languages

• The emergence of the human language faculty represented one of the major transitions in the evolutions of life• It is the first time, exchange of highly complex information between

individuals has become possible• Parallels between genetic and language evolutions have been noticed

by Charles Darwin. But the issue is debatable• It is generally accepted that language has evolved and diversified

obeying mechanisms similar to those of biological evolution• The extant languages amount to a total of nearly 7000 languages and

divided into 19 linguistic familiesAtkinson QD, Meade A, Venditti C, Greenhill SJ, Pagel M (2008) Languages evolve in punctuational bursts. Science 319: 588.

Linguistic Tree• The number in red show the estimated age of the

languages

• From the diagram we can interpret that the language has originated somewhere like 8,700 years ago

• This says that our Indian languages are not older than 3000 years

• Our Indian languages are still the oldest languages in the tree

• All the languages are broadly classified as: Celtic, Italic, French/lberian, West Germanic, North Germanic, Baltic, Slavic, Indic, Iranian, Albanian, Greek, Armenian, Tocharian, Antolian

Controversies in the origin of Sanskrit• The main key to the Indo-Iranian language is the Sanskrit language• People believe that it originated around 1500 BCE (Radio Carbon

dating of the scripts prove it)• Nearly 3500 years old• But we know that during the Ramayana, Sita was residing in the

Valmiki’s ashram. Who happened to write Ramayana in Sanskrit• But estimates show that Ramayana happened nearly 1.6 lakh years

ago which contradicts our assumption

Formal Definition of Language

• Alphabet: An alphabet is a finite set of symbols. Without loss of generality, we can consider the binary alphabet, {0,1}, by enumerating the actual alphabet in binary code

• Sentence: A sentence is defined as a string of symbols. The set of allsentences over the binary alphabet is {0,1,00,01,10,11,000,...}.There are infinitely many sentences, as many as integers; the set ofall sentences is ‘countable’.

Computational and evolutionary aspects of language (2011), Martin A Nowak, Natalia L Komarova

Formal Definition of a Language

• Language: A language is a set of sentences. Among all possible sentences some are part of the language and some are not. A finite language contains a finite number of sentences. An infinite language contains an infinite number of sentences.

• Grammar: A grammar is a finite list of rules specifying a language. A grammar is expressed in terms of ‘rewrite rules’: a certain string can be rewritten as another string. Strings contain elements of the alphabet together with ‘non-terminals’, which are place holders. After iterated application of the rewrite rules, the final string will only contain symbols of the alphabet.

Computational and evolutionary aspects of language (2011), Martin A Nowak, Natalia L Komarova

Is English Language finite or infinite?• The number of words in English is nearly 500,000 as per the Oxford

dictionary• We also know that any alphabet can be one of the 26 characters• Practically speaking a spoke english sentence can be in the worst case

100 words (Though the number is absurd)• But if we consider written English, the length of a sentence can be

unbounded (E.g.: Given a sentence a longer one can be made by joining with other)• So if we consider written english the cardinality of the set of all english

sentence will be infinite which makes English an infinite languageComputational and evolutionary aspects of language (2011), Martin A Nowak, Natalia L Komarova

Learning Theory of a Language

• The learner is presented with data and has to infer the rules that generate these data• The difference between ‘learning’ and ‘memorization’ is the ability to

generalize beyond one’s own experience to novel circumstances• So if we consider language: The child will generalize to novel

sentences never heard before.• Learning theory describes the mathematics of learning with the aim

of outlining conditions for successful generalization

Computational and evolutionary aspects of language (2011), Martin A Nowak, Natalia L Komarova

Learning theory of a Language

• Children learn their native language by hearing grammatical sentences from their parents or others. • From this ‘environmental input’, children construct an internal

representation of the underlying grammar. • Children are not told the grammatical rules. • Neither children nor adults are ever aware of the grammatical rules

that specify their own language.

Computational and evolutionary aspects of language (2011), Martin A Nowak, Natalia L Komarova

Learnability

• Imagine a speaker-hearer pair. The speaker uses grammar, G, to construct sentence of language L.• The hearer receives sentences and should after some time be able to

use grammar G to construct other sentences of L• Mathematically speaking, the hearer is described by an algorithm, A,

which takes a list of sentences as input and generates a language as output• Furthermore, a set of languages is learnable by an algorithm if each

language of this set is learnable

Computational and evolutionary aspects of language (2011), Martin A Nowak, Natalia L Komarova

Learnability

• We are interested in what set of languages, L = {L1,L2,…} can be learned by a given algorithm• Gold’s theorem implies there exists no algorithm that can learn the set of

regular languages• Implies -> No algorithm can learn context-free languages, context-sensitive

languages or computable languages• Gold’s theorem formally states there exists no algorithm that can

learn a set of ‘super-finite’ languages. • Such a set includes all finite languages and at least one infinite language.

Intuitively, if the learner infers that the target language is an infinite language, whereas the actual target language is a finite language that is contained in the infinite language, then the learner will not encounter any contradicting evidence, and will never converge onto the correct language.

Learning a Language

• We might think of this in the content of infinite languages. Let us look at finite languages• In the context of statistical learning theory, the set of all finite

languages cannot be learned. • In the Gold framework, the set of all finite languages can be learned,

but only by memorization:The learner will identify the correct language only after having heardall sentences of this language

Computational and evolutionary aspects of language (2011), Martin A Nowak, Natalia L Komarova

References

• Atkinson QD, Meade A, Venditti C, Greenhill SJ, Pagel M (2008) Languages evolve in punctuational bursts. Science 319: 588.• Universal Entropy of Word Ordering Across Linguistic Families.

Marcelo A. Montemurro, Zanette DH. PLoS ONE (2011), e19875. doi:10.1371/ journal.pone.0019875. • Computational and evolutionary aspects of language (2011), Martin A

Nowak, Natalia L Komarova