Vedic Sanskrit has a number of linguistic features which are alien to other Indo-European languages but common to other language families in South Asia. Prominent examples include: phonologically, the introduction of retroflexes, which alternate with dentals; morphologically, the formation of gerunds; and syntactically, the use of a quotative marker ("iti").[1]

Such features, as well as the presence of non-Indo-European vocabulary, are attributed to a local substratum of the languages spoken around 1000 BCE in northwestern and northern South Asia: Dravidian, Munda and proto-Burushaski,[2] as well as another, lost, prefixing language which has been dubbed "Para-Munda".[3]

A few words in the Ṛigveda, and progressively more words in later Vedic texts, have been identified as loanwords. While they are principally from Dravidian, some forms are traceable to Munda[4] or Proto-Burushaski,[2] but quite a few have no sensible basis in any of these families, indicating a source in one or more lost languages, such as Para-Munda.[2]


Retroflex phonemes are now found throughout the Burushaski,[5][6] Nuristani,[7] Dravidian and Munda families. They are reconstructed for proto-Burushaski, proto-Dravidian and (to a minimal extent) for proto-Munda,[8] and are thus clearly an areal feature of the Indian subcontinent. They are not reconstructible for either Proto-Indo-European or Proto-Indo-Iranian, and they are also not found in Mitanni-Indo-Aryan loan words.

The acquisition of the phonological trait by early Indo-Aryan is thus unsurprising, but it does not immediately permit identification of the donor language. Since the adoption of a retroflex series does not affect poetic meter, it is impossible to say if it predates the early portions of the Rigveda or was a part of Indo-Aryan when the Rigvedic verses were being composed; however, it is certain that at the time of the redaction of the Rigveda (ca. 500 BC), the retroflex series had become part of Sanskrit phonology. There is a clear predominance of retroflexion in the Northwest (Nuristani, Dardic, Khotanese Saka, Burushaski), involving affricates, sibilants and even vowels (in Kalasha), compared to other parts of the subcontinent. It has been suggested that this points to the regional, northwestern origin of the phenomenon in Rigvedic Sanskrit.[9] A linguistic shift from Dravidian to Indic has also been more commonly suggested as an explanation.


Burrow compiled a list of approximately 500 foreign words in Sanskrit that he considered to be loans predominantly from Dravidian. Later in his career, he has revoked a substantial number of them[10]. Kuiper identified 383 Ṛgvedic words as non-Indo-Aryan—roughly 4% of its vocabulary— borrowed from Old Dravidian, Old Munda, and several other, lost languages. The Indo-Europeanist and Indologist Thieme has questioned Dravidian etymologies proposed for Vedic words, most of which he gives Indoaryan or Sanskrit etymologies, and condemned what he characterizes as a misplaced “zeal for hunting up Dravidian loans in Sanskrit”. Das even contends that there is “not a single case in which a communis opinio has been found confirming the foreign origin of a Rgvedic word”. Kuiper has answered that charge [11]. Burrow in turn has criticized the "resort to tortuous reconstructions in order to find, by hook or by crook, Indo-European explanations for Sanskrit words". Kuiper reasons that given the abundance of Indo-European comparative material—and the scarcity of Dravidian or Munda—the inability to clearly confirm whether the etymology of a Vedic word is Indo-European implies that it is not.[12] Recent reconstructions of Proto-Dravidian and Proto-Munda (Krishnamurti 2003, Anderson 2008) now help in distinguishing the traits of these languages from those of Indo-European in the evaluation of substrate and loan words.

Mayrhofer identifies a "prefixing" language as the source of many non-Indo-European words in the Rigveda, based on recurring prefixes like ka- or ki-, that have been compared to the Austro-Asiatic article (as seen in Khasi) by Template:Harvcoltxt. Examples include: kavandha "barrel", kākambīra a certain tree, kavaṣa "straddle-legged", kakardu "wooden stick", kapardin "with a hair-knot" kimīda a demon, śimidā a demoness, kilāsa "spotted, leprous", kiyāmbu a water plant, kīnāśa "ploughman", kumāra "boy", kulāya "nest", kuliśa "ax", kuluṅga "antelope" (Kuruṅga name of a chieftain of the Turvaśa).

However, post-Vedic words such as nāraṅgaḥ "orange" (first attested in the Sushruta Samhita, ca. 4th century AD) are often taken to be straightforward loans from Dravidian into Sanskrit. Since they belong to a later period, they are unsuited to establish the origin of the loans in Rigvedic Sanskrit.

Krishnamurti states: "Besides, the Ṛg Veda has used the gerund, not found in Avestan, with the same grammatical function as in Dravidian, as a non-finite verb for 'incomplete' action. Ṛg Vedic language also attests the use of iti as a quotative clause complementizer. All these features are not a consequence of simple borrowing but they indicate substratum influence (Kuiper 1991: ch 2)". However, such features are also found in the indigenous Burushaski language of the Pamirs and cannot be attributed only to Dravidian influence on the early Rigveda (Witzel 1999, Tikkanen 1987, 2005).

Vijendra Kashyap, one of the authors of Sahoo et al. (2006), states that the people of the Indian subcontinent are indigenous to South Asia, but that Indo-European languages are not, and that language change resulted from the migration of numerically small superstrate groups that are difficult to trace genetically.[13] Cavalli-Sforza (2000) states that "Archeology can verify the occurrence of migration only in exceptional cases" and identifies the introduction of Indo-European languages to India as an instance of language replacement, when the language of a population changes accompanied by only modest genetic effects.

Writing specifically about language contact phenomena, Template:Harvcoltxt state that there is strong evidence that Dravidian influenced Indic through "shift", that is, native Dravidian speakers learning and adopting Indic languages. Elst (1999) claims that the presence of the Brahui language, similarities between Elamite and Harappan script as well as similarities between Indo-Aryan and Dravidian indicate that these languages may have interacted prior to the spread of Indo-Aryans southwards and the resultant interaction of languages.This statement, however, neglects the undeciphered nature of both the Proto-Elamite and Indus inscriptions as well as the late introduction of Brahui into Baluchistan from Central India, about a thousand years ago (Hock, Elfenbein). Template:Harvcoltxt states that the most plausible explanation for the presence of Dravidian structural features in Old Indo-Aryan is that the majority of early Old Indo-Aryan speakers had a Dravidian mother tongue which they gradually abandoned. Even though the innovative traits in Indic could be explained by multiple internal explanations, early Dravidian influence is the only explanation that can account for all of the innovations at once – it becomes a question of explanatory parsimony; moreover, early Dravidian influence accounts for the several of the innovative traits in Indic better than any internal explanation that has been proposed.[14] This statement, however, neglects the influence of local Burushaski in the Northwest, the scene of the early Rigvedic hymns. It is already seen in some Rigvedic loanwords such as kilāṭa "beestings" (Witzel 1999).


A concern raised in the identification of the substrate is that there is a large time gap between the comparative materials, which can be seen as a serious methodological drawback- the syntax of the Rigveda is being compared with a reconstructed proto-Dravidian.[15] The first completely intelligible, datable, and sufficiently long and complete epigraphs that might be of some use in linguistic comparison are the Tamil inscriptions of the Pallava dynasty of about 550 c.e. (Zvelebil 1990), and the early Tamil Brahmi inscriptions starting in the second millennium BCE,[16]) about one millennium after the commonly accepted date for the Rigveda. Similarly there is much less material available for comparative Munda and the interval in their case is at least three millennia. However, the use of historical linguistics closes the perceived time gap between actually preserved Dravidian, Munda, etc. texts and the Rigveda. The comparison of securely reconstructed phonetical features and word formation (Krishnamurti 2003, Anderson 2008) allows statements regarding the donor languages that go much beyond what a comparison of actually attested later syntactical features may suggest. Non-Indo-Aryan elements (such as -s- following -u- in Rigvedic busa) are clearly in evidence (Kuiper 1991, Witzel 1999).


Although in modern times speakers of the various Dravidian languages have mainly occupied the southern portion of India, nothing definite is known about the ancient domain of the Dravidian parent speech. It is, however, a well-established and well-supported hypothesis that Dravidian speakers must have been widespread throughout India, including the northwest region.[17]

A number of features of the Dravidian languages appear in the Rigveda, the earliest known Indo-Aryan literary work, thus showing that the Dravidian languages must have been present in the area of the Indo-Aryan ones. Several scholars have demonstrated that pre-Indo-Aryan and pre-Dravidian bilingualism in India provided conditions for the far-reaching influence of Dravidian on the Indo-Aryan tongues in the spheres of phonology (e.g., the retroflex consonants, made with the tongue curled upward toward the palate), syntax (e.g., the frequent use of gerunds, which are nonfinite verb forms of nominal character, as in “by the falling of the rain”), and vocabulary (a number of Dravidian loanwords apparently appearing in the Rigveda itself). Thus a form of Proto-Dravidian, or perhaps Proto-North Dravidian, must have been extensive in northern India before the advent of Indo-Aryan languages. Apart from the survival of some islands of Dravidian speech, however, the process of replacement of the Dravidian languages by the Aryan tongues was entirely completed before the beginning of the Christian Era, after a period of bilingualism that must have lasted many centuries. Finally, the almost universal adoption of Indo-Aryan in the north and of Dravidian in the south has covered up the original linguistic diversity of India.[18]

Dravidian languages show extensive lexical (vocabulary) borrowing, but only a few traits of structural (either phonological or grammatical) borrowing, from the Indo-Aryan tongues. On the other hand, Indo-Aryan shows rather large-scale structural borrowing from Dravidian, but relatively few loanwords.[18]

While Dravidian languages are primarily confined to the South of India, there is a striking exception: the Brahui (which is spoken in parts of Baluchistan). It has been taken by some as the linguistic equivalent of a relict population, perhaps indicating that Dravidian languages were formerly much more widespread and were supplanted by the incoming Indo-Aryan languages.[19] However, Template:Harvcoltxt states that it is possible that the present-day northern location of Brahui results from later migration, and Elfenbein (1987, as cited in ) argues that the presence of Brahui in Baluchistan is explained by a late immigration that took place within the last thousand years.

Oraon (Kurukh) and Malto, too, are Dravidian languages not confined to South India; their range is Central and East India. But in this case too the prehistory and ancient history of populations speaking the language is unknown[citation needed].

Template:Harvcoltxt finds Dravidian loans only from the middle Rigvedic period, suggesting that linguistic contact between Indo-Aryan and Dravidian speakers only occurred as the Indo-Aryans expanded well into and beyond the Punjab. The Rigveda does, still, have a "small but precious handful of Vedic forms for which Dravidian etymologies are certain" (Zvelebil 1990:81), including kulāya "nest", kulpha "ankle", daṇḍa "stick", kūla "slope", bila "hollow", khala "threshing floor".

Munda and Para-Munda

Kuiper (1991) identifies one of the donor languages as Proto-Munda. Template:Harvcoltxt prefers "Para-Munda", denoting a hypothetical language related but not ancestral to modern Munda languages, and identifies it as "Harappan", the language of the Harappan civilization. He argues that the Rigveda shows signs of para-Munda influence in the earliest level and Dravidian only in later levels, suggesting that speakers of Para-Munda were the original inhabitants of Punjab and that the Indo-Aryans encountered speakers of Dravidian not before middle Rgvedic times. Krishnamurti deems the evidence too meagre for this proposal. Regarding Witzel's methodology in claiming Para-Munda origins, Krishnamurti states: "The main flaw in Witzel's argument is his inability to show a large number of complete, unanalyzed words from Munda borrowed into the first phase of the Ṛgveda. Such an extensive lexical borrowing must precede any effort on the part of the borrowers proceeding to the next stage of isolating prefixes and using them creatively with native stems. It would have been better if he said we did not know the true source of 300 or so early borrowings into the Ṛgveda.[20]" This statement, however, confuses Proto-Munda and Para-Munda and neglects the several hundred "complete, unanalyzed words" from Para-Munda or a similar prefixing language, adduced by Kuiper (1991) and Witzel (1999). Their number even increases in post-Rigvedic texts.

Munda linguist Gregory D. Anderson[8] states: "It is surprising that nothing in the way of quotations from a Munda language turned up in (the hundreds and hundreds of) Sanskrit and middle-Indic texts. There is also a surprising lack of borrowings of names of plants/animal/bird, etc. into Sanskrit (Zide and Zide 1976). Much of what has been proposed for Munda words in older Indic (e.g. Kuiper 1948) has been rejected by careful analysis. Some possible Munda names have been proposed, for example, Savara (Sora) or Khara, but ethnonymy is notoriously messy for the identification of language groups, and a single ethnonym may be adopted and used for linguistically rather different or entirely unrelated groups".

Substratum vs. adstratum

Hock 1975/1984/1996 and Tikkanen 1987 (as cited in Template:Harvcolnb) are quite open to considering that various syntactical developments in Indo-Aryan could have been the result of adstratum rather than the result of substrate influences.

About retroflexion, that is, according to Template:Harvcoltxt and Template:Harvcoltxt, most pronounced in the Northwest, Tikkanen states that "in view of the strictly areal implications of retroflexion and the occurrence of retroflexes in many early loanwords, it is hardly likely that Indo-Aryan retroflexion arose in a region that did not have a substratum with retroflexes."


See also

