The development of typological databases, as carried out at the Max-Planck-Institutes in Nijmegen and Leipzig, makes it possible to use quantitative methods to test classification or reconstruction hypotheses. The goal of this strand is to describe and model language diversity as dynamic systems. A major issue is to find a balance between qualitative and quantitative approaches, and to integrate historical, regional and social variation, as well as discourse factors. The studies will be pursued in close collaboration with strand 1 (evolutionary phonology), 2 (grammar), 4 (language acquisition and bilinguism) and 6 (language resources). The research teams involved have first hand expertise in over 150 languages worldwide. The various studies will share the following methodology:
·fieldwork methods for data collection including a) task-oriented data collection for semi-spontaneous oriented speech; b) instrumented phonetic data collection with light equipment such as a portable echograph.
·processing and archiving oral data using international standards (cf. DoBeS, ELDP) as the LaCiTO archives with CRDO/Adonis.
·using (lexical and morphological) databases for statistical studies (to test language universals or phylums) which can be indexed with cartographic systems.
Our work will be organized into three substrands with a total of 10 workpackages.
Regarding Historical variation, we will put into question Greenberg (1963)’s classification of the languages of Africa, with the traditional comparative method as well as new quantitative methods on more and better descriptive data of African languages in order to prove (or not) the genetic unity of Central Sudanic on the one hand and Niger-Congo and its alleged lower branches on the other hand. This will result in new proposals for classifications and proto languages. The same methods will be used in order to identify any outer connections of the Austronesian family and to improve the internal classification of the languages of Taiwan. We will study significant typological variation within a number of language families for which we have the necessary expertise, viz. Austronesian, Afro-Asiatic, Tibeto-Burman, Dravidian and Iranian, and propose paths and models of typological change. We will provide documentary and descriptive work on previously undescribed languages of Africa, Asia and Oceania. Using sophisticated quantitative techniques, and machine learning on sizable databases, for classification and reconstruction will be a major goal and a privileged domain of collaboration inside and outside Labex teams.
Regarding language contact, doubts have arisen about strictly genetic models of language change ever since Indo-European linguistics, Sprachbund or Belt theories had started describing areas where genetically unrelated languages share crucial linguistic features due to continuous contact. However, little work has been done on the precise modeling of areal change, despite the important development of contact linguistics in the last 15 years. Some researchers involved in this program have starting building and processing various corpora for multifactorial studies, especially for languages spoken in French Guyana, and the development and extension to other areas of such an approach will be the goal of WP LC1. It necessitates dedicated tools and new annotated corpora to take the relevant linguistic features into account (in collaboration with strand 6). The question of Sprachbund will be examined in two areas where the mere notion is subject to debate. WP LC2 and LC3 will deal with the Macro-Sudan area (oral languages), on the one hand, and with the Caucasus-Iran-Anatolian area (languages with a long written tradition but heavy dialectal scattering), on the other hand. They will both be based on a quantitative approach (inventory of features, linguistic maps, lexical databases as in the ANR program ReFlex) and also on qualitative studies (fine-grained grammatical description, studies of categories or features involved and of their impact on the grammatical system as a whole and on language variation). The collaboration with strand 4 on Creole formation as second language acquisition will tackle the question of language emergence. These work packages will bring new quantitative data and shed new light on morphological and syntactic analyses and models.
Fieldwork, especially on oral languages, has increasingly taken discourse factors into account for grammatical description and linguistic typology. Phenomena usually referred to as Information structure, and sometimes considered a separate level of linguistic organization, play a crucial role in the grammatical organization of many languages: they not only determine word ordering but also different types of verb or argument marking. WP GD1 studies these grammatical markings on various sizable corpora annotated for information structure, which is an innovation both from the theoretical and methodological point of view, since it takes stakes on typological comparability. It draws on the results of the ESF project CorpAfroAs and extends it to other languages studied in the Labex. Studying the grammar-discourse interface will also permit to better account for some neglected phenomena in strictly descriptive approaches, such as information structure. GD2 will explore notions as focus, backgrounding/foregrounding, distinguishedness, saliency, and their role in the grammatical categorization of various languages. With the Labex corpora, and thanks to previous studies on well described languages (Armenian, Greek, Russian, Rumanian, Wolof, Hindi), we plan to have corpus-based results complemented by experimental testings such as those used for cognitive salience (in psycholinguistics). GD3 studies discourse markers in a cross-linguistic perspective with large annotated corpora (to allow access to extended contexts) as described in strand 6, available for example for Russian (RNC), or to be annotated (for Armenian for example).
GD4 focuses on a Tense/Aspect/Mode/Evidentiality cluster of verbal markers that are deeply rooted in cross-linguistic, diachronic and synchronic phenomena (like the perfect formation), and are well-known for receiving highly context-sensitive interpretative uses. The use of quantitative measures will allow the current project to make substantial progress in the study of TAME categories in interaction with Strands 2 and 6.
This strands gathers over 60 faculty and researchers, and 15 PhD students, from 10 partner teams, with numerous international collaborations. During the 10 years, we will launch new annotation operations for various languages corpora in collaboration with strand 6.
RT1 Reconstruction, internal classification and grammatical description in the world’s two biggest phyla: Niger-Congo and Austronesian (resp V Vydrin, I. Bril, L. Sagart): CRLAO, Llacan, Lacito
RT2 The central sudanic languages: genetic unit or affinity group? (resp. P. Boyeldieu)
RT3 Long term typological changes in languages (resp. F. Jacquesson): Lacito, HTL, MII
LC1 Multifactorial Analysis of language changes (resp. I. Leglise): Sedyl, LACITO, LLACAN, LLF
LC2 Areal phenomena in Northern sub-Saharan Africa (resp. D. Idiatov & M. Van de Velde): Llacan, LPP-P3
LC3 Caucasus-Iran-Anatolia belt (resp. A. Donabedian, P. Samvelian) Sedyl, MII, LLF
LC4 - Les parlers du Croissant : une aire de contact entre oc et oïl (responsable : Nicolas Quint)
GD1 Typology and annotation of information structure and grammatical relations (responsables K.Haude, M. Vanhove): LLacan, SEDYL, Lacito, CRLAO
GD2 La syntaxe de la phrase complexe dans les langues créoles (The Syntax of Complex Sentence in Creole Languages) (responsables : Stefano Manfredi & Nicolas Quint)
GD3 A cross-linguistic approach of discourse markers (resp. C. Bonnot): Sedyl, LLF