Linguistic Dynamics in the Greater Tunis Area: A Corpus-based Approach
Linguistic Dynamics in the Greater Tunis Area: A Corpus-based Approach
Disciplines
Linguistics and Literature (100%)
Keywords
-
Arabic dialectology,
Tunisia,
Lexicography,
Corpus linguistics
The majority of publications on the dialect of the Tunisian capital focus on sociolinguistics, phonological and morphological issues. In-depth studies on syntax are very scarce and there is no up-to-date dictionary available that is based on authentic spoken data. There are also very few relevant studies dedicated to the linguistic dynamics caused by recent demographic changes in the metropolitan area of Tunis. Today, the variety of the Arabic spoken by most inhabitants of the city has become a koiné that has not only spread to the vicinity of the city but is widely used throughout Tunisia. The project focuses on contemporary language. We will therefore strive to gather data from field recordings made with young speakers who have grown up in the city of Tunis but descend from parents who for the most part had come to the capital from other regions. As part of the project, we will create two digital language resources: (1) a corpus of unmonitored speech that will contain both conversations and narratives and (2) a dictionary based on this corpus and on previously published resources. Hitherto, no digital corpora for Arabic dialects have been made available that provide both linguistic transcriptions and translations. Besides serving as the primary source for the planned dictionary, the corpus will be used to investigate a number of selected topics dealing with the morphology and syntax of contemporary Tunis Arabic. As for the dictionary, it will not only contain all the lexicographic data of the corpus. Two additional sources are to be incorporated as well: data elicited from complementary interviews with young Tunisians and lexicographical material compiled from various published sources. The diachronic nature of the dictionary - the exploited printed sources also contain material from the middle of the 20th century and earlier - will enable us to analyse the linguistic dynamics in the realm of the lexicon as well. The project has been designed as an attempt to combine dialectological approaches with up-to-date text technological methodologies. The tools being developed and tested in the project will be beneficial for similar research questions in the field of Arabic studies and beyond. A particular feature of the project is the importance of the dictionary/corpus interface, which will allow the researcher to navigate from the corpus to the dictionary and vice versa. The project will be conducted in the spirit of open source and open access. Therefore, both the corpus and the lexicographical data of the project will be made available to the scientific community through a publicly accessible web interface, which will enable other scholars to do further analyses and to add their own material.
In terms of phonology and morphology, the dialect of Tunis belongs to the best-documented Arabic varieties in North Africa. However, many syntactic phenomena have largely remained unexplored so far. Among others, this is due to the absence of adequate digital corpora which could serve as basis for such investigations. The same holds true of the linguistic dynamics of the recent decades (having been brought about by far-reaching demographic changes in the greater Tunis area) which have largely been neglected by research. Another goal of the project was a modern dictionary containing not only the basic vocabulary but also neologisms of the Internet generation.During field trips to Tunis, we could produce more than 30 hours of recordings on the basis of which we created a digital, annotated and freely available corpus of roughly 95,000 words. This collection reflects mainly the language of the younger generation which differs not only in lexicon and phraseology but also with respect to pronunciation and word formation from the speech as described until the 1980s. By contrast to narrative texts which are quite common in Arabic dialectology, the corpus is mainly made up of spontaneous conversations. These allow for the investigation of a linguistic register characterised by the frequent use of discourse particles and other phenomena typical of oral communication. Research results published during the project show in an exemplary fashion for which research questions, in particular in the field of syntax, the new corpus can be used. The new infrastructure allows to search for single lexemes as well as to retrieve information about the frequency of items, word class or etymology of particular lexical items.A novel feature of the projects set-up is that the texts are directly linked to the digital dictionary in which words from the corpus as well as from the main historical grammar of Tunisian Arabic were systematically incorporated. Currently, the dictionary (which will be further expanded in the future) contains almost 8,500 entries which makes it the most sizeable freely available online dictionary of an Arabic dialect. The innovative strength of the project lies in the combination of traditional linguistic research in the field of Arabic Studies and modern text-technology which opened up entirely new perspectives. All the data were prepared and documented in accordance with up-to-date standards accepted by the community (in particular TEI) which ensures a high degree of reusability for future projects, in linguistic research as well as in the wider field of digital humanities. In the field of research-oriented lexicography, the TUNICO project could make considerable progress with respect to digital infrastructures. All tools and data are freely available for future research.
- Veronika Ritt-Benmimoun, Universität Wien , national collaboration partner
- Karlheinz Mörth, Österreichische Akademie der Wissenschaften , associated research partner
- Martine Vanhove, Centre National de la Recherche Scientifique - France
- Giuliano Mion, Universitá degli Studi "G.D. Annunzio" - Italy
Research Output
- 9 Publications
-
2016
Title Overabundance in the Arabic Dialect of Tunis: A Diachronic Study of Plural Formation. Type Conference Proceeding Abstract Author Dallaji I Conference Grigore, George & Bituna, Gabriel: Arabic Varieties: Far and Wide. Proceedings of the 11th Conference of AIDA - Bucharest, 2015. Bucharest: Editura Universitaii din Bucuresti -
2014
Title Laying the Foundations for a Diachronic Dictionary of Tunis Arabic. A First Glance at an Evolving New Language Resource. Type Conference Proceeding Abstract Author Dallaji I Et Al Conference Abel, Andrea; Vettori, Chiara; Ralli, Natascia (eds.): Proceedings of the XVI EURALEX International Congress: The User in Focus, Bolzano/Bozen, 15-19 July 2014. Bozen: EURAC Research -
2015
Title Towards a Diatopic Dictionary of Spoken Arabic Varieties: Challenges in Compiling the VICAV Dictionaries. Type Conference Proceeding Abstract Author Moerth Kh Conference Grigore, George & Bituna, Gabriel: Arabic Varieties: Far and Wide. Proceedings of the 11th Conference of AIDA - Bucharest, 2015. Bucharest: Editura Universitatii din Bucuresti -
0
Title Forms and Functions of Diminutives in Tunis Arabic (see AIDA 12). Type Other Author Dallaji I -
0
Title Interrogation in Tunis Arabic: The syntax of yes/no questions. Type Other Author Dallaji I -
0
Title Conditional clauses in Tunis Arabic. Type Other Author Dallaji I -
0
Title A Corpus of Tunis Arabic. Compiled by Ines Dallaji and Ines Gabsi. Type Other Author Dallaji I -
0
Title A Digital Dictionary of Tunis Arabic. Compiled by Ines , Ines Gabsi, Ines, Stephan Prochazka and Veronika. Type Other Author Dallaji I -
0
Title Linguistic Dynamics in Tunis - A first Approach. Type Other Author Dallaji I