TSU Arnold Chikobava Institute of Linguistics

The Department of the Computational Processing of Linguistic Data

The department was founded in spring of 2006, on the basis of two scientific laboratories: laboratories of dialectal atlas and computational linguistics. Accordingly main directions of development of this important structural unit of the institute were distinguished – creating of linguistic database and elaborating of theoretic basis of research.
The main priority of the department is: computational processing of Georgian, preparation of linguistic data of inter- and multidisciplinary research and providing its availability, forming of texts corpus and development of “corpus linguistics”.
The point of the department is to unite and to develop the computational linguistics, viz. intellectual resources existing in the sphere of computational processing of linguistic data. The department will extend the course elaborated at the institute in the 90s of the last century. It can be noted, that preparing for researching a language with new technology was started in this period. Fundamental dictionaries, which were basis for creation language formal systems, patterning and computer programs, were compiled under leadership of the director of that time Besarion Jorbenadze. Here are these dictionaries: “Verbal stems dictionary” (G. Gogolashvili, Ts. Kvantaliani, D. Shengelia), “Deverbative noun stems dictionary” (G. Gogolashvili, Ts. Kvantaliani, D. Shengelia), “Dictionary of Georgian formants and modal elements” (B. Jorbenadze, M. Kobaidze, M. Beridze), “Georgian noun roots dictionary” (B. Jorbenadze, N. Loladze, M. Kikonishvili) (is prepared for publication). Working on the dictionary of the Megrelian and Svan formants and modal elements (B. Jorbenadze, M. Kobaidze, M. Beridze) is also begun.

In parallel with preparing the dictionaries, working on problems of creation data base and a language patterning is also begun. Activity of Tedo Uturgaidze and his group (L. Chkhaidze, M. Tandashvili, K. Datukishvili, M. Manjgaladze and others) underlined formation and development of a new direction at the institute.
In 1989 a computational laboratory was founded (headed by Vladimer Kikilashvili), on the basis of which a laboratory of computational linguistics was founded in 1995. In 1995-2002 Manana Tandashvili and in 2002-2006 Ketevan Datukishvili headed the laboratory.
The conferences “Conceptual and computational models of a language” (1996, 1997, 1998) were held at the department. Since 2003 the conferences were renewed under the title ” Natural language processing (the Georgian Language and computational technologies)” (2003, 2004, 2005). Materials were published. Besides the institute of linguistics, other institutions took part in the conferences: Iv. Javakhishvili Tbilisi State University, Institute of Control Systems, Institute of Literature, Institute of Manuscripts, National Library, etc.
In 2004-2006 laboratory of the computational linguistics together with the laboratory of speech culture took part in grant project of the Georgian Academy of Sciences “Orthographic dictionary of proper names”.
Database of an electronic dictionary was created within the framework of the project. The base was prepared on the basis of the following dictionaries: “Dictionary of geographic names of the USSR”, “Dictionary of geographic names of foreign countries”, “Dictionary of foreign names”, “Orthographic dictionary of the Greek and Roman names”.
In 1991 laboratory of the “Linguistic atlas” was founded on the basis of the Kartvelian languages department and its main function was to prepare textual and lexical basis of the dialects of the Kartvelian languages. The laboratory planned to obtain and to elaborate dialectal material. But the developments which took place in the country blocked the planned field work for a long time. Through the laboratory carried on elaboration and theoretical study of the information accumulated throughout the expeditions during the previous years.
From 1991 M. Kobaidze and from 1995 G. Tsotsanidze headed the laboratory.

Since 2004 M. Beridze is a leader of the laboratory. From 1998 till 2006 the laboratory carried out preparing work of database of dialectal texts and dialectal vocabulary. The work was financed by the Georgian Academy of Sciences.
G.Tsotsanidze’s “Tushian dictionary” was prepared at the laboratory; several projects were realized through financial supporting of other organizations; activating and researching of interdisciplinary meaning of the material obtained with the linguistic goal were carried out for the first time: G.Tsotsanidze’s “Tushian chronicles” (with Tushian texts), (2004), M.Beridze’s “Direct reports from past” – Meskheti and Meskhians (with Meshkian texts), (1918-1944).
The work has already been begun for preparing the cartographic base of the dialectal vocabulary. The collaborators of the laboratory G.Tsotsanidze, N.Surmava, M.Beridze, L.Bakuradze are occupied in the project financed by the “Volkswagen Stiftung” (director of the project Jost Gippert).
In 2006, in spring the group presented a three-year term project “Georgian linguistic portrait – corpus of the Georgian dialectal texts” (director M.Beridze) announced by the Georgian scientific fund. The project was financed and is being realized successfully.
The main aims of the department for the future:
1. Computational linguistics
2. Corpus lingustics

In 2009 candidate of Philological Sciences Marine Beridze was voted as a head of the department of the computational processing of linguistic data.
In 2009, October the following scientific workers were voted:

Chief scientific worker:
Senior research worker:
Lia Bakuradze
Nargiza Surmava
Tsitsino Kvantaliani

Research workers:
Rusudan Landia
Maia Barikhashvili

Arnold Chikobava Institute of Linguistics © 2023