:: Lexical
Semantics
in Building
the Croatian
WordNet ::
>> HOMEPAGE | P3
News, information... Novosti:

Project meeting:
2007-05-XX, 9:30, FFZG, B-003


Paper Computational Linguistic Models and Language Technologies for Croatian by Bojana Dalbelo Bašić, Zdravko Dovedan, Ida Raffaelli, Sanja Seljan and Marko Tadić published in the Proceedings of ITI2007 conference withing the Language Technologies section.

Lexical Semantics in Building the Croatian WordNet

There are two basic starting points for this project:

  1. there is no wide-ranging description of lexical relations within the elaborated theoretical framework of lexical semantics for the Croatian language so far. There are various theoretical and methodological studies of semantic fields and lexical-semantic relations, but a systematic and detailed semantic research of wide areas of Croatian vocabulary is still missing.
  2. there is no computational lexical database that could serve as the basis for such wide-coverage studies of vocabulary and at the same time provide data for the development of NLP tools for Croatian.

The first hypothesis of this research project is that the comprehensive elaboration of theoretical postulates of lexical semantics on wide language samples is necessary for the elaboration of theoretical conceptions and a broader perspective of language as a system. The second hypothesis is that such insights are necessary for the building of a computational lexical database (Croatian WordNet). The third hypothesis is that such a lexical database is a fundamental prerequisite for the further development of tools for the natural language processing of Croatian.

The aim of the project is to build Croatian WordNet, a lexical database for the Croatian language. In order to retain the compatibility with wordnets from EuroWordNet 1, EuroWordNet 2 and BalkaNet and to enable multilingual browsing, Croatian Wordnet will be based on elements common to wordnets involved in these projects (set of base concepts, Top ontology. ILI records). A systematic attention will be paid to language-specific lexical semantic structures in further phases of the Croatian WordNet building. Concerning its conceived design and contents, Croatian Wordnet will be at the same time a thesaurus, a dictionary of synonyms and a valency lexicon of Croatian verbs in digital form. Croatian Wordnet will reflect lexical and conceptual structures of the Croatian language and provide data for possible confirmation or re-interpretation of theoretical postulates of lexical semantics relevant to Croatian language and at the same time enable elaboration of postulates and principles of semantics in general. Such a lexical database will also be the source of data for further development of natural language processing tools for the Croatian language.

Building of a national WordNet is important because no higher level of the development of language technologies could not be reached without such a computational model of lexico-semantical system. Beside the compatibility with languages from previous WordNet projects, Croatian WordNet will investigate and maintain specific lexico-conceptual structures of Croatian, givin the fundaments for further development and possible reinterpretation of the theoretical principles of lexical semantics of Croatian, but it will also enable the researches to get deeper insight into wider semantic problems.

Project number: MZOŠ 130-1300646-1002

Scientific area: humanities

Type of research: directed basic research

Priority research areas: Social sciences and humanities and Croatian identity

Project leader: professor Ida Raffaelli

Contribution to the short- and long-term goals of the development of the Republic of Croatia:

Keywords: WordNet, Croatian, Croatian WordNet, lexical semantics

Published works: bib.irb.hr

Design LABOO WEB DESIGN i MARKO TADIĆ. Valid XHTML i CSS.