Novosti:Project meeting:
2007-05-XX, 9:30, FFZG, B-003
Paper Computational Linguistic Models and Language Technologies for Croatian by Bojana Dalbelo Bašić, Zdravko Dovedan, Ida Raffaelli, Sanja Seljan and Marko Tadić published in the Proceedings of ITI2007 conference withing the Language Technologies section.
There are two basic starting points for this project:
The first hypothesis of this research project is that the comprehensive elaboration of theoretical postulates of lexical semantics on wide language samples is necessary for the elaboration of theoretical conceptions and a broader perspective of language as a system. The second hypothesis is that such insights are necessary for the building of a computational lexical database (Croatian WordNet). The third hypothesis is that such a lexical database is a fundamental prerequisite for the further development of tools for the natural language processing of Croatian.
The aim of the project is to build Croatian WordNet, a lexical database for the Croatian language. In order to retain the compatibility with wordnets from EuroWordNet 1, EuroWordNet 2 and BalkaNet and to enable multilingual browsing, Croatian Wordnet will be based on elements common to wordnets involved in these projects (set of base concepts, Top ontology. ILI records). A systematic attention will be paid to language-specific lexical semantic structures in further phases of the Croatian WordNet building. Concerning its conceived design and contents, Croatian Wordnet will be at the same time a thesaurus, a dictionary of synonyms and a valency lexicon of Croatian verbs in digital form. Croatian Wordnet will reflect lexical and conceptual structures of the Croatian language and provide data for possible confirmation or re-interpretation of theoretical postulates of lexical semantics relevant to Croatian language and at the same time enable elaboration of postulates and principles of semantics in general. Such a lexical database will also be the source of data for further development of natural language processing tools for the Croatian language.
Building of a national WordNet is important because no higher level of the development of language technologies could not be reached without such a computational model of lexico-semantical system. Beside the compatibility with languages from previous WordNet projects, Croatian WordNet will investigate and maintain specific lexico-conceptual structures of Croatian, givin the fundaments for further development and possible reinterpretation of the theoretical principles of lexical semantics of Croatian, but it will also enable the researches to get deeper insight into wider semantic problems.
Project number: MZOŠ 130-1300646-1002
Scientific area: humanities
Type of research: directed basic research
Priority research areas: Social sciences and humanities and Croatian identity
Project leader: professor Ida Raffaelli
Contribution to the short- and long-term goals of the development of the Republic of Croatia:
Keywords: WordNet, Croatian, Croatian WordNet, lexical semantics
Published works: bib.irb.hr