Ovo web sjedište koristi tehnologiju "kolačića" (eng. cookie) da bi se korisnicima prikazao dinamičan i personaliziran sadržaj, te su oni nužni za ispravan rad. Ako nastavite pregledavati ove stranice, kolačići će biti korišteni u suradnji s vašim preglednikom Weba.
In recent years, statistical machine translation (SMT) has become the leading paradigm for machine translation. SMT systems are built by analyzing huge volumes of parallel corpus and learning translation models from this data. The quality of SMT systems largely depends on the size of training data. Since the majority of parallel data is in major languages, SMT systems for larger languages are of much better quality compared to systems for smaller languages. This quality gap is further deepened due to the complex linguistic structure of many smaller languages. Languages like Latvian, Lithuanian and Croatian (to name just a few) have complex morphological structure and free word order. To learn this complexity from corpus data, much larger volumes of training data are needed. Current systems are built on the data accessible on the web, but it is just a fraction of all parallel texts. Most of them still reside in the local systems of different corporations, public and private institutions, and desktops of individual users. The cost and the know-how required for building custom MT solutions deter many small-to-medium companies from utilizing the power of MT technologies.
Short description of the task performed by Croatian partner