Двуязычные и многоязычные электронные языковые ресурсы Иван А. Держанский  ( [email_address] ) Институт математики и информатики Болгарской академии наук Секция Математической лингвистики
Resources for language engineering lexical databases (LDBs) electronic dictionaries monolingual bilingual and multilingual corpora
C orpus annotation Def:  the process of adding linguistic information in an electronic form to a text corpus. Most common types: morphosyntactic (grammatical, PoS) annotation lemma annotation
PoS tagging Def:   the task of labelling each word in a sequence of words with its appropriate part-of-speech. Ambiguity: вероятно  ‘probable ( sg. n. ), probably’ вероятно   ->  P о S: adjective, Gender: neuter, Number: singular, Definiteness: no вероятно   ->  P о S: adverb, Type: adjectival Def  tagset: set of PoS tags
Electronic corpora of Bulgarian The first two electronic corpora of the Bulgarian language were created in the framework of two EU projects on language technologies: MULTEXT-East  ( http://nl.ijs.si/IME ); CONCEDE.
MULTEXT-East   The project MULTEXT-East  (Multilingual Text Tools and Corpora for Eastern and Central European Languages, 1995–1997)  produced resources for six Central and Eastern European languages: Bulgarian, Slovene, Czech, Roumanian, Hungarian, Estonian, as well as English (as the ‘hub language’ of the project).
MULTEXT-East (continued) The extended results of the project were made available in 1998, first on CD-ROM and then via TRACTOR, the TELRI  (Technology-Enhanced Learning in Research-led Institutions)  Research Archive of Computational Tools and Resources. Version 3 (2004) includes material in five more languages (Croatian, Lithuanian, Resian, Russian, Serbian).
MULTEXT-East (continued) The corpus of Bulgarian, developed according to the methodology and requirements of the project, contains three parts: Bulgarian Language-Specific Resources , a  Parallel Annotated 1984 Corpus , a  Comparative Corpus .
The  Parallel Annotated  1984  Corpus The  Parallel Annotated  1984  Corpus  consists of the Bulgarian translation of George Orwell’s novel  Nineteen Eighty-Four  (including approximately 87,000 words); Bulgarian-English aligned texts.
The  Parallel Annotated  1984  Corpus  ( continued ) The material was formatted as a well-structured, lemmatised, Corpus Encoding Standard (CES) corpus (Ide, 1998). That is, each word form is accompanied by the corresponding lemma and grammatical information that constitute its standard lexical description.
The  Parallel Annotated  1984  Corpus  ( continued ) The lexical descriptions for Bulgarian are in line with the terminology and the methodology used by MULTEXT. The corpus was marked and validated for alignment and sentence boundaries.
The  Comparative Corpus The  Comparative Corpus  contains two subsets of about 100,000 words, each consisting of fiction, comprising excerpts from two contemporary Bulgarian novels, and excerpts from newspaper text. The data was comparable across the six languages, in terms of the number and size of texts.
The  Comparative Corpus  (continued) The entire multilingual  Comparative Corpus  was prepared in CES (Corpus Encoding Standard) format, manually or using ad-hoc tools, and was automatically annotated for tokenisation, sentence boundaries, and part of speech using the project tools.
Bulgarian Language-Specific Resources The  Bulgarian Language-Specific Resources  are data required by the segmentation procedure, morphological analyser and disambiguator. This includes a lexical list and lists of special tokens (frequent abbreviations and names, titles, patterns for proper names, etc.) with their types.
The lexicon The lexical list ( lexicon ) contains about 242,000 lemmata. Each lemma in the lexicon is associated with its part(s) of speech and lexical characteristics. 156,000 morpho­syntactic descriptions were provided for Bulgarian.
The lexicon (continued) Each lexicon entry includes the following information: word form; lemma; part of speech; further morphological information (feature values).
The lexicon (continued) part of speech the traditional set of  10  parts of speech punctuation abbreviations numbers written in digits unidentified objects (residuals) same system for all languages of the project (though different interpretations)
Lexicography ↔ Linguistic Theory lexicography  requires  linguistic theory (analysis, methodology) but also serves as a touchstone, because what can be represented must have been studied, understood, formalised to a sufficient extent lexicography  supports  linguistic theory (data for research)
Dictionary ↔ Grammar mutually complementary, mutually indispensable components of integrated linguistic description lexicographic type  (unification) lexicographic portrait  ( individualisation )
Computational lexicography digital (machine-readable) dictionaries: digital versions of traditional dictionaries for human use computer dictionaries as components of information systems
Advantages of digital dictionaries size not an issue potential for infinite growth in depth and breadth (a dictionary needn’t be small, medium or large by design) many purposes served (explanatory dictionary, grammatical dictionary, dictionary of synonyms, antonyms, phraseology, etymology, etc., all as one integrated system)
Advantages of digital dictionaries (continued) easy update possible, incl. by continued distributed collective effort (wiki-style) flexible search (incl. bidirectional) and presentation of results audio-, video- etc. material can be added requirement:  definitions must be simpler, but at the same time more comprehensive
Dictionary (definition) an aggregate of linguistic units (forms) established in the language system as represented by the usage of a certain language community, put in a predetermined order and accompanied by formal (orthographic, phonetic, grammatical, etymological, stylistic, etc.) and semantic information on the linguistic units themselves or on the denoted entities or phenomena,
Dictionary (definition, continued) an aggregate of linguistic units (forms) put in a predetermined order and accompanied by formal and semantic information, arranged and ordered in a certain way within the  entry , …  almost always supplemented by  auxiliary material introduction, criteria, sources, list of abbreviations, structure of the dictionary entry, grammar tables
Structure of the dictionary entry   register part  (on the left) interpretation part  (on the right) all the register parts together form the dictionary’s  register the set of rules and methods used when composing the entries forms the  metalanguage
The register   designing the register (needn’t be a one-time event in the case of an electronic dictionary) from other dictionaries from a corpus of texts editing the register: eliminating obsolete words, arbitrary neologisms, suspected non-words automatic extension: productive derivation made into procedures
Structural aspects of lexicography macrostructure:  nature and purpose of the dictionary, place within the typology of dictionaries, choice of register, choice of illustrations, order, metalanguage mediostructure:  relations between language units, e.g., derivation, families of words microstructure:  setup of the entry, hierarchy of meanings; requirements:  standardisation , economy, simplicity, completeness
An example of a lexical entry: CONCEDE Bulgarian dictionary <entry> <hw>цел</hw> <gen>ж.</gen> <struc type=&quot;Sense&quot; n=&quot;1&quot;> <def>Това, към което е насочена някаква дейност, към което някой се стреми; умисъл, намерение.</def> <eg><q>С каква цел отиваш в града?</q></eg> <eg><q>Вървя без цел.</q></eg> <eg><q>Постигнах целта си.</q></eg> <eg><q>Целта оправдава средствата.</q></eg></struc> <struc type=&quot;Sense&quot; n=&quot;2&quot;> <def>Предмет или точка, в която някой стреля, към която е насочено определено действие, движение, удар и под.; прицел.</def> <eg><q>Улучих целта.</q></eg></struc> <struc type=&quot;Phrases&quot;> <struc type=&quot;Phrase&quot; n=&quot;1&quot;><orth>Имам (нямам) [за] цел.</orth> <def>стремя се (не се стремя) към нещо.</def> <eg><q>Нямам за цел да му навредя.</q></eg></struc> <struc type=&quot;Phrase&quot; n=&quot;2&quot;><orth>Попадам в целта.</orth> <def>улучвам, умервам.</def></struc> </struc> <etym><lang>нем.</lang>&gt;<lang>рус.</lang></etym> </entry>
An example of a lexical entry (zoom, part 1: head word, gender) <entry> <hw> цел </hw> <gen> ж. </gen> [ … ] </entry>
An example of a lexical entry (zoom, part 2) <struc type=&quot;Sense&quot; n=&quot;1&quot;> <def> Това, към което е насочена някаква дейност, към което някой се стреми; умисъл, намерение. </def> <eg><q> С каква цел отиваш в града? </q></eg> <eg><q> Вървя без цел. </q></eg> <eg><q> Постигнах целта си. </q></eg> <eg><q> Целта оправдава средствата. </q></eg></struc>
An example of a lexical entry (zoom, part 3) <struc type=&quot;Sense&quot; n=&quot;2&quot;> <def> Предмет или точка, в която някой стреля, към която е насочено определено действие, движение, удар и под.; прицел. </def> <eg><q> Улучих целта. </q></eg></struc>
An example of a lexical entry (zoom, part 4) <struc type=&quot;Phrases&quot;> <struc type=&quot;Phrase&quot; n=&quot;1&quot;><orth> Имам (нямам) [за] цел. </orth> <def> стремя се (не се стремя) към нещо. </def> <eg><q> Нямам за цел да му навредя. </q></eg></struc> <struc type=&quot;Phrase&quot; n=&quot;2&quot;><orth> Попадам в целта. </orth> <def> улучвам, умервам. </def></struc> </struc>
An example of a lexical entry (zoom, part 5: etymology) <entry> […] <etym><lang> нем. </lang> &gt; <lang> рус. </lang></etym> </entry>
ABBYY Lingvo (Ru–It)
ABBYY Lingvo ( Ru –Et) цель [m1][trn]eesmärk, märk, otstarve, siht[/trn][/m]
Why is order important?
 
Why is order important?  (continued) Ингредиенты :   сахар, глюкоза, мука,  милая , корица, какао, сода, маргарин
Why is order important?  (continued) Ингредиенты:   бикарбонат натрия, ароматы,  студень , молочный порошок, эмульгатор
wash  (En – Ru )
honey  (En – Ru )
jelly  (En – Ru )
Digital grammatical dictionaries   modelling of inflexion (essential for inflecting languages) word form ↔ lemma + grammatical meaning built upon a formal model of inflexion: a division of the set of words into inflexional paradigmatic classes (non-intersecting subsets with algorithmically described rules)
Bi- and multilingual dictionaries translation: most general member(s) of the corresponding synset grammatical semantics (incl. valency, subcategorisation) pragmatic context (sublanguage of most frequent usage)
Bi- and multilingual dictionaries (continued) bilingual dictionary: two integrated linguistic systems (explanatory dictionary, grammatical dictionary, dictionary of synonyms, of antonyms, of phraseology) complemented by comparable monolingual corpora and a parallel bilingual corpus and linked by an interface
Bi- and multilingual dictionaries (continued) Integrating a synonym and a translation linguistic system: EuroWordNet (an assembly of WordNets using a common ontology and indexing)
Bi- and multilingual dictionaries (continued) multilingual dictionary: a set of pairs of bilingual dictionaries interlingua one of the target languages an external natural language an artificial but speakable language (e.g., Esperanto) a semantic interlingua (a digital concept dictionary)
Plans of the joint research project “Semantics and Contrastive linguistics with a focus on a bilingual electronic dictionary” between IMI—BAS and ISS—PAS : Bulgarian –Polish/Polish–Bulgarian dictionaries Bulgarian–Polish–Ukrainian dictionary Bulgarian–Polish–Ukrainian–Lithuanian … …  more?
Bulgarian –Polish/Polish–Bulgarian dictionaries … on the basis of (1) the most recent paper bilingual dictionaries (1987, 1988) volume ≈60 000 words already dated of questionable reliability to boot
Bulgarian –Polish/Polish–Bulgarian dictionaries … on the basis of (2) a bilingual corpus (3   000 000 words envisaged) consisting of fiction Polish to Bulgarian (easy to find) Bulgarian to Polish (hard to find) 3rdLg original, translated into  Bg  and Pl EU/EC documents texts in Bulgarian and Polish of similar sizes excerpts from newspapers literary works available on the Internet
Bulgarian –Polish dictionary (after OCR and proofreading) претовар|я, -иш  vp.  v.   претоварям претоп|я, -иш  vp.  v.   претапям, претопявам претопява|м, -ш  vi .   przetapiać;  przen.  asymilować претор, -и  т  hist.   pretor  m преториан|ец, -ци  т   pretorianin  m преториански  adi.  pretoriański I   преточ|а, -иш  vp.  v.  npe такам II  преточ|а, -иш  vp.  v.   II  преточвам I   преточвам  v.   претакам II   преточва | м, -ш   vi.  ostrzyć nadmiernie претрайва|м, -ш  vi.  v.  npe трая претра|я, -еш  vp. lud.  przetrwać претрива|м, -ш  vi.  przecierać, przecinać, przepiłowywać;  ~м   праговете  wycieram (obijam) cudze progi претри|я, -еш  vp.  v.   претривам
Bulgarian –Polish dictionary (after first round of markup) [b]претовар|я, -иш[/b]  [i]vp.[/i] v. [b] претоварям[/b] [b]претоп|я, -иш[/b]  [i]vp.[/i] v. [b] претапям, претопявам[/b] [b]претопява|м, -ш[/b]  [i]vi .[/i] przetapiać; [i]przen.[/i] asymilować [b]претор, -и[/b] [i]m[/i] [i] hist.[/i] [b]pretor[/b] [i]m[/i] [b]преториан|ец, -ци[/b] [i]m[/i]  pretorianin [i]m[/i] [b]преториански[/b]  [i]adi.[/i] pretoriański [b] I  преточ|а, -иш[/b]  [i]vp.[/i] v. [b] пре такам[/b] [b] II  преточ|а, -иш[/b]  [i]vp.[/i] v. [b] II  преточвам[/b] [b] I  преточвам[/b]  v. [b] претакам [/b] [b]II  преточва | м, -ш[/b]  [i]vi.[/i] [b]ostrzyć nadmiernie [/b] [b] претрайва|м, -ш[/b]  [i]vi.[/i] v. [b] прет рая[/b] [b]претра|я, -еш[/b]  [i]vp.[/i] [i]lud.[/i] przetrwać [b]претрива|м, -ш[/b]  [i]vi.[/i] przecierać, przecinać, przepiłowywać;  [b]~м праговете[/b]  wycieram (obijam) cudze progi [b]претри|я, -еш[/b]  [i]vp.[/i] v. [b] претривам[/b]
Adding procedurality? по газва|м, -ш   vi.  deptać, brodzić  (trochę) по гор|я, -иш   vp.  popalić się  (trochę, krótko) ; […] по гъделичква|м, -ш   vi.  łaskotać, łechtać  (trochę, lekko) по гълта|м, -ш   vp.  łyknąć  trochę по гърмява|м, -ш   vi.  pogrzmiewać, grzmieć  od czasu do czasu , […] по дадва|м, -ш   vi.   lud.  dawać  po trochę, od czasu do czasu
Polyprefixation по за газ|я, -иш   vp.  zabrnąć, wpaść w ciężkie położenie  (trochę) по за гатн|а, -еш   vp.  napomknąć, wspomnieć  mimochodem по за гледа|м, -ш   vp.  spoglądnąć, spojrzeć, popatrzyć  (trochę, od czasu do czasu) по на тежава|м, -ш   vi.  stawać się  trochę  cięższym, ciążyć  trochę по на тисн|а, -еш   vp.  nacisnąć, przycisnąć  trochę по на товар|я, -иш   vp.  naładować  trochę , obciążyć, obarczyć  trochę
Adding procedurality? (continued) пре търкаля|м, -ш   vp.  przetoczyć,  przesunąć  tocząc Likewise perhaps: evaluatives words for females abstract nouns …  and other productive derivatives
Applications of the electronic LDB lexicography: creation of electronic bilingual dictionaries for research and teaching specialised reference works, e.g., valency dictionaries education :  training skills of independent investigation with the help of the computer

Ivan Derganskyi

  • 1.
    Двуязычные и многоязычныеэлектронные языковые ресурсы Иван А. Держанский ( [email_address] ) Институт математики и информатики Болгарской академии наук Секция Математической лингвистики
  • 2.
    Resources for languageengineering lexical databases (LDBs) electronic dictionaries monolingual bilingual and multilingual corpora
  • 3.
    C orpus annotationDef: the process of adding linguistic information in an electronic form to a text corpus. Most common types: morphosyntactic (grammatical, PoS) annotation lemma annotation
  • 4.
    PoS tagging Def: the task of labelling each word in a sequence of words with its appropriate part-of-speech. Ambiguity: вероятно ‘probable ( sg. n. ), probably’ вероятно -> P о S: adjective, Gender: neuter, Number: singular, Definiteness: no вероятно -> P о S: adverb, Type: adjectival Def tagset: set of PoS tags
  • 5.
    Electronic corpora ofBulgarian The first two electronic corpora of the Bulgarian language were created in the framework of two EU projects on language technologies: MULTEXT-East ( http://nl.ijs.si/IME ); CONCEDE.
  • 6.
    MULTEXT-East The project MULTEXT-East (Multilingual Text Tools and Corpora for Eastern and Central European Languages, 1995–1997) produced resources for six Central and Eastern European languages: Bulgarian, Slovene, Czech, Roumanian, Hungarian, Estonian, as well as English (as the ‘hub language’ of the project).
  • 7.
    MULTEXT-East (continued) Theextended results of the project were made available in 1998, first on CD-ROM and then via TRACTOR, the TELRI (Technology-Enhanced Learning in Research-led Institutions) Research Archive of Computational Tools and Resources. Version 3 (2004) includes material in five more languages (Croatian, Lithuanian, Resian, Russian, Serbian).
  • 8.
    MULTEXT-East (continued) Thecorpus of Bulgarian, developed according to the methodology and requirements of the project, contains three parts: Bulgarian Language-Specific Resources , a Parallel Annotated 1984 Corpus , a Comparative Corpus .
  • 9.
    The ParallelAnnotated 1984 Corpus The Parallel Annotated 1984 Corpus consists of the Bulgarian translation of George Orwell’s novel Nineteen Eighty-Four (including approximately 87,000 words); Bulgarian-English aligned texts.
  • 10.
    The ParallelAnnotated 1984 Corpus ( continued ) The material was formatted as a well-structured, lemmatised, Corpus Encoding Standard (CES) corpus (Ide, 1998). That is, each word form is accompanied by the corresponding lemma and grammatical information that constitute its standard lexical description.
  • 11.
    The ParallelAnnotated 1984 Corpus ( continued ) The lexical descriptions for Bulgarian are in line with the terminology and the methodology used by MULTEXT. The corpus was marked and validated for alignment and sentence boundaries.
  • 12.
    The ComparativeCorpus The Comparative Corpus contains two subsets of about 100,000 words, each consisting of fiction, comprising excerpts from two contemporary Bulgarian novels, and excerpts from newspaper text. The data was comparable across the six languages, in terms of the number and size of texts.
  • 13.
    The ComparativeCorpus (continued) The entire multilingual Comparative Corpus was prepared in CES (Corpus Encoding Standard) format, manually or using ad-hoc tools, and was automatically annotated for tokenisation, sentence boundaries, and part of speech using the project tools.
  • 14.
    Bulgarian Language-Specific ResourcesThe Bulgarian Language-Specific Resources are data required by the segmentation procedure, morphological analyser and disambiguator. This includes a lexical list and lists of special tokens (frequent abbreviations and names, titles, patterns for proper names, etc.) with their types.
  • 15.
    The lexicon Thelexical list ( lexicon ) contains about 242,000 lemmata. Each lemma in the lexicon is associated with its part(s) of speech and lexical characteristics. 156,000 morpho­syntactic descriptions were provided for Bulgarian.
  • 16.
    The lexicon (continued)Each lexicon entry includes the following information: word form; lemma; part of speech; further morphological information (feature values).
  • 17.
    The lexicon (continued)part of speech the traditional set of 10 parts of speech punctuation abbreviations numbers written in digits unidentified objects (residuals) same system for all languages of the project (though different interpretations)
  • 18.
    Lexicography ↔ LinguisticTheory lexicography requires linguistic theory (analysis, methodology) but also serves as a touchstone, because what can be represented must have been studied, understood, formalised to a sufficient extent lexicography supports linguistic theory (data for research)
  • 19.
    Dictionary ↔ Grammarmutually complementary, mutually indispensable components of integrated linguistic description lexicographic type (unification) lexicographic portrait ( individualisation )
  • 20.
    Computational lexicography digital(machine-readable) dictionaries: digital versions of traditional dictionaries for human use computer dictionaries as components of information systems
  • 21.
    Advantages of digitaldictionaries size not an issue potential for infinite growth in depth and breadth (a dictionary needn’t be small, medium or large by design) many purposes served (explanatory dictionary, grammatical dictionary, dictionary of synonyms, antonyms, phraseology, etymology, etc., all as one integrated system)
  • 22.
    Advantages of digitaldictionaries (continued) easy update possible, incl. by continued distributed collective effort (wiki-style) flexible search (incl. bidirectional) and presentation of results audio-, video- etc. material can be added requirement: definitions must be simpler, but at the same time more comprehensive
  • 23.
    Dictionary (definition) anaggregate of linguistic units (forms) established in the language system as represented by the usage of a certain language community, put in a predetermined order and accompanied by formal (orthographic, phonetic, grammatical, etymological, stylistic, etc.) and semantic information on the linguistic units themselves or on the denoted entities or phenomena,
  • 24.
    Dictionary (definition, continued)an aggregate of linguistic units (forms) put in a predetermined order and accompanied by formal and semantic information, arranged and ordered in a certain way within the entry , … almost always supplemented by auxiliary material introduction, criteria, sources, list of abbreviations, structure of the dictionary entry, grammar tables
  • 25.
    Structure of thedictionary entry register part (on the left) interpretation part (on the right) all the register parts together form the dictionary’s register the set of rules and methods used when composing the entries forms the metalanguage
  • 26.
    The register designing the register (needn’t be a one-time event in the case of an electronic dictionary) from other dictionaries from a corpus of texts editing the register: eliminating obsolete words, arbitrary neologisms, suspected non-words automatic extension: productive derivation made into procedures
  • 27.
    Structural aspects oflexicography macrostructure: nature and purpose of the dictionary, place within the typology of dictionaries, choice of register, choice of illustrations, order, metalanguage mediostructure: relations between language units, e.g., derivation, families of words microstructure: setup of the entry, hierarchy of meanings; requirements: standardisation , economy, simplicity, completeness
  • 28.
    An example ofa lexical entry: CONCEDE Bulgarian dictionary <entry> <hw>цел</hw> <gen>ж.</gen> <struc type=&quot;Sense&quot; n=&quot;1&quot;> <def>Това, към което е насочена някаква дейност, към което някой се стреми; умисъл, намерение.</def> <eg><q>С каква цел отиваш в града?</q></eg> <eg><q>Вървя без цел.</q></eg> <eg><q>Постигнах целта си.</q></eg> <eg><q>Целта оправдава средствата.</q></eg></struc> <struc type=&quot;Sense&quot; n=&quot;2&quot;> <def>Предмет или точка, в която някой стреля, към която е насочено определено действие, движение, удар и под.; прицел.</def> <eg><q>Улучих целта.</q></eg></struc> <struc type=&quot;Phrases&quot;> <struc type=&quot;Phrase&quot; n=&quot;1&quot;><orth>Имам (нямам) [за] цел.</orth> <def>стремя се (не се стремя) към нещо.</def> <eg><q>Нямам за цел да му навредя.</q></eg></struc> <struc type=&quot;Phrase&quot; n=&quot;2&quot;><orth>Попадам в целта.</orth> <def>улучвам, умервам.</def></struc> </struc> <etym><lang>нем.</lang>&gt;<lang>рус.</lang></etym> </entry>
  • 29.
    An example ofa lexical entry (zoom, part 1: head word, gender) <entry> <hw> цел </hw> <gen> ж. </gen> [ … ] </entry>
  • 30.
    An example ofa lexical entry (zoom, part 2) <struc type=&quot;Sense&quot; n=&quot;1&quot;> <def> Това, към което е насочена някаква дейност, към което някой се стреми; умисъл, намерение. </def> <eg><q> С каква цел отиваш в града? </q></eg> <eg><q> Вървя без цел. </q></eg> <eg><q> Постигнах целта си. </q></eg> <eg><q> Целта оправдава средствата. </q></eg></struc>
  • 31.
    An example ofa lexical entry (zoom, part 3) <struc type=&quot;Sense&quot; n=&quot;2&quot;> <def> Предмет или точка, в която някой стреля, към която е насочено определено действие, движение, удар и под.; прицел. </def> <eg><q> Улучих целта. </q></eg></struc>
  • 32.
    An example ofa lexical entry (zoom, part 4) <struc type=&quot;Phrases&quot;> <struc type=&quot;Phrase&quot; n=&quot;1&quot;><orth> Имам (нямам) [за] цел. </orth> <def> стремя се (не се стремя) към нещо. </def> <eg><q> Нямам за цел да му навредя. </q></eg></struc> <struc type=&quot;Phrase&quot; n=&quot;2&quot;><orth> Попадам в целта. </orth> <def> улучвам, умервам. </def></struc> </struc>
  • 33.
    An example ofa lexical entry (zoom, part 5: etymology) <entry> […] <etym><lang> нем. </lang> &gt; <lang> рус. </lang></etym> </entry>
  • 34.
  • 35.
    ABBYY Lingvo (Ru –Et) цель [m1][trn]eesmärk, märk, otstarve, siht[/trn][/m]
  • 36.
    Why is orderimportant?
  • 37.
  • 38.
    Why is orderimportant? (continued) Ингредиенты : сахар, глюкоза, мука, милая , корица, какао, сода, маргарин
  • 39.
    Why is orderimportant? (continued) Ингредиенты: бикарбонат натрия, ароматы, студень , молочный порошок, эмульгатор
  • 40.
    wash (En– Ru )
  • 41.
    honey (En– Ru )
  • 42.
    jelly (En– Ru )
  • 43.
    Digital grammatical dictionaries modelling of inflexion (essential for inflecting languages) word form ↔ lemma + grammatical meaning built upon a formal model of inflexion: a division of the set of words into inflexional paradigmatic classes (non-intersecting subsets with algorithmically described rules)
  • 44.
    Bi- and multilingualdictionaries translation: most general member(s) of the corresponding synset grammatical semantics (incl. valency, subcategorisation) pragmatic context (sublanguage of most frequent usage)
  • 45.
    Bi- and multilingualdictionaries (continued) bilingual dictionary: two integrated linguistic systems (explanatory dictionary, grammatical dictionary, dictionary of synonyms, of antonyms, of phraseology) complemented by comparable monolingual corpora and a parallel bilingual corpus and linked by an interface
  • 46.
    Bi- and multilingualdictionaries (continued) Integrating a synonym and a translation linguistic system: EuroWordNet (an assembly of WordNets using a common ontology and indexing)
  • 47.
    Bi- and multilingualdictionaries (continued) multilingual dictionary: a set of pairs of bilingual dictionaries interlingua one of the target languages an external natural language an artificial but speakable language (e.g., Esperanto) a semantic interlingua (a digital concept dictionary)
  • 48.
    Plans of thejoint research project “Semantics and Contrastive linguistics with a focus on a bilingual electronic dictionary” between IMI—BAS and ISS—PAS : Bulgarian –Polish/Polish–Bulgarian dictionaries Bulgarian–Polish–Ukrainian dictionary Bulgarian–Polish–Ukrainian–Lithuanian … … more?
  • 49.
    Bulgarian –Polish/Polish–Bulgarian dictionaries… on the basis of (1) the most recent paper bilingual dictionaries (1987, 1988) volume ≈60 000 words already dated of questionable reliability to boot
  • 50.
    Bulgarian –Polish/Polish–Bulgarian dictionaries… on the basis of (2) a bilingual corpus (3 000 000 words envisaged) consisting of fiction Polish to Bulgarian (easy to find) Bulgarian to Polish (hard to find) 3rdLg original, translated into Bg and Pl EU/EC documents texts in Bulgarian and Polish of similar sizes excerpts from newspapers literary works available on the Internet
  • 51.
    Bulgarian –Polish dictionary(after OCR and proofreading) претовар|я, -иш vp. v. претоварям претоп|я, -иш vp. v. претапям, претопявам претопява|м, -ш vi . przetapiać; przen. asymilować претор, -и т hist. pretor m преториан|ец, -ци т pretorianin m преториански adi. pretoriański I преточ|а, -иш vp. v. npe такам II преточ|а, -иш vp. v. II преточвам I преточвам v. претакам II преточва | м, -ш vi. ostrzyć nadmiernie претрайва|м, -ш vi. v. npe трая претра|я, -еш vp. lud. przetrwać претрива|м, -ш vi. przecierać, przecinać, przepiłowywać; ~м праговете wycieram (obijam) cudze progi претри|я, -еш vp. v. претривам
  • 52.
    Bulgarian –Polish dictionary(after first round of markup) [b]претовар|я, -иш[/b] [i]vp.[/i] v. [b] претоварям[/b] [b]претоп|я, -иш[/b] [i]vp.[/i] v. [b] претапям, претопявам[/b] [b]претопява|м, -ш[/b] [i]vi .[/i] przetapiać; [i]przen.[/i] asymilować [b]претор, -и[/b] [i]m[/i] [i] hist.[/i] [b]pretor[/b] [i]m[/i] [b]преториан|ец, -ци[/b] [i]m[/i] pretorianin [i]m[/i] [b]преториански[/b] [i]adi.[/i] pretoriański [b] I преточ|а, -иш[/b] [i]vp.[/i] v. [b] пре такам[/b] [b] II преточ|а, -иш[/b] [i]vp.[/i] v. [b] II преточвам[/b] [b] I преточвам[/b] v. [b] претакам [/b] [b]II преточва | м, -ш[/b] [i]vi.[/i] [b]ostrzyć nadmiernie [/b] [b] претрайва|м, -ш[/b] [i]vi.[/i] v. [b] прет рая[/b] [b]претра|я, -еш[/b] [i]vp.[/i] [i]lud.[/i] przetrwać [b]претрива|м, -ш[/b] [i]vi.[/i] przecierać, przecinać, przepiłowywać; [b]~м праговете[/b] wycieram (obijam) cudze progi [b]претри|я, -еш[/b] [i]vp.[/i] v. [b] претривам[/b]
  • 53.
    Adding procedurality? погазва|м, -ш vi. deptać, brodzić (trochę) по гор|я, -иш vp. popalić się (trochę, krótko) ; […] по гъделичква|м, -ш vi. łaskotać, łechtać (trochę, lekko) по гълта|м, -ш vp. łyknąć trochę по гърмява|м, -ш vi. pogrzmiewać, grzmieć od czasu do czasu , […] по дадва|м, -ш vi. lud. dawać po trochę, od czasu do czasu
  • 54.
    Polyprefixation по загаз|я, -иш vp. zabrnąć, wpaść w ciężkie położenie (trochę) по за гатн|а, -еш vp. napomknąć, wspomnieć mimochodem по за гледа|м, -ш vp. spoglądnąć, spojrzeć, popatrzyć (trochę, od czasu do czasu) по на тежава|м, -ш vi. stawać się trochę cięższym, ciążyć trochę по на тисн|а, -еш vp. nacisnąć, przycisnąć trochę по на товар|я, -иш vp. naładować trochę , obciążyć, obarczyć trochę
  • 55.
    Adding procedurality? (continued)пре търкаля|м, -ш vp. przetoczyć, przesunąć tocząc Likewise perhaps: evaluatives words for females abstract nouns … and other productive derivatives
  • 56.
    Applications of theelectronic LDB lexicography: creation of electronic bilingual dictionaries for research and teaching specialised reference works, e.g., valency dictionaries education : training skills of independent investigation with the help of the computer