Reduction of lexical ambiguity

Download 221.21 Kb.
Size221.21 Kb.
  1   2   3   4   5   6   7   8   9   10

Éric Laporte

IGM, University of Marne-la-Vallée - CNRS
The resolution of lexical ambiguity is a prerequisite for many automatic procedures on written texts, even simpler ones. However, it is not an easily automatable task. We will examine on concrete examples the issues faced during the elaboration of lexical disambiguators. In order to estimate the potential of approaches, we will consider how disambiguating written texts before processing them brings about improvements to relevant applications. In this study we will take into account both linguistic and computational problems and show how they are connected1.
1. Lexical tagging of texts
Written text cannot undergo linguistic processing without the system having access to linguistic information about words. In order to make such information quickly and conveniently available, computer programs usually attach it to the words of the text themselves in the form of lexical tags. The lexical tag for a word, therefore, gathers all the information available about it and useful for the task to be performed, ranging from the very form occurring in the text to grammatical, morphological, syntactic and semantic data, according to the nature of the task. A basic step consists in segmenting the text, identifying minimal units and annotating them with tags. This task is called lexical analysis, lexical tagging or annotation.

The technical means of attaching lexical information to words can be classified into two types, depending on whether the information comes from an electronic dictionary or is deduced from information present in the text.

Dictionary-based tagging is simple: the program looks up the words in a dictionary that associates tags to all the words in the language. This approach was widely put to the test in the 1990's and yields the most reliable results, in so far as the dictionary conforms to actual usage of the language and is comprehensive enough. For inflected languages, like most European languages, inflected dictionaries are used. The number of entries in inflected-form dictionaries is larger than in conventional dictionaries, in which verbs are present only in the infinitive. Highly inflected languages, e.g. Polish, have several millions of inflected words. Even so, there exist dictionaries reasonably close to exhaustivity, that can be compressed into files of an order of magnitude of 1 Mb, making it possible to tag thousands of words per second. The Intex system contains efficient tools for dictionary compression and text tagging (M. Silberztein 1994). In this article, we will use usual Intex conventions for lexical tags: thus, in French, <actif,A:fp> represents the adjective actif in the feminine plural, i.e. actives.

Approaches to tagging without dictionary were implemented in numerous systems during the 1980's and 1990's. Such systems exploit information present in the text, such as final parts of words and contexts. For example, many French words in -ives are adjectives in the feminine plural. This rule correctly assigns the tag <A:fp> to the word actives in the sentence:

Share with your friends:
  1   2   3   4   5   6   7   8   9   10

The database is protected by copyright © 2017
send message

    Main page
mental health
health sciences
gandhi university
Rajiv gandhi
Chapter introduction
multiple choice
research methods
south africa
language acquisition
Relationship between
qualitative research
literature review
Curriculum vitae
early childhood
relationship between
Masaryk university
nervous system
Course title
young people
Multiple choice
bangalore karnataka
state university
Original article
academic performance
essay plans
social psychology
psychology chapter
Front matter
United states
Research proposal
sciences bangalore
Mental health
compassion publications
workplace bullying
publications sorted
comparative study
chapter outline
mental illness
Course outline
decision making
sciences karnataka
working memory
Literature review
clinical psychology
college students
systematic review
problem solving
research proposal
human rights
Learning objectives
karnataka proforma