Corpus-based ambiguity resolution of biomedical terms using knowledge bases and machine learning.

Item

Title
Corpus-based ambiguity resolution of biomedical terms using knowledge bases and machine learning.
Identifier
AAI3063853
identifier
3063853
Creator
Liu, Hongfang.
Contributor
Adviser: Carol Friedman
Date
2002
Language
English
Publisher
City University of New York.
Subject
Computer Science | Engineering, Biomedical | Information Science
Abstract
With the widespread use of natural language processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that efficiently and accurately assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of resolving ambiguity in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating disambiguation rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. We propose a two-phase method to build a classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W using a biomedical terminology knowledge base, the UMLS, and free-text databases, and may include a semi-automatic process using clustering analysis and human supervision when we cannot automatically extract enough sense-tagged instances for W. The second phase automatically derives a classifier for W through supervised machine learning techniques using the derived sense-tagged corpus as a training set. Experimental results show that generally the method can be used to construct WSD classifiers for abbreviations with a high precision without the need of human supervision. It can be used to construct WSD classifiers for general biomedical terms with a set of unrelated senses with a high precision when there are enough instances extracted for each sense. Clustering analysis can reduce human annotation cost when human supervision is needed.
Type
dissertation
Source
PQT Legacy CUNY.xlsx
degree
Ph.D.
Item sets
CUNY Legacy ETDs