Corpus-based ambiguity resolution of biomedical terms using knowledge bases and machine learning.
Item
-
Title
-
Corpus-based ambiguity resolution of biomedical terms using knowledge bases and machine learning.
-
Identifier
-
AAI3063853
-
identifier
-
3063853
-
Creator
-
Liu, Hongfang.
-
Contributor
-
Adviser: Carol Friedman
-
Date
-
2002
-
Language
-
English
-
Publisher
-
City University of New York.
-
Subject
-
Computer Science | Engineering, Biomedical | Information Science
-
Abstract
-
With the widespread use of natural language processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that efficiently and accurately assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of resolving ambiguity in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating disambiguation rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. We propose a two-phase method to build a classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W using a biomedical terminology knowledge base, the UMLS, and free-text databases, and may include a semi-automatic process using clustering analysis and human supervision when we cannot automatically extract enough sense-tagged instances for W. The second phase automatically derives a classifier for W through supervised machine learning techniques using the derived sense-tagged corpus as a training set. Experimental results show that generally the method can be used to construct WSD classifiers for abbreviations with a high precision without the need of human supervision. It can be used to construct WSD classifiers for general biomedical terms with a set of unrelated senses with a high precision when there are enough instances extracted for each sense. Clustering analysis can reduce human annotation cost when human supervision is needed.
-
Type
-
dissertation
-
Source
-
PQT Legacy CUNY.xlsx
-
degree
-
Ph.D.