Genetic algorithms for optical character recognition.

Item

Title: Genetic algorithms for optical character recognition.
Identifier: AAI3310646
identifier: 3310646
Creator: Svitak, Joseph John, Jr.
Contributor: Adviser: Robert Goldberg
Date: 2008
Language: English
Publisher: City University of New York.
Subject: Computer Science
Abstract: This thesis addresses the application of genetic algorithms to optical character recognition (OCR). The first problem considered is recognizing characters. Agents (finite state machines) evolve that dedicate themselves to particular pathway segments of a noisy (possibly handwritten) character. The fitness of an agent is the amount of the path consumed by the agent. Collectively, these agents have the means to recognize such characters since the automata itself encapsulates the underlying structure of some or all of the curves of a signature.;The five experiments for this problem studied the feasibility of agents as descriptors for signatures utilizing data representing a large number of types of handwritten signature shapes. The first data set comprises 10,000 paths where no part of a path can crossover any other part of the path. The second data set relaxes these criteria and considers 10,000 paths where any part of a path can crossover any other part. The results from both of these datasets were quite promising on randomly generated script.;The question then arose as how to properly locate and align scanned characters with the assumed positions of characters generated by fonts, stored in a digital image database. This line recognition problem utilized a genetic algorithm to determine the number of lines of text in the image and their relative positions to a world coordinate system. Forty pages of text skewed at different angles test the line recognition genetic algorithm with a high degree of success.;The final problem investigated is the moment-based character recognition. Since OCR systems employ matching algorithms, statistical moment values are typically calculated. The current system computes nineteen intrinsic values, seven Hu and ten Flusser-Suk moments, axis aligned and minimum area rectangle areas for each potential character on a scanned page. The character in the database with the shortest Euclidean distance with respect to the measured values is recognized as the true potential character. The question becomes how to determine the smallest set of measures (or moments) that can still enable character recognition. Here too a genetic algorithm was employed, but the results indicated that this is a difficult problem.
Type: dissertation
Source: PQT Legacy CUNY.xlsx
degree: Ph.D.

Item sets: CUNY Legacy ETDs

Media: Genetic algorithms for optical character recognition.