String pattern matching and lossless data compression.
Item
-
Title
-
String pattern matching and lossless data compression.
-
Identifier
-
AAI9325077
-
identifier
-
9325077
-
Creator
-
Chang, Daniel Kuo-Yee.
-
Contributor
-
Adviser: Charles Giardina
-
Date
-
1993
-
Language
-
English
-
Publisher
-
City University of New York.
-
Subject
-
Computer Science
-
Abstract
-
A new string pattern matching algorithm has been designed that is better than the Boyer-Moore algorithm for certain types of patterns. It is particularly very efficient to search a non case sensitive pattern or pattern with don't care symbols because the extra works are all done in preprocessing the pattern on-the-fly, the search part of the algorithm doesn't need to do any conversion or extra work. String pattern matching is used for many applications, including text editing, information retrieval and lossless data compression.;A new lossless data compression/expansion algorithm has been designed that overcomes several shortcomings of the best algorithms known. The algorithm uses multiple dictionaries for storing previously encountered strings of a data file to be compressed. Each of these dictionaries (one is a short one and needs less bits for indexing an entry) is initialized with frequently occurring strings. An input data file is compared with the dictionaries so that the longest substring which can be found in a dictionary will give its position in the dictionary as an output code. Compression and expansion use the same procedure to form and update dictionaries. A compressed result consists of a series of pointers to the dictionaries. During compression/expansion, the more frequently occurring strings will be dynamically swapped into the short dictionary. The dictionaries are used as scratch pads during the time of program execution and need not be stored/transmitted otherwise.;A damaged compressed file with few redundancy can hardly be deciphered, if not impossible. By using the idea of Efficient Dispersal of Information originated by Dr. Michael O. Rabin of Harvard University, a practical implementation of the algorithm has been designed. By manipulating a file F of length L into n subfiles F{dollar}\sb{lcub}\rm i{rcub}{dollar}, 1 {dollar}\le{dollar} i {dollar}\le{dollar} n, the length of each F{dollar}\sb{lcub}\rm i{rcub}{dollar} is L/m where m is an integer smaller than n. If no more than n-m subfiles have been damaged, any remaining m subfiles can be used to reconstruct the original file F quite easily.
-
Type
-
dissertation
-
Source
-
PQT Legacy CUNY.xlsx
-
degree
-
Ph.D.