INDEXING AND COMPRESSING FULL-TEXT DATABASES FOR CD-ROM
Date
1990-11-01
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
CD-ROM is an attractive delivery vehicle for full-text databases.
Because of large storage capacity and low access speed, carefully-designed
indexing structures--including a concordance--are necessary to enable the
text to be retrieved efficiently. However, the indexes are sufficiently
large that they tax the ability of main store to hold them when processing
queries. The use of compression techniques can substantially increase the
volume of text that a disk can accommodate, and substantially decrease
the amount of primary storage needed to hold the indexes.
This paper describes a suitable indexing mechanism, and its compression
potential using modern compression methods. It is possible to double the
amount of text that can be stored on a CD-ROM disk \fIand\fR include
a full concordance and indexes as well. A single disk can accommodate
around 180 million words of text--equivalent to a library of 1000-1500
books--and provide rapid response to a variety of queries involving
multiple search terms and word fragments.
Description
Keywords
Computer Science