INDEXING AND COMPRESSING FULL-TEXT DATABASES FOR CD-ROM

dc.contributor.authorWitten, Ian H.eng
dc.contributor.authorBell, Timothy C.eng
dc.contributor.authorNevill, Craig G.eng
dc.date.accessioned2008-02-27T22:29:07Z
dc.date.available2008-02-27T22:29:07Z
dc.date.computerscience1999-05-27eng
dc.date.issued1990-11-01eng
dc.description.abstractCD-ROM is an attractive delivery vehicle for full-text databases. Because of large storage capacity and low access speed, carefully-designed indexing structures--including a concordance--are necessary to enable the text to be retrieved efficiently. However, the indexes are sufficiently large that they tax the ability of main store to hold them when processing queries. The use of compression techniques can substantially increase the volume of text that a disk can accommodate, and substantially decrease the amount of primary storage needed to hold the indexes. This paper describes a suitable indexing mechanism, and its compression potential using modern compression methods. It is possible to double the amount of text that can be stored on a CD-ROM disk \fIand\fR include a full concordance and indexes as well. A single disk can accommodate around 180 million words of text--equivalent to a library of 1000-1500 books--and provide rapid response to a variety of queries involving multiple search terms and word fragments.eng
dc.description.notesWe are currently acquiring citations for the work deposited into this collection. We recognize the distribution rights of this item may have been assigned to another entity, other than the author(s) of the work.If you can provide the citation for this work or you think you own the distribution rights to this work please contact the Institutional Repository Administrator at digitize@ucalgary.caeng
dc.identifier.department1990-412-36eng
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/31163
dc.identifier.urihttp://hdl.handle.net/1880/46182
dc.language.isoEngeng
dc.publisher.corporateUniversity of Calgaryeng
dc.publisher.facultyScienceeng
dc.subjectComputer Scienceeng
dc.titleINDEXING AND COMPRESSING FULL-TEXT DATABASES FOR CD-ROMeng
dc.typeunknown
thesis.degree.disciplineComputer Scienceeng
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1990-412-36.pdf
Size:
1.62 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.86 KB
Format:
Plain Text
Description: