MODELS FOR COMPRESSION IN FULL-TEXT RETRIEVAL SYSTEMS
dc.contributor.author | Witten, Ian H. | eng |
dc.contributor.author | Nevill, Craig G. | eng |
dc.contributor.author | Bell, Timothy C. | eng |
dc.date.accessioned | 2008-02-27T22:29:02Z | |
dc.date.available | 2008-02-27T22:29:02Z | |
dc.date.computerscience | 1999-05-27 | eng |
dc.date.issued | 1990-08-01 | eng |
dc.description.abstract | Text compression systems operate in a stream-oriented fashion which is inappropriate for databases that need to be accessed through a variety of retrieval mechanisms. This paper develops models for full-text retrieval systems which (a) compress the main text so that it can be randomly accessed via synchronization points; (b) store the text's lexicon in a compressed form that can be efficiently searched for concordancing and decoding purposes; (c) include a lexicon of word fragments that can be used to implement retrieval based on partial word matches; and (d) store the text's concordance in highly compressed form. All compression is based on the method of arithmetic coding, in conjunction with static models, derived from the text itself. This contrasts with contemporary stream-oriented compression techniques that use adaptive models, and with database compression techniques that use ad hoc codes rather than principled models. A number of design trade-offs are identified and investigated on a 2.7 million word sample of English text. The paper is intended to assist designers of full-text retrieval systems by defining, documenting and evaluating pertinent design decisions. | eng |
dc.description.notes | We are currently acquiring citations for the work deposited into this collection. We recognize the distribution rights of this item may have been assigned to another entity, other than the author(s) of the work.If you can provide the citation for this work or you think you own the distribution rights to this work please contact the Institutional Repository Administrator at digitize@ucalgary.ca | eng |
dc.identifier.department | 1990-403-27 | eng |
dc.identifier.doi | http://dx.doi.org/10.11575/PRISM/31172 | |
dc.identifier.uri | http://hdl.handle.net/1880/46181 | |
dc.language.iso | Eng | eng |
dc.publisher.corporate | University of Calgary | eng |
dc.publisher.faculty | Science | eng |
dc.subject | Computer Science | eng |
dc.title | MODELS FOR COMPRESSION IN FULL-TEXT RETRIEVAL SYSTEMS | eng |
dc.type | unknown | |
thesis.degree.discipline | Computer Science | eng |