THE ZERO FREQUENCY PROBLEM: ESTIMATING THE PROBABILITIES OF NOVEL EVENTS IN ADAPTIVE TEXT COMPRESSION
dc.contributor.author | Witten, Ian H. | eng |
dc.contributor.author | Bell, Timothy C. | eng |
dc.date.accessioned | 2008-05-26T20:47:09Z | |
dc.date.available | 2008-05-26T20:47:09Z | |
dc.date.computerscience | 1999-05-27 | eng |
dc.date.issued | 1989-04-01 | eng |
dc.description.abstract | The zero-frequency problem is the problem of estimating the likelihood of a novel event occurring. It is important in adaptive statistical text compression because it is almost always necessary to reserve a small part of the code space for the unexpected (say, the appearance of a new word); the alternative of allocating code space to every possible event (say, a code for each ASCII character) invariably impairs coding efficiency since not all possible events actually occur. This paper reviews approaches that have been taken to the problem in adaptive text compression. Although several methods have been used, their suitability has been based on empirical evaluation rather than a well-founded model. We propose the application of a Poisson process model of novelty. Its ability to predict novel tokens is evaluated, and it consistently outperforms existing methods. It is also applied to a practical statistical coding scheme, where a slight modification is required to avoid divergence. The result is a well-founded zero-frequency model that explains observed differences in the performance of existing methods, and offers a small improvement in the coding efficiency of text compression over the best method previously known. | eng |
dc.description.notes | We are currently acquiring citations for the work deposited into this collection. We recognize the distribution rights of this item may have been assigned to another entity, other than the author(s) of the work.If you can provide the citation for this work or you think you own the distribution rights to this work please contact the Institutional Repository Administrator at digitize@ucalgary.ca | eng |
dc.identifier.department | 1989-347-09 | eng |
dc.identifier.doi | http://dx.doi.org/10.11575/PRISM/31207 | |
dc.identifier.uri | http://hdl.handle.net/1880/46607 | |
dc.language.iso | Eng | eng |
dc.publisher.corporate | University of Calgary | eng |
dc.publisher.faculty | Science | eng |
dc.subject | Computer Science | eng |
dc.title | THE ZERO FREQUENCY PROBLEM: ESTIMATING THE PROBABILITIES OF NOVEL EVENTS IN ADAPTIVE TEXT COMPRESSION | eng |
dc.type | unknown | |
thesis.degree.discipline | Computer Science | eng |