THE ZERO FREQUENCY PROBLEM: ESTIMATING THE PROBABILITIES OF NOVEL EVENTS IN ADAPTIVE TEXT COMPRESSION
Date
1989-04-01
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The zero-frequency problem is the problem of estimating the likelihood
of a novel event occurring. It is important in adaptive statistical text
compression because it is almost always necessary to reserve a small
part of the code space for the unexpected (say, the appearance of a new
word); the alternative of allocating code space to every possible event (say,
a code for each ASCII character) invariably impairs coding efficiency since
not all possible events actually occur.
This paper reviews approaches that have been taken to the problem in
adaptive text compression. Although several methods have been used, their
suitability has been based on empirical evaluation rather than a well-founded
model. We propose the application of a Poisson process model of novelty.
Its ability to predict novel tokens is evaluated, and it consistently
outperforms existing methods. It is also applied to a practical statistical
coding scheme, where a slight modification is required to avoid divergence.
The result is a well-founded zero-frequency model that explains observed
differences in the performance of existing methods, and offers a small
improvement in the coding efficiency of text compression over the best
method previously known.
Description
Keywords
Computer Science