Compressed Full-Text Indexes
Full-text indexes provide fast substring search over large text collections.A serious problem of these indexes has traditionally been their space consumption. A recent trend isto develop indexes that exploit the compressibility of the text, so that their size is a functionof the compressed text length. This concept has evolved into self-indexes, which in addition containenough information to reproduce any text portion, so they replace the text. The excitingpossibility of an index that takes space close to that of the compressed text, replaces it, and inaddition provides fast search over it, has triggered a wealth of activity and produced surprisingresults in a very short time, which radically changed the status of this area in less than 5 years.The most successful indexes nowadays are able to obtain almost optimal space and search timesimultaneously. In this article we present the main concepts underlying (compressed) self-indexes.We explain the relationship between text entropy and regularities that show up in index structuresand permit compressing them. Then we cover the most relevant self-indexes, focusing on how theyexploit text compressibility to achieve compact structures that can efficiently solve various searchproblems. Our aim is to give the background to understand and follow the developments in this area.
GONZALO NAVARRO VELI MAEKINEN
text indexing text compression entropy










