English | Japanese

Brief Introduction to Suffix Array

Last Modified: 2000-11-14 (Since: 2000-11-14)


Suffix array is a data structure designed for efficient searching of a large text. The data structure is simply an array containing all the pointers to the text suffixes sorted in lexicographical (alphabetical) order. Each suffix is a string starting at a certain poinsition in the text and ending at the end of the text. Searching a text can be performed by binary search using the suffix array.

Let's get started with the suffix array construction. Suppose that we have the sample text ``abracadabra'' and wish to construct the suffix array for the sample text.

First, we should assign index points to the sample text. Index points specify positions where search can be performed. In our example, index points are assigned character by character. Thus, we can search the sample text with the suffix array at any positions later.

Second, we should sort the index points according to thier corresponding suffixes. The correspondance between the index points and the suffixes looks like

After sorting:

Finally, The resulting index points become the suffix array for the sample text.

Search of the sample text can be performed by binary search using the created suffix array. The following figure shows the process of searching the sample text for `ra'. Numbered arrows shows the order of processing.

References


Satoru Takabayashi