What is sary?
sary is a suffix array library and tools. It provides fast full-text search facilities for text files on the order of 10 to 100 MB using a data structure called a suffix array. It can also search specific fields in a text file by assigning index points to those fields.
Table of Contents
- What's New
- Characteristics
- Brief Introduction to Suffix Array
- libsary Reference Manual
- Using the Included Tools
- FAQ
- Download
- TODO
- Links
What's New
- 2005-03-30: sary 1.2.0 Released!
- Changed ABI.
- Fixed some minor bugs.
- 2002-09-18: sary 1.0.4 Released!
- Improve perfonrmance of display of search results.
- Modify help messages.
- 2000-11-06: sary 0.1.0 Released!
It's the first version.
Characteristics
- Fast full-text search for huge text files.
With suffix array. `mmap' are used for performance. - Flexible construction of suffix array.
Indexer can be easily extended. - Useful included tools.
- mksary: constructs suffix array.
- sary: performs full-text search using suffix array.
- Easy-to-understand source codes. (I hope so.)
Written in C but OO fashion. Simple and compact. - The minimum functionality is implemented. Maybe new functions will be added in the future.
- GLib is required.
Reference Manual
Using the Included Tools
mksary
mksary constructs a suffix array. Try -b option to use mksary with a machine which has limited memories. Try --help option for finding out other options.
# Creating a suffix array for HUGE-TEXT. % mksary HUGE-TEXT % ls HUGE-TEXT* HUGE-TEXT HUGE-TEXT.ary # .ary file is created.
sary
sary searchs a text file with a suffix array created by mksary. -i -A -B -C -c options can be used like GNU grep. Try --help option for finding out other options.
# Searching HUGE-TEXT for PATTERN. % sary PATTERN HUGE-TEXT (search results follow...)
Download
sary is a free software under the terms of the GNU Lesser General Public License.
Stable
TODO
- Implement limited approximate searchs.
- Implement limited regular expression searchs.
- Prepare more language bindings.
Links
- SUFARY
Another suffix array library. - McIlroy's suffix sort
highly-developed algorithm. sary doesn't use this codes.