Efficient StemmerGeneration
Christopher Fox and Brian
Fox
The efficient stemmer generation project has created a program to
generate stemmers from stemmer specification files. This approach to stemmer
implementation has two advantages:
- Human Resource Efficiency: Simple textual stemmer
specification files are used to specify the stemmers to be generated. It is
much easier to create and modify stemmers this way than it is to write custom
stemmer code (the traditional way to create a stemmer). Thus stemmer generation
is human resource efficient.
- Computational Efficiency: Generated stemmers use finite state
machines to do stemming, which is very fast. Generated stemmers are usually
faster than all but the most highly optimiaed custom stemmer programs.
Details of this project are available in "Efficient Stemmer Generation"
by Brian Fox and Christopher Fox, in press.
This project has used/generated the following code:
A corpus of words to use for experimentation
is aslo available.
Copyright 2001 Christopher Fox and Brian Fox