![]() Snowball Stemmers use an algorithm which allows them to stem any word in a language, whether the word would appear in an official dictionary or not. ![]() Started by Martin Porter, the creator of the Porter algorithm for stemming English, Snowball provides Stemmers for the commonly used European languages such as French, Dutch, German, Spanish and Italian. Snowball is a domain specific language for defining stemming algorithms for European languages, from which ANSI C and Java implementations can be generated. Snowball – Defacto no more?įor most of Lucene’s history, the defacto analysis framework for European languages has been driven by Snowball. ![]() In this blog I want to compare and contrast each of these options focusing on their algorithms and how they achieve their goals and hopefully giving you enough information so you can make an informed decision about which option fits your usecase. Yet nowadays developers are presented with a plethora of TokenFilter alternatives, from the traditional Snowball based filters, through the recently added Hunspell filter, to the vaguely named Light and Minimal filters. In the past all this required in Lucene was use of the Analyzers for the desired languages. An example of stemming is the reduction of the words “run”, “running”, “runs” and “ran” to their stem “run”. Part of supporting a language is analysing words to find their stem or root form. It seems more and more often these days that search applications must support a large array of European languages. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |