andyr.jtokeniser
Class BreakIteratorTokeniser
java.lang.Object
andyr.jtokeniser.Tokeniser
andyr.jtokeniser.BreakIteratorTokeniser
public class BreakIteratorTokeniser
- extends Tokeniser
The BreakIteratorTokeniser class uses a BreakIterator
to find each word
instance according to a specified locale.
The following is one example of the use of the tokeniser. The code:
BreakIteratorTokeniser bit = new BreakIteratorTokeniser("the cat sat on the mat");
while (bit.hasMoreTokens()) {
System.out.println(bit.nextToken());
}
prints the following output:
the
sat
on
the
mat
- Version:
- 1.2 (01-Aug-2005)
- Author:
- Andrew Roberts
Constructor Summary |
BreakIteratorTokeniser(java.lang.String input)
Creates a BreakIteratorTokeniser that tokenises the input. |
BreakIteratorTokeniser(java.lang.String input,
java.util.Locale locale)
Creates a BreakIteratorTokeniser that tokenises the input according to a given locale. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
BreakIteratorTokeniser
public BreakIteratorTokeniser(java.lang.String input,
java.util.Locale locale)
- Creates a BreakIteratorTokeniser that tokenises the input according to a given locale.
- Parameters:
input
- a string from which the tokens will be extracted.locale
- the locale that the BreakIterator will use for finding word instances.
BreakIteratorTokeniser
public BreakIteratorTokeniser(java.lang.String input)
- Creates a BreakIteratorTokeniser that tokenises the input. The BreakIterator will use the default locale
as returned by
Locale.getDefault()
.
- Parameters:
input
- a string from which the tokens will be extracted.