andyr.jtokeniser
Class BreakIteratorTokeniser

java.lang.Object
  extended by andyr.jtokeniser.Tokeniser
      extended by andyr.jtokeniser.BreakIteratorTokeniser

public class BreakIteratorTokeniser
extends Tokeniser

The BreakIteratorTokeniser class uses a BreakIterator to find each word instance according to a specified locale.

The following is one example of the use of the tokeniser. The code:

     BreakIteratorTokeniser bit = new BreakIteratorTokeniser("the cat sat on the mat");
     while (bit.hasMoreTokens()) {
         System.out.println(bit.nextToken());
     }
 

prints the following output:

     the
     sat
     on
     the
     mat
 

Version:
1.2 (01-Aug-2005)
Author:
Andrew Roberts

Field Summary
 
Fields inherited from class andyr.jtokeniser.Tokeniser
currentTokenPosition, tokens
 
Constructor Summary
BreakIteratorTokeniser(java.lang.String input)
          Creates a BreakIteratorTokeniser that tokenises the input.
BreakIteratorTokeniser(java.lang.String input, java.util.Locale locale)
          Creates a BreakIteratorTokeniser that tokenises the input according to a given locale.
 
Method Summary
 
Methods inherited from class andyr.jtokeniser.Tokeniser
countTokens, getTokens, hasMoreTokens, nextToken, numberOfTokens
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BreakIteratorTokeniser

public BreakIteratorTokeniser(java.lang.String input,
                              java.util.Locale locale)
Creates a BreakIteratorTokeniser that tokenises the input according to a given locale.

Parameters:
input - a string from which the tokens will be extracted.
locale - the locale that the BreakIterator will use for finding word instances.

BreakIteratorTokeniser

public BreakIteratorTokeniser(java.lang.String input)
Creates a BreakIteratorTokeniser that tokenises the input. The BreakIterator will use the default locale as returned by Locale.getDefault().

Parameters:
input - a string from which the tokens will be extracted.