|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectandyr.jtokeniser.Tokeniser
andyr.jtokeniser.RegexSeparatorTokeniser
public class RegexSeparatorTokeniser
The RegexSeparatorTokeniser class uses regular expressions to define the
separation between tokens. Whereas RegexTokeniser uses regular expressions to
define a "word" or token, RegexSeparator uses regular expressions to define
what delimits tokens. All matching is performed via Java's
Pattern
and Matcher
classes.
The following is one example of the use of the tokeniser, which refines the token delimeter as one or more whitespace characters. The code:
RegexSeparatorTokeniser rest = new RegexSeparatorTokeniser( "the cat sat on the mat", "\\s+"); while (rest.hasMoreTokens()) { System.out.println(rest.nextToken()); }
prints the following output:
* It is also possible to keep the strings inbetween tokens should it be necessary. By default these are discarded. For example, take the string "abc123def456ghi" and the separator regular expression "\\d+" (one or more digits):the sat on the mat
RegexTokeniser ret = new RegexTokeniser("abc123def456ghi", "\\d+"); while (ret.hasMoreTokens()) { System.out.println(ret.nextToken()); }
prints the following output:
abc 123 def 456 ghi
Field Summary |
---|
Fields inherited from class andyr.jtokeniser.Tokeniser |
---|
currentTokenPosition, tokens |
Constructor Summary | |
---|---|
RegexSeparatorTokeniser(java.lang.String input)
Creates a RegexSeparatorTokeniser that tokenises the input. |
|
RegexSeparatorTokeniser(java.lang.String input,
java.lang.String regex)
Creates a RegexSeparatorTokeniser that tokenises the input according a
regular expression that defines what separates "words" or tokens. |
|
RegexSeparatorTokeniser(java.lang.String input,
java.lang.String regex,
boolean keepDelim)
Creates a RegexSeparatorTokeniser that tokenises the input according a
regular expression that defines what separates "words" or tokens. |
Method Summary |
---|
Methods inherited from class andyr.jtokeniser.Tokeniser |
---|
countTokens, getTokens, hasMoreTokens, nextToken, numberOfTokens |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public RegexSeparatorTokeniser(java.lang.String input, java.lang.String regex, boolean keepDelim)
RegexSeparatorTokeniser
that tokenises the input according a
regular expression that defines what separates "words" or tokens.
input
- a string from which the tokens will be extracted.regex
- the regular expression.keepDelim
- flag indicating whether to return the delimiters as tokens.Pattern
public RegexSeparatorTokeniser(java.lang.String input, java.lang.String regex)
RegexSeparatorTokeniser
that tokenises the input according a
regular expression that defines what separates "words" or tokens.
input
- a string from which the tokens will be extracted.regex
- the regular expression.Pattern
public RegexSeparatorTokeniser(java.lang.String input)
RegexSeparatorTokeniser
that tokenises the input. Default
separation regular expression is "\\s+" which defines one or more
whitespace characters as the token delimiter.
input
- a string from which the tokens will be extracted.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |