nsISemanticUnitScanner

Provides a language independent way to break UNICODE
text into meaningful semantic units (e.g. words).

Methods

start(characterSet)

start()

Starts up the semantic unit scanner with an optional
character set, which acts as a hint to optimize the heuristics
used to determine the language(s) of the processed text.

Parameters

characterSet

the character set the text was originally encoded in (can be NULL)

next(text, length, pos, isLastBuffer, begin, end)

next()
Get the begin / end offset of the next unit in the current text

Parameters

text	the text to be scanned
length	the number of characters in the text to be processed
pos	the current position
isLastBuffer,	the buffer is the last one
begin	the begin offset of the next unit
begin	the end offset of the next unit

Returns

has more unit in the current text