nsISemanticUnitScanner

Provides a language independent way to break UNICODE
text into meaningful semantic units (e.g. words).

Methods

start(characterSet)

start()

Starts up the semantic unit scanner with an optional
character set, which acts as a hint to optimize the heuristics
used to determine the language(s) of the processed text.

Parameters

characterSet the character set the text was originally encoded in (can be NULL)

next(text, length, pos, isLastBuffer, begin, end)

next()
Get the begin / end offset of the next unit in the current text

Parameters

text the text to be scanned
length the number of characters in the text to be processed
pos the current position
isLastBuffer, the buffer is the last one
begin the begin offset of the next unit
begin the end offset of the next unit

Returns

has more unit in the current text