AnalyzerPipe

textmogrify.lucene.AnalyzerPipe
See theAnalyzerPipe companion object
sealed abstract case class AnalyzerPipe[F[_]](readerF: Reader => Resource[F, TokenGetter])(implicit F: Async[F])

AnalyzerPipe provides methods to tokenize a possibly very long Stream[F, String] or Stream[F, Byte], such as from a file. When possible, prefer starting with a Stream[F, Byte] and use tokenizeBytes.

Attributes

Companion
object
Source
AnalyzerPipe.scala
Graph
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Members list

Value members

Concrete methods

def tokenizeBytes(in: Stream[F, Byte], tokenN: Int): Stream[F, String]

Emits a string for every token, as determined by the Analyzer, in the input stream.

Emits a string for every token, as determined by the Analyzer, in the input stream. Decoding from bytes to strings is done using the default charset.

Value parameters

in

input stream to tokenize

tokenN

maximum number of tokens to read at a time

Attributes

Source
AnalyzerPipe.scala
def tokenizeStrings(in: Stream[F, String], tokenN: Int): Stream[F, String]

Emits a string for every token, as determined by the Analyzer, in the input stream.

Emits a string for every token, as determined by the Analyzer, in the input stream. A space is inserted between each element in the input stream to avoid accidentally combining words. See tokenizeStringsRaw to avoid this behaviour.

Value parameters

in

input stream to tokenize

tokenN

maximum number of tokens to read at a time

Attributes

Source
AnalyzerPipe.scala
def tokenizeStringsRaw(in: Stream[F, String], tokenN: Int): Stream[F, String]

Emits a string for every token, as determined by the Analyzer, in the input stream.

Emits a string for every token, as determined by the Analyzer, in the input stream. Becareful, the end of one string will be joined with the beginning of the next in the Analyzer. See tokenizeStrings to automatically intersperse spaces.

Value parameters

in

input stream to tokenize

tokenN

maximum number of tokens to read at a time

Attributes

Source
AnalyzerPipe.scala

Inherited methods

Attributes

Inherited from:
Product

Attributes

Inherited from:
Product