tensorflow

Type Members

class ClassifierDatasetEncoder extends Serializable
case class ClassifierDatasetEncoderParams(tags: Array[String]) extends Product with Serializable
case class DatasetEncoderParams(tags: List[String], chars: List[Char], emptyVector: List[Float], embeddingsDim: Int, defaultTag: String = "O") extends Product with Serializable

tags
list of unique tags
chars
list of unique characters
emptyVector
list of embeddings
embeddingsDim
dimension of embeddings
defaultTag
the default tag
trait Logging extends AnyRef
class NerBatch extends AnyRef

Batch that contains data in Tensorflow input format.
class NerDatasetEncoder extends Serializable
trait ReadTensorflowModel extends AnyRef
case class SentenceGrouper[T](getLength: (T) ⇒ Int, sizes: Array[Int] = Array(5, 10, 20, 50))(implicit evidence$1: ClassTag[T]) extends Product with Serializable
class TensorResources extends AnyRef

This class is being used to initialize Tensors of different types and shapes for Tensorflow operations
class TensorflowAlbert extends Serializable

This class is used to calculate ALBERT embeddings for For Sequence Batches of WordpieceTokenizedSentence.
This class is used to calculate ALBERT embeddings for For Sequence Batches of WordpieceTokenizedSentence. Input for this model must be tokenzied with a SentencePieceModel,
This Tensorflow model is using the weights provided by https://tfhub.dev/google/albert_base/3 * sequence_output: representations of every token in the input sequence with shape [batch_size, max_sequence_length, hidden_size].
ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS - Google Research, Toyota Technological Institute at Chicago This these embeddings represent the outputs generated by the Albert model. All offical Albert releases by google in TF-HUB are supported with this Albert Wrapper:
TF-HUB Models : albert_base = https://tfhub.dev/google/albert_base/3 | 768-embed-dim, 12-layer, 12-heads, 12M parameters albert_large = https://tfhub.dev/google/albert_large/3 | 1024-embed-dim, 24-layer, 16-heads, 18M parameters albert_xlarge = https://tfhub.dev/google/albert_xlarge/3 | 2048-embed-dim, 24-layer, 32-heads, 60M parameters albert_xxlarge = https://tfhub.dev/google/albert_xxlarge/3 | 4096-embed-dim, 12-layer, 64-heads, 235M parameters
This model requires input tokenization with SentencePiece model, which is provided by Spark NLP
For additional information see : https://arxiv.org/pdf/1909.11942.pdf https://github.com/google-research/ALBERT https://tfhub.dev/s?q=albert
Tips:
ALBERT uses repeating layers which results in a small memory footprint, however the computational cost remains similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same number of (repeating) layers.
class TensorflowBert extends Serializable

BERT (Bidirectional Encoder Representations from Transformers) provides dense vector representations for natural language by using a deep, pre-trained neural network with the Transformer architecture
BERT (Bidirectional Encoder Representations from Transformers) provides dense vector representations for natural language by using a deep, pre-trained neural network with the Transformer architecture
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/BertEmbeddingsTestSpec.scala for further reference on how to use this API. Sources:
class TensorflowClassifier extends Serializable with Logging
class TensorflowElmo extends Serializable

Embeddings from a language model trained on the 1 Billion Word Benchmark.
Embeddings from a language model trained on the 1 Billion Word Benchmark.
Note that this is a very computationally expensive module compared to word embedding modules that only perform embedding lookups. The use of an accelerator is recommended.
word_emb: the character-based word representations with shape [batch_size, max_length, 512]. == word_emb
lstm_outputs1: the first LSTM hidden state with shape [batch_size, max_length, 1024]. === lstm_outputs1
lstm_outputs2: the second LSTM hidden state with shape [batch_size, max_length, 1024]. === lstm_outputs2
elmo: the weighted sum of the 3 layers, where the weights are trainable. This tensor has shape [batch_size, max_length, 1024] == elmo
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/ElmoEmbeddingsTestSpec.scala for further reference on how to use this API.
class TensorflowLD extends Serializable

Language Identification and Detection by using CNNs and RNNs architectures in TensowrFlow
Language Identification and Detection by using CNNs and RNNs architectures in TensowrFlow
The models are trained on large datasets such as Wikipedia and Tatoeba The output is a language code in Wiki Code style: https://en.wikipedia.org/wiki/List_of_Wikipedias
class TensorflowMarian extends Serializable

MarianTransformer: Fast Neural Machine Translation
MarianTransformer: Fast Neural Machine Translation
MarianTransformer uses models trained by MarianNMT.
Marian is an efficient, free Neural Machine Translation framework written in pure C++ with minimal dependencies. It is mainly being developed by the Microsoft Translator team. Many academic (most notably the University of Edinburgh and in the past the Adam Mickiewicz University in Poznań) and commercial contributors help with its development.
It is currently the engine behind the Microsoft Translator Neural Machine Translation services and being deployed by many companies, organizations and research projects (see below for an incomplete list).
Sources : MarianNMT https://marian-nmt.github.io/ Marian: Fast Neural Machine Translation in C++ https://www.aclweb.org/anthology/P18-4020/
class TensorflowMultiClassifier extends Serializable with Logging
class TensorflowNer extends Serializable with Logging
class TensorflowSentenceDetectorDL extends Serializable with Logging
class TensorflowSentiment extends Serializable with Logging
class TensorflowSpell extends Logging with Serializable
class TensorflowT5 extends Serializable

This class is used to run T5 model for For Sequence Batches of WordpieceTokenizedSentence.
This class is used to run T5 model for For Sequence Batches of WordpieceTokenizedSentence. Input for this model must be tokenized with a SentencePieceModel,
class TensorflowUSE extends Serializable

The Universal Sentence Encoder encodes text into high dimensional vectors that can be used for text classification, semantic similarity, clustering and other natural language tasks.
The Universal Sentence Encoder encodes text into high dimensional vectors that can be used for text classification, semantic similarity, clustering and other natural language tasks.
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/UniversalSentenceEncoderTestSpec.scala for further reference on how to use this API.
class TensorflowWrapper extends Serializable
class TensorflowXlnet extends Serializable

XlnetEmbeddings (XLNet): Generalized Autoregressive Pretraining for Language Understanding
XlnetEmbeddings (XLNet): Generalized Autoregressive Pretraining for Language Understanding
Note that this is a very computationally expensive module compared to word embedding modules that only perform embedding lookups. The use of an accelerator is recommended.
XLNet is a new unsupervised language representation learning method based on a novel generalized permutation language modeling objective. Additionally, XLNet employs Transformer-XL as the backbone model, exhibiting excellent performance for language tasks involving long context. Overall, XLNet achieves state-of-the-art (SOTA) results on various downstream language tasks including question answering, natural language inference, sentiment analysis, and document ranking.
XLNet-Large = https://storage.googleapis.com/xlnet/released_models/cased_L-24_H-1024_A-16.zip | 24-layer, 1024-hidden, 16-heads XLNet-Base = https://storage.googleapis.com/xlnet/released_models/cased_L-12_H-768_A-12.zip | 12-layer, 768-hidden, 12-heads. This model is trained on full data (different from the one in the paper).
case class Variables(variables: Array[Byte], index: Array[Byte]) extends Product with Serializable
trait WriteTensorflowModel extends AnyRef

Value Members

object NerBatch
object TensorResources
object TensorflowWrapper extends Serializable
package sentencepiece

package tensorflow

Type Members

class ClassifierDatasetEncoder extends Serializable

case class ClassifierDatasetEncoderParams(tags: Array[String]) extends Product with Serializable

case class DatasetEncoderParams(tags: List[String], chars: List[Char], emptyVector: List[Float], embeddingsDim: Int, defaultTag: String = "O") extends Product with Serializable

trait Logging extends AnyRef

class NerBatch extends AnyRef

class NerDatasetEncoder extends Serializable

trait ReadTensorflowModel extends AnyRef

case class SentenceGrouper[T](getLength: (T) ⇒ Int, sizes: Array[Int] = Array(5, 10, 20, 50))(implicit evidence$1: ClassTag[T]) extends Product with Serializable

class TensorResources extends AnyRef

class TensorflowAlbert extends Serializable

class TensorflowBert extends Serializable

class TensorflowClassifier extends Serializable with Logging

class TensorflowElmo extends Serializable

class TensorflowLD extends Serializable

class TensorflowMarian extends Serializable

class TensorflowMultiClassifier extends Serializable with Logging

class TensorflowNer extends Serializable with Logging

class TensorflowSentenceDetectorDL extends Serializable with Logging

class TensorflowSentiment extends Serializable with Logging

class TensorflowSpell extends Logging with Serializable

class TensorflowT5 extends Serializable

class TensorflowUSE extends Serializable

class TensorflowWrapper extends Serializable

class TensorflowXlnet extends Serializable

case class Variables(variables: Array[Byte], index: Array[Byte]) extends Product with Serializable

trait WriteTensorflowModel extends AnyRef

Value Members

object NerBatch

object TensorResources

object TensorflowWrapper extends Serializable

package sentencepiece

Ungrouped