:: Experimental ::
:: Experimental :: K-means clustering with support for k-means|| initialization proposed by Bahmani et al.
:: Experimental :: K-means clustering with support for k-means|| initialization proposed by Bahmani et al.
:: Experimental :: Model fitted by KMeans.
:: Experimental :: Model fitted by KMeans.
:: Experimental ::
:: Experimental ::
Latent Dirichlet Allocation (LDA), a topic model designed for text documents.
Terminology:
References:
Input data (featuresCol): LDA is given a collection of documents as input data, via the featuresCol parameter. Each document is specified as a Vector of length vocabSize, where each entry is the count for the corresponding term (word) in the document. Feature transformers such as org.apache.spark.ml.feature.Tokenizer and org.apache.spark.ml.feature.CountVectorizer can be useful for converting text to word count vectors.
:: Experimental :: Model fitted by LDA.
:: Experimental :: Model fitted by LDA.
:: Experimental ::
:: Experimental ::
Local (non-distributed) model fitted by LDA.
This model stores the inferred topics only; it does not store info about the training dataset.
:: Experimental ::
Distributed model fitted by LDA. This type of model is currently only produced by Expectation-Maximization (EM).
This model stores the inferred topics, the full training dataset, and the topic distribution for each training document.