incremental train
incremental train
LDA training
LDA training
RDD of documents, which are term (word) count vectors paired with IDs. The term count vectors are "bags of words" with a fixed-size vocabulary (where the vocabulary size is the length of the vector). Document IDs must be unique and >= 0.
the number of iterations
the number of topics (5000+ for large data)
recommend to be (5.0 /numTopics)
recommend to be in range 0.001 - 0.1
recommend to be in range 0.01 - 1.0
use LightLDA sampling algorithm or not, recommend false for short text
DistributedLDAModel