Class WordVectorSerializer
- java.lang.Object
-
- org.deeplearning4j.models.embeddings.loader.WordVectorSerializer
-
public class WordVectorSerializer extends Object
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
WordVectorSerializer.BinaryReader
protected static class
WordVectorSerializer.CSVReader
protected static interface
WordVectorSerializer.Reader
static class
WordVectorSerializer.ReadHelper
Helper static methods to read data from input stream.
-
Method Summary
All Methods Static Methods Concrete Methods Deprecated Methods Modifier and Type Method Description static Word2Vec
fromPair(org.nd4j.common.primitives.Pair<InMemoryLookupTable,VocabCache> pair)
Load word vectors from the given pairstatic WordVectors
fromTableAndVocab(WeightLookupTable table, VocabCache vocab)
Load word vectors for the given vocab and tableprotected static TokenizerFactory
getTokenizerFactory(VectorsConfiguration configuration)
static Word2Vec
loadFullModel(@NonNull String path)
Deprecated.Use readWord2VecModel() or loadStaticModel() method insteadstatic WordVectors
loadStaticModel(@NonNull File file)
This method restores previously saved w2v model.static WordVectors
loadStaticModel(InputStream inputStream)
This method restores previously saved w2v model.static org.nd4j.common.primitives.Pair<InMemoryLookupTable,VocabCache>
loadTxt(@NonNull File file)
static org.nd4j.common.primitives.Pair<InMemoryLookupTable,VocabCache>
loadTxt(@NonNull InputStream inputStream)
Loads an in memory cache from the given input stream (sets syn0 and the vocab).static WordVectors
loadTxtVectors(@NonNull InputStream stream, boolean skipFirstLine)
Deprecated.Use readWord2VecModel() or loadStaticModel() method insteadstatic WordVectors
loadTxtVectors(File vectorsFile)
Deprecated.static void
printOutProjectedMemoryUse(long numWords, int vectorLength, int numTables)
This method prints memory usage to logstatic Word2Vec
readAsBinary(@NonNull File file)
static Word2Vec
readAsBinary(@NonNull InputStream inputStream)
This method loads Word2Vec model from binary input stream.static Word2Vec
readAsBinaryNoLineBreaks(@NonNull File file)
static Word2Vec
readAsBinaryNoLineBreaks(@NonNull InputStream inputStream)
static Word2Vec
readAsCsv(@NonNull File file)
static Word2Vec
readAsCsv(@NonNull InputStream inputStream)
This method loads Word2Vec model from csv filestatic Word2Vec
readBinaryModel(InputStream inputStream, boolean linebreaks, boolean normalize)
Read a binary word2vec from input stream.static <T extends SequenceElement>
WeightLookupTable<T>readLookupTable(File file)
static <T extends SequenceElement>
WeightLookupTable<T>readLookupTable(InputStream stream)
static ParagraphVectors
readParagraphVectors(File file)
This method restores ParagraphVectors model previously saved with writeParagraphVectors()static ParagraphVectors
readParagraphVectors(InputStream stream)
This method restores ParagraphVectors model previously saved with writeParagraphVectors()static ParagraphVectors
readParagraphVectors(String path)
This method restores ParagraphVectors model previously saved with writeParagraphVectors()static ParagraphVectors
readParagraphVectorsFromText(@NonNull File file)
Deprecated.static ParagraphVectors
readParagraphVectorsFromText(@NonNull InputStream stream)
Deprecated.static ParagraphVectors
readParagraphVectorsFromText(@NonNull String path)
Deprecated.static <T extends SequenceElement>
SequenceVectors<T>readSequenceVectors(@NonNull File file, boolean readExtendedTables)
This method loads SequenceVectors from specified file pathstatic <T extends SequenceElement>
SequenceVectors<T>readSequenceVectors(@NonNull InputStream stream, boolean readExtendedTables)
This method loads SequenceVectors from specified input streamstatic <T extends SequenceElement>
SequenceVectors<T>readSequenceVectors(@NonNull String path, boolean readExtendedTables)
This method loads SequenceVectors from specified file pathstatic <T extends SequenceElement>
SequenceVectors<T>readSequenceVectors(@NonNull SequenceElementFactory<T> factory, @NonNull File file)
This method loads previously saved SequenceVectors model from Filestatic <T extends SequenceElement>
SequenceVectors<T>readSequenceVectors(@NonNull SequenceElementFactory<T> factory, @NonNull InputStream stream)
This method loads previously saved SequenceVectors model from InputStreamstatic VocabCache<VocabWord>
readVocabCache(@NonNull File file)
This method reads vocab cache from provided file.static VocabCache<VocabWord>
readVocabCache(@NonNull InputStream stream)
This method reads vocab cache from provided InputStream.static Word2Vec
readWord2Vec(@NonNull File file, boolean readExtendedTables)
This method loads Word2Vec model from filestatic Word2Vec
readWord2Vec(@NonNull InputStream stream, boolean readExtendedTable)
This method loads Word2Vec model from input streamstatic Word2Vec
readWord2Vec(@NonNull String path, boolean readExtendedTables)
This method restores Word2Vec model from filestatic Word2Vec
readWord2Vec(File file)
Deprecated.static Word2Vec
readWord2VecFromText(@NonNull File vectors, @NonNull File hs, @NonNull File h_codes, @NonNull File h_points, @NonNull VectorsConfiguration configuration)
This method allows you to read ParagraphVectors from externally originated vectors and syn1.static Word2Vec
readWord2VecModel(File file)
This method 1) Binary model, either compressed or not.static Word2Vec
readWord2VecModel(File file, boolean extendedModel)
This method 1) Binary model, either compressed or not.static Word2Vec
readWord2VecModel(String path)
This method 1) Binary model, either compressed or not.static Word2Vec
readWord2VecModel(String path, boolean extendedModel)
This method 1) Binary model, either compressed or not.static FastText
readWordVectors(File path)
This method unloads FastText model from filestatic void
writeFullModel(@NonNull Word2Vec vec, @NonNull String path)
Deprecated.Use writeWord2VecModel() method insteadstatic <T extends SequenceElement>
voidwriteLookupTable(WeightLookupTable<T> weightLookupTable, @NonNull File file)
This method saves table of weights to filestatic void
writeParagraphVectors(ParagraphVectors vectors, File file)
This method saves ParagraphVectors model into compressed zip filestatic void
writeParagraphVectors(ParagraphVectors vectors, OutputStream stream)
This method saves ParagraphVectors model into compressed zip file and sends it to output streamstatic void
writeParagraphVectors(ParagraphVectors vectors, String path)
This method saves ParagraphVectors model into compressed zip file located at pathstatic <T extends SequenceElement>
voidwriteSequenceVectors(@NonNull SequenceVectors<T> vectors, @NonNull OutputStream stream)
This method saves specified SequenceVectors model to target OutputStreamstatic <T extends SequenceElement>
voidwriteSequenceVectors(@NonNull SequenceVectors<T> vectors, @NonNull SequenceElementFactory<T> factory, @NonNull File file)
This method saves specified SequenceVectors model to target filestatic <T extends SequenceElement>
voidwriteSequenceVectors(@NonNull SequenceVectors<T> vectors, @NonNull SequenceElementFactory<T> factory, @NonNull OutputStream stream)
This method saves specified SequenceVectors model to target OutputStreamstatic <T extends SequenceElement>
voidwriteSequenceVectors(@NonNull SequenceVectors<T> vectors, @NonNull SequenceElementFactory<T> factory, @NonNull String path)
This method saves specified SequenceVectors model to target file pathstatic void
writeTsneFormat(Word2Vec vec, org.nd4j.linalg.api.ndarray.INDArray tsne, File csv)
Write the tsne formatstatic void
writeVocabCache(@NonNull VocabCache<VocabWord> vocabCache, @NonNull File file)
This method saves vocab cache to provided File.static void
writeVocabCache(@NonNull VocabCache<VocabWord> vocabCache, @NonNull OutputStream stream)
This method saves vocab cache to provided OutputStream.static void
writeWord2Vec(@NonNull Word2Vec word2Vec, @NonNull OutputStream stream)
This method saves Word2Vec model to output streamstatic void
writeWord2VecModel(Word2Vec vectors, File file)
This method saves Word2Vec model into compressed zip file PLEASE NOTE: This method saves FULL model, including syn0 AND syn1static void
writeWord2VecModel(Word2Vec vectors, OutputStream stream)
This method saves Word2Vec model into compressed zip file and sends it to output stream PLEASE NOTE: This method saves FULL model, including syn0 AND syn1static void
writeWord2VecModel(Word2Vec vectors, String path)
This method saves Word2Vec model into compressed zip file PLEASE NOTE: This method saves FULL model, including syn0 AND syn1static void
writeWordVectors(@NonNull FastText vectors, @NonNull File path)
This method loads FastText model to filestatic void
writeWordVectors(@NonNull ParagraphVectors vectors, @NonNull File path)
Deprecated.static void
writeWordVectors(@NonNull ParagraphVectors vectors, @NonNull String path)
Deprecated.static void
writeWordVectors(@NonNull Word2Vec vec, @NonNull BufferedWriter writer)
Deprecated.static void
writeWordVectors(@NonNull Word2Vec vec, @NonNull File file)
Deprecated.static void
writeWordVectors(@NonNull Word2Vec vec, @NonNull OutputStream outputStream)
Deprecated.static void
writeWordVectors(@NonNull Word2Vec vec, @NonNull String path)
Deprecated.static void
writeWordVectors(InMemoryLookupTable lookupTable, InMemoryLookupCache cache, String path)
Deprecated.UsewriteWord2VecModel(Word2Vec, File)
insteadstatic <T extends SequenceElement>
voidwriteWordVectors(WeightLookupTable<T> lookupTable, File file)
This method writes word vectors to the given file.static <T extends SequenceElement>
voidwriteWordVectors(WeightLookupTable<T> lookupTable, OutputStream stream)
This method writes word vectors to the given OutputStream.static <T extends SequenceElement>
voidwriteWordVectors(WeightLookupTable<T> lookupTable, String path)
This method writes word vectors to the given path.static void
writeWordVectors(ParagraphVectors vectors, OutputStream stream)
Deprecated.
-
-
-
Method Detail
-
readBinaryModel
public static Word2Vec readBinaryModel(InputStream inputStream, boolean linebreaks, boolean normalize) throws NumberFormatException, IOException
Read a binary word2vec from input stream.- Parameters:
inputStream
- input stream to readlinebreaks
- if true, the reader expects each word/vector to be in a separate line, terminated by a line breaknormalize
-- Returns:
- a
model
- Throws:
NumberFormatException
IOException
FileNotFoundException
-
writeWordVectors
public static <T extends SequenceElement> void writeWordVectors(WeightLookupTable<T> lookupTable, String path) throws IOException
This method writes word vectors to the given path. Please note: this method doesn't load whole vocab/lookupTable into memory, so it's able to process large vocabularies served over network.- Type Parameters:
T
-- Parameters:
lookupTable
-path
-- Throws:
IOException
-
writeWordVectors
public static <T extends SequenceElement> void writeWordVectors(WeightLookupTable<T> lookupTable, File file) throws IOException
This method writes word vectors to the given file. Please note: this method doesn't load whole vocab/lookupTable into memory, so it's able to process large vocabularies served over network.- Type Parameters:
T
-- Parameters:
lookupTable
-file
-- Throws:
IOException
-
writeWordVectors
public static <T extends SequenceElement> void writeWordVectors(WeightLookupTable<T> lookupTable, OutputStream stream) throws IOException
This method writes word vectors to the given OutputStream. Please note: this method doesn't load whole vocab/lookupTable into memory, so it's able to process large vocabularies served over network.- Type Parameters:
T
-- Parameters:
lookupTable
-stream
-- Throws:
IOException
-
writeWordVectors
@Deprecated public static void writeWordVectors(@NonNull @NonNull ParagraphVectors vectors, @NonNull @NonNull File path)
Deprecated.This method saves paragraph vectors to the given file.
-
writeWordVectors
@Deprecated public static void writeWordVectors(@NonNull @NonNull ParagraphVectors vectors, @NonNull @NonNull String path)
Deprecated.This method saves paragraph vectors to the given path.
-
writeParagraphVectors
public static void writeParagraphVectors(ParagraphVectors vectors, File file)
This method saves ParagraphVectors model into compressed zip file- Parameters:
file
-
-
writeParagraphVectors
public static void writeParagraphVectors(ParagraphVectors vectors, String path)
This method saves ParagraphVectors model into compressed zip file located at path- Parameters:
path
-
-
writeWord2VecModel
public static void writeWord2VecModel(Word2Vec vectors, File file)
This method saves Word2Vec model into compressed zip file PLEASE NOTE: This method saves FULL model, including syn0 AND syn1
-
writeWord2VecModel
public static void writeWord2VecModel(Word2Vec vectors, String path)
This method saves Word2Vec model into compressed zip file PLEASE NOTE: This method saves FULL model, including syn0 AND syn1
-
writeWord2VecModel
public static void writeWord2VecModel(Word2Vec vectors, OutputStream stream) throws IOException
This method saves Word2Vec model into compressed zip file and sends it to output stream PLEASE NOTE: This method saves FULL model, including syn0 AND syn1- Throws:
IOException
-
writeParagraphVectors
public static void writeParagraphVectors(ParagraphVectors vectors, OutputStream stream) throws IOException
This method saves ParagraphVectors model into compressed zip file and sends it to output stream- Throws:
IOException
-
readParagraphVectors
public static ParagraphVectors readParagraphVectors(String path) throws IOException
This method restores ParagraphVectors model previously saved with writeParagraphVectors()- Returns:
- Throws:
IOException
-
readParagraphVectors
public static ParagraphVectors readParagraphVectors(File file) throws IOException
This method restores ParagraphVectors model previously saved with writeParagraphVectors()- Returns:
- Throws:
IOException
-
readWord2Vec
@Deprecated public static Word2Vec readWord2Vec(File file) throws IOException
Deprecated.This method restores Word2Vec model previously saved with writeWord2VecModelPLEASE NOTE: This method loads FULL model, so don't use it if you're only going to use weights.
- Parameters:
file
-- Returns:
- Throws:
IOException
-
readParagraphVectors
public static ParagraphVectors readParagraphVectors(InputStream stream) throws IOException
This method restores ParagraphVectors model previously saved with writeParagraphVectors()- Returns:
- Throws:
IOException
-
readWord2VecFromText
public static Word2Vec readWord2VecFromText(@NonNull @NonNull File vectors, @NonNull @NonNull File hs, @NonNull @NonNull File h_codes, @NonNull @NonNull File h_points, @NonNull @NonNull VectorsConfiguration configuration) throws IOException
This method allows you to read ParagraphVectors from externally originated vectors and syn1. So, technically this method is compatible with any other w2v implementation- Parameters:
vectors
- text file with words and their weights, aka Syn0hs
- text file HS layers, aka Syn1h_codes
- text file with Huffman tree codesh_points
- text file with Huffman tree points- Returns:
- Throws:
IOException
-
readParagraphVectorsFromText
@Deprecated public static ParagraphVectors readParagraphVectorsFromText(@NonNull @NonNull String path)
Deprecated.Restores previously serialized ParagraphVectors modelDeprecation note: Please, consider using readParagraphVectors() method instead
- Parameters:
path
- Path to file that contains previously serialized model- Returns:
-
readParagraphVectorsFromText
@Deprecated public static ParagraphVectors readParagraphVectorsFromText(@NonNull @NonNull File file)
Deprecated.Restores previously serialized ParagraphVectors modelDeprecation note: Please, consider using readParagraphVectors() method instead
- Parameters:
file
- File that contains previously serialized model- Returns:
-
readParagraphVectorsFromText
@Deprecated public static ParagraphVectors readParagraphVectorsFromText(@NonNull @NonNull InputStream stream)
Deprecated.Restores previously serialized ParagraphVectors modelDeprecation note: Please, consider using readParagraphVectors() method instead
- Parameters:
stream
- InputStream that contains previously serialized model
-
writeWordVectors
@Deprecated public static void writeWordVectors(ParagraphVectors vectors, OutputStream stream)
Deprecated.This method saves paragraph vectors to the given output stream.
-
writeWordVectors
@Deprecated public static void writeWordVectors(InMemoryLookupTable lookupTable, InMemoryLookupCache cache, String path) throws IOException
Deprecated.UsewriteWord2VecModel(Word2Vec, File)
insteadWrites the word vectors to the given path. Note that this assumes an in memory cache- Parameters:
lookupTable
-cache
-path
- the path to write- Throws:
IOException
-
writeFullModel
@Deprecated public static void writeFullModel(@NonNull @NonNull Word2Vec vec, @NonNull @NonNull String path)
Deprecated.Use writeWord2VecModel() method insteadSaves full Word2Vec model in the way, that allows model updates without being rebuilt from scratchesDeprecation note: Please, consider using writeWord2VecModel() method instead
- Parameters:
vec
- - The Word2Vec instance to be savedpath
- - the path for json to be saved
-
loadFullModel
@Deprecated public static Word2Vec loadFullModel(@NonNull @NonNull String path) throws FileNotFoundException
Deprecated.Use readWord2VecModel() or loadStaticModel() method insteadThis method loads full w2v model, previously saved with writeFullMethod callDeprecation note: Please, consider using readWord2VecModel() or loadStaticModel() method instead
- Parameters:
path
- - path to previously stored w2v json model- Returns:
- - Word2Vec instance
- Throws:
FileNotFoundException
-
writeWordVectors
@Deprecated public static void writeWordVectors(@NonNull @NonNull Word2Vec vec, @NonNull @NonNull String path) throws IOException
Deprecated.Writes the word vectors to the given path. Note that this assumes an in memory cache- Parameters:
vec
- the word2vec to writepath
- the path to write- Throws:
IOException
-
writeWordVectors
@Deprecated public static void writeWordVectors(@NonNull @NonNull Word2Vec vec, @NonNull @NonNull File file) throws IOException
Deprecated.Writes the word vectors to the given path. Note that this assumes an in memory cache- Parameters:
vec
- the word2vec to writefile
- the file to write- Throws:
IOException
-
writeWordVectors
@Deprecated public static void writeWordVectors(@NonNull @NonNull Word2Vec vec, @NonNull @NonNull OutputStream outputStream) throws IOException
Deprecated.Writes the word vectors to the given OutputStream. Note that this assumes an in memory cache.- Parameters:
vec
- the word2vec to writeoutputStream
- - OutputStream, where all data should be sent to the path to write- Throws:
IOException
-
writeWordVectors
@Deprecated public static void writeWordVectors(@NonNull @NonNull Word2Vec vec, @NonNull @NonNull BufferedWriter writer) throws IOException
Deprecated.Writes the word vectors to the given BufferedWriter. Note that this assumes an in memory cache. BufferedWriter can be writer to local file, or hdfs file, or any compatible to java target.- Parameters:
vec
- the word2vec to writewriter
- - BufferedWriter, where all data should be written to the path to write- Throws:
IOException
-
fromTableAndVocab
public static WordVectors fromTableAndVocab(WeightLookupTable table, VocabCache vocab)
Load word vectors for the given vocab and table- Parameters:
table
- the weights to usevocab
- the vocab to use- Returns:
- wordvectors based on the given parameters
-
fromPair
public static Word2Vec fromPair(org.nd4j.common.primitives.Pair<InMemoryLookupTable,VocabCache> pair)
Load word vectors from the given pair- Parameters:
pair
- the given pair- Returns:
- a read only word vectors impl based on the given lookup table and vocab
-
loadTxtVectors
@Deprecated public static WordVectors loadTxtVectors(File vectorsFile) throws IOException
Deprecated.Loads an in memory cache from the given path (sets syn0 and the vocab)Deprecation note: Please, consider using readWord2VecModel() or loadStaticModel() method instead
- Parameters:
vectorsFile
- the path of the file to load\- Returns:
- Throws:
FileNotFoundException
- if the file does not existIOException
-
loadTxt
public static org.nd4j.common.primitives.Pair<InMemoryLookupTable,VocabCache> loadTxt(@NonNull @NonNull File file)
-
loadTxt
public static org.nd4j.common.primitives.Pair<InMemoryLookupTable,VocabCache> loadTxt(@NonNull @NonNull InputStream inputStream)
Loads an in memory cache from the given input stream (sets syn0 and the vocab).- Parameters:
inputStream
- input stream- Returns:
- a
Pair
holding the lookup table and the vocab cache.
-
loadTxtVectors
@Deprecated public static WordVectors loadTxtVectors(@NonNull @NonNull InputStream stream, boolean skipFirstLine) throws IOException
Deprecated.Use readWord2VecModel() or loadStaticModel() method insteadThis method can be used to load previously saved model from InputStream (like a HDFS-stream)Deprecation note: Please, consider using readWord2VecModel() or loadStaticModel() method instead
- Parameters:
stream
- InputStream that contains previously serialized modelskipFirstLine
- Set this TRUE if first line contains csv header, FALSE otherwise- Returns:
- Throws:
IOException
-
writeTsneFormat
public static void writeTsneFormat(Word2Vec vec, org.nd4j.linalg.api.ndarray.INDArray tsne, File csv) throws Exception
Write the tsne format- Parameters:
vec
- the word vectors to use for labelingtsne
- the tsne array to writecsv
- the file to use- Throws:
Exception
-
writeSequenceVectors
public static <T extends SequenceElement> void writeSequenceVectors(@NonNull @NonNull SequenceVectors<T> vectors, @NonNull @NonNull SequenceElementFactory<T> factory, @NonNull @NonNull String path) throws IOException
This method saves specified SequenceVectors model to target file path- Type Parameters:
T
-- Parameters:
vectors
- SequenceVectors modelfactory
- SequenceElementFactory implementation for your objectspath
- Target output file path- Throws:
IOException
-
writeSequenceVectors
public static <T extends SequenceElement> void writeSequenceVectors(@NonNull @NonNull SequenceVectors<T> vectors, @NonNull @NonNull SequenceElementFactory<T> factory, @NonNull @NonNull File file) throws IOException
This method saves specified SequenceVectors model to target file- Type Parameters:
T
-- Parameters:
vectors
- SequenceVectors modelfactory
- SequenceElementFactory implementation for your objectsfile
- Target output file- Throws:
IOException
-
writeSequenceVectors
public static <T extends SequenceElement> void writeSequenceVectors(@NonNull @NonNull SequenceVectors<T> vectors, @NonNull @NonNull SequenceElementFactory<T> factory, @NonNull @NonNull OutputStream stream) throws IOException
This method saves specified SequenceVectors model to target OutputStream- Type Parameters:
T
-- Parameters:
vectors
- SequenceVectors modelfactory
- SequenceElementFactory implementation for your objectsstream
- Target output stream- Throws:
IOException
-
writeSequenceVectors
public static <T extends SequenceElement> void writeSequenceVectors(@NonNull @NonNull SequenceVectors<T> vectors, @NonNull @NonNull OutputStream stream) throws IOException
This method saves specified SequenceVectors model to target OutputStream- Type Parameters:
T
-- Parameters:
vectors
- SequenceVectors modelstream
- Target output stream- Throws:
IOException
-
readSequenceVectors
public static <T extends SequenceElement> SequenceVectors<T> readSequenceVectors(@NonNull @NonNull String path, boolean readExtendedTables) throws IOException
This method loads SequenceVectors from specified file path- Type Parameters:
T
-- Parameters:
path
- StringreadExtendedTables
- boolean- Throws:
IOException
-
readSequenceVectors
public static <T extends SequenceElement> SequenceVectors<T> readSequenceVectors(@NonNull @NonNull File file, boolean readExtendedTables) throws IOException
This method loads SequenceVectors from specified file path- Type Parameters:
T
-- Parameters:
file
- FilereadExtendedTables
- boolean- Throws:
IOException
-
readSequenceVectors
public static <T extends SequenceElement> SequenceVectors<T> readSequenceVectors(@NonNull @NonNull InputStream stream, boolean readExtendedTables) throws IOException
This method loads SequenceVectors from specified input stream- Type Parameters:
T
-- Parameters:
stream
- InputStreamreadExtendedTables
- boolean- Throws:
IOException
-
readSequenceVectors
public static <T extends SequenceElement> SequenceVectors<T> readSequenceVectors(@NonNull @NonNull SequenceElementFactory<T> factory, @NonNull @NonNull File file) throws IOException
This method loads previously saved SequenceVectors model from File- Type Parameters:
T
-- Parameters:
factory
-file
-- Returns:
- Throws:
IOException
-
readSequenceVectors
public static <T extends SequenceElement> SequenceVectors<T> readSequenceVectors(@NonNull @NonNull SequenceElementFactory<T> factory, @NonNull @NonNull InputStream stream) throws IOException
This method loads previously saved SequenceVectors model from InputStream- Type Parameters:
T
-- Parameters:
factory
-stream
-- Returns:
- Throws:
IOException
-
writeVocabCache
public static void writeVocabCache(@NonNull @NonNull VocabCache<VocabWord> vocabCache, @NonNull @NonNull File file) throws IOException
This method saves vocab cache to provided File. Please note: it saves only vocab content, so it's suitable mostly for BagOfWords/TF-IDF vectorizers- Parameters:
vocabCache
-file
-- Throws:
UnsupportedEncodingException
IOException
-
writeVocabCache
public static void writeVocabCache(@NonNull @NonNull VocabCache<VocabWord> vocabCache, @NonNull @NonNull OutputStream stream) throws IOException
This method saves vocab cache to provided OutputStream. Please note: it saves only vocab content, so it's suitable mostly for BagOfWords/TF-IDF vectorizers- Parameters:
vocabCache
-stream
-- Throws:
UnsupportedEncodingException
IOException
-
readVocabCache
public static VocabCache<VocabWord> readVocabCache(@NonNull @NonNull File file) throws IOException
This method reads vocab cache from provided file. Please note: it reads only vocab content, so it's suitable mostly for BagOfWords/TF-IDF vectorizers- Parameters:
file
-- Returns:
- Throws:
IOException
-
readVocabCache
public static VocabCache<VocabWord> readVocabCache(@NonNull @NonNull InputStream stream) throws IOException
This method reads vocab cache from provided InputStream. Please note: it reads only vocab content, so it's suitable mostly for BagOfWords/TF-IDF vectorizers- Parameters:
stream
-- Returns:
- Throws:
IOException
-
readWord2VecModel
public static Word2Vec readWord2VecModel(String path)
This method 1) Binary model, either compressed or not. Like well-known Google Model 2) Popular CSV word2vec text format 3) DL4j compressed formatPlease note: Only weights will be loaded by this method.
- Parameters:
path
-- Returns:
-
readWord2VecModel
public static Word2Vec readWord2VecModel(String path, boolean extendedModel)
This method 1) Binary model, either compressed or not. Like well-known Google Model 2) Popular CSV word2vec text format 3) DL4j compressed formatPlease note: Only weights will be loaded by this method.
- Parameters:
path
- path to model fileextendedModel
- if TRUE, we'll try to load HS states & Huffman tree info, if FALSE, only weights will be loaded- Returns:
-
readWord2VecModel
public static Word2Vec readWord2VecModel(File file)
This method 1) Binary model, either compressed or not. Like well-known Google Model 2) Popular CSV word2vec text format 3) DL4j compressed formatPlease note: Only weights will be loaded by this method.
- Parameters:
file
-- Returns:
-
readWord2VecModel
public static Word2Vec readWord2VecModel(File file, boolean extendedModel)
This method 1) Binary model, either compressed or not. Like well-known Google Model 2) Popular CSV word2vec text format 3) DL4j compressed formatPlease note: if extended data isn't available, only weights will be loaded instead.
- Parameters:
file
- model fileextendedModel
- if TRUE, we'll try to load HS states & Huffman tree info, if FALSE, only weights will be loaded- Returns:
- word2vec model
-
readAsBinaryNoLineBreaks
public static Word2Vec readAsBinaryNoLineBreaks(@NonNull @NonNull File file)
-
readAsBinaryNoLineBreaks
public static Word2Vec readAsBinaryNoLineBreaks(@NonNull @NonNull InputStream inputStream)
-
readAsBinary
public static Word2Vec readAsBinary(@NonNull @NonNull InputStream inputStream)
This method loads Word2Vec model from binary input stream.- Parameters:
inputStream
- binary input stream- Returns:
- Word2Vec
-
readAsCsv
public static Word2Vec readAsCsv(@NonNull @NonNull InputStream inputStream)
This method loads Word2Vec model from csv file- Parameters:
inputStream
- input stream- Returns:
- Word2Vec model
-
getTokenizerFactory
protected static TokenizerFactory getTokenizerFactory(VectorsConfiguration configuration)
-
loadStaticModel
public static WordVectors loadStaticModel(InputStream inputStream) throws IOException
This method restores previously saved w2v model. File can be in one of the following formats: 1) Binary model, either compressed or not. Like well-known Google Model 2) Popular CSV word2vec text format 3) DL4j compressed format In return you get StaticWord2Vec model, which might be used as lookup table only in multi-gpu environment.- Parameters:
inputStream
- InputStream should point to previously saved w2v model- Returns:
- Throws:
IOException
-
loadStaticModel
public static WordVectors loadStaticModel(@NonNull @NonNull File file)
This method restores previously saved w2v model. File can be in one of the following formats: 1) Binary model, either compressed or not. Like well-known Google Model 2) Popular CSV word2vec text format 3) DL4j compressed format In return you get StaticWord2Vec model, which might be used as lookup table only in multi-gpu environment.- Parameters:
file
- File- Returns:
-
writeWord2Vec
public static void writeWord2Vec(@NonNull @NonNull Word2Vec word2Vec, @NonNull @NonNull OutputStream stream) throws IOException
This method saves Word2Vec model to output stream- Parameters:
word2Vec
- Word2Vecstream
- OutputStream- Throws:
IOException
-
readWord2Vec
public static Word2Vec readWord2Vec(@NonNull @NonNull String path, boolean readExtendedTables)
This method restores Word2Vec model from file- Parameters:
path
-readExtendedTables
-- Returns:
- Word2Vec
-
writeLookupTable
public static <T extends SequenceElement> void writeLookupTable(WeightLookupTable<T> weightLookupTable, @NonNull @NonNull File file) throws IOException
This method saves table of weights to file- Parameters:
weightLookupTable
- WeightLookupTablefile
- File- Throws:
IOException
-
readLookupTable
public static <T extends SequenceElement> WeightLookupTable<T> readLookupTable(File file) throws IOException
- Throws:
IOException
-
readLookupTable
public static <T extends SequenceElement> WeightLookupTable<T> readLookupTable(InputStream stream) throws IOException
- Throws:
IOException
-
readWord2Vec
public static Word2Vec readWord2Vec(@NonNull @NonNull File file, boolean readExtendedTables)
This method loads Word2Vec model from file- Parameters:
file
- FilereadExtendedTables
- boolean- Returns:
- Word2Vec
-
readWord2Vec
public static Word2Vec readWord2Vec(@NonNull @NonNull InputStream stream, boolean readExtendedTable) throws IOException
This method loads Word2Vec model from input stream- Parameters:
stream
- InputStreamreadExtendedTable
- boolean- Returns:
- Word2Vec
- Throws:
IOException
-
writeWordVectors
public static void writeWordVectors(@NonNull @NonNull FastText vectors, @NonNull @NonNull File path) throws IOException
This method loads FastText model to file- Parameters:
vectors
- FastTextpath
- File- Throws:
IOException
-
readWordVectors
public static FastText readWordVectors(File path)
This method unloads FastText model from file- Parameters:
path
- File
-
printOutProjectedMemoryUse
public static void printOutProjectedMemoryUse(long numWords, int vectorLength, int numTables)
This method prints memory usage to log- Parameters:
numWords
-vectorLength
-numTables
-
-
-