Class WordVectorSerializer


  • public class WordVectorSerializer
    extends Object
    • Method Detail

      • writeWordVectors

        public static <T extends SequenceElement> void writeWordVectors​(WeightLookupTable<T> lookupTable,
                                                                        String path)
                                                                 throws IOException
        This method writes word vectors to the given path. Please note: this method doesn't load whole vocab/lookupTable into memory, so it's able to process large vocabularies served over network.
        Type Parameters:
        T -
        Parameters:
        lookupTable -
        path -
        Throws:
        IOException
      • writeWordVectors

        public static <T extends SequenceElement> void writeWordVectors​(WeightLookupTable<T> lookupTable,
                                                                        File file)
                                                                 throws IOException
        This method writes word vectors to the given file. Please note: this method doesn't load whole vocab/lookupTable into memory, so it's able to process large vocabularies served over network.
        Type Parameters:
        T -
        Parameters:
        lookupTable -
        file -
        Throws:
        IOException
      • writeWordVectors

        public static <T extends SequenceElement> void writeWordVectors​(WeightLookupTable<T> lookupTable,
                                                                        OutputStream stream)
                                                                 throws IOException
        This method writes word vectors to the given OutputStream. Please note: this method doesn't load whole vocab/lookupTable into memory, so it's able to process large vocabularies served over network.
        Type Parameters:
        T -
        Parameters:
        lookupTable -
        stream -
        Throws:
        IOException
      • writeParagraphVectors

        public static void writeParagraphVectors​(ParagraphVectors vectors,
                                                 File file)
        This method saves ParagraphVectors model into compressed zip file
        Parameters:
        file -
      • writeParagraphVectors

        public static void writeParagraphVectors​(ParagraphVectors vectors,
                                                 String path)
        This method saves ParagraphVectors model into compressed zip file located at path
        Parameters:
        path -
      • writeWord2VecModel

        public static void writeWord2VecModel​(Word2Vec vectors,
                                              File file)
        This method saves Word2Vec model into compressed zip file PLEASE NOTE: This method saves FULL model, including syn0 AND syn1
      • writeWord2VecModel

        public static void writeWord2VecModel​(Word2Vec vectors,
                                              String path)
        This method saves Word2Vec model into compressed zip file PLEASE NOTE: This method saves FULL model, including syn0 AND syn1
      • writeWord2VecModel

        public static void writeWord2VecModel​(Word2Vec vectors,
                                              OutputStream stream)
                                       throws IOException
        This method saves Word2Vec model into compressed zip file and sends it to output stream PLEASE NOTE: This method saves FULL model, including syn0 AND syn1
        Throws:
        IOException
      • writeParagraphVectors

        public static void writeParagraphVectors​(ParagraphVectors vectors,
                                                 OutputStream stream)
                                          throws IOException
        This method saves ParagraphVectors model into compressed zip file and sends it to output stream
        Throws:
        IOException
      • readParagraphVectors

        public static ParagraphVectors readParagraphVectors​(String path)
                                                     throws IOException
        This method restores ParagraphVectors model previously saved with writeParagraphVectors()
        Returns:
        Throws:
        IOException
      • readParagraphVectors

        public static ParagraphVectors readParagraphVectors​(File file)
                                                     throws IOException
        This method restores ParagraphVectors model previously saved with writeParagraphVectors()
        Returns:
        Throws:
        IOException
      • readWord2Vec

        @Deprecated
        public static Word2Vec readWord2Vec​(File file)
                                     throws IOException
        This method restores Word2Vec model previously saved with writeWord2VecModel

        PLEASE NOTE: This method loads FULL model, so don't use it if you're only going to use weights.

        Parameters:
        file -
        Returns:
        Throws:
        IOException
      • readWord2VecFromText

        public static Word2Vec readWord2VecFromText​(@NonNull
                                                    @NonNull File vectors,
                                                    @NonNull
                                                    @NonNull File hs,
                                                    @NonNull
                                                    @NonNull File h_codes,
                                                    @NonNull
                                                    @NonNull File h_points,
                                                    @NonNull
                                                    @NonNull VectorsConfiguration configuration)
                                             throws IOException
        This method allows you to read ParagraphVectors from externally originated vectors and syn1. So, technically this method is compatible with any other w2v implementation
        Parameters:
        vectors - text file with words and their weights, aka Syn0
        hs - text file HS layers, aka Syn1
        h_codes - text file with Huffman tree codes
        h_points - text file with Huffman tree points
        Returns:
        Throws:
        IOException
      • readParagraphVectorsFromText

        @Deprecated
        public static ParagraphVectors readParagraphVectorsFromText​(@NonNull
                                                                    @NonNull String path)
        Restores previously serialized ParagraphVectors model

        Deprecation note: Please, consider using readParagraphVectors() method instead

        Parameters:
        path - Path to file that contains previously serialized model
        Returns:
      • readParagraphVectorsFromText

        @Deprecated
        public static ParagraphVectors readParagraphVectorsFromText​(@NonNull
                                                                    @NonNull File file)
        Restores previously serialized ParagraphVectors model

        Deprecation note: Please, consider using readParagraphVectors() method instead

        Parameters:
        file - File that contains previously serialized model
        Returns:
      • readParagraphVectorsFromText

        @Deprecated
        public static ParagraphVectors readParagraphVectorsFromText​(@NonNull
                                                                    @NonNull InputStream stream)
        Restores previously serialized ParagraphVectors model

        Deprecation note: Please, consider using readParagraphVectors() method instead

        Parameters:
        stream - InputStream that contains previously serialized model
      • writeFullModel

        @Deprecated
        public static void writeFullModel​(@NonNull
                                          @NonNull Word2Vec vec,
                                          @NonNull
                                          @NonNull String path)
        Deprecated.
        Use writeWord2VecModel() method instead
        Saves full Word2Vec model in the way, that allows model updates without being rebuilt from scratches

        Deprecation note: Please, consider using writeWord2VecModel() method instead

        Parameters:
        vec - - The Word2Vec instance to be saved
        path - - the path for json to be saved
      • loadFullModel

        @Deprecated
        public static Word2Vec loadFullModel​(@NonNull
                                             @NonNull String path)
                                      throws FileNotFoundException
        Deprecated.
        Use readWord2VecModel() or loadStaticModel() method instead
        This method loads full w2v model, previously saved with writeFullMethod call

        Deprecation note: Please, consider using readWord2VecModel() or loadStaticModel() method instead

        Parameters:
        path - - path to previously stored w2v json model
        Returns:
        - Word2Vec instance
        Throws:
        FileNotFoundException
      • writeWordVectors

        @Deprecated
        public static void writeWordVectors​(@NonNull
                                            @NonNull Word2Vec vec,
                                            @NonNull
                                            @NonNull OutputStream outputStream)
                                     throws IOException
        Writes the word vectors to the given OutputStream. Note that this assumes an in memory cache.
        Parameters:
        vec - the word2vec to write
        outputStream - - OutputStream, where all data should be sent to the path to write
        Throws:
        IOException
      • writeWordVectors

        @Deprecated
        public static void writeWordVectors​(@NonNull
                                            @NonNull Word2Vec vec,
                                            @NonNull
                                            @NonNull BufferedWriter writer)
                                     throws IOException
        Writes the word vectors to the given BufferedWriter. Note that this assumes an in memory cache. BufferedWriter can be writer to local file, or hdfs file, or any compatible to java target.
        Parameters:
        vec - the word2vec to write
        writer - - BufferedWriter, where all data should be written to the path to write
        Throws:
        IOException
      • fromTableAndVocab

        public static WordVectors fromTableAndVocab​(WeightLookupTable table,
                                                    VocabCache vocab)
        Load word vectors for the given vocab and table
        Parameters:
        table - the weights to use
        vocab - the vocab to use
        Returns:
        wordvectors based on the given parameters
      • fromPair

        public static Word2Vec fromPair​(org.nd4j.common.primitives.Pair<InMemoryLookupTable,​VocabCache> pair)
        Load word vectors from the given pair
        Parameters:
        pair - the given pair
        Returns:
        a read only word vectors impl based on the given lookup table and vocab
      • loadTxtVectors

        @Deprecated
        public static WordVectors loadTxtVectors​(File vectorsFile)
                                          throws IOException
        Deprecated.
        Loads an in memory cache from the given path (sets syn0 and the vocab)

        Deprecation note: Please, consider using readWord2VecModel() or loadStaticModel() method instead

        Parameters:
        vectorsFile - the path of the file to load\
        Returns:
        Throws:
        FileNotFoundException - if the file does not exist
        IOException
      • loadTxt

        public static org.nd4j.common.primitives.Pair<InMemoryLookupTable,​VocabCache> loadTxt​(@NonNull
                                                                                                    @NonNull InputStream inputStream)
        Loads an in memory cache from the given input stream (sets syn0 and the vocab).
        Parameters:
        inputStream - input stream
        Returns:
        a Pair holding the lookup table and the vocab cache.
      • loadTxtVectors

        @Deprecated
        public static WordVectors loadTxtVectors​(@NonNull
                                                 @NonNull InputStream stream,
                                                 boolean skipFirstLine)
                                          throws IOException
        Deprecated.
        Use readWord2VecModel() or loadStaticModel() method instead
        This method can be used to load previously saved model from InputStream (like a HDFS-stream)

        Deprecation note: Please, consider using readWord2VecModel() or loadStaticModel() method instead

        Parameters:
        stream - InputStream that contains previously serialized model
        skipFirstLine - Set this TRUE if first line contains csv header, FALSE otherwise
        Returns:
        Throws:
        IOException
      • writeTsneFormat

        public static void writeTsneFormat​(Word2Vec vec,
                                           org.nd4j.linalg.api.ndarray.INDArray tsne,
                                           File csv)
                                    throws Exception
        Write the tsne format
        Parameters:
        vec - the word vectors to use for labeling
        tsne - the tsne array to write
        csv - the file to use
        Throws:
        Exception
      • writeSequenceVectors

        public static <T extends SequenceElement> void writeSequenceVectors​(@NonNull
                                                                            @NonNull SequenceVectors<T> vectors,
                                                                            @NonNull
                                                                            @NonNull SequenceElementFactory<T> factory,
                                                                            @NonNull
                                                                            @NonNull String path)
                                                                     throws IOException
        This method saves specified SequenceVectors model to target file path
        Type Parameters:
        T -
        Parameters:
        vectors - SequenceVectors model
        factory - SequenceElementFactory implementation for your objects
        path - Target output file path
        Throws:
        IOException
      • writeSequenceVectors

        public static <T extends SequenceElement> void writeSequenceVectors​(@NonNull
                                                                            @NonNull SequenceVectors<T> vectors,
                                                                            @NonNull
                                                                            @NonNull SequenceElementFactory<T> factory,
                                                                            @NonNull
                                                                            @NonNull File file)
                                                                     throws IOException
        This method saves specified SequenceVectors model to target file
        Type Parameters:
        T -
        Parameters:
        vectors - SequenceVectors model
        factory - SequenceElementFactory implementation for your objects
        file - Target output file
        Throws:
        IOException
      • writeSequenceVectors

        public static <T extends SequenceElement> void writeSequenceVectors​(@NonNull
                                                                            @NonNull SequenceVectors<T> vectors,
                                                                            @NonNull
                                                                            @NonNull SequenceElementFactory<T> factory,
                                                                            @NonNull
                                                                            @NonNull OutputStream stream)
                                                                     throws IOException
        This method saves specified SequenceVectors model to target OutputStream
        Type Parameters:
        T -
        Parameters:
        vectors - SequenceVectors model
        factory - SequenceElementFactory implementation for your objects
        stream - Target output stream
        Throws:
        IOException
      • writeSequenceVectors

        public static <T extends SequenceElement> void writeSequenceVectors​(@NonNull
                                                                            @NonNull SequenceVectors<T> vectors,
                                                                            @NonNull
                                                                            @NonNull OutputStream stream)
                                                                     throws IOException
        This method saves specified SequenceVectors model to target OutputStream
        Type Parameters:
        T -
        Parameters:
        vectors - SequenceVectors model
        stream - Target output stream
        Throws:
        IOException
      • readSequenceVectors

        public static <T extends SequenceElementSequenceVectors<T> readSequenceVectors​(@NonNull
                                                                                         @NonNull String path,
                                                                                         boolean readExtendedTables)
                                                                                  throws IOException
        This method loads SequenceVectors from specified file path
        Type Parameters:
        T -
        Parameters:
        path - String
        readExtendedTables - boolean
        Throws:
        IOException
      • readSequenceVectors

        public static <T extends SequenceElementSequenceVectors<T> readSequenceVectors​(@NonNull
                                                                                         @NonNull File file,
                                                                                         boolean readExtendedTables)
                                                                                  throws IOException
        This method loads SequenceVectors from specified file path
        Type Parameters:
        T -
        Parameters:
        file - File
        readExtendedTables - boolean
        Throws:
        IOException
      • readSequenceVectors

        public static <T extends SequenceElementSequenceVectors<T> readSequenceVectors​(@NonNull
                                                                                         @NonNull InputStream stream,
                                                                                         boolean readExtendedTables)
                                                                                  throws IOException
        This method loads SequenceVectors from specified input stream
        Type Parameters:
        T -
        Parameters:
        stream - InputStream
        readExtendedTables - boolean
        Throws:
        IOException
      • writeVocabCache

        public static void writeVocabCache​(@NonNull
                                           @NonNull VocabCache<VocabWord> vocabCache,
                                           @NonNull
                                           @NonNull File file)
                                    throws IOException
        This method saves vocab cache to provided File. Please note: it saves only vocab content, so it's suitable mostly for BagOfWords/TF-IDF vectorizers
        Parameters:
        vocabCache -
        file -
        Throws:
        UnsupportedEncodingException
        IOException
      • writeVocabCache

        public static void writeVocabCache​(@NonNull
                                           @NonNull VocabCache<VocabWord> vocabCache,
                                           @NonNull
                                           @NonNull OutputStream stream)
                                    throws IOException
        This method saves vocab cache to provided OutputStream. Please note: it saves only vocab content, so it's suitable mostly for BagOfWords/TF-IDF vectorizers
        Parameters:
        vocabCache -
        stream -
        Throws:
        UnsupportedEncodingException
        IOException
      • readVocabCache

        public static VocabCache<VocabWord> readVocabCache​(@NonNull
                                                           @NonNull File file)
                                                    throws IOException
        This method reads vocab cache from provided file. Please note: it reads only vocab content, so it's suitable mostly for BagOfWords/TF-IDF vectorizers
        Parameters:
        file -
        Returns:
        Throws:
        IOException
      • readVocabCache

        public static VocabCache<VocabWord> readVocabCache​(@NonNull
                                                           @NonNull InputStream stream)
                                                    throws IOException
        This method reads vocab cache from provided InputStream. Please note: it reads only vocab content, so it's suitable mostly for BagOfWords/TF-IDF vectorizers
        Parameters:
        stream -
        Returns:
        Throws:
        IOException
      • readWord2VecModel

        public static Word2Vec readWord2VecModel​(String path)
        This method 1) Binary model, either compressed or not. Like well-known Google Model 2) Popular CSV word2vec text format 3) DL4j compressed format

        Please note: Only weights will be loaded by this method.

        Parameters:
        path -
        Returns:
      • readWord2VecModel

        public static Word2Vec readWord2VecModel​(String path,
                                                 boolean extendedModel)
        This method 1) Binary model, either compressed or not. Like well-known Google Model 2) Popular CSV word2vec text format 3) DL4j compressed format

        Please note: Only weights will be loaded by this method.

        Parameters:
        path - path to model file
        extendedModel - if TRUE, we'll try to load HS states & Huffman tree info, if FALSE, only weights will be loaded
        Returns:
      • readWord2VecModel

        public static Word2Vec readWord2VecModel​(File file)
        This method 1) Binary model, either compressed or not. Like well-known Google Model 2) Popular CSV word2vec text format 3) DL4j compressed format

        Please note: Only weights will be loaded by this method.

        Parameters:
        file -
        Returns:
      • readWord2VecModel

        public static Word2Vec readWord2VecModel​(File file,
                                                 boolean extendedModel)
        This method 1) Binary model, either compressed or not. Like well-known Google Model 2) Popular CSV word2vec text format 3) DL4j compressed format

        Please note: if extended data isn't available, only weights will be loaded instead.

        Parameters:
        file - model file
        extendedModel - if TRUE, we'll try to load HS states & Huffman tree info, if FALSE, only weights will be loaded
        Returns:
        word2vec model
      • readAsBinaryNoLineBreaks

        public static Word2Vec readAsBinaryNoLineBreaks​(@NonNull
                                                        @NonNull File file)
      • readAsBinaryNoLineBreaks

        public static Word2Vec readAsBinaryNoLineBreaks​(@NonNull
                                                        @NonNull InputStream inputStream)
      • readAsBinary

        public static Word2Vec readAsBinary​(@NonNull
                                            @NonNull File file)
      • readAsBinary

        public static Word2Vec readAsBinary​(@NonNull
                                            @NonNull InputStream inputStream)
        This method loads Word2Vec model from binary input stream.
        Parameters:
        inputStream - binary input stream
        Returns:
        Word2Vec
      • readAsCsv

        public static Word2Vec readAsCsv​(@NonNull
                                         @NonNull File file)
      • readAsCsv

        public static Word2Vec readAsCsv​(@NonNull
                                         @NonNull InputStream inputStream)
        This method loads Word2Vec model from csv file
        Parameters:
        inputStream - input stream
        Returns:
        Word2Vec model
      • loadStaticModel

        public static WordVectors loadStaticModel​(InputStream inputStream)
                                           throws IOException
        This method restores previously saved w2v model. File can be in one of the following formats: 1) Binary model, either compressed or not. Like well-known Google Model 2) Popular CSV word2vec text format 3) DL4j compressed format In return you get StaticWord2Vec model, which might be used as lookup table only in multi-gpu environment.
        Parameters:
        inputStream - InputStream should point to previously saved w2v model
        Returns:
        Throws:
        IOException
      • loadStaticModel

        public static WordVectors loadStaticModel​(@NonNull
                                                  @NonNull File file)
        This method restores previously saved w2v model. File can be in one of the following formats: 1) Binary model, either compressed or not. Like well-known Google Model 2) Popular CSV word2vec text format 3) DL4j compressed format In return you get StaticWord2Vec model, which might be used as lookup table only in multi-gpu environment.
        Parameters:
        file - File
        Returns:
      • writeWord2Vec

        public static void writeWord2Vec​(@NonNull
                                         @NonNull Word2Vec word2Vec,
                                         @NonNull
                                         @NonNull OutputStream stream)
                                  throws IOException
        This method saves Word2Vec model to output stream
        Parameters:
        word2Vec - Word2Vec
        stream - OutputStream
        Throws:
        IOException
      • readWord2Vec

        public static Word2Vec readWord2Vec​(@NonNull
                                            @NonNull String path,
                                            boolean readExtendedTables)
        This method restores Word2Vec model from file
        Parameters:
        path -
        readExtendedTables -
        Returns:
        Word2Vec
      • writeLookupTable

        public static <T extends SequenceElement> void writeLookupTable​(WeightLookupTable<T> weightLookupTable,
                                                                        @NonNull
                                                                        @NonNull File file)
                                                                 throws IOException
        This method saves table of weights to file
        Parameters:
        weightLookupTable - WeightLookupTable
        file - File
        Throws:
        IOException
      • readWord2Vec

        public static Word2Vec readWord2Vec​(@NonNull
                                            @NonNull File file,
                                            boolean readExtendedTables)
        This method loads Word2Vec model from file
        Parameters:
        file - File
        readExtendedTables - boolean
        Returns:
        Word2Vec
      • readWord2Vec

        public static Word2Vec readWord2Vec​(@NonNull
                                            @NonNull InputStream stream,
                                            boolean readExtendedTable)
                                     throws IOException
        This method loads Word2Vec model from input stream
        Parameters:
        stream - InputStream
        readExtendedTable - boolean
        Returns:
        Word2Vec
        Throws:
        IOException
      • writeWordVectors

        public static void writeWordVectors​(@NonNull
                                            @NonNull FastText vectors,
                                            @NonNull
                                            @NonNull File path)
                                     throws IOException
        This method loads FastText model to file
        Parameters:
        vectors - FastText
        path - File
        Throws:
        IOException
      • readWordVectors

        public static FastText readWordVectors​(File path)
        This method unloads FastText model from file
        Parameters:
        path - File
      • printOutProjectedMemoryUse

        public static void printOutProjectedMemoryUse​(long numWords,
                                                      int vectorLength,
                                                      int numTables)
        This method prints memory usage to log
        Parameters:
        numWords -
        vectorLength -
        numTables -