Package

org.clulab

embeddings

Permalink

package embeddings

Visibility
  1. Public
  2. All

Type Members

  1. class CompactWordEmbeddingMap extends WordEmbeddingMap

    Permalink

    This class and its companion object have been backported from Eidos.

    This class and its companion object have been backported from Eidos. There it is/was an optional replacement for WordEmbeddingMap used for performance reasons. It loads data faster from disk and stores it more compactly in memory. It does not, however, include all the operations of processer's Word2Vec. For instance, logMultiplicativeTextSimilarity is not included, but could probably be added. Other methods like getWordVector, which in Word2Vec returns an Array[Double], would be inefficient to include because the arrays of doubles (or floats) are no longer part of the design. For more documentation other than that immediately below, both the companion object and the related test case (org.clulab.embeddings.TestCompactWord2Vec) may be helpful.

    The class is typically instantiated by the apply method of the companion object which takes as arguments a filename and then two booleans: "resource", which specifies whether the named file exists as a resource or is alternatively stored on the broader filesystem, and "cached", which specifies that the data consists of Java-serialized objects (see the save method) or, alternatively, the standard vector text format. The apply method arranges for the file to be read in the appropriate way and converted into a map with the words being keys with values being the row numbers in an implied 2-dimentional matrix of the all vector values, also included in the constructor. So, rather than each word being mapped to an independent, mini array as in Word2Vec, they are mapped to an integer row number of a single, larger matrix/array.

    To take advantage of the faster load times, the vector data file needs to be converted from text format into a binary (Java serialized objects) for loadBin below. The test case includes an example. In some preprocessing phase, call CompactWord2Vec(filename, resource = false, cached = false) on the file containing the vectors in text format, such as glove.840B.300d.txt. "resource" is usually false because it can be a very large file, too large to include as a resource. On the resulting return value, call save(compactFilename). Thereafter, for normal, speedy processing, use CompactWord2Vec(compactFilename, resource = false, cached = true).

  2. class DefaultWordSanitizer extends WordSanitizing

    Permalink
  3. class ExplicitWordEmbeddingMap extends WordEmbeddingMap

    Permalink

    Implements an word embedding map, where each embedding is stored as a distinct array

  4. class LemmatizeEmbeddings extends AnyRef

    Permalink

    Generates embeddings for lemmas, by averaging GloVe embeddings for words that have the same lemma The averaging of embedding vectors is weighted by the frequency of the corresponding words in Gigaword

  5. trait WordEmbeddingMap extends AnyRef

    Permalink

    Basic functionality required by all implementations of word embeddings

  6. trait WordSanitizing extends Serializable

    Permalink
  7. class SanitizedWordEmbeddingMap extends AnyRef

    Permalink

    Implements similarity metrics using the embedding matrix IMPORTANT: In our implementation, words are lower cased but NOT lemmatized or stemmed (see sanitizeWord) Note: matrixConstructor is lazy, meant to save memory space if we're caching features User: mihais, dfried, gus Date: 11/25/13 Last Modified: Fix compiler issue: import scala.io.Source.

    Implements similarity metrics using the embedding matrix IMPORTANT: In our implementation, words are lower cased but NOT lemmatized or stemmed (see sanitizeWord) Note: matrixConstructor is lazy, meant to save memory space if we're caching features User: mihais, dfried, gus Date: 11/25/13 Last Modified: Fix compiler issue: import scala.io.Source.

    Annotations
    @deprecated
    Deprecated

    (Since version processors 8.3.0) ExplicitWordEmbeddingMap should replace the functionality in this class

Value Members

  1. object CompactWordEmbeddingMap extends Logging

    Permalink
  2. object CompactWordEmbeddingMapApp extends App

    Permalink
  3. object CullVectors extends App

    Permalink
  4. object EmbeddingUtils

    Permalink
  5. object ExplicitWordEmbeddingMap extends Logging

    Permalink
  6. object LemmatizeEmbeddings

    Permalink
  7. object SanitizedWordEmbeddingMap

    Permalink
  8. object WordEmbeddingMap

    Permalink
  9. object WordEmbeddingMapPool

    Permalink

    Manages a pool of word embedding maps, so we do not load them more than once

Ungrouped