Bucket keys to use for quickly finding other similar items via locality sensitive hashing
Bucket keys to use for quickly finding other similar items via locality sensitive hashing
Decode two signatures into hash values, combine them somehow, and produce a new array
Decode two signatures into hash values, combine them somehow, and produce a new array
Initialize a byte array by generating hash values
Initialize a byte array by generating hash values
useful for understanding the effects of numBands and numRows
useful for understanding the effects of numBands and numRows
We always use a 128 bit hash function, so the number of hash functions is different (and usually smaller) than the number of hashes in the signature.
We always use a 128 bit hash function, so the number of hash functions is different (and usually smaller) than the number of hashes in the signature.
the number of bytes used for each hash in the signature
the number of bytes used for each hash in the signature
Create a signature for an arbitrary value
Create a signature for an arbitrary value
Create a signature for a single String value
Create a signature for a single String value
Create a signature for a single Long value
Create a signature for a single Long value
Maximum value the hash can take on (not 2*hashSize because of signed types)
Maximum value the hash can take on (not 2*hashSize because of signed types)
For explanation of the "bands" and "rows" see Ullman and Rajaraman
For explanation of the "bands" and "rows" see Ullman and Rajaraman
numerically solve the inverse of estimatedThreshold, given numBands*numRows
numerically solve the inverse of estimatedThreshold, given numBands*numRows
Set union
useful for understanding the effects of numBands and numRows
useful for understanding the effects of numBands and numRows
This seed could be anything
This seed could be anything
Esimate jaccard similarity (size of union / size of intersection)
Esimate jaccard similarity (size of union / size of intersection)
Signature for empty set, needed to be a proper Monoid
Just use Monoid.sum