Discount vectors.
Discount vectors. Expects vector(0) as timestamp
earlier vector
later vector
- base for exponent
- parameter to scale time difference
- time to weight vetors to
result vector
Function to shrink vectors and discount it with Vector Utils, contract: vector(0) - timestamp vector2(0) should goes after vector1 if vectors size differs - pad or shrink vector1 to vector2 size
Function to shrink vectors and discount it with Vector Utils, contract: vector(0) - timestamp vector2(0) should goes after vector1 if vectors size differs - pad or shrink vector1 to vector2 size
earlier vector
later vector
- base for exponent
- parameter to scale time difference
- time to weight vectors to
result vector
Perform aggregation with repartition-and-sortWithinPartitions style
Perform aggregation with repartition-and-sortWithinPartitions style
1) repartition dataFrame by repartitionBy columns 2) Sorts dataframe via sortColumns + timestamp.asc 3) all data about vectors for identificator - is in one partition and sorted by timestamp; Iterate through partition and aggregate it 4) Map RDD back to dataFrame
Created by eugeny.malyutin on 18.02.18.
Transformer to implement exponential weighted discounting for vectors; Expects dataFrame with structure ( $"groupByColumns", $"timestamp", $"vector")
Return dataFrame ( $"groupByColumns", $"timestamp", $"vector)
$"timestamp" - last seen action timestamp for this $"identificator" $"vector" - summed actions. vector(0) is reserved for "aggregation" timestamp