org.saddle
==Saddle==
Saddle is a '''S'''cala '''D'''ata '''L'''ibrary.
Saddle provides array-backed, indexed one- and two-dimensional data structures.
These data structures are specialized on JVM primitives. With them one can often avoid the overhead of boxing and unboxing.
Basic operations also aim to be robust to missing values (NA's)
The building blocks are intended to be easily composed.
The foundational building blocks are:
Inspiration for Saddle comes from many sources, including the R programming language, the pandas data analysis library for Python, and the Scala collections library.
Type members
Classlikes
- Companion:
- object
Filling method for NA values. Non-sealed because could add more variants in the future.
Filling method for NA values. Non-sealed because could add more variants in the future.
Frame
is an immutable container for 2D data which is indexed along both
axes (rows, columns) by associated keys (i.e., indexes).
Frame
is an immutable container for 2D data which is indexed along both
axes (rows, columns) by associated keys (i.e., indexes).
The primary use case is homogeneous data, but a secondary concern is to support heterogeneous data that is homogeneous ony within any given column.
The row index, column index, and constituent value data are all backed ultimately by arrays.
Frame
is effectively a doubly-indexed associative map whose row keys and
col keys each have an ordering provided by the natural (provided) order of
their backing arrays.
Several factory and access methods are provided. In the following examples, assume that:
val f = Frame('a'->Vec(1,2,3), 'b'->Vec(4,5,6))
The apply
method takes a row and col key returns a slice of the original
Frame:
f(0,'a') == Frame('a'->Vec(1))
apply
also accepts a org.saddle.index.Slice:
f(0->1, 'b') == Frame('b'->Vec(4,5))
f(0, *) == Frame('a'->Vec(1), 'b'->Vec(4))
You may slice using the col
and row
methods respectively, as follows:
f.col('a') == Frame('a'->Vec(1,2,3))
f.row(0) == Frame('a'->Vec(1), 'b'->Vec(4))
f.row(0->1) == Frame('a'->Vec(1,2), 'b'->Vec(4,5))
You can achieve a similar effect with rowSliceBy
and colSliceBy
The colAt
and rowAt
methods take an integer offset i into the Frame, and
return a Series indexed by the opposing axis:
f.rowAt(0) == Series('a'->1, 'b'->4)
If there is a one-to-one relationship between offset i and key (ie, no duplicate keys in the index), you may achieve the same effect via key as follows:
f.first(0) == Series('a'->1, 'b'->4)
f.firstCol('a') == Series(1,2,3)
The at
method returns an instance of a org.saddle.scalar.Scalar, which
behaves much like an Option
; it can be either an instance of
org.saddle.scalar.NA or a org.saddle.scalar.Value case class:
f.at(0, 0) == scalar.Scalar(1)
The rowSlice
and colSlice
methods allows slicing the Frame for locations
in [i, j) irrespective of the value of the keys at those locations.
f.rowSlice(0,1) == Frame('a'->Vec(1), 'b'->Vec(4))
Finally, the method raw
accesses a value directly, which may reveal the
underlying representation of a missing value (so be careful).
f.raw(0,0) == 1
Frame
may be used in arithmetic expressions which operate on two Frame
s
or on a Frame
and a scalar value. In the former case, the two Frames will
automatically align along their indexes:
f + f.shift(1) == Frame('a'->Vec(NA,3,5), 'b'->Vec(NA,9,11))
- Type parameters:
- CX
The type of column keys
- RX
The type of row keys
- T
The type of entries in the frame
- Value parameters:
- colIx
An index for the columns
- rowIx
An index for the rows
- values
A sequence of Vecs which comprise the columns of the Frame
- Companion:
- object
Index provides a constant-time look-up of a value within array-backed storage, as well as operations to support joining and slicing.
Index provides a constant-time look-up of a value within array-backed storage, as well as operations to support joining and slicing.
- Companion:
- object
Mat
is an immutable container for 2D homogeneous data (a "matrix"). It is
backed by a single array. Data is stored in row-major order.
Mat
is an immutable container for 2D homogeneous data (a "matrix"). It is
backed by a single array. Data is stored in row-major order.
Several element access methods are provided.
The at
method returns an instance of a org.saddle.scalar.Scalar, which
behaves much like an Option
in that it can be either an instance of
org.saddle.scalar.NA or a org.saddle.scalar.Value case class:
val m = Mat(2,2,Array(1,2,3,4))
m.at(0,0) == Value(1)
The method raw
accesses the underlying value directly.
val m = Mat(2,2,Array(1,2,3,4))
m.raw(0,0) == 1d
Mat
may be used in arithmetic expressions which operate on two Mat
s or
on a Mat
and a primitive value. A fe examples:
val m = Mat(2,2,Array(1,2,3,4))
m * m == Mat(2,2,Array(1,4,9,16))
m dot m == Mat(2,2,Array(7d,10,15,22))
m * 3 == Mat(2, 2, Array(3,6,9,12))
Note, Mat is generally compatible with EJML's DenseMatrix. It may be convenient to induce this conversion to do more complex linear algebra, or to work with a mutable data structure.
- Type parameters:
- A
Type of elements within the Mat
- Companion:
- object
Convenience constructors for a Frame[RX, CX, Any] that accept arbitrarily-typed Vectors and Series as constructor parameters, leaving their internal representations unchanged.
Convenience constructors for a Frame[RX, CX, Any] that accept arbitrarily-typed Vectors and Series as constructor parameters, leaving their internal representations unchanged.
Trait which specifies what percentile method to use
Trait which specifies what percentile method to use
- Companion:
- object
Trait which specifies how to break a rank tie
Trait which specifies how to break a rank tie
- Companion:
- object
Augments Seq with a toFrame method that returns a new Frame instance.
Augments Seq with a toFrame method that returns a new Frame instance.
For example,
val t = IndexedSeq(("a", "x", 3), ("b", "y", 4))
val f = t.toFrame
res0: org.saddle.Frame[java.lang.String,java.lang.String,Int] =
[2 x 2]
x y
-- --
a -> 3 NA
b -> NA 4
- Type parameters:
- CX
Type of col index elements of Frame
- RX
Type of row index elements of Frame
- T
Type of data elements of Frame
- Value parameters:
- s
A value of type Seq[(RX, CX, T)]
Augments Seq with a toIndex method that returns a new Index instance.
Augments Seq with a toIndex method that returns a new Index instance.
For example,
val i = IndexedSeq(1,2,3)
val s = i.toIndex
- Type parameters:
- X
Type of index elements
- Value parameters:
- ix
A value of type Seq[X]
Augments Seq with a toSeries method that returns a new Series instance.
Augments Seq with a toSeries method that returns a new Series instance.
For example,
val p = IndexedSeq(1,2,3) zip IndexedSeq(4,5,6)
val s = p.toSeries
- Type parameters:
- T
Type of data elements of Series
- X
Type of index elements of Series
- Value parameters:
- s
A value of type Seq[(X, T)]
Augments Seq with a toVec method that returns a new Vec instance.
Augments Seq with a toVec method that returns a new Vec instance.
For example,
val s = IndexedSeq(1,2,3)
val v = s.toVec
- Type parameters:
- T
Type of elements of Vec
- Value parameters:
- s
A value of type Seq[T]
Series
is an immutable container for 1D homogeneous data which is indexed
by a an associated sequence of keys.
Series
is an immutable container for 1D homogeneous data which is indexed
by a an associated sequence of keys.
Both the index and value data are backed by arrays.
Series
is effectively an associative map whose keys have an ordering
provided by the natural (provided) order of the backing array.
Several element access methods are provided.
The apply
method returns a slice of the original Series:
val s = Series(Vec(1,2,3,4), Index('a','b','b','c'))
s('a') == Series('a'->1)
s('b') == Series('b'->2, 'b'->3)
Other ways to slice a series involve implicitly constructing an org.saddle.index.Slice object and passing it to the Series apply method:
s('a'->'b') == Series('a'->1, 'b'->2, 'b'->3)
s(* -> 'b') == Series('a'->1, 'b'->2, 'b'->3)
s('b' -> *) == Series('b'->2, 'b'->3, 'c'->4)
s(*) == s
The at
method returns an instance of a org.saddle.scalar.Scalar, which
behaves much like an Option
in that it can be either an instance of
org.saddle.scalar.NA or a org.saddle.scalar.Value case class:
s.at(0) == Scalar(1)
The slice
method allows slicing the Series for locations in [i, j)
irrespective of the value of the keys at those locations.
s.slice(2,4) == Series('b'->3, 'c'->4)
To slice explicitly by labels, use the sliceBy
method, which is inclusive
of the key boundaries:
s.sliceBy('b','c') == Series('b'->3, 'c'->4)
The method raw
accesses the value directly, which may reveal the
underlying representation of a missing value (so be careful).
s.raw(0) == 1
Series
may be used in arithmetic expressions which operate on two Series
or on a Series
and a scalar value. In the former case, the two Series will
automatically align along their indexes. A few examples:
s * 2 == Series('a'->2, 'b'->4, ... )
s + s.shift(1) == Series('a'->NA, 'b'->3, 'b'->5, ...)
- Type parameters:
- T
Type of elements in the values array, for which there must be an implicit ST
- X
Type of elements in the index, for which there must be an implicit Ordering and ST
- Value parameters:
- index
Index backing the keys in the Series
- values
Vec backing the values in the Series
- Companion:
- object
Vec
is an immutable container for 1D homogeneous data (a "vector"). It is
backed by an array and indexed from 0 to length - 1.
Vec
is an immutable container for 1D homogeneous data (a "vector"). It is
backed by an array and indexed from 0 to length - 1.
Several element access methods are provided.
The apply()
method returns a slice of the original vector:
val v = Vec(1,2,3,4)
v(0) == Vec(1)
v(1, 2) == Vec(2,3)
The at
method returns an instance of a org.saddle.scalar.Scalar, which
behaves much like an Option
in that it can be either an instance of
org.saddle.scalar.NA or a org.saddle.scalar.Value case class:
Vec[Int](1,2,3,na).at(0) == Scalar(1)
Vec[Int](1,2,3,na).at(3) == NA
The method raw
accesses the underlying value directly.
Vec(1d,2,3).raw(0) == 1d
Vec
may be used in arithmetic expressions which operate on two Vec
s or
on a Vec
and a scalar value. A few examples:
Vec(1,2,3,4) + Vec(2,3,4,5) == Vec(3,5,7,9)
Vec(1,2,3,4) * 2 == Vec(2,4,6,8)
Note, Vec is implicitly convertible to an array for convenience; this could be abused to mutate the contents of the Vec. Try to avoid this!
- Type parameters:
- T
Type of elements within the Vec
- Companion:
- object
Specialized methods for Vec[Double]
Specialized methods for Vec[Double]
Methods in this class do not filter out NAs, e.g. Vec(NA,1d).max2 == NA rather than 1d
na
provides syntactic sugar for constructing primitives recognized as
NA. A use case is be:
na
provides syntactic sugar for constructing primitives recognized as
NA. A use case is be:
Vec[Int](1,2,na,4)
na
will implicitly convert to a primitive having the designated missing
value bit pattern. That pattern is as follows:
- byte => Byte.MinValue
- char => Char.MinValue
- short => Short.Minvalue
- int => Int.MinValue
- long => Long.MinValue
- float => Float.NaN
- double => Double.NaN
The NA bit pattern for integral types is MinValue
because it induces a
symmetry on the remaining bound of values; e.g. the remaining Byte
bound
is (-127, +127).
Note since Boolean
s can only take on two values, it has no na
primitive bit pattern.
Value members
Concrete methods
Syntactic sugar, placeholder for 'slice-all'
Syntactic sugar, placeholder for 'slice-all'
val v = Vec(1,2,3, 4)
val u = v(*)
Implicits
Implicits
Augments Seq with a toFrame method that returns a new Frame instance.
Augments Seq with a toFrame method that returns a new Frame instance.
For example,
val t = IndexedSeq(("a", "x", 3), ("b", "y", 4))
val f = t.toFrame
res0: org.saddle.Frame[java.lang.String,java.lang.String,Int] =
[2 x 2]
x y
-- --
a -> 3 NA
b -> NA 4
- Type parameters:
- CX
Type of col index elements of Frame
- RX
Type of row index elements of Frame
- T
Type of data elements of Frame
- Value parameters:
- s
A value of type Seq[(RX, CX, T)]
Augments Seq with a toIndex method that returns a new Index instance.
Augments Seq with a toIndex method that returns a new Index instance.
For example,
val i = IndexedSeq(1,2,3)
val s = i.toIndex
- Type parameters:
- X
Type of index elements
- Value parameters:
- ix
A value of type Seq[X]
Augments Seq with a toSeries method that returns a new Series instance.
Augments Seq with a toSeries method that returns a new Series instance.
For example,
val p = IndexedSeq(1,2,3) zip IndexedSeq(4,5,6)
val s = p.toSeries
- Type parameters:
- T
Type of data elements of Series
- X
Type of index elements of Series
- Value parameters:
- s
A value of type Seq[(X, T)]
Augments Seq with a toVec method that returns a new Vec instance.
Augments Seq with a toVec method that returns a new Vec instance.
For example,
val s = IndexedSeq(1,2,3)
val v = s.toVec
- Type parameters:
- T
Type of elements of Vec
- Value parameters:
- s
A value of type Seq[T]
Specialized methods for Vec[Double]
Specialized methods for Vec[Double]
Methods in this class do not filter out NAs, e.g. Vec(NA,1d).max2 == NA rather than 1d
Syntactic sugar, allow '->' to generate an (inclusive) index slice
Syntactic sugar, allow '->' to generate an (inclusive) index slice
val v = Vec(1,2,3,4)
val u = v(0 -> 2)
Syntactic sugar, allow ' -> *' to generate an (inclusive) index slice, open on right
Syntactic sugar, allow ' -> *' to generate an (inclusive) index slice, open on right
val v = Vec(1,2,3,4)
val u = v(1 -> *)