thrift

Type Members

class DailySuffixParquetThrift[T <: ThriftBase] extends DailySuffixSource with ParquetThrift[T]

When Using these sources or creating subclasses of them, you can provide a filter predicate and / or a set of fields (columns) to keep (project).
When Using these sources or creating subclasses of them, you can provide a filter predicate and / or a set of fields (columns) to keep (project).
The filter predicate will be pushed down to the input format, potentially making the filter significantly more efficient than a filter applied to a TypedPipe (parquet push-down filters can skip reading entire chunks of data off disk).
For data with a large schema (many fields / columns), providing the set of columns you intend to use can also make your job significantly more efficient (parquet column projection push-down will skip reading unused columns from disk). The columns are specified in the format described here: https://github.com/apache/parquet-mr/blob/master/parquet_cascading.md#21-projection-pushdown-with-thriftscrooge-records
These settings are defined in the traits com.twitter.scalding.parquet.HasFilterPredicate and com.twitter.scalding.parquet.HasColumnProjection
Here are two ways you can use these in a parquet source:
```
class MyParquetSource(dr: DateRange) extends DailySuffixParquetThrift("/a/path", dr)

val mySourceFilteredAndProjected = new MyParquetSource(dr) {
  override val withFilter: Option[FilterPredicate] = Some(myFp)
  override val withColumnProjections: Set[String] = Set("a.b.c", "x.y")
}
```
The other way is to add these as constructor arguments:
```
class MyParquetSource(
  dr: DateRange,
  override val withFilter: Option[FilterPredicate] = None
  override val withColumnProjections: Set[String] = Set()
) extends DailySuffixParquetThrift("/a/path", dr)

val mySourceFilteredAndProjected = new MyParquetSource(dr, Some(myFp), Set("a.b.c", "x.y"))
```
class FixedPathParquetThrift[T <: ThriftBase] extends FixedPathSource with ParquetThrift[T]
class HourlySuffixParquetThrift[T <: ThriftBase] extends HourlySuffixSource with ParquetThrift[T]
class Parquet346TBaseRecordConverter[T <: TBase[_, _]] extends ThriftRecordConverter[T]

Same as TBaseRecordConverter with one important (subtle) difference.
Same as TBaseRecordConverter with one important (subtle) difference. It passes a repaired schema (StructType) to ThriftRecordConverter's constructor. This is important because older files don't contain all the metadata needed for ThriftSchemaConverter to not throw, but we can put dummy data in there because it's not actually used.
class Parquet346TBaseScheme[T <: TBase[_, _]] extends ParquetTBaseScheme[T]

The same as ParquetTBaseScheme, but sets the record convert to Parquet346TBaseRecordConverter
class ParquetTBaseScheme[T <: TBase[_, _]] extends ParquetValueScheme[T]
trait ParquetThrift[T <: ThriftBase] extends FileSource with ParquetThriftBase[T]
trait ParquetThriftBase[T] extends FileSource with SingleMappable[T] with TypedSink[T] with LocalTapSource with HasFilterPredicate with HasColumnProjection

Value Members

object Parquet346StructTypeRepairer extends StateVisitor[ThriftType, Unit]

Takes a ThriftType with potentially missing structOrUnionType metadata, and makes a copy that sets all StructOrUnionType metadata to UNION
object ParquetThrift extends Serializable

package thrift

Type Members

class DailySuffixParquetThrift[T <: ThriftBase] extends DailySuffixSource with ParquetThrift[T]

class FixedPathParquetThrift[T <: ThriftBase] extends FixedPathSource with ParquetThrift[T]

class HourlySuffixParquetThrift[T <: ThriftBase] extends HourlySuffixSource with ParquetThrift[T]

class Parquet346TBaseRecordConverter[T <: TBase[_, _]] extends ThriftRecordConverter[T]

class Parquet346TBaseScheme[T <: TBase[_, _]] extends ParquetTBaseScheme[T]

class ParquetTBaseScheme[T <: TBase[_, _]] extends ParquetValueScheme[T]

trait ParquetThrift[T <: ThriftBase] extends FileSource with ParquetThriftBase[T]

trait ParquetThriftBase[T] extends FileSource with SingleMappable[T] with TypedSink[T] with LocalTapSource with HasFilterPredicate with HasColumnProjection

Value Members

object Parquet346StructTypeRepairer extends StateVisitor[ThriftType, Unit]

object ParquetThrift extends Serializable

Ungrouped