Package

com.salesforce.op.utils

io

Permalink

package io

Visibility
  1. Public
  2. All

Type Members

  1. class DirectMapreduceOutputCommitter extends OutputCommitter

    Permalink
  2. class DirectOutputCommitter extends OutputCommitter

    Permalink

    OutputCommitter suitable for S3 workloads.

    OutputCommitter suitable for S3 workloads. Unlike the usual FileOutputCommitter, which writes files to a _temporary/ directory before renaming them to their final location, this simply writes directly to the final location.

    The FileOutputCommitter is required for HDFS + speculation, which allows only one writer at a time for a file (so two people racing to write the same file would not work). However, S3 supports multiple writers outputting to the same file, where visibility is guaranteed to be atomic. This is a monotonic operation: all writers should be writing the same data, so which one wins is immaterial.

    Code adapted from Ian Hummel's code from this PR: https://github.com/themodernlife/spark/commit/4359664b1d557d55b0579023df809542386d5b8c

Value Members

  1. package avro

    Permalink
  2. package csv

    Permalink

Ungrouped