parameters

Type Members

case class MultisegmentParameters(segmentIdField: String, segmentIdFilter: Option[Seq[String]], segmentLevelIds: Seq[String], segmentIdPrefix: String, segmentIdRedefineMap: Map[String, String], fieldParentMap: Map[String, String]) extends Product with Serializable

This class holds the parameters currently used for parsing variable-length records.
case class ReaderParameters(isEbcdic: Boolean = true, ebcdicCodePage: String = "common", ebcdicCodePageClass: Option[String] = None, asciiCharset: String = "", floatingPointFormat: FloatingPointFormat = FloatingPointFormat.IBM, variableSizeOccurs: Boolean = false, lengthFieldName: Option[String] = None, isRecordSequence: Boolean = false, isRdwBigEndian: Boolean = false, isRdwPartRecLength: Boolean = false, rdwAdjustment: Int = 0, isIndexGenerationNeeded: Boolean = false, inputSplitRecords: Option[Int] = None, inputSplitSizeMB: Option[Int] = None, hdfsDefaultBlockSize: Option[Int] = None, startOffset: Int = 0, endOffset: Int = 0, fileStartOffset: Int = 0, fileEndOffset: Int = 0, generateRecordId: Boolean = false, schemaPolicy: SchemaRetentionPolicy = SchemaRetentionPolicy.KeepOriginal, stringTrimmingPolicy: StringTrimmingPolicy = StringTrimmingPolicy.TrimBoth, multisegment: Option[MultisegmentParameters] = None, commentPolicy: CommentPolicy = CommentPolicy(), dropGroupFillers: Boolean = false, nonTerminals: Seq[String] = Nil, recordHeaderParser: Option[String] = None, rhpAdditionalInfo: Option[String] = None, inputFileNameColumn: String = "") extends Product with Serializable

These are properties for customizing mainframe binary data reader.
These are properties for customizing mainframe binary data reader.
isEbcdic
If true the input data file encoding is EBCDIC, otherwise it is ASCII
ebcdicCodePage
Specifies what code page to use for EBCDIC to ASCII/Unicode conversions
ebcdicCodePageClass
An optional custom code page conversion class provided by a user
asciiCharset
A charset for ASCII data
floatingPointFormat
A format of floating-point numbers
variableSizeOccurs
If true, OCCURS DEPENDING ON data size will depend on the number of elements
lengthFieldName
A name of a field that contains record length. Optional. If not set the copybook record length will be used.
isRecordSequence
Does input files have 4 byte record length headers
isRdwBigEndian
Is RDW big endian? It may depend on flavor of mainframe and/or mainframe to PC transfer method
isRdwPartRecLength
Does RDW count itself as part of record length itself
rdwAdjustment
Controls a mismatch between RDW and record length
isIndexGenerationNeeded
Is indexing input file before processing is requested
inputSplitRecords
The number of records to include in each partition. Notice mainframe records may have variable size, inputSplitMB is the recommended option
inputSplitSizeMB
A partition size to target. In certain circumstances this size may not be exactly that, but the library will do the best effort to target that size
hdfsDefaultBlockSize
Default HDFS block size for the HDFS filesystem used. This value is used as the default split size if inputSplitSizeMB is not specified
startOffset
An offset to the start of the record in each binary data block.
endOffset
An offset from the end of the record to the end of the binary data block.
fileStartOffset
A number of bytes to skip at the beginning of each file
fileEndOffset
A number of bytes to skip at the end of each file
generateRecordId
If true, a record id field will be prepended to each record.
schemaPolicy
Specifies a policy to transform the input schema. The default policy is to keep the schema exactly as it is in the copybook.
stringTrimmingPolicy
Specifies if and how strings should be trimmed when parsed.
multisegment
Parameters specific to reading multisegment files
commentPolicy
A comment truncation policy
dropGroupFillers
If true the parser will drop all FILLER fields, even GROUP FILLERS that have non-FILLER nested fields
nonTerminals
A list of non-terminals (GROUPS) to combine and parse as primitive fields
recordHeaderParser
A parser used to parse data field record headers
rhpAdditionalInfo
An optional additional option string passed to a custom record header parser
inputFileNameColumn
A column name to add to the dataframe. The column will contain input file name for each record similar to 'input_file_name()' function

package parameters

Type Members

case class MultisegmentParameters(segmentIdField: String, segmentIdFilter: Option[Seq[String]], segmentLevelIds: Seq[String], segmentIdPrefix: String, segmentIdRedefineMap: Map[String, String], fieldParentMap: Map[String, String]) extends Product with Serializable

Ungrouped