model

Type Members

case class Attribute(name: String, type: String = "string", array: Option[Boolean] = None, required: Boolean = true, privacy: Option[PrivacyLevel] = None, comment: Option[String] = None, rename: Option[String] = None, metricType: Option[MetricType] = None, attributes: Option[List[Attribute]] = None, position: Option[Position] = None, default: Option[String] = None, tags: Option[Set[String]] = None, trim: Option[Trim] = None) extends LazyLogging with Product with Serializable

A field in the schema.
A field in the schema. For struct fields, the field "attributes" contains all sub attributes
name
: Attribute name as defined in the source dataset
array
: Is it an array ?
required
: Should this attribute always be present in the source
privacy
: Should this attribute be applied a privacy transformaiton at ingestion time
comment
: free text for attribute description
rename
: If present, the attribute is renamed with this name
metricType
: If present, what kind of stat should be computed for this field
attributes
: List of sub-attributes
position
: Valid only where file format is POSITION
case class AutoJobDesc(name: String, tasks: List[AutoTaskDesc], area: Option[StorageArea] = None, format: Option[String], coalesce: Option[Boolean], udf: Option[String] = None, views: Option[Map[String, String]] = None) extends Product with Serializable

name
Job logical name
tasks
List of business tasks to execute
case class AutoTaskDesc(sql: String, domain: String, dataset: String, write: WriteMode, partition: Option[List[String]] = None, presql: Option[List[String]] = None, postsql: Option[List[String]] = None, area: Option[StorageArea] = None, index: Option[IndexSink] = None, properties: Option[Map[String, String]] = None) extends Product with Serializable

Task executed in teh context of a job
Task executed in teh context of a job
sql
SQL request to exexute (do not forget to prefix table names with the database name
domain
Output domain in Business Area (Will be the Database name in Hive or Dataset in BigQuery)
dataset
Dataset Name in Business Area (Will be the Table name in Hive & BigQuery)
write
Append to or overwrite existing dataset
area
Target Area where domain / dataset will be stored
case class CometArrayType(fields: CometStructType) extends CometDataType with Product with Serializable
trait CometDataType extends AnyRef
case class CometSimpleType(simpleType: DataType, attribute: Attribute, tpe: Type) extends CometDataType with Product with Serializable
case class CometStructField(sparkField: StructField, attribute: Attribute, tpe: Type) extends CometDataType with Product with Serializable
case class CometStructType(fields: Array[CometStructField]) extends CometDataType with Product with Serializable
case class Domain(name: String, directory: String, metadata: Option[Metadata] = None, schemas: List[Schema] = Nil, comment: Option[String] = None, extensions: Option[List[String]] = None, ack: Option[String] = None) extends Product with Serializable

Let's say you are wiling to import from you Sales system customers and orders.
Let's say you are wiling to import from you Sales system customers and orders. Sales is therefore the domain and cusomer & order are syour datasets.
name
: Domain name
directory
: Folder on the local filesystem where incomping files are stored. This folder will be scanned regurlaly to move the dataset to the cluster
metadata
: Default Schema meta data.
schemas
: List of schema for each dataset in this domain
comment
: Free text
extensions
: recognized filename extensions (json, csv, dsv, psv) are recognized by default
ack
: Ack extension used for each file
sealed case class Format(value: String) extends Product with Serializable

Recognized file type format.
Recognized file type format. This will select the correct parser
value
: SIMPLE_JSON, JSON of DSV Simple Json is made of a single level attributes of simple types (no arrray or map or sub objects)

Annotations
@JsonSerialize() @JsonDeserialize()
class FormatDeserializer extends JsonDeserializer[Format]
sealed case class IndexMapping(value: String) extends Product with Serializable

This attribute property let us know what statistics should be computed for this field when analyze is active.
This attribute property let us know what statistics should be computed for this field when analyze is active.
value
: DISCRETE or CONTINUOUS or TEXT or NONE

Annotations
@JsonSerialize() @JsonDeserialize()
class IndexMappingDeserializer extends JsonDeserializer[IndexMapping]
sealed case class IndexSink(value: String) extends Product with Serializable

Recognized file type format.
Recognized file type format. This will select the correct parser
value
: SIMPLE_JSON, JSON of DSV Simple Json is made of a single level attributes of simple types (no arrray or map or sub objects)

Annotations
@JsonSerialize() @JsonDeserialize()
class IndexSinkDeserializer extends JsonDeserializer[IndexSink]
case class MergeOptions(key: List[String], delete: Option[String] = None, timestamp: Option[String] = None) extends Product with Serializable

How dataset are merge
How dataset are merge
key
list of attributes to join existing with incoming dataset. Use renamed columns here.
delete
Optional valid sql condition on the incoming dataset. Use renamed column here.
timestamp
Timestamp column used to identify last version, if not specified currently ingested row is considered the last
case class Metadata(mode: Option[Mode] = None, format: Option[Format] = None, encoding: Option[String] = None, multiline: Option[Boolean] = None, array: Option[Boolean] = None, withHeader: Option[Boolean] = None, separator: Option[String] = None, quote: Option[String] = None, escape: Option[String] = None, write: Option[WriteMode] = None, partition: Option[Partition] = None, index: Option[IndexSink] = None, properties: Option[Map[String, String]] = None) extends Product with Serializable

Specify Schema properties.
Specify Schema properties. These properties may be specified at the schema or domain level Any property non specified at the schema level is taken from the one specified at the domain level or else the default value is returned.
mode
: FILE mode by default
format
: DSV by default
encoding
: UTF-8 by default
multiline
: are json objects on a single line or multiple line ? Single by default. false means single. false also means faster
array
: Is a json stored as a single object array ? false by default
withHeader
: does the dataset has a header ? true bu default
separator
: the column separator, ';' by default
quote
: The String quote char, '"' by default
escape
: escaping char '\' by default
write
: Write mode, APPEND by default
partition
: Partition columns, no partitioning by default
index
: should the dataset be indexed in elasticsearch after ingestion ?

Annotations
@JsonDeserialize()
class MetadataDeserializer extends JsonDeserializer[Metadata]
sealed case class MetricType(value: String) extends Product with Serializable

This attribute property let us know what statistics should be computed for this field when analyze is active.
This attribute property let us know what statistics should be computed for this field when analyze is active.
value
: DISCRETE or CONTINUOUS or TEXT or NONE

Annotations
@JsonSerialize() @JsonDeserialize()
class MetricTypeDeserializer extends JsonDeserializer[MetricType]
sealed case class Mode(value: String) extends Product with Serializable

Big versus Fast data ingestion.
Big versus Fast data ingestion. Are we ingesting a file or a message stream ?
value
: FILE or STREAM

Annotations
@JsonSerialize() @JsonDeserialize()
class ModeDeserializer extends JsonDeserializer[Mode]
case class Partition(sampling: Option[Double], attributes: Option[List[String]]) extends Product with Serializable

sampling
: 0.0 means no sampling, > 0 && < 1 means sample dataset, >=1 absolute number of partitions.
attributes
: Attributes used to partition de dataset.

Annotations
@JsonDeserialize()
class PartitionDeserializer extends JsonDeserializer[Partition]
case class Position(first: Int, last: Int) extends Product with Serializable

How (the attribute should be transformed at ingestion time ?
How (the attribute should be transformed at ingestion time ?
first
: First char position
last
: last char position
sealed abstract case class PrimitiveType extends Product with Serializable

Spark supported primitive types.
Spark supported primitive types. These are the only valid raw types. Dataframes columns are converted to these types before the dataset is ingested

Annotations
@JsonSerialize() @JsonDeserialize()
class PrimitiveTypeDeserializer extends JsonDeserializer[PrimitiveType]
sealed case class PrivacyLevel(value: String) extends Product with Serializable

How (the attribute should be transformed at ingestion time ?
How (the attribute should be transformed at ingestion time ?
value
algorithm to use : NONE, HIDE, MD5, SHA1, SHA256, SHA512, AES

Annotations
@JsonSerialize() @JsonDeserialize()
class PrivacyLevelDeserializer extends JsonDeserializer[PrivacyLevel]
case class Schema(name: String, pattern: Pattern, attributes: List[Attribute], metadata: Option[Metadata], merge: Option[MergeOptions], comment: Option[String], presql: Option[List[String]], postsql: Option[List[String]], tags: Option[Set[String]] = None) extends Product with Serializable

Dataset Schema
Dataset Schema
name
: Schema name, must be unique in the domain. Will become the hive table name
pattern
: filename pattern to which this schema must be applied
attributes
: datasets columns
metadata
: Dataset metadata
comment
: free text
presql
: SQL code executed before the file is ingested
postsql
: SQL code executed right after the file has been ingested
sealed case class Stage(value: String) extends Product with Serializable

Big versus Fast data ingestion.
Big versus Fast data ingestion. Are we ingesting a file or a message stream ?
value
: FILE or STREAM

Annotations
@JsonSerialize() @JsonDeserialize()
class StageDeserializer extends JsonDeserializer[Stage]
sealed case class Trim(value: String) extends Product with Serializable

Big versus Fast data ingestion.
Big versus Fast data ingestion. Are we ingesting a file or a message stream ?
value
: FILE or STREAM

Annotations
@JsonSerialize() @JsonDeserialize()
class TrimDeserializer extends JsonDeserializer[Trim]
case class Type(name: String, pattern: String, primitiveType: PrimitiveType = PrimitiveType.string, zone: Option[String] = None, sample: Option[String] = None, comment: Option[String] = None, indexMapping: Option[IndexMapping] = None) extends Product with Serializable

Semantic Type
Semantic Type
name
: Type name
pattern
: Pattern use to check that the input data matches the pattern
primitiveType
: Spark Column Type of the attribute
case class Types(types: List[Type]) extends Product with Serializable

List of globally defined types
List of globally defined types
types
: Type list
class WriteDeserializer extends JsonDeserializer[WriteMode]
sealed case class WriteMode(value: String) extends Product with Serializable

During ingestion, should the data be appended to the previous ones or should it replace the existing ones ? see Spark SaveMode for more options.
During ingestion, should the data be appended to the previous ones or should it replace the existing ones ? see Spark SaveMode for more options.
value
: OVERWRITE / APPEND / ERROR_IF_EXISTS / IGNORE.

Annotations
@JsonSerialize() @JsonDeserialize()

Value Members

object Format extends Serializable
object IndexMapping extends Serializable
object IndexSink extends Serializable
object Metadata extends Serializable
object MetricType extends Serializable
object Mode extends Serializable
object PrimitiveType extends Serializable
object PrivacyLevel extends Serializable
object Rejection

Contains classes used to describe rejected records.
Contains classes used to describe rejected records. Recjected records are stored in parquet file in teh rejected area. A reject row contains
- the list of columns and for each column wether it has been accepted or not. A row is rejected if at least one of its column is rejected
object Schema extends Serializable
object Stage extends Serializable
object Trim extends Serializable
object WriteMode extends Serializable
package atlas
def combine(errors1: Either[List[String], Boolean], errors2: Either[List[String], Boolean]*): Either[List[String], Boolean]
def duplicates(values: List[String], errorMessage: String): Either[List[String], Boolean]

Utility to extract duplicates and their number of occurrences
Utility to extract duplicates and their number of occurrences
values
: Liste of strings
errorMessage
: Error Message that should contains placeholders for the value(%s) and number of occurrences (%d)
returns
List of tuples contains for ea ch duplicate the number of occurrences

package model

Type Members

case class AutoJobDesc(name: String, tasks: List[AutoTaskDesc], area: Option[StorageArea] = None, format: Option[String], coalesce: Option[Boolean], udf: Option[String] = None, views: Option[Map[String, String]] = None) extends Product with Serializable

case class CometArrayType(fields: CometStructType) extends CometDataType with Product with Serializable

trait CometDataType extends AnyRef

case class CometSimpleType(simpleType: DataType, attribute: Attribute, tpe: Type) extends CometDataType with Product with Serializable

case class CometStructField(sparkField: StructField, attribute: Attribute, tpe: Type) extends CometDataType with Product with Serializable

case class CometStructType(fields: Array[CometStructField]) extends CometDataType with Product with Serializable

case class Domain(name: String, directory: String, metadata: Option[Metadata] = None, schemas: List[Schema] = Nil, comment: Option[String] = None, extensions: Option[List[String]] = None, ack: Option[String] = None) extends Product with Serializable

sealed case class Format(value: String) extends Product with Serializable

class FormatDeserializer extends JsonDeserializer[Format]

sealed case class IndexMapping(value: String) extends Product with Serializable

class IndexMappingDeserializer extends JsonDeserializer[IndexMapping]

sealed case class IndexSink(value: String) extends Product with Serializable

class IndexSinkDeserializer extends JsonDeserializer[IndexSink]

case class MergeOptions(key: List[String], delete: Option[String] = None, timestamp: Option[String] = None) extends Product with Serializable

class MetadataDeserializer extends JsonDeserializer[Metadata]

sealed case class MetricType(value: String) extends Product with Serializable

class MetricTypeDeserializer extends JsonDeserializer[MetricType]

sealed case class Mode(value: String) extends Product with Serializable

class ModeDeserializer extends JsonDeserializer[Mode]

case class Partition(sampling: Option[Double], attributes: Option[List[String]]) extends Product with Serializable

class PartitionDeserializer extends JsonDeserializer[Partition]

case class Position(first: Int, last: Int) extends Product with Serializable

sealed abstract case class PrimitiveType extends Product with Serializable

class PrimitiveTypeDeserializer extends JsonDeserializer[PrimitiveType]

sealed case class PrivacyLevel(value: String) extends Product with Serializable

class PrivacyLevelDeserializer extends JsonDeserializer[PrivacyLevel]

case class Schema(name: String, pattern: Pattern, attributes: List[Attribute], metadata: Option[Metadata], merge: Option[MergeOptions], comment: Option[String], presql: Option[List[String]], postsql: Option[List[String]], tags: Option[Set[String]] = None) extends Product with Serializable

sealed case class Stage(value: String) extends Product with Serializable

class StageDeserializer extends JsonDeserializer[Stage]

sealed case class Trim(value: String) extends Product with Serializable

class TrimDeserializer extends JsonDeserializer[Trim]

case class Type(name: String, pattern: String, primitiveType: PrimitiveType = PrimitiveType.string, zone: Option[String] = None, sample: Option[String] = None, comment: Option[String] = None, indexMapping: Option[IndexMapping] = None) extends Product with Serializable

case class Types(types: List[Type]) extends Product with Serializable

class WriteDeserializer extends JsonDeserializer[WriteMode]

sealed case class WriteMode(value: String) extends Product with Serializable

Value Members

object Format extends Serializable

object IndexMapping extends Serializable

object IndexSink extends Serializable

object Metadata extends Serializable

object MetricType extends Serializable

object Mode extends Serializable

object PrimitiveType extends Serializable

object PrivacyLevel extends Serializable

object Rejection

object Schema extends Serializable

object Stage extends Serializable

object Trim extends Serializable

object WriteMode extends Serializable

package atlas

def combine(errors1: Either[List[String], Boolean], errors2: Either[List[String], Boolean]*): Either[List[String], Boolean]

def duplicates(values: List[String], errorMessage: String): Either[List[String], Boolean]

Inherited from AnyRef

Inherited from Any

Ungrouped