Package

com.ebiznext.comet.schema

model

Permalink

package model

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. model
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class Attribute(name: String, type: String = "string", array: Option[Boolean] = None, required: Boolean = true, privacy: Option[PrivacyLevel] = None, comment: Option[String] = None, rename: Option[String] = None, metricType: Option[MetricType] = None, attributes: Option[List[Attribute]] = None, position: Option[Position] = None, default: Option[String] = None, tags: Option[Set[String]] = None, trim: Option[Trim] = None) extends LazyLogging with Product with Serializable

    Permalink

    A field in the schema.

    A field in the schema. For struct fields, the field "attributes" contains all sub attributes

    name

    : Attribute name as defined in the source dataset

    array

    : Is it an array ?

    required

    : Should this attribute always be present in the source

    privacy

    : Should this attribute be applied a privacy transformaiton at ingestion time

    comment

    : free text for attribute description

    rename

    : If present, the attribute is renamed with this name

    metricType

    : If present, what kind of stat should be computed for this field

    attributes

    : List of sub-attributes

    position

    : Valid only where file format is POSITION

  2. case class AutoJobDesc(name: String, tasks: List[AutoTaskDesc], area: Option[StorageArea] = None, format: Option[String], coalesce: Option[Boolean], udf: Option[String] = None, views: Option[Map[String, String]] = None) extends Product with Serializable

    Permalink

    name

    Job logical name

    tasks

    List of business tasks to execute

  3. case class AutoTaskDesc(sql: String, domain: String, dataset: String, write: WriteMode, partition: Option[List[String]] = None, presql: Option[List[String]] = None, postsql: Option[List[String]] = None, area: Option[StorageArea] = None, index: Option[IndexSink] = None, properties: Option[Map[String, String]] = None) extends Product with Serializable

    Permalink

    Task executed in teh context of a job

    Task executed in teh context of a job

    sql

    SQL request to exexute (do not forget to prefix table names with the database name

    domain

    Output domain in Business Area (Will be the Database name in Hive or Dataset in BigQuery)

    dataset

    Dataset Name in Business Area (Will be the Table name in Hive & BigQuery)

    write

    Append to or overwrite existing dataset

    area

    Target Area where domain / dataset will be stored

  4. case class CometArrayType(fields: CometStructType) extends CometDataType with Product with Serializable

    Permalink
  5. trait CometDataType extends AnyRef

    Permalink
  6. case class CometSimpleType(simpleType: DataType, attribute: Attribute, tpe: Type) extends CometDataType with Product with Serializable

    Permalink
  7. case class CometStructField(sparkField: StructField, attribute: Attribute, tpe: Type) extends CometDataType with Product with Serializable

    Permalink
  8. case class CometStructType(fields: Array[CometStructField]) extends CometDataType with Product with Serializable

    Permalink
  9. case class Domain(name: String, directory: String, metadata: Option[Metadata] = None, schemas: List[Schema] = Nil, comment: Option[String] = None, extensions: Option[List[String]] = None, ack: Option[String] = None) extends Product with Serializable

    Permalink

    Let's say you are wiling to import from you Sales system customers and orders.

    Let's say you are wiling to import from you Sales system customers and orders. Sales is therefore the domain and cusomer & order are syour datasets.

    name

    : Domain name

    directory

    : Folder on the local filesystem where incomping files are stored. This folder will be scanned regurlaly to move the dataset to the cluster

    metadata

    : Default Schema meta data.

    schemas

    : List of schema for each dataset in this domain

    comment

    : Free text

    extensions

    : recognized filename extensions (json, csv, dsv, psv) are recognized by default

    ack

    : Ack extension used for each file

  10. sealed case class Format(value: String) extends Product with Serializable

    Permalink

    Recognized file type format.

    Recognized file type format. This will select the correct parser

    value

    : SIMPLE_JSON, JSON of DSV Simple Json is made of a single level attributes of simple types (no arrray or map or sub objects)

    Annotations
    @JsonSerialize() @JsonDeserialize()
  11. class FormatDeserializer extends JsonDeserializer[Format]

    Permalink
  12. sealed case class IndexMapping(value: String) extends Product with Serializable

    Permalink

    This attribute property let us know what statistics should be computed for this field when analyze is active.

    This attribute property let us know what statistics should be computed for this field when analyze is active.

    value

    : DISCRETE or CONTINUOUS or TEXT or NONE

    Annotations
    @JsonSerialize() @JsonDeserialize()
  13. class IndexMappingDeserializer extends JsonDeserializer[IndexMapping]

    Permalink
  14. sealed case class IndexSink(value: String) extends Product with Serializable

    Permalink

    Recognized file type format.

    Recognized file type format. This will select the correct parser

    value

    : SIMPLE_JSON, JSON of DSV Simple Json is made of a single level attributes of simple types (no arrray or map or sub objects)

    Annotations
    @JsonSerialize() @JsonDeserialize()
  15. class IndexSinkDeserializer extends JsonDeserializer[IndexSink]

    Permalink
  16. case class MergeOptions(key: List[String], delete: Option[String] = None, timestamp: Option[String] = None) extends Product with Serializable

    Permalink

    How dataset are merge

    How dataset are merge

    key

    list of attributes to join existing with incoming dataset. Use renamed columns here.

    delete

    Optional valid sql condition on the incoming dataset. Use renamed column here.

    timestamp

    Timestamp column used to identify last version, if not specified currently ingested row is considered the last

  17. case class Metadata(mode: Option[Mode] = None, format: Option[Format] = None, encoding: Option[String] = None, multiline: Option[Boolean] = None, array: Option[Boolean] = None, withHeader: Option[Boolean] = None, separator: Option[String] = None, quote: Option[String] = None, escape: Option[String] = None, write: Option[WriteMode] = None, partition: Option[Partition] = None, index: Option[IndexSink] = None, properties: Option[Map[String, String]] = None) extends Product with Serializable

    Permalink

    Specify Schema properties.

    Specify Schema properties. These properties may be specified at the schema or domain level Any property non specified at the schema level is taken from the one specified at the domain level or else the default value is returned.

    mode

    : FILE mode by default

    format

    : DSV by default

    encoding

    : UTF-8 by default

    multiline

    : are json objects on a single line or multiple line ? Single by default. false means single. false also means faster

    array

    : Is a json stored as a single object array ? false by default

    withHeader

    : does the dataset has a header ? true bu default

    separator

    : the column separator, ';' by default

    quote

    : The String quote char, '"' by default

    escape

    : escaping char '\' by default

    write

    : Write mode, APPEND by default

    partition

    : Partition columns, no partitioning by default

    index

    : should the dataset be indexed in elasticsearch after ingestion ?

    Annotations
    @JsonDeserialize()
  18. class MetadataDeserializer extends JsonDeserializer[Metadata]

    Permalink
  19. sealed case class MetricType(value: String) extends Product with Serializable

    Permalink

    This attribute property let us know what statistics should be computed for this field when analyze is active.

    This attribute property let us know what statistics should be computed for this field when analyze is active.

    value

    : DISCRETE or CONTINUOUS or TEXT or NONE

    Annotations
    @JsonSerialize() @JsonDeserialize()
  20. class MetricTypeDeserializer extends JsonDeserializer[MetricType]

    Permalink
  21. sealed case class Mode(value: String) extends Product with Serializable

    Permalink

    Big versus Fast data ingestion.

    Big versus Fast data ingestion. Are we ingesting a file or a message stream ?

    value

    : FILE or STREAM

    Annotations
    @JsonSerialize() @JsonDeserialize()
  22. class ModeDeserializer extends JsonDeserializer[Mode]

    Permalink
  23. case class Partition(sampling: Option[Double], attributes: Option[List[String]]) extends Product with Serializable

    Permalink

    sampling

    : 0.0 means no sampling, > 0 && < 1 means sample dataset, >=1 absolute number of partitions.

    attributes

    : Attributes used to partition de dataset.

    Annotations
    @JsonDeserialize()
  24. class PartitionDeserializer extends JsonDeserializer[Partition]

    Permalink
  25. case class Position(first: Int, last: Int) extends Product with Serializable

    Permalink

    How (the attribute should be transformed at ingestion time ?

    How (the attribute should be transformed at ingestion time ?

    first

    : First char position

    last

    : last char position

  26. sealed abstract case class PrimitiveType extends Product with Serializable

    Permalink

    Spark supported primitive types.

    Spark supported primitive types. These are the only valid raw types. Dataframes columns are converted to these types before the dataset is ingested

    Annotations
    @JsonSerialize() @JsonDeserialize()
  27. class PrimitiveTypeDeserializer extends JsonDeserializer[PrimitiveType]

    Permalink
  28. sealed case class PrivacyLevel(value: String) extends Product with Serializable

    Permalink

    How (the attribute should be transformed at ingestion time ?

    How (the attribute should be transformed at ingestion time ?

    value

    algorithm to use : NONE, HIDE, MD5, SHA1, SHA256, SHA512, AES

    Annotations
    @JsonSerialize() @JsonDeserialize()
  29. class PrivacyLevelDeserializer extends JsonDeserializer[PrivacyLevel]

    Permalink
  30. case class Schema(name: String, pattern: Pattern, attributes: List[Attribute], metadata: Option[Metadata], merge: Option[MergeOptions], comment: Option[String], presql: Option[List[String]], postsql: Option[List[String]], tags: Option[Set[String]] = None) extends Product with Serializable

    Permalink

    Dataset Schema

    Dataset Schema

    name

    : Schema name, must be unique in the domain. Will become the hive table name

    pattern

    : filename pattern to which this schema must be applied

    attributes

    : datasets columns

    metadata

    : Dataset metadata

    comment

    : free text

    presql

    : SQL code executed before the file is ingested

    postsql

    : SQL code executed right after the file has been ingested

  31. sealed case class Stage(value: String) extends Product with Serializable

    Permalink

    Big versus Fast data ingestion.

    Big versus Fast data ingestion. Are we ingesting a file or a message stream ?

    value

    : FILE or STREAM

    Annotations
    @JsonSerialize() @JsonDeserialize()
  32. class StageDeserializer extends JsonDeserializer[Stage]

    Permalink
  33. sealed case class Trim(value: String) extends Product with Serializable

    Permalink

    Big versus Fast data ingestion.

    Big versus Fast data ingestion. Are we ingesting a file or a message stream ?

    value

    : FILE or STREAM

    Annotations
    @JsonSerialize() @JsonDeserialize()
  34. class TrimDeserializer extends JsonDeserializer[Trim]

    Permalink
  35. case class Type(name: String, pattern: String, primitiveType: PrimitiveType = PrimitiveType.string, zone: Option[String] = None, sample: Option[String] = None, comment: Option[String] = None, indexMapping: Option[IndexMapping] = None) extends Product with Serializable

    Permalink

    Semantic Type

    Semantic Type

    name

    : Type name

    pattern

    : Pattern use to check that the input data matches the pattern

    primitiveType

    : Spark Column Type of the attribute

  36. case class Types(types: List[Type]) extends Product with Serializable

    Permalink

    List of globally defined types

    List of globally defined types

    types

    : Type list

  37. class WriteDeserializer extends JsonDeserializer[WriteMode]

    Permalink
  38. sealed case class WriteMode(value: String) extends Product with Serializable

    Permalink

    During ingestion, should the data be appended to the previous ones or should it replace the existing ones ? see Spark SaveMode for more options.

    During ingestion, should the data be appended to the previous ones or should it replace the existing ones ? see Spark SaveMode for more options.

    value

    : OVERWRITE / APPEND / ERROR_IF_EXISTS / IGNORE.

    Annotations
    @JsonSerialize() @JsonDeserialize()

Value Members

  1. object Format extends Serializable

    Permalink
  2. object IndexMapping extends Serializable

    Permalink
  3. object IndexSink extends Serializable

    Permalink
  4. object Metadata extends Serializable

    Permalink
  5. object MetricType extends Serializable

    Permalink
  6. object Mode extends Serializable

    Permalink
  7. object PrimitiveType extends Serializable

    Permalink
  8. object PrivacyLevel extends Serializable

    Permalink
  9. object Rejection

    Permalink

    Contains classes used to describe rejected records.

    Contains classes used to describe rejected records. Recjected records are stored in parquet file in teh rejected area. A reject row contains

    • the list of columns and for each column wether it has been accepted or not. A row is rejected if at least one of its column is rejected
  10. object Schema extends Serializable

    Permalink
  11. object Stage extends Serializable

    Permalink
  12. object Trim extends Serializable

    Permalink
  13. object WriteMode extends Serializable

    Permalink
  14. package atlas

    Permalink
  15. def combine(errors1: Either[List[String], Boolean], errors2: Either[List[String], Boolean]*): Either[List[String], Boolean]

    Permalink
  16. def duplicates(values: List[String], errorMessage: String): Either[List[String], Boolean]

    Permalink

    Utility to extract duplicates and their number of occurrences

    Utility to extract duplicates and their number of occurrences

    values

    : Liste of strings

    errorMessage

    : Error Message that should contains placeholders for the value(%s) and number of occurrences (%d)

    returns

    List of tuples contains for ea ch duplicate the number of occurrences

Inherited from AnyRef

Inherited from Any

Ungrouped