Class DatasetProfile

  • All Implemented Interfaces:
    java.io.Serializable

    public class DatasetProfile
    extends java.lang.Object
    implements java.io.Serializable
    Representing a DatasetProfile that tracks
    See Also:
    Serialized Form
    • Constructor Detail

      • DatasetProfile

        public DatasetProfile​(@NonNull
                              @NonNull java.lang.String sessionId,
                              @NonNull
                              @NonNull java.time.Instant sessionTimestamp,
                              @Nullable
                              java.time.Instant dataTimestamp,
                              @NonNull
                              @NonNull java.util.Map<java.lang.String,​java.lang.String> tags,
                              @NonNull
                              @NonNull java.util.Map<java.lang.String,​ColumnProfile> columns)
        DEVELOPER API. DO NOT USE DIRECTLY
        Parameters:
        sessionId - dataset name
        sessionTimestamp - the timestamp for the current profiling session
        dataTimestamp - the timestamp for the dataset. Used to aggregate across different cadences
        tags - tags of the dataset
        columns - the columns that we're copying over. Note that the source of columns should stop using these column objects as they will back this DatasetProfile instead
      • DatasetProfile

        public DatasetProfile​(@NonNull
                              @NonNull java.lang.String sessionId,
                              @NonNull
                              @NonNull java.time.Instant sessionTimestamp,
                              @NonNull
                              @NonNull java.util.Map<java.lang.String,​java.lang.String> tags)
        Create a new Dataset profile
        Parameters:
        sessionId - the name of the dataset profile
        sessionTimestamp - the timestamp for this run
        tags - the tags to track the dataset with
      • DatasetProfile

        public DatasetProfile​(java.lang.String sessionId,
                              java.time.Instant sessionTimestamp)
    • Method Detail

      • getColumns

        public java.util.Map<java.lang.String,​ColumnProfile> getColumns()
      • withTag

        public DatasetProfile withTag​(java.lang.String key,
                                      java.lang.String value)
      • withMetadata

        public DatasetProfile withMetadata​(java.lang.String key,
                                           java.lang.String value)
      • withAllMetadata

        public DatasetProfile withAllMetadata​(java.util.Map<java.lang.String,​java.lang.String> metadata)
      • track

        public void track​(java.lang.String columnName,
                          java.lang.Object data)
      • track

        public void track​(java.util.Map<java.lang.String,​?> columns)
      • withClassificationModel

        public DatasetProfile withClassificationModel​(java.lang.String prediction,
                                                      java.lang.String target,
                                                      java.lang.String score,
                                                      java.lang.Iterable<java.lang.String> additionalOutputFields)
        Returns a new dataset profile with the same backing datastructure. However, this new object contains a ClassificationMetrics object
        Returns:
        a new DatasetProfile object
      • withClassificationModel

        public DatasetProfile withClassificationModel​(java.lang.String prediction,
                                                      java.lang.String target,
                                                      java.lang.String score)
      • withRegressionModel

        public DatasetProfile withRegressionModel​(java.lang.String prediction,
                                                  java.lang.String target)
      • withRegressionModel

        public DatasetProfile withRegressionModel​(java.lang.String prediction,
                                                  java.lang.String target,
                                                  java.lang.Iterable<java.lang.String> additionalOutputFields)
      • toSummary

        public com.whylogs.core.message.DatasetSummary toSummary()
      • toChunkIterator

        public java.util.Iterator<com.whylogs.core.message.MessageSegment> toChunkIterator()
      • merge

        public DatasetProfile merge​(@NonNull
                                    @NonNull DatasetProfile other)
        Merge the data of another DatasetProfile into this one.

        We will only retain the shared tags and share metadata. The timestamps are copied over from this dataset. It is the responsibility of the user to ensure that the two datasets are matched on important grouping information

        Parameters:
        other - a DatasetProfile
        Returns:
        a merged DatasetProfile with summed up columns
      • toProtobuf

        public com.whylogs.core.message.DatasetProfileMessage.Builder toProtobuf()
      • writeTo

        public void writeTo​(java.io.OutputStream out)
                     throws java.io.IOException
        Throws:
        java.io.IOException
      • toBytes

        public byte[] toBytes()
                       throws java.io.IOException
        Throws:
        java.io.IOException
      • fromProtobuf

        @Nullable
        public static DatasetProfile fromProtobuf​(@Nullable
                                                  com.whylogs.core.message.DatasetProfileMessage message)
      • parse

        public static DatasetProfile parse​(java.io.InputStream in)
                                    throws java.io.IOException
        Throws:
        java.io.IOException