checksum_record Transform¶
The checksum_record
transform creates a checksum based on values in a
record and saves it in the factorytx_checksum
column. This field will be
saved outside of the data.fieldvalues
of a record and moved outside of a
record’s data
structure upon being received by a MDP environment.
It is recommended to place this transform last in the list, so that the
record has all of its data as well as an explicit timestamp
column.
- Example:
If we want to create a checksum from the
asset
andtimestamp
columns of a record, our configuration will look something like this:{ "transform_name": "Checksum", "transform_type": "checksum_record", "filter_stream": ["*"], "record_keys": ["asset", "timestamp"] }
Configuration:
Required and optional properties that can be configured for a hash_record transform.
transform_name: Unique name for the transform.
transform_type: Type of transform to apply. Should be
checksum_record
.filter_stream: List of data streams to transform. Each stream can either be
*
(all) orasset:stream
.record_keys: List of keys in a record for generating a checksum of the record. Supported keys are:
asset
,stream_type
,timestamp
, andfieldvalues
(values in a record). Please refer to Caveats section for more implementation details.
Caveats:
record_keys order: The order of the specified keys in the
record_keys
config setting will affect the checksum. For example, checksums from [‘asset’, ‘stream_type’] will not equal checksums from [‘stream_type’, ‘asset’].timestamp hash: The value in the
timestamp
column will be converted into a string for hashing. The data type of the value will affect the hash value. For example, a datetime object (datetime(2019, 9, 30, 12, 12, 12)
) will not yield the same hash as its equivalents in ISO-8061 string format ('2019-09-30T12:12:12'
) and epoch time format (1569845532.0
). If there is no timestamp value or if it isNaT
, the timestamp will not be included in the checksum.fieldvalues hash: Values in a record will be sorted in alphanumerical order by their keys and then serialized into a JSON string, using this conversion table.