Calculate uniqueness inter and intra cluster records to measure the correctness of DME output
As with the Spark v 2.3.0 or later version, has a API which can calculate measure of Clustering Prediction Score - It will nice to integrate this in our DME plugin as a confidence scoring model out of box. More details about the Spark API : https:...
over 2 years ago
in Data Quality
On the Backlog
Ability to parse multi level complex json file in ZDP
Current json serde unable to parse multi level json. can we add openx serde which is available as open source. details: open source link : https://github.com/rcongiu/Hive-JSON-Serde jar files link: http://www.congiu.net/hive-json-serde/1.3.8/cdh5/
Need ability to have automated schema mapping of data being ingested
PS has implemented a mechanism of schema mapping of the data being ingested. This allows columns to be not fixed within a data-file and it gets assigned to the right position within the Hive table at run-time. It uses column-headers to determine t...
almost 3 years ago
in Data Ingestion
Provide ability to comment and crowdsource business information on ZDP entities/fields.
Business user/Data Stewards would like to collaborate and share their comments and findings on the data sets. They would like review these comments before underlying data can be provisioned to downstream systems. Provide ability to capture user fe...
Scenario #1: For AWS GovCloud, it is mandatory to supply the S3 region (or endpoint) to successfully access any S3 buckets. In an S3 connection, we can easily provide this information, and hence file ingestion using a BDCA agent is possible. Howev...
Display ingestion history for db wizard, db import created entities in entity view ingestion history tab when Display is 'Ingested File Size Per Day'
Display ingestion history for db wizard, db import created entities in entity view ingestion history tab when Display is 'Ingested File Size Per Day' Current behavior: The 'Ingested File Size Per Day' is shown only for entities associated with fil...
Ability to auto tag entities based on the vocabulary/business glossary
Adding labels to the entity is tedious and time consuming effort. provide capability to auto tag based on the business glossary. or by referring business vocabulary. some of the competitors are leveraging modified Maui - Multi-purpose automatic to...
Enable data profiling on Hive Views from profile of underlying Tables
Enable data profiling on views in Zaloni via linking with table instead of physically creating data profiling From customer: Views should not have their own profiling but profile information should come from the original table profiled from where ...
almost 3 years ago
On the Backlog
Customer would like to have details on rows counts ingested from files. Today, ZDP 5.0.2 displays the File Size per Day and File Count per Day. User would like to validate that these match in both a visual and cumulative report. Suggestion: Provid...
Incremental profiling of data When an incremental ingestion (adding data to an existing entity) happens, can we provide a profile of the entire entity by only profiling the new data? For this customer, we will also need to make the incremental pro...
over 3 years ago
On the Roadmap