Skip to Main Content
Zaloni Ideas
Status Future consideration
Categories Data Ingestion
Created by Adi Bandaru
Created on Feb 18, 2020

Support record-level insert, update, and delete on DFS using apache HUDI

Apache Hudi - HADOOP UPSERT AND INCREMENTAL is an open-source data management framework used to simplify incremental data processing and data pipeline development. Apache Hudi enables you to manage data at record level in DFS storages to simplify Change Data Capture (CDC) and streaming data ingestion, and provides a framework to handle data privacy use cases requiring record level updates and deletes. Data sets managed by Apache Hudi are stored in DFS using open storage formats, and integrations with Presto, Apache Hive, Apache Spark, and AWS Glue Data Catalog give you near real-time access to updated data using familiar tools. A solution to address GDPR related use cases.

https://hudi.apache.org/

  • PRODUCT MANAGEMENT RESPONSE
    Feb 19, 2020

    We may reconsider this enhancement request at a future date.

  • Attach files