Zaloni Ideas
Status On the Backlog
Categories Data Quality
Created by Jitul Nath
Created on Apr 12, 2020

Calculate uniqueness inter and intra cluster records to measure the correctness of DME output

As with the Spark v 2.3.0 or later version, has a API which can calculate measure of Clustering Prediction Score - It will nice to integrate this in our DME plugin as a confidence scoring model out of box.

More details about the Spark API :

https://spark.apache.org/docs/latest/ml-clustering.html

Algorithm

https://en.wikipedia.org/wiki/Silhouette_(clustering)

Customer Impact Major inconvenience
  • Attach files