We have observed that when we run FDQ for multiple rows with multiple rules, the report generated is huge.
We have been reported by a client that for a file of size about 1.5GB and 2.5 crore records, the fdq report generated was of about 350GB which is anyways not expected.
So, after discussion with client, we came to know that they were only interested with failed records or those records which failed a rule. But as of now, it shows the statu of each row
Hence, i propose that we should have a way to limit this reporting. We can provide a flag to generate report of only the Good or Bad data or both. Also, the report should show limited information and not huge json as it does currently.
For details, please check the attached Jira ticket.
|Customer Impact||Major inconvenience|