Validation Free and Replication Robust Volume-based Data Valuation

Dec 6, 2021



Data valuation arises as a non-trivial challenge in use cases such as collaborative data sharing, data markets and etc. The value of data is often related to the learning performance, e.g. validation accuracy, of the model trained on the data. While intuitive, this methodology introduces a high coupling between data valuation and validation, which may not be desirable in practice. For instance, data providers may disagree on the choice of the validation set, or the validation set may be (statistically) different from the actual application. A separate but practical issue is data replication. If some data points are valuable, a dishonest data provider may offer a dataset containing replications of these data points, trying to exploit the valuation to get a higher reward/payment. Based on the ordinary least squares framework, our data valuation method does not require validation, and still provides a useful connection between the value of data and learning performance. In particular, we utilize the volume of the data matrix (determinant of its left Gram), thus able to provide an intuitive interpretation of the value of data via the diversity in the data. Furthermore, we formalize the robustness to data replication, and propose a robust volume valuation with robustness guarantees. We conduct extensive experiments to demonstrate its consistency and practical advantages over existing baselines.


About NeurIPS 2021

Neural Information Processing Systems (NeurIPS) is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers. Following the conference, there are workshops which provide a less formal setting.

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%


Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow NeurIPS 2021