UPDF AI

Building Legal Datasets

Jerrold Soh

2021 · ArXiv: 2111.02034
arXiv.org · 4 Citations

TLDR

Key legal obligations surrounding ML datasets are reviewed, the practical impact of data laws on ML pipelines is examined, and a framework for building legal datasets is offered.

Abstract

Data-centric AI calls for better, not just bigger, datasets. As data protection laws with extra-territorial reach proliferate worldwide, ensuring datasets are legal is an increasingly crucial yet overlooked component of ``better''. To help dataset builders become more willing and able to navigate this complex legal space, this paper reviews key legal obligations surrounding ML datasets, examines the practical impact of data laws on ML pipelines, and offers a framework for building legal datasets.

Cited Papers
Citing Papers