Data at Risk: The Governance Challenge of Generative AI

Data at Risk: The Governance Challenge of Generative AI, ComplexDiscovery
Image: Rob Robinson, ComplexDiscovery with AI.

[EDRM Editor’s Note: This article was first published here on June 16, 2025, and EDRM is grateful to Rob Robinson, editor and managing director of Trusted Partner ComplexDiscovery, for permission to republish.]


ComplexDiscovery Editor’s Note: As Generative AI technologies increasingly shape enterprise workflows and public decision-making, the governance of data—its origins, usage, and accountability—faces mounting pressure. This article draws from the European Commission’s comprehensive 2025 report, Generative AI Outlook Report – Exploring the Intersection of Technology, Society and Policy (JRC142598), to examine the legal, ethical, and operational challenges emerging at the crossroads of data sovereignty and artificial intelligence.

This narrative explores how models trained on vast, minimally curated datasets are exposing gaps in frameworks like the GDPR, particularly when it comes to consent, purpose limitation, and accountability. It delves into the increasing difficulty of tracking data provenance and ensuring compliance in systems where transparency often ends at the model boundary.

For professionals in privacy, compliance, and information governance, the article offers a timely and critical lens to reconsider whether current data policies can withstand the scale and opacity of generative AI. It argues for a shift from static regulation to proactive, lifecycle-oriented governance practices that better reflect the realities of AI-driven data use.


In today’s AI-driven digital environment, data has often been likened to oil—valuable, extractable, and central to innovation. Yet in the context of generative artificial intelligence (GenAI), data is something far more unstable—less a commodity and more a catalytic force capable of reshaping legal norms, institutional governance, and the meaning of consent itself. At the heart of this transformation is a question that the European Commission’s Generative AI Outlook Report poses, directly and indirectly: Who controls the data that trains the machines now shaping our society?

For professionals tasked with stewarding sensitive data—chief privacy officers, information governance strategists, and compliance experts—GenAI introduces a tangle of dilemmas not easily solved by traditional policy frameworks. While the General Data Protection Regulation (GDPR) has stood as a pillar of European digital rights, it was crafted before the emergence of models capable of learning from and generating content with massive unstructured datasets scraped from public and semi-public domains. As these models become more embedded in both public services and enterprise software, the limitations of current law become increasingly visible.

Just because data can be accessed does not mean it was offered freely or that its reuse was understood or consented to.

Rob Robinson, Editor and Managing Director, ComplexDiscovery.

The European Union’s regulatory ecosystem now includes the AI Act, designed to promote trustworthy and ethical AI systems, and it is meant to complement the GDPR. But complementarity, in principle, does not always mean clarity in practice. The report underscores a critical disconnect between how data is collected and how it is ultimately used. For example, while consent may have been given for a particular use of personal data—say, for customer service or medical recordkeeping—GenAI models may repurpose that data during training in ways the original subject neither anticipated nor approved.

This disjunction between intent and application reveals the deep structural problem facing modern data governance: the lack of transparency in how training data is selected, labeled, and retained. Unlike traditional databases, where records can be audited and traced, GenAI models are trained on inputs that often lack documented provenance. Once ingested into a model, this data is transformed, abstracted, and distributed across a statistical lattice that defies straightforward tracing. The resulting system is not a ledger of inputs but an emergent capability that can reproduce sensitive information—sometimes without even being prompted to do so.

That capability has already drawn legal and regulatory scrutiny. Cases against companies like OpenAI and Meta are exploring whether scraping publicly accessible data for training purposes constitutes a breach of privacy law. The JRC report cites mounting concerns about whether publicly available data can be assumed to be lawful training material. Just because data can be accessed does not mean it was offered freely or that its reuse was understood or consented to. Legal scholars call this the lawful-unlawful paradox: training that complies with the letter of access law may still violate the spirit or application of data protection norms.

The report further highlights a fundamental tension within modern AI development—between the need for massive datasets and the legal principle of data minimization. GenAI thrives on diversity and scale. The more examples it can digest, the more fluent and flexible it becomes. But this hunger for data runs directly counter to GDPR’s insistence on using only what is necessary for a defined purpose. GenAI’s general-purpose nature breaks the mold, requiring a fresh debate on what constitutes acceptable data use when the boundaries of function are fluid.

Compounding this is the issue of accountability. Traditional data systems typically assign responsibility to a clear data controller. But when GenAI is involved, roles are diffuse. Is the developer responsible for the training data? What about the vendor who fine-tunes the model? Or the enterprise client who integrates it into their services? The JRC report cautions that our current understanding of accountability may be insufficient for AI systems that morph through use and scale without direct human oversight.

The systems we build must account for context, consent, and consequence—not just compliance.

Rob Robinson, Editor and Managing Director, ComplexDiscovery.

Emerging concepts like “data visiting” aim to reduce the exposure of sensitive information by moving algorithms to where the data resides rather than copying data into centralized repositories. Similarly, the report recommends the adoption of FAIR principles—ensuring data is findable, accessible, interoperable, and reusable—as a way to align governance practices with modern data ecosystems. These efforts suggest that governance must evolve from static compliance checklists to dynamic lifecycle strategies that address risks at the point of data collection, during model training, and long after deployment.

Beyond compliance, there is a broader societal implication. The opacity of GenAI systems exacerbates the existing trust deficit between institutions and the public. If people cannot understand how their data is being used—or if they cannot even discover that it has been used at all—how can they meaningfully participate in digital society? This question is not just regulatory; it is democratic.

The future of information governance lies in shifting from reactive enforcement to proactive design. The systems we build must account for context, consent, and consequence—not just compliance. As GenAI technologies become fixtures in everything from legal contracts to healthcare diagnostics, the frameworks we develop today will determine not only how data is protected but whether the people it represents are truly respected.

Read the original article here.


About ComplexDiscovery OÜ

ComplexDiscovery OÜ is a highly recognized digital publication providing insights into cybersecurity, information governance, and eDiscovery. Based in Estonia, ComplexDiscovery OÜ delivers nuanced analyses of global trends, technology advancements, and the legal technology sector, connecting intricate issues with the broader narrative of international business and current events. Learn more at ComplexDiscovery.com.

News Sources

Additional Reading


Source: ComplexDiscovery OÜ
Assisted by GAI and LLM Technologies per EDRM GAI and LLM Policy.

Author

  • Rob Robinson

    Rob Robinson is a technology marketer who has held senior leadership positions with multiple top-tier data and legal technology providers. He writes frequently on technology and marketing topics and publish regularly on ComplexDiscovery.com of which he is the Managing Director.

    View all posts