Digital Fairness, AI Self-Regulation and the Limits of Voluntary Safety Framework

This post was written by EFA Vice Chair Julian Watchorn and EFA board member Kirsten Frederiksen.

1. The Limits of Corporate Self-Regulation

A longstanding axiom in public policy is that corporate self-regulation is ineffective when
vast sums of money are at stake. This has been demonstrated repeatedly across sectors such as finance, environmental protection, tobacco, gambling, and digital platforms. Where commercial incentives strongly favour rapid growth, market dominance, or first-mover advantage, voluntary safeguards tend to weaken, drift over time, or be selectively applied. In technology and AI markets, these risks are intensified.

Products can be deployed at a global scale almost instantly, competitive pressure rewards speed over caution, and many harms emerge indirectly or only after widespread adoption. In this context, internal ethical principles and safety frameworks do not operate as neutral constraints. They are embedded within commercial governance structures and remain vulnerable to reinterpretation when they conflict with strategic or financial objectives.

2. AI Safety Frameworks and the Problem of Moving Goalposts

Recent developments in AI governance provide a concrete illustration of why reliance on
voluntary, company-defined safety frameworks is insufficient. In many company-defined safety frameworks, a Critical Capability Level (CCL) denotes an internally defined threshold at which an AI system is deemed to pose a heightened risk and therefore requires additional safeguards.

In contrast, uplift levels are predictions of how much an AI system “uplifts” or assists a specific action and thus denote progressively more stringent risk-control categories as capability increases. Prior to the release of Gemini Pro 3, Google updated its internal Frontier Safety Framework (FSF). Under the earlier version of the FSF, released in February 2025 (FSF v2), Gemini Pro 3 may have reached Google’s CCL for cybersecurity risks, specifically Cybersecurity CCL Uplift Level 1.

However, before Gemini Pro 3’s release in November 2025, the FSF itself was revised in September 2025 (FSF v3), and the evaluation benchmarks were also updated:

  • Change in definition of Cybersecurity CCL Uplift Level 1: The definition shifted from a focus on whether an AI system increased capability and cost-reduction thresholds (FSF v2) to a focus on severe downstream outcomes or harms (FSF v3). Compare “significantly assist with high-impact cyber attacks, resulting in overall cost/resource reductions of an order of magnitude or more” (FSF v2) with “sufficient uplift with high-impact cyber attacks for additional expected harm at severe scale” (FSF v3).
  • Change in evaluation benchmark for Cybersecurity CCL Uplift Level 1: The evaluation benchmark used as a proxy for uplift capability was also updated. The “hard v1” set of challenges used to test Gemini Pro 3 appears to be a subset of the “difficult” initial challenges from earlier FSFs, which Google had reported on previously (for example, testing Gemini Pro 2.5 Deep Think in August 2025 under FSF v2). Gemini Pro 3 reportedly solved 11 of 12 of these “hard v1” cybersecurity challenges, while the earlier Gemini Pro 2.5 models reportedly solved only 6 of 12 challenges. A new benchmark, “v2” (FSF v3), was introduced, with Google stating that, “anticipating that these [challenges] would soon become inadequate for measuring models’ growing capabilities, we worked … to develop a new set of harder, more realistic challenges.” Gemini Pro 3 reportedly solved 0 of 13 of these new cybersecurity challenges.

The practical effect of these changes is that an AI system that may have crossed a critical internal risk threshold earlier in the year could later be classified as not meeting that threshold. This is not because the model became safer, but because the definition and measurement of risk changed.

Even when such changes are made in good faith, the fact that thresholds, definitions, and benchmarks are internally controlled means that external stakeholders cannot distinguish genuine safety improvements from reclassification.

This demonstrates how internal AI risk assessments can evolve in ways that reclassify what would previously have been considered a critical capability as acceptable, without any corresponding reduction in underlying risk.

3. Structural Risks of Self-Regulated AI Safety

This example highlights three core regulatory concerns.

The absence of standardised thresholds.

There is no shared or enforceable definition of what constitutes a “critical AI capability”.
Companies can alter how risk is defined, scored, or tested, making meaningful comparison
across the industry extremely difficult.

Misaligned incentives.

Internal safety frameworks operate within commercial decision-making structures. Where a
model is strategically or financially important, it may be easier to revise the framework than
to delay or limit deployment. In these circumstances, safety criteria adapt to the product,
rather than the product adapting to safety requirements.

A regulatory blind spot.

Where governments do not require disclosure of internal risk frameworks, explanations for
changes to those frameworks, or longitudinal reporting of capability uplift levels, public
authorities lack visibility into whether safety standards are being strengthened or diluted over
time. A “low-risk” classification may reflect definitional change rather than a genuine
reduction in risk.

4. Implications for National AI Policy

Australia has seen similar dynamics play out in other sectors, where voluntary compliance
was prioritised over enforceable standards. These dynamics are already generating public
debate: Electronic Frontiers Australia has publicly criticised the government’s National AI
Plan
for sidelining ex ante safeguards and prioritising industry opportunity over safety and
fundamental rights.

Image credit: Unsplash