Effective AI Assessments, Verification and Assurance: Establishing the Foundations for Responsible Confidence in AI

Abstract

The panel explored how to build a trustworthy ecosystem for AI assessment, governance and risk management. After a brief framing of three core assessment types—governance, conformity and performance—the discussion moved through market‑driven assurance, the AI Verify “Global Assurance Sandbox,” global‑standard‑setting work by the IPIE, sector‑specific hurdles identified by academia, industry demand for third‑party assurance, the professionalization agenda of the ACCA, and practical challenges faced in the Global South. The session closed with an open dialogue on road‑maps, standards, tooling, capacity‑building and the need for coordinated, bottom‑up and top‑down approaches.

Detailed Summary

  • Purpose – To set the scene for why reliable assessment and reporting are essential for responsible AI.
  • Key Insight – Assurance must complement regulation; it should be market‑driven as well as policy‑driven.
  • Types of Assessments Introduced
    1. Governance assessments – evaluation of internal AI governance structures.
    2. Conformity assessments – check against laws, voluntary standards, contractual obligations.
    3. Performance assessments – measurement against predefined quality and performance metrics.
  • Assessment Essentials – purpose, subject matter, methodology, criteria, qualifications of assessors, and conflict‑of‑interest safeguards.

2. Market‑Driven Assurance & the Global Assurance Sandbox (Lee Wan Sie)

  • Analogy to Aviation – Safety regulations exist, but technical assessments ensure compliance; AI needs a similar layered approach.
  • Sector‑Specific Testing – Financial‑advice chatbots vs. hospital triage tools require distinct test suites.
  • Sandbox Concept – A global assurance sandbox where:
    • Third‑party testers, deployers, and AI Verify act as a collaborative test‑bed.
    • ~20‑30 live use‑cases are already undergoing pilot testing.
    • Results will feed into policy guidelines and standards.
  • Call to Action – Audience invited to join the sandbox; the AI Verify website hosts details.

3. IPIE Perspective – Global Standards & Auditing (Dr. Philip Howard)

  • Current Work – Panels on AI & peacebuilding, AI & financial inclusion; early work on global AI‑auditing standards (model‑cards, data provenance).
  • Key FindingMulti‑stakeholder cooperation (industry, academia, regulators) is essential; purely self‑regulatory or regulator‑only models are insufficient.
  • Open Question – Frequency of audits and defining “high‑risk” AI systems remain unresolved.
  • Message – A collective conversation is needed to set shared language, expectations, and trust.

4. Sector‑Specific Challenges (Prof. Balaraman Ravindran)

  • Sector‑Tailored Governance – Each regulator must examine AI‑specific gaps rather than imposing a monolithic framework.
  • Illustrative Case Study – Fetal‑age estimation models trained on Western datasets under‑estimated Asian fetuses by 30‑40 %.
    • Technical success in controlled labs did not translate to field conditions because head‑circumference measurement was unrealistic on low‑quality scans.
    • Solution: Re‑engineer features that are observable in real‑world settings and co‑design with clinicians.
  • LessonAssurance must involve end‑users; otherwise, metrics are meaningless in practice.

5. Industry View – EY’s Demand for Assurance (Anne McCormick)

  • Shift from Adoption to Embedding – Companies now focus on operationalizing AI and proving value.
  • Stakeholder Pressures – Boards, investors, insurers, and customers increasingly ask for transparent, trustworthy AI.
  • Governance vs. Compliance vs. Strategic Value
    • Governance – ISO 42001‑type frameworks for AI governance.
    • Compliance – Meeting regulatory obligations.
    • Strategic Advantage – Using third‑party assessments as a brand differentiator and to set internal KPIs.
  • Regional Observation – Europe is moving toward a pragmatic mix of high‑risk regulation and flexibility for lower‑risk AI.

6. Professionalising AI Assurance (Narayanan Vaidyanathan, ACCA)

  • Goal – Turn AI assurance into a recognised professional discipline.
  • Three Pillars
    1. Common definition – What is being tested, how, and the pass/fail outcome.
    2. Terminology harmonisation – Clarify “assessment”, “audit”, “risk review”, etc.
    3. Confidence levels – Different tiers of assurance (quick‑check vs. high‑stakes audit).
  • Ethics as Core – Assurance without ethical credibility loses trust.
  • Skills & Education – Embedding AI‑assurance modules in ACCA curricula; leveraging transferable skills from traditional audit and risk assurance.

7. Global‑South Realities (Dr. Jibu Elias)

  • Context – Many AI deployments occur in public hospitals, schools, fintech SMEs, which lack mature compliance infrastructures.
  • Non‑Technical Barriers
    • Shortage of trained evaluators.
    • Fragmented standards across jurisdictions.
    • Limited incentives for data sharing.
    • Lack of affordable testing tools.
  • Road‑Map Proposal
    1. Common, open‑source tool repository (especially for low‑resource languages & domains).
    2. Cross‑jurisdiction calibration for interoperability.
    3. Embedding evaluation requirements in public procurement and university curricula.

8. Open Discussion & Emerging Road‑Map Ideas (All Panelists)

ContributorCore Recommendation
Lee Wan SieContinue expanding the sandbox; collect empirical data to inform standards.
Phil HowardBottom‑up data gathering to set risk‑based audit thresholds; keep dialogue with policymakers.
RavindranBuild Global‑South‑focused benchmark datasets (e.g., language‑specific profanity lexicons).
Anne McCormickLeverage assurance as a competitive differentiator; develop sector‑specific KPI frameworks.
NarayananAlign AI‑assurance with existing audit standards (e.g., IESBA, ISAE) to avoid reinventing the wheel.
JibuPrioritise capacity‑building in universities and civil society; create open‑source tooling.
Audience (summarised)Need clear pass/fail thresholds for specific use‑cases (e.g., 90 % accuracy for financial advice bots).
  • Key Open Questions – How to define “high‑risk” AI across domains? What is the appropriate frequency of audits? How to ensure independence while keeping costs affordable for smaller organisations?

9. Closing Remarks & Announcements

  • QR Code / Report – A joint EY‑ACCA report (citing IMDA and IPIE work) was advertised for download.
  • Invitation – Participants were encouraged to join the AI Verify community, contribute to the sandbox, and stay engaged in the evolving AI‑assessment ecosystem.

Key Takeaways

  • Three assessment pillars—governance, conformity, performance—must be clearly scoped and reported with purpose, methodology, criteria, and assessor qualifications.
  • Market‑driven assurance complements regulation; a global sandbox provides a practical arena for testing, learning, and standard‑setting.
  • Multi‑stakeholder collaboration (industry, academia, regulators, civil society) is essential to create credible, adaptable AI‑audit standards.
  • Sector‑specific validation is critical: models must be tested in the real-world conditions of the end‑user, not only in controlled labs.
  • Industry demand is shifting from compliance‑only to strategic assurance, using third‑party audits for brand trust and competitive advantage.
  • Professionalisation (ACCA) hinges on a shared definition of assurance, harmonised terminology, and tiered confidence levels, underpinned by ethics.
  • Global‑South challenges centre on lack of trained evaluators, fragmented standards, and resource‑scarce tooling; open‑source repositories and capacity‑building are proposed solutions.
  • Road‑map consensus: expand sandbox pilots, develop open‑source toolkits, create cross‑jurisdiction benchmarks, and embed AI‑assessment training in universities and professional curricula.
  • Open questions remain on setting high‑risk thresholds, audit frequency, and ensuring independent yet affordable assurance for smaller actors.

These points capture the collective direction discussed at the panel: building an interoperable, trustworthy AI‑assessment ecosystem that balances regulatory rigor, market incentives, and practical feasibility across diverse global contexts.

See Also: