Building High-Quality AI Systems for Education: From Innovation to System-Wide Delivery

Abstract

The panel explored how governments and ed‑tech firms can design, evaluate, and scale AI‑driven tools for education while maintaining rigorous quality assurance. Speakers reviewed existing AI‑ethics standards (UNICEF, UK government), introduced a new QA‑fund and benchmark initiatives, and described the ConveGenius “SwiftChat” platform that reaches 150 million children via WhatsApp. Marc Shotland outlined the product‑ and system‑level “enoughs” (good‑enough, big‑enough, simple‑enough, cheap‑enough) and highlighted evidence that AI tutoring can double learning gains. Kalpana Sharma emphasized teacher‑education needs, inclusive AI solutions for sports and physical‑education learners, and the importance of nuanced impact‑evaluation. The session closed with three memorable take‑away lines about benchmarks, scaling, and the ultimate goal of safer, better learning for every child.

Detailed Summary

  • The moderator opened with a rapid “thank you” sequence, then outlined the growing ecosystem of AI quality‑assurance standards:

    • Tectulna standard (India, CSF Foundation)
    • UNICEF AI for Good framework
    • UK government AI standards focusing on manipulation and mental‑health impacts.
  • The panel agreed that existing standards are a useful baseline, but gaps remain—particularly around technical evaluation layers that can translate standards into actionable thresholds for developers.

  • Announcement – QA Fund

    • The panel announced the launch of a QA Fund that will finance the creation of new benchmarks, rapid field evaluations, and open‑source evaluation components.
    • Partnerships highlighted: IDinsight (field‑level evaluation expertise), CGD (digital cross‑sectoral playbook), and the Dahlberg network (to be launched the next day at the summit).

Key Insight: Creating a shared evaluation infrastructure, rather than isolated pilots, is essential for ecosystem‑wide quality improvement.


2. Jairaj Bhattacharya – Scaling AI‑Enabled EdTech with ConveGenius

2.1 The “SwiftChat” Platform

  • Reach: 150 million children across 800 000 schools – roughly ½ of India’s schools.
  • Core Idea: Bring AI to the “last mile” using WhatsApp‑style messaging, the most ubiquitous app in Indian smartphones.

2.2 Why Conversational AI?

ReasonExplanation
High conversionUsers naturally reply to messages → ~99 % response rate.
24/7 AvailabilityAI tutors can answer questions at any hour, including night‑time homework help.
Familiar UILeveraging a platform children already use reduces friction.

2.3 From Messaging to Pedagogical Engineering

  • Quality‑first design – Guardrails, content policies, bias mitigation, and presentation standards were built into the platform from day one.
  • Learning metric – The “North Star” is not just access (users logged in) but measurable learning gains (e.g., standard‑deviation improvements).
  • Pedagogical challenge: Decide what to show and when, given limited classroom time. This requires context‑aware sequencing (state‑level curricula, teacher schedules, prior student interactions).

2.4 Technical Scaling Challenges

ChallengeSolution
WhatsApp Business API rate limits (30‑40 k requests/min)Developed a native messaging protocol that can handle 150‑200 k+ requests/min, supporting synchronous (voice/video) and asynchronous chat.
Data compliance (India’s DPDP Act)Designed a modular architecture separating application, transactional data, and knowledge layers, allowing government‑hosted storage of personally identifiable data.
State‑wise heterogeneityBuilt an unbundled knowledge‑infrastructure that can be customized per state (e.g., Gujarat vs. Andhra Pradesh).

Key Insight: Scaling AI in a national education system requires both a robust messaging backbone and a flexible, compliance‑ready architecture.


3. Marc Shotland – Evaluation Frameworks & System‑Level Enablers

3.1 The “Enough s” Test (adapted from Kevin Starr, Malago Foundation)

  1. Good‑Enough – Does the product produce a demonstrable impact?
    • Cited a (still‑unpublished) study by Michael Kramer: ConveGenius learners doubled learning outcomes over 17 months versus control.
  2. Big‑Enough – Is the need large enough?
    • India’s school‑age population guarantees massive demand.
  3. Simple‑Enough – Is the solution operable for teachers and administrators?
    • Complexity kills scalability; the solution must fit into existing workflows with minimal friction.
  4. Cheap‑Enough – Can per‑student cost be sustainable at scale?
    • LLM token costs remain a hurdle; suggests building smaller, locally‑hosted models to reduce reliance on costly cloud APIs.

3.2 Product‑Side Conditions

  • Evidence base: Robust RCTs or quasi‑experiments needed to claim “good‑enough”.
  • Cost‑efficiency: Exploration of distilled language models or hybrid approaches (rule‑based + LLM) to keep token usage low.

3.3 System‑Side Conditions

ConditionDescription
CoherenceAlignment of teacher incentives (curriculum coverage) with AI‑driven pacing (learning‑outcome focus).
Social Norms & Peer EffectsTeachers’ adoption depends on perceived stability of interventions; frequent short‑lived pilots breed skepticism.
Top‑down & Grassroots MomentumPolicy endorsement plus local champion networks are essential for sustained uptake.
Political & Fiscal AlignmentBudgetary commitments, data‑privacy regulations, and cross‑sector coordination (education, IT, finance).

Key Insight: A product can be technically brilliant, but systemic coherence, affordable cost structures, and entrenched social norms dictate real‑world scale.


4. Kalpana Sharma – AI for Teacher Education & Inclusive Physical‑Education

4.1 Role of Teacher‑Education Institutions

  • As Vice‑Chancellor of a government university, Sharma highlighted the need for AI‑augmented teacher‑training curricula, especially for physical‑education, sports, yoga, and sports‑management educators.

4.2 AI‑Driven Solutions for Athletes & Teachers

  • Personalised training programs: AI can track daily performance, analyse progress, and deliver rapid feedback to budding athletes.
  • Impact evaluation: Emphasised that any AI tool must be rigorously evaluated, much like the “teacher‑register” component of school report cards (qualitative narrative alongside grades).

4.3 Real‑World Example – Raj Samand (Rajasthan)

  • Described a remote school for children with severe disabilities (99 % cerebral palsy, deaf‑blind).
  • A dedicated teacher travelled on a scooter wearing a gungat (veil) to reach the school, using WhatsApp to field student queries.
  • The children successfully passed examinations, illustrating how low‑tech AI‑enabled communication can bridge severe accessibility gaps.

4.4 Inclusive & Sensitive Design

  • Calls for AI products that respect cultural norms, gender sensitivities, and disability requirements.
  • Stressed the importance of human‑teacher connection; AI should augment—not replace—the personal narrative a teacher builds with each child.

Key Insight: AI in physical‑education and special‑needs contexts must be context‑aware, culturally sensitive, and evaluated with both quantitative outcomes and qualitative teacher narratives.


5. Closing Remarks

  • The moderator used a ChatGPT‑generated “memorable line” exercise to end the session. The three lines delivered were:

    1. “Benchmarks are not about winners and losers. They’re about making decisions that protect children and improve teaching and learning.”
    2. “Scaling isn’t just a bigger pilot. It’s about the system, not the product.”
    3. “The goal isn’t AI in classrooms. It’s better learning safely for every child.”
  • The session concluded with thanks and a rapid roll‑call of “thank yous” from the panel.

Key Takeaways

  • Standards are a foundation, but actionable technical evaluation layers are still missing; the new QA‑Fund aims to fill that gap.
  • ConveGenius’ SwiftChat reaches 150 M children via WhatsApp, proving that a messaging‑first, conversational AI can achieve population‑scale access.
  • Scalable AI requires a robust messaging backbone and modular, compliance‑ready architecture to handle Indian data‑privacy laws and state‑level diversity.
  • Evidence matters: an (unpublished) trial shows learning outcomes doubled in the ConveGenius group, satisfying the “good‑enough” criterion.
  • Product‑side “enoughs” (good, big, simple, cheap) provide a concise checklist for evaluating whether an AI solution can be scaled.
  • System‑level coherence (alignment of teacher incentives, policy, and finance) and social‑norm alignment are essential for lasting adoption.
  • AI can augment teacher education, especially in physical‑education and special‑needs contexts, but must preserve the human narrative captured in teacher registers.
  • Low‑tech, culturally sensitive AI interventions (e.g., WhatsApp support for disabled learners) can deliver real impact in remote, underserved settings.
  • Benchmarks are decision‑making tools, not competitions; scaling is a system challenge, and the ultimate aim is safe, equitable learning for every child.