Metrics, Methods and Platforms for Measurement of Artificial Intelligence for Trustworthy, Reliable and Explainable Applications
Detailed Summary
Speaker: Prof. Partha Pratim Das
- Key Insight: AI systems now cut across traditional sector boundaries; robust measurement must therefore be cross‑sectoral, not siloed.
- Research Agenda Highlights:
- Development of Sectoral World Models and a Cross‑Sectoral Metric Description Language to enable interoperability of metrics across domains.
- Focus on inter‑ and intra‑sectoral metric flow, recommendation engines for metric selection, and a foundational metrology framework for AI governance.
- Recommendation: Build adaptable AI‑evaluation frameworks that can be translated to domain‑specific pilots (healthcare, governance, manufacturing, logistics).
2. AI Measurement in Healthcare
Speaker: Prof. Richa Singh
-
Five Thrust Areas
- Open Platforms & Benchmarking – Need for country‑specific health data repositories; demographic differences (e.g., average heights) make a universal benchmark infeasible.
- Point‑of‑Care Explainability – Clinicians must understand why an AI recommendation is made (e.g., why a CT suggests avoiding lumbar puncture).
- Personalised Digital Health & Preventive Care – AI should support early‑intervention health monitoring.
- Public‑Health AI – Scaling AI to reach mass populations given low doctor‑to‑patient ratios.
- Metric Innovation – Beyond sensitivity/specificity, develop application‑specific metrics that capture trust, explainability, and data‑diversity gaps.
-
Challenges Identified:
- Fragmented standards across countries.
- Data‑diversity gaps that impede model generalisation.
-
Recommendations:
- Create open benchmarking platforms that curate and annotate data per local clinical practice.
- Embed explainability hooks in AI pipelines to surface reasoning to clinicians.
3. AI Measurement in Manufacturing (Industry 5.0)
Speaker: Prof. Amlan Chakraborty
-
Context: Transition from Industry 4.0 (automation) to Industry 5.0 (human‑centric AI).
-
Key Risks:
- Black‑Box Models → hidden biases and reliability gaps.
- Sensor/Instrument Failures → data‑drift and policy‑drift.
- Security Threats – data‑poisoning, privacy breaches.
-
Four Core Metric Families
- Transparency & Explainability – Why did a model output a particular decision?
- Demographic Equality & Equity – Performance parity across demographic groups and sectors.
- Robustness & Drift Detection – Ability to tolerate sensor drift, policy changes, and operational variance.
- Accountability & Auditing – Frequency of human overrides, traceability from data ingestion to model maintenance.
-
Actionable Recommendations:
- Develop auditable standards and compliance mechanisms.
- Provide SME‑friendly, “go‑model” toolkits that are explainable yet lightweight.
- Leverage edge computing for real‑time monitoring and lifecycle governance.
4. AI Measurement for Governance & Public Services
Speaker: Prof. Mayank Vatsa
-
Premise: Governments worldwide already (or will soon) rely on AI for policy decisions; these decisions must be explainable and auditable to citizens.
-
Illustrative Use‑Case:
- Aadhaar‑type national ID – AI‑driven eligibility decisions need transparent justification.
-
Measurement Challenges:
- Multilingual & Multicultural Contexts – India’s 22 official languages + thousands of dialects demand language‑aware AI interfaces.
- Summarisation of Legal Texts – Translating dense legislation into layperson‑friendly explanations.
-
Proposed Solution:
- Unified Citizen‑Interface Programme – A multilingual voice‑driven interface that can:
- Explain AI‑driven decisions,
- Summarise legal documents, and
- Provide audit trails for civic oversight.
- Unified Citizen‑Interface Programme – A multilingual voice‑driven interface that can:
-
Risk Classification: AI systems should be categorised into low, medium, high risk domains with corresponding measurement regimes.
5. Beyond Benchmarks – Latent Performance Profiling (LPP)
Speaker: Prof. Tonnoi Chakrawood (presenter on model evaluation)
-
Problem Statement: Traditional benchmark‑centric evaluation masks hidden weaknesses (e.g., data contamination, inconsistent performance across similar‑size models).
-
Proposed Metric Suite – LPP:
- Entropy of Final Layer – Captures information richness of model representations.
- Participation Ratio – Measures how “compact” or distributed activations are across layers.
-
Prescription: Include LPP metrics in model cards so users can select models best suited for a given task (e.g., a healthcare task may prefer a model with higher entropy).
6. AI Security & Safety – Measurement‑Centric View
Speakers: Prof. Devdeep Makhabharadha & Prof. Maynath Mandal
-
Three Categories of AI‑Related Risks
- Intrinsic Risk – Knowledge gaps, hallucinations, inability to reason.
- Interaction Risk – Users over‑trusting or misusing AI outputs.
- Societal Risk – Large‑scale misinformation, malicious deployments.
-
Current Defence Strategies
- Guard‑rail (black‑list) approaches – Simple rule‑based filters; effective for privacy/PII but vulnerable to jailbreaks.
- Adaptive, Measurement‑Based Guardrails – Continuously monitor knowledge‑gap metrics, norm compliance, and “judge models” that evaluate safety in real time.
Key Takeaways
7. Panel Discussion (Moderated by Prof. Lipika Dey)
7.1. Panelists
- Prof. Siddharth Khastgir – Safe Autonomy, University of Warwick.
- Prof. Karsten Maple – Cyber Systems Engineering, The Alan Turing Institute.
- Prof. Wolfgang Nagel – High‑Performance Computing, TU Dresden.
- Ms. Kavita Bhatia – COO, India AI Mission, MeitY.
7.2. Major Themes & Exchanges
| Question | Key Points Raised |
|---|---|
| Robustness & Regulatory Guarantees? (to Prof. Khastgir) | • Different regulatory philosophies (UK/US guidelines vs. EU strict regulation). • Need to bound the problem – ensure training & test sets represent real‑world distributions. • ISO standards (e.g., ISO 34503 for automotive) help formalise the bounding. |
| Post‑deployment Evaluation? (to Prof. Nagel) | • Energy consumption as a measurable post‑deployment metric (operational cost). • No universal answer yet; AI systems evolve, so continuous monitoring (akin to software versioning) is required. |
| Mandatory Measurables for Regulated Sectors? (to Prof. Maple) | • Emphasis on system‑level testing rather than model‑level only. • Digital sandboxes – controlled environments with synthetic data for safe testing. • Collaboration between regulators, industry, and academia is essential. |
| India‑specific Metrics? (to Ms. Bhatia) | • Inclusivity, language, robustness are top priorities. • India AI Mission launched AI Kosh (10 k+ Indian datasets) and Bhasini (language‑aware tools). • Ongoing work with 12 partner organisations and 4 model releases showcased at the summit. |
| What Aspects Remain Hard to Measure? (to Prof. Nagel) | • In healthcare, human‑in‑the‑loop verification remains the most reliable control; scaling this is a challenge. |
| Engineering‑level Bias‑Free Data? (to Prof. Khastgir) | • Proposal of an ontological model (OASIS) to define completeness & representativeness of data. • Need for standardised engineering methods to prove bias‑free datasets. |
| Global Benchmarks vs. Local Tailoring? (to Prof. Maple) | • ML Commons initiative provides a rigorous, repeatable benchmark methodology. • Global principles are feasible, but metrics must be culturally aware (e.g., gifting a clock in East Asia vs. UK). |
| Final Views on Global Standards? (to Ms. Bhatia) | • Global AI‑principles are essential, yet local adaptation is mandatory for diversity and inclusivity. • Indian AI Mission is establishing an institute with 13 academic partners to co‑design India‑specific evaluation frameworks. |
7.3. Audience Interaction
-
Question about “Bounding the Real World” – Prof. Khastgir illustrated an ontology‑driven taxonomy (static scenery, dynamic elements, environmental conditions) that maps directly to ISO 34503 for autonomous vehicles.
-
Question on International Research Collaboration – Ms. Bhatia explained that Indian researchers abroad are already contributing to foundation‑model development under the AI Mission; calls for proposals on safety and ethics are open for global participation.
See Also:
- ai-impact-forum-democratising-ai-resources
- best-practices-from-the-international-network-for-advanced-ai-measurement-evaluation-and-science
- enterprise-adoption-of-responsible-ai-challenges-frameworks-and-solutions
- ai-innovators-exchange-accelerating-innovation-through-startup-and-industry-synergy
- from-buzzword-to-blueprint-engineering-sustainable-ai-at-scale
- scaling-trusted-ai-for-8-billion
- sovereign-ai-for-india-designing-the-nations-future-compute-data-and-innovation-ecosystem
- from-pilots-to-impact-evidence-on-scaling-ai-for-farmers-in-lmics
- beyond-proof-of-concepts-using-4d-ai-to-build-sovereign-sustainable-and-responsible-ai-at-production-scale
- ai-diffusion-from-innovation-to-population-scale-impact