Metrics, Methods and Platforms for Measurement of Artificial Intelligence for Trustworthy, Reliable and Explainable Applications

Detailed Summary

Speaker: Prof. Partha Pratim Das

Key Insight: AI systems now cut across traditional sector boundaries; robust measurement must therefore be cross‑sectoral, not siloed.
Research Agenda Highlights:
- Development of Sectoral World Models and a Cross‑Sectoral Metric Description Language to enable interoperability of metrics across domains.
- Focus on inter‑ and intra‑sectoral metric flow, recommendation engines for metric selection, and a foundational metrology framework for AI governance.
Recommendation: Build adaptable AI‑evaluation frameworks that can be translated to domain‑specific pilots (healthcare, governance, manufacturing, logistics).

2. AI Measurement in Healthcare

Speaker: Prof. Richa Singh

Five Thrust Areas
1. Open Platforms & Benchmarking – Need for country‑specific health data repositories; demographic differences (e.g., average heights) make a universal benchmark infeasible.
2. Point‑of‑Care Explainability – Clinicians must understand why an AI recommendation is made (e.g., why a CT suggests avoiding lumbar puncture).
3. Personalised Digital Health & Preventive Care – AI should support early‑intervention health monitoring.
4. Public‑Health AI – Scaling AI to reach mass populations given low doctor‑to‑patient ratios.
5. Metric Innovation – Beyond sensitivity/specificity, develop application‑specific metrics that capture trust, explainability, and data‑diversity gaps.
Challenges Identified:
- Fragmented standards across countries.
- Data‑diversity gaps that impede model generalisation.
Recommendations:
- Create open benchmarking platforms that curate and annotate data per local clinical practice.
- Embed explainability hooks in AI pipelines to surface reasoning to clinicians.

3. AI Measurement in Manufacturing (Industry 5.0)

Speaker: Prof. Amlan Chakraborty

Context: Transition from Industry 4.0 (automation) to Industry 5.0 (human‑centric AI).
Key Risks:
- Black‑Box Models → hidden biases and reliability gaps.
- Sensor/Instrument Failures → data‑drift and policy‑drift.
- Security Threats – data‑poisoning, privacy breaches.
Four Core Metric Families
1. Transparency & Explainability – Why did a model output a particular decision?
2. Demographic Equality & Equity – Performance parity across demographic groups and sectors.
3. Robustness & Drift Detection – Ability to tolerate sensor drift, policy changes, and operational variance.
4. Accountability & Auditing – Frequency of human overrides, traceability from data ingestion to model maintenance.
Actionable Recommendations:
- Develop auditable standards and compliance mechanisms.
- Provide SME‑friendly, “go‑model” toolkits that are explainable yet lightweight.
- Leverage edge computing for real‑time monitoring and lifecycle governance.

4. AI Measurement for Governance & Public Services

Speaker: Prof. Mayank Vatsa

Premise: Governments worldwide already (or will soon) rely on AI for policy decisions; these decisions must be explainable and auditable to citizens.
Illustrative Use‑Case:
- Aadhaar‑type national ID – AI‑driven eligibility decisions need transparent justification.
Measurement Challenges:
- Multilingual & Multicultural Contexts – India’s 22 official languages + thousands of dialects demand language‑aware AI interfaces.
- Summarisation of Legal Texts – Translating dense legislation into layperson‑friendly explanations.
Proposed Solution:
- Unified Citizen‑Interface Programme – A multilingual voice‑driven interface that can:
  - Explain AI‑driven decisions,
  - Summarise legal documents, and
  - Provide audit trails for civic oversight.
Risk Classification: AI systems should be categorised into low, medium, high risk domains with corresponding measurement regimes.

5. Beyond Benchmarks – Latent Performance Profiling (LPP)

Speaker: Prof. Tonnoi Chakrawood (presenter on model evaluation)

Problem Statement: Traditional benchmark‑centric evaluation masks hidden weaknesses (e.g., data contamination, inconsistent performance across similar‑size models).
Proposed Metric Suite – LPP:
- Entropy of Final Layer – Captures information richness of model representations.
- Participation Ratio – Measures how “compact” or distributed activations are across layers.
Prescription: Include LPP metrics in model cards so users can select models best suited for a given task (e.g., a healthcare task may prefer a model with higher entropy).

6. AI Security & Safety – Measurement‑Centric View

Speakers: Prof. Devdeep Makhabharadha & Prof. Maynath Mandal

Three Categories of AI‑Related Risks
1. Intrinsic Risk – Knowledge gaps, hallucinations, inability to reason.
2. Interaction Risk – Users over‑trusting or misusing AI outputs.
3. Societal Risk – Large‑scale misinformation, malicious deployments.
Current Defence Strategies
- Guard‑rail (black‑list) approaches – Simple rule‑based filters; effective for privacy/PII but vulnerable to jailbreaks.
- Adaptive, Measurement‑Based Guardrails – Continuously monitor knowledge‑gap metrics, norm compliance, and “judge models” that evaluate safety in real time.

Key Takeaways

7. Panel Discussion (Moderated by Prof. Lipika Dey)

7.1. Panelists

Prof. Siddharth Khastgir – Safe Autonomy, University of Warwick.
Prof. Karsten Maple – Cyber Systems Engineering, The Alan Turing Institute.
Prof. Wolfgang Nagel – High‑Performance Computing, TU Dresden.
Ms. Kavita Bhatia – COO, India AI Mission, MeitY.

7.2. Major Themes & Exchanges

Question	Key Points Raised
Robustness & Regulatory Guarantees? (to Prof. Khastgir)	• Different regulatory philosophies (UK/US guidelines vs. EU strict regulation). • Need to bound the problem – ensure training & test sets represent real‑world distributions. • ISO standards (e.g., ISO 34503 for automotive) help formalise the bounding.
Post‑deployment Evaluation? (to Prof. Nagel)	• Energy consumption as a measurable post‑deployment metric (operational cost). • No universal answer yet; AI systems evolve, so continuous monitoring (akin to software versioning) is required.
Mandatory Measurables for Regulated Sectors? (to Prof. Maple)	• Emphasis on system‑level testing rather than model‑level only. • Digital sandboxes – controlled environments with synthetic data for safe testing. • Collaboration between regulators, industry, and academia is essential.
India‑specific Metrics? (to Ms. Bhatia)	• Inclusivity, language, robustness are top priorities. • India AI Mission launched AI Kosh (10 k+ Indian datasets) and Bhasini (language‑aware tools). • Ongoing work with 12 partner organisations and 4 model releases showcased at the summit.
What Aspects Remain Hard to Measure? (to Prof. Nagel)	• In healthcare, human‑in‑the‑loop verification remains the most reliable control; scaling this is a challenge.
Engineering‑level Bias‑Free Data? (to Prof. Khastgir)	• Proposal of an ontological model (OASIS) to define completeness & representativeness of data. • Need for standardised engineering methods to prove bias‑free datasets.
Global Benchmarks vs. Local Tailoring? (to Prof. Maple)	• ML Commons initiative provides a rigorous, repeatable benchmark methodology. • Global principles are feasible, but metrics must be culturally aware (e.g., gifting a clock in East Asia vs. UK).
Final Views on Global Standards? (to Ms. Bhatia)	• Global AI‑principles are essential, yet local adaptation is mandatory for diversity and inclusivity. • Indian AI Mission is establishing an institute with 13 academic partners to co‑design India‑specific evaluation frameworks.

7.3. Audience Interaction

Question about “Bounding the Real World” – Prof. Khastgir illustrated an ontology‑driven taxonomy (static scenery, dynamic elements, environmental conditions) that maps directly to ISO 34503 for autonomous vehicles.
Question on International Research Collaboration – Ms. Bhatia explained that Indian researchers abroad are already contributing to foundation‑model development under the AI Mission; calls for proposals on safety and ethics are open for global participation.

See Also:

India AI Impact Summit 2026

Explorer

metrics-methods-and-platforms-for-measurement-of-artificial-intelligence-for-trustworthy-reliable-and-explainable-applic

Metrics, Methods and Platforms for Measurement of Artificial Intelligence for Trustworthy, Reliable and Explainable Applications

Detailed Summary

2. AI Measurement in Healthcare

3. AI Measurement in Manufacturing (Industry 5.0)

4. AI Measurement for Governance & Public Services

5. Beyond Benchmarks – Latent Performance Profiling (LPP)

6. AI Security & Safety – Measurement‑Centric View

Key Takeaways

7. Panel Discussion (Moderated by Prof. Lipika Dey)

7.1. Panelists

7.2. Major Themes & Exchanges

7.3. Audience Interaction

Graph View

Table of Contents

India AI Impact Summit 2026

Explorer

metrics-methods-and-platforms-for-measurement-of-artificial-intelligence-for-trustworthy-reliable-and-explainable-applic

Metrics, Methods and Platforms for Measurement of Artificial Intelligence for Trustworthy, Reliable and Explainable Applications

Detailed Summary

2. AI Measurement in Healthcare

3. AI Measurement in Manufacturing (Industry 5.0)

4. AI Measurement for Governance & Public Services

5. Beyond Benchmarks – Latent Performance Profiling (LPP)

6. AI Security & Safety – Measurement‑Centric View

Key Takeaways

7. Panel Discussion (Moderated by Prof. Lipika Dey)

7.1. Panelists

7.2. Major Themes & Exchanges

7.3. Audience Interaction

Graph View

Table of Contents

3. AI Measurement in Manufacturing (Industry 5.0)

7. Panel Discussion (Moderated by Prof. Lipika Dey)