Metrics, Methods and Platforms for Measurement of Artificial Intelligence for Trustworthy, Reliable and Explainable Applications

Detailed Summary

Speaker: Prof. Partha Pratim Das

  • Key Insight: AI systems now cut across traditional sector boundaries; robust measurement must therefore be cross‑sectoral, not siloed.
  • Research Agenda Highlights:
    • Development of Sectoral World Models and a Cross‑Sectoral Metric Description Language to enable interoperability of metrics across domains.
    • Focus on inter‑ and intra‑sectoral metric flow, recommendation engines for metric selection, and a foundational metrology framework for AI governance.
  • Recommendation: Build adaptable AI‑evaluation frameworks that can be translated to domain‑specific pilots (healthcare, governance, manufacturing, logistics).

2. AI Measurement in Healthcare

Speaker: Prof. Richa Singh

  • Five Thrust Areas

    1. Open Platforms & Benchmarking – Need for country‑specific health data repositories; demographic differences (e.g., average heights) make a universal benchmark infeasible.
    2. Point‑of‑Care Explainability – Clinicians must understand why an AI recommendation is made (e.g., why a CT suggests avoiding lumbar puncture).
    3. Personalised Digital Health & Preventive Care – AI should support early‑intervention health monitoring.
    4. Public‑Health AI – Scaling AI to reach mass populations given low doctor‑to‑patient ratios.
    5. Metric Innovation – Beyond sensitivity/specificity, develop application‑specific metrics that capture trust, explainability, and data‑diversity gaps.
  • Challenges Identified:

    • Fragmented standards across countries.
    • Data‑diversity gaps that impede model generalisation.
  • Recommendations:

    • Create open benchmarking platforms that curate and annotate data per local clinical practice.
    • Embed explainability hooks in AI pipelines to surface reasoning to clinicians.

3. AI Measurement in Manufacturing (Industry 5.0)

Speaker: Prof. Amlan Chakraborty

  • Context: Transition from Industry 4.0 (automation) to Industry 5.0 (human‑centric AI).

  • Key Risks:

    • Black‑Box Models → hidden biases and reliability gaps.
    • Sensor/Instrument Failures → data‑drift and policy‑drift.
    • Security Threats – data‑poisoning, privacy breaches.
  • Four Core Metric Families

    1. Transparency & Explainability – Why did a model output a particular decision?
    2. Demographic Equality & Equity – Performance parity across demographic groups and sectors.
    3. Robustness & Drift Detection – Ability to tolerate sensor drift, policy changes, and operational variance.
    4. Accountability & Auditing – Frequency of human overrides, traceability from data ingestion to model maintenance.
  • Actionable Recommendations:

    • Develop auditable standards and compliance mechanisms.
    • Provide SME‑friendly, “go‑model” toolkits that are explainable yet lightweight.
    • Leverage edge computing for real‑time monitoring and lifecycle governance.

4. AI Measurement for Governance & Public Services

Speaker: Prof. Mayank Vatsa

  • Premise: Governments worldwide already (or will soon) rely on AI for policy decisions; these decisions must be explainable and auditable to citizens.

  • Illustrative Use‑Case:

    • Aadhaar‑type national ID – AI‑driven eligibility decisions need transparent justification.
  • Measurement Challenges:

    • Multilingual & Multicultural Contexts – India’s 22 official languages + thousands of dialects demand language‑aware AI interfaces.
    • Summarisation of Legal Texts – Translating dense legislation into layperson‑friendly explanations.
  • Proposed Solution:

    • Unified Citizen‑Interface Programme – A multilingual voice‑driven interface that can:
      • Explain AI‑driven decisions,
      • Summarise legal documents, and
      • Provide audit trails for civic oversight.
  • Risk Classification: AI systems should be categorised into low, medium, high risk domains with corresponding measurement regimes.


5. Beyond Benchmarks – Latent Performance Profiling (LPP)

Speaker: Prof. Tonnoi Chakrawood (presenter on model evaluation)

  • Problem Statement: Traditional benchmark‑centric evaluation masks hidden weaknesses (e.g., data contamination, inconsistent performance across similar‑size models).

  • Proposed Metric Suite – LPP:

    • Entropy of Final Layer – Captures information richness of model representations.
    • Participation Ratio – Measures how “compact” or distributed activations are across layers.
  • Prescription: Include LPP metrics in model cards so users can select models best suited for a given task (e.g., a healthcare task may prefer a model with higher entropy).


6. AI Security & Safety – Measurement‑Centric View

Speakers: Prof. Devdeep Makhabharadha & Prof. Maynath Mandal

  • Three Categories of AI‑Related Risks

    1. Intrinsic Risk – Knowledge gaps, hallucinations, inability to reason.
    2. Interaction Risk – Users over‑trusting or misusing AI outputs.
    3. Societal Risk – Large‑scale misinformation, malicious deployments.
  • Current Defence Strategies

    • Guard‑rail (black‑list) approaches – Simple rule‑based filters; effective for privacy/PII but vulnerable to jailbreaks.
    • Adaptive, Measurement‑Based Guardrails – Continuously monitor knowledge‑gap metrics, norm compliance, and “judge models” that evaluate safety in real time.

Key Takeaways

7. Panel Discussion (Moderated by Prof. Lipika Dey)

7.1. Panelists

  • Prof. Siddharth Khastgir – Safe Autonomy, University of Warwick.
  • Prof. Karsten Maple – Cyber Systems Engineering, The Alan Turing Institute.
  • Prof. Wolfgang Nagel – High‑Performance Computing, TU Dresden.
  • Ms. Kavita Bhatia – COO, India AI Mission, MeitY.

7.2. Major Themes & Exchanges

QuestionKey Points Raised
Robustness & Regulatory Guarantees? (to Prof. Khastgir)• Different regulatory philosophies (UK/US guidelines vs. EU strict regulation).
• Need to bound the problem – ensure training & test sets represent real‑world distributions.
• ISO standards (e.g., ISO 34503 for automotive) help formalise the bounding.
Post‑deployment Evaluation? (to Prof. Nagel)Energy consumption as a measurable post‑deployment metric (operational cost).
• No universal answer yet; AI systems evolve, so continuous monitoring (akin to software versioning) is required.
Mandatory Measurables for Regulated Sectors? (to Prof. Maple)• Emphasis on system‑level testing rather than model‑level only.
Digital sandboxes – controlled environments with synthetic data for safe testing.
• Collaboration between regulators, industry, and academia is essential.
India‑specific Metrics? (to Ms. Bhatia)Inclusivity, language, robustness are top priorities.
• India AI Mission launched AI Kosh (10 k+ Indian datasets) and Bhasini (language‑aware tools).
• Ongoing work with 12 partner organisations and 4 model releases showcased at the summit.
What Aspects Remain Hard to Measure? (to Prof. Nagel)• In healthcare, human‑in‑the‑loop verification remains the most reliable control; scaling this is a challenge.
Engineering‑level Bias‑Free Data? (to Prof. Khastgir)• Proposal of an ontological model (OASIS) to define completeness & representativeness of data.
• Need for standardised engineering methods to prove bias‑free datasets.
Global Benchmarks vs. Local Tailoring? (to Prof. Maple)ML Commons initiative provides a rigorous, repeatable benchmark methodology.
• Global principles are feasible, but metrics must be culturally aware (e.g., gifting a clock in East Asia vs. UK).
Final Views on Global Standards? (to Ms. Bhatia)• Global AI‑principles are essential, yet local adaptation is mandatory for diversity and inclusivity.
• Indian AI Mission is establishing an institute with 13 academic partners to co‑design India‑specific evaluation frameworks.

7.3. Audience Interaction

  • Question about “Bounding the Real World” – Prof. Khastgir illustrated an ontology‑driven taxonomy (static scenery, dynamic elements, environmental conditions) that maps directly to ISO 34503 for autonomous vehicles.

  • Question on International Research Collaboration – Ms. Bhatia explained that Indian researchers abroad are already contributing to foundation‑model development under the AI Mission; calls for proposals on safety and ethics are open for global participation.



See Also: