India has emerged as one of the world's largest adopters of AI, driven by scale of data, digital public infrastructure, and enterprise digitisation. However, the next strategic leap lies in transitioning from being predominantly an AI user to becoming a creator of indigenous AI systems. This session will examine what it takes for India to move decisively up the AI value chain: from deploying models to building home-grown AI solutions.
VIDEO RECORDING
From AI User to Creator: The Next Leap of India’s AI Innovation
Detailed Summary
Welcome & framing: Manu introduced the session, explaining Karya’s mission – “work that gives dignity” – and the broader summit theme of moving India from AI user to AI creator.
Scale of effort: Over the past three years Karya has mobilised 250 k+ contributors across every Indian state, building foundational datasets, and striving to make AI models fair and inclusive.
Challenge statement: Current AI economics, model‑building pipelines, and product‑centric deployments are not well‑aligned with the needs of the Global South; community‑centric evaluation is essential.
2. Community‑Centric Evaluation – Samiksha (Sunaina & Dr. Kalika Balia)
2.1 Why a new benchmark?
Existing multilingual benchmarks are English‑centric, sparsely cover Indian languages, and often rely on translation of English tasks – leading to poor relevance and low cultural fidelity.
The AI community needs ground‑truth data that reflects real‑world community questions (legal, healthcare, education, finance).
2.2 Design & Construction
Name & meaning: “Samiksha” = Sanskrit for collective analysis.
Data‑collection pipeline:
Partnered with 14 civil‑society organisations (CSOs) to source authentic user questions across four domains (Healthcare, Education, Legal, Finance).
Generated 23 000 data points in 11 Indian languages + Indian English (≈ 2 % of world’s linguistic diversity).
Human evaluation: 150 000 evaluations by Karya workers (a record compared with the earlier Pariksha benchmark).
Automated evaluation: > 1 million “LLM‑as‑judge” runs to calibrate automated metrics against human scores.
2.3 Benchmark Scope
Domain
Languages
Models evaluated
Data points
Healthcare
11
17
23 000
Education
—
—
—
Legal
—
—
—
Finance
—
—
—
Coverage goal: expand to 22 + languages in the next iteration.
2.4 Key Findings
Overall performance: Most global frontier models score well on content and linguistic quality but fall short on cultural relevance.
Healthcare – lowest scores; models frequently missed clinical nuance and produced overly generic advice.
Finance – mixed; some models produced “translated‑English” phrasing that felt unnatural.
Cultural relevance: The strongest differentiator; models trained on locally curated data (e.g., Indian‑origin LLMs) performed better.
Automated vs. human: Pure LLM‑judge metrics diverge sharply from human preferences; however, fine‑tuning the automated pipeline with human‑collected data reduces the gap significantly.
2.5 Takeaway Messages (Speaker call‑outs)
Sunaina: “Benchmarks must be built from the ground up, not retro‑fitted from English.”
Dr. Kalika: “Human‑in‑the‑loop evaluation remains the gold standard; automation is only an aide.”
3. Scaling AI Creation – Bhashini & Samadhan (Amitabh Nafsar)
3.1 The Bhashini Vision
Goal: Build a full‑stack multilingual AI stack (ASR, MT, NER, TTS) for every Indian language, tackling the data‑scarcity problem where the top 6‑7 languages hold only ~3 % of internet text.
3.2 Data‑Collection Strategy
Brute‑force field effort: 200 + field workers captured audio‑visual prompts (images of objects, food, places) and transcribed/translated them, creating an initial morpheme‑level corpus sufficient for prototype models.
3.3 Early Deployments (Use‑Case Highlights)
Use‑case
Community Impact
Technical Note
Panchayat meeting minutes
Transparency: 270 k panchayats now have English‑to‑local‑language transcription, enabling central ministries to review local deliberations.
Speech‑to‑text pipeline + translation.
Agriculture advisory (Maharashtra)
Farmers receive voice‑based, local‑language AI advice for field‑level decisions.
Domain‑specific taxonomy (≈ 70 k agri terms) collected from tribal students.
UIDA localisation
Place‑name translation for over 1.6 M geographical entities (Survey of India data) – improves navigation, e‑governance.
Crowd‑sourced audio recordings + ASR + NER.
Education / Career guidance
AI chat‑bot helps students explore courses and jobs in mother‑tongue.
Dialogue system trained on Samiksha‑derived data.
3.4 AI Value‑Chain Philosophy
End‑to‑end loop: Data collection → Annotation → Model building → Vetting → Deployment → Community feedback → New data.
Emphasised that communities must own each stage to avoid a “centralised AI economy”.
3.5 Announcement: Samadhan Platform
A community‑driven marketplace (named after the Sanskrit “solution”) that matches AI data‑workers with NGOs, government bodies, and private firms needing language‑specific data/annotation.
Launch date: announced for 3 pm, 19 Feb (demo at booth 14).
4. Global Outlook – Community‑Driven Pipelines (Fezzel, Collective Intelligence Project)
Core thesis: Community‑driven evaluation captures breadth, contextuality, and variability that traditional benchmarks miss.
Scalable platform:WeVal (weval.org) enables teachers, doctors, and local experts worldwide to author and run AI evaluations using criteria from their lived experience.
Future roadmap: Export the Samiksha pipeline to other Global‑South nations (Uganda, Colombia, etc.), creating a network of locally owned benchmarks.
5. AI‑for‑Good Investment Landscape (Shannon Farley, Fast Forward)
Indian ecosystem strengths:
Deep technical talent, robust public‑digital infrastructure, and homegrown philanthropy (Nudge, ACT Grants).
Model for replication: Indian nonprofits demonstrate that community‑centred AI + targeted funding can accelerate impact; same pattern can be nurtured in Colombia, Africa, and beyond.
6. Audience Q&A – Highlights
Question
Respondent(s)
Key Points
Human evaluation consistency across languages?
Karya team (Ankush Sabharwal, Navrina Singh)
Evaluators are language‑specific; inter‑annotator agreement measured and reported in the Samiksha report.
Reliability of LLM‑as‑judge?
Sunaina, Dr. Kalika
Current LLM judges are not reliable; after fine‑tuning with human scores, correlation improves but still insufficient for final decisions.
Scaling community data‑workers?
Amitabh Nafsar, Pratyush Kumar
Need sustainable income models; Samudaya (community platform) will provide paid micro‑tasks and skill‑building pathways.
Open‑source data release?
Manu Chopra
Benchmark data will remain closed until all model evaluations are completed to avoid contamination; post‑evaluation release planned.
7. Announcements & Closing Remarks
Dataset & benchmark release (via Karya website) – Samiksha leaderboard and 37‑page technical report available.
Launch of Samudaya – community‑platform for AI data‑workers (demo at booth 14).
Booth lineup: Mastercard, Bhashini, India AI Mission – live demos of Samudaya, language‑AI tools, and new datasets.
Upcoming panels: 28 additional panels over the next four days; participants invited to attend.
Final thanks from Manu Chopra and the panelists; applause.
Key Takeaways
Community‑driven benchmarks are essential for building AI that truly serves India’s linguistic and cultural diversity.
Samiksha – the first large‑scale, ground‑up Indian‑language benchmark – shows global models excel on content but lag on cultural relevance, especially in healthcare.
Human evaluation remains the gold standard; automated LLM‑as‑judge methods need substantial calibration with local human data.
The Bhashini‑Samadhan stack demonstrates a viable path from data scarcity to deployable multilingual AI via field‑level data collection and community ownership.
AI value‑chain must be closed-loop: community members participate in data collection, annotation, evaluation, and feedback, ensuring sustainable AI creation rather than consumption.
Global scaling is feasible: the Collective Intelligence Project’s WeVal platform adapts the Indian community‑benchmark model for other regions, promising broader impact.
India’s AI‑for‑Good ecosystem (non‑profits, philanthropy, government) provides a replicable model for other emerging economies.
Policy implication: To transition from AI user to creator, India must institutionalise community‑centred data pipelines, multilingual evaluation standards, and open‑access yet responsibly managed benchmark releases.
Prepared from the verbatim transcript of the AI Impact Summit session held in Delhi, 24 Feb 2026.