Genomics, AI, and the Future of Health: Data Visitation to Empower the Global South
Abstract
The workshop examined why genomic and health‑AI breakthroughs remain skewed toward the Global North and explored data visitation—the practice of sending algorithms to the data rather than moving the data—to preserve sovereignty while enabling collaborative research. Presentations detailed practical implementations (BioVault, a federated COVID‑19‑screening model) and highlighted policy, ethical, and quality‑control challenges. A panel synthesized technical, societal, and regulatory perspectives, underscoring the need for multidisciplinary champions to translate these innovations into real‑world health benefit for the Global South.
Detailed Summary
- Definition of “Global South” vs. “Global North” – ~85 % of humanity lives in the Global South, which is generally less economically developed and has limited access to advanced AI‑driven health technologies.
- Data‑representation gap – Large biomedical databases (e.g., those used by Google, Amazon, Facebook) are dominated by Caucasian cohorts; the Global South is severely under‑represented.
- Consequences – Higher disease burdens (diabetes, coronary heart disease, chronic kidney disease) in the South are not reflected in training data, producing models that are less accurate for Southern populations.
- Illustrative analogies – The Tata Nano (affordable car) vs. Rolls‑Royce (expensive, ill‑suited) were used to emphasize the need for affordable, accessible solutions rather than luxury exports from the North.
Key Insight: Empowering the Global South requires data‑centric governance that keeps data local while still allowing AI to learn from it.
2. Introducing Data Visitation
Speaker: Binay Panda (continued)
- Concept – “Data visitation” means algorithms travel to the data; the data never leaves its sovereign repository.
- Motivation – Avoids violating data‑sovereignty laws, reduces cost of moving massive genomic datasets, and preserves privacy.
3. BioVault – A Privacy‑First Data‑Visitation Platform
Speaker: Madhava J. (OpenMined Foundation)
3.1. Vision & Personal Background
- Rare‑disease patient and privacy‑technology engineer; emphasizes personal stake in safeguarding health data.
3.2. Technical Foundations
| Feature | Description |
|---|---|
| Desktop utility | Windows‑compatible, open‑source GUI that links locally stored data to a “facade” exposing only query‑level access. |
| NTF encryption | End‑to‑end encryption of data‑owner side; no raw data leaves the host machine. |
| Support for Jupyter & extensions | Enables data scientists to run notebook‑style analyses without uploading data. |
| Federated encrypted computation | Peer‑to‑peer, multi‑party protocols that aggregate model updates while keeping raw records hidden. |
| Low‑cost hardware | Runs on modest PCs; future plans to use inexpensive Raspberry Pi‑style devices for edge deployment. |
3.3. Demonstrations & Use‑Cases
- Single‑cell RNA‑seq analysis – shows that high‑dimensional genomic data can be processed remotely while preserving privacy.
- Remote inference on large clinical datasets – a proof‑of‑concept where a model was trained on a partner’s data without seeing the raw records.
3.4. Pilot Projects
-
Caribbean Allele‑Frequency Study (collaboration with Dr Karika Weldon, Kerogenetics)
- Samples collected across multiple islands; data remained on‑site.
- BioVault enabled queries that calculated allele frequencies for APOL1, a kidney‑disease‑associated variant, and compared them to the global gnomAD reference.
- All analyses were performed without any data upload, representing a world‑first for privacy‑preserving population genomics.
-
(Mentioned) Jordanian Cohort – preprint on bioRxiv (QR code displayed) describing use of BioVault on datasets from the Hashemite University’s Arab‑population research.
Key Insight: Data visitation can deliver high‑resolution, clinically relevant genomics while respecting national sovereignty and patient privacy.
4. Video Testimony – The Human Dimension of Data Siloing
Speaker: Prof. Rana Dajani (Hashemite University – pre‑recorded video)
- Personal loss – Mother’s recent passing; stresses that scientific work must remain rooted in humanity.
- Research focus – Genetic risk factors for diabetes in Jordanian, Circassian, and Chechen populations; also epigenetic impacts of trauma on refugees.
- Data‑sharing barriers – Regulatory constraints, lack of infrastructure, and high costs prevent Jordanian groups from contributing to global databases.
- Why BioVault matters – Provides a “domain‑agnostic, privacy‑preserving” framework that lets her team share results without violating sovereignty.
Key Insight: When technology respects people and policy, researchers from low‑resource settings can meaningfully contribute to global precision health.
5. Fair, Scalable, Private AI Using Routine Hospital Data
Speaker: Andrew Soltan (University of Oxford)
5.1. Problem Statement – COVID‑19 Diagnostic Bottleneck
- Early pandemic PCR results took up to 72 hours; delayed patient flow and amplified transmission.
5.2. Data Sources
- Routine vital signs (temperature, heart rate, blood pressure) & standard blood panels collected within the first hour of admission – data that exists in any middle‑income or high‑income hospital.
5.3. Model Development & Validation
- Trained a tabular‑ML model to predict COVID‑19 status from routine data.
- Validated across four NHS trusts (Birmingham, Bedford, Portsmouth, Oxford) and on a second‑wave cohort (≈72 k patients).
5.4. Deployment – Edge‑Device Federated Learning
- Hardware: Low‑cost Raspberry Pi (~£40 / 5 k INR) with Ubuntu OS; powered by a detachable micro‑SD card (securely destroyed after use).
- Workflow: Device sits inside each hospital, receives encrypted model updates, trains locally on site data, and returns only the updated weights. No raw patient data leaves the premises.
- Results: Deployed at John McClure Hospital ED; delivered a COVID‑19 risk score within 45 minutes, outperforming standard PCR turnaround.
5.5. Implications for the Global South
- Minimal hardware cost makes the approach viable for resource‑constrained hospitals.
- Federated learning respects data‑sovereignty laws while still enabling model improvement across diverse populations.
Key Insight: A hardware‑light, federated‑learning pipeline can democratize AI‑driven diagnostics even where computational resources are scarce.
6. Policy, Standards, and Quality‑Control for Data Visitation
Speaker: Francis Crawley (CODATA International Data Policy Committee)
6.1. Ethical‑Technical Integration
- Argues that ethics cannot be an after‑thought; it must be built‑in to the algorithmic layer (e.g., verifiable provenance, bias audits).
6.2. Emerging Governance Frameworks
| Principle | Aim |
|---|---|
| FAIR‑CARE‑TRUST | Extend FAIR data principles (Findable, Accessible, Interoperable, Reusable) with Care (patient‑centric), Trust (auditability) for AI models. |
| Informed‑Consent 2.0 | Dynamic consent mechanisms that allow participants to specify permissible algorithmic queries. |
| Quality‑Control as a Service | Data visitation platforms can run automated QC pipelines (e.g., genotype‑call consistency, missingness checks) before granting query access. |
6.3. Real‑World Pilot
- Collaboration with European Open Science Cloud and Research Data Alliance on a working‑group that drafted guidance notes for ethics committees evaluating health‑AI proposals.
Key Insight: Standardised, privacy‑by‑design policies coupled with built‑in quality metrics are essential for trustworthy global health AI.
7. Panel Discussion – Translating Technology to Impact
| Panelist (identified) | Main Points |
|---|---|
| Madhava J. | Emphasises the need for “champions” who are passionate about data; technology alone is insufficient. |
| Andrew Soltan | Highlights engineering bottlenecks – “boots on the ground” required to integrate solutions into existing health systems. |
| Binay Panda | Stresses starting from human impact: align technology with the lived realities of families and communities. |
| Francis Crawley | Calls for policy advocacy: translate technical possibilities into regulatory incentives. |
| Audience (unidentified) | Raises questions about scaling, sustainability, and the role of AI in agriculture and water management as parallel use‑cases. |
Consensus: Successful deployment hinges on multidisciplinary collaboration, local ownership, and clear value propositions for end‑users.
8. Concluding Remarks
Speaker: Dawn Chen (moderator) – Summarises the session’s call to action:
- Software engineers & entrepreneurs – Leverage open‑source data‑visitation tools (e.g., BioVault) to build scalable solutions.
- Clinicians & researchers – Partner with technologists early, co‑design workflows that respect clinical constraints.
- Policymakers – Craft future‑proof regulations that enable data visitation while safeguarding sovereignty.
A tech meetup announced for the next day at the Indian International Center, inviting participants to continue the dialogue.
Key Takeaways
- Data visitation (algorithms traveling to data) is a practical strategy to reconcile privacy, sovereignty, and AI collaboration across the Global South.
- BioVault demonstrates that a low‑cost, open‑source platform can perform privacy‑preserving genomic analyses (e.g., Caribbean allele‑frequency study) without moving raw data.
- Federated learning on inexpensive edge devices (Raspberry Pi) can deliver rapid, high‑utility clinical predictions (COVID‑19 screening) while respecting data‑share laws.
- Policy frameworks (FAIR‑CARE‑TRUST, dynamic consent, built‑in QC) are essential to embed ethics and trust into AI pipelines.
- Human‑centered framing—ensuring that technologies address actual health burdens of families in the Global South—is crucial for adoption.
- Champion‑driven, multidisciplinary teams (technologists, clinicians, policymakers) are the engine that will move these prototypes into real‑world health impact.
- Scalable, low‑budget solutions are feasible; the main barrier is knowledge transfer and local capacity building.
- Open‑source community and regional meet‑ups (e.g., the upcoming tech meetup) are vital venues for sharing expertise and fostering collaborations.
Prepared as a polished, third‑person summary of the workshop transcript, preserving speaker attribution, data points, and nuanced discussion of technical, ethical, and societal dimensions.
See Also: