Beyond Explainability: How We Are Redefining Interpretability in AI
As artificial intelligence becomes more deeply embedded in scientific discovery and healthcare, one challenge continues to stand out: understanding what models actually know.
This paper takes a major step toward answering that important question about what AI models know.
Jonathan Warrell from our Machine Learning team is the lead author of “Interpretability and Implicit Model Semantics in Biomedicine and Deep Learning,” written in collaboration with Michael Gancz, Hussein Mohsen, Prashant Emani, and Mark Gerstein of Yale University. The paper, published in Nature Machine Intelligence, introduces a new framework for thinking about AI systems.
A New Perspective: Interpretability Is Not Enough
Together, the team introduces a broader framework for understanding AI, shifting the conversation beyond explainability toward a more fundamental question: what do models actually represent?
Much of today’s AI discussion centers on interpretability.
Can we explain a model’s decisions? Can we visualize what it has learned?
Warrell says the research challenges one of the most widely held assumptions in artificial intelligence.
“Interpretability is only one aspect of a model’s semantics. Models do more than generate outputs. They encode relationships and structure about the world, often in ways that are not directly accessible to human understanding. If we focus only on interpretability, we risk overlooking the deeper scientific meaning captured within these systems.”
What Are Model Semantics?
Borrowing from the philosophy of science, the paper defines model semantics as the way a model represents real-world phenomena. This includes:
- What patterns the model captures
- How internal representations relate to real-world variables
- Whether those representations align with scientific reality
Importantly, these semantics can be implicit. Deep learning systems often learn features that are highly predictive but difficult to interpret directly. As the authors explain, the goal is not just to make models explainable, but to understand what they actually represent and whether those representations are meaningful.
Real-World Implications
This shift from interpretability to semantics has major implications, especially in biomedicine.
- Trusting High-Performance Models
In healthcare, accuracy can be life-saving. But interpretability is often required for trust. This research suggests a more nuanced approach. A model may be:
- Hard to interpret
- Yet still scientifically valid
If its learned semantics align with real biological processes, it can still be trustworthy, even without full transparency.
- Advancing Drug Discovery and Genomics
These systems often uncover patterns that humans do not already understand. By focusing on semantics, researchers can evaluate whether those patterns correspond to real biological phenomena, opening the door to new discoveries rather than just explanations. Deep learning models are increasingly used to:
- Predict protein structures
- Model gene expression
- Identify disease mechanisms
- Bridging AI and Scientific Theory
One of the most powerful ideas in the paper is that AI models can function similarly to scientific theories. The framework provides a way to formally analyze this connection, helping scientists move from “black box” skepticism to structured validation of AI knowledge. Instead of simply fitting data, they can:
- Encode hypotheses about the world
- Capture latent structures in complex systems
- Provide new ways of understanding phenomena
Why This Matters Now
As AI systems become more complex and widely deployed, the limits of traditional interpretability are becoming increasingly clear. Models are growing more powerful yet less transparent, real-world applications demand both high performance and trust, and scientific domains require deeper understanding, not just accurate predictions. Our work with Yale offers a path forward by reframing the problem. Instead of asking whether we can explain a model, the more important questions are what the model actually represents and whether those representations correspond to reality.
Final Thoughts
This publication marks an important shift in how we think about AI. By expanding the conversation beyond interpretability to include implicit model semantics, the authors provide a more complete framework for evaluating modern machine learning systems. For industries like healthcare, where accuracy, trust, and discovery intersect, this perspective could prove transformative. And as Jonathan Warrell and his co-authors make clear, the future of AI understanding will not be defined by how well we can explain models, but by how well we can connect them to the real world.
About Jonathan Warrell
Jonathan Warrell is a Researcher in the Machine Learning Department at NEC Laboratories America. He received his BA in Music from the University of Cambridge, his Master’s as well as his PhD in Music Theory and Analysis from King’s College London, an MS in Computer Science (Distinction) from the University College London, and he went on to postdoctoral work in computational genomics and neuroscience in Yale’s Department of Molecular Biophysics and Biochemistry.
His research focuses particularly on computational biology, spatial genomics, and optimization theory. At NEC, Dr. Warrell contributes to projects involving molecular design, large-scale genomic reasoning, compositionality of diffusion models, biomarker discovery, reinforcement learning, and variational methods. His work leverages hybrid approaches that combine symbolic and neural methods, particularly for solving discrete optimization problems in genomics.
He collaborates closely with NEC Bio, NEC OncoImmunity, and NEC’s Biometric Research Laboratories on developing interpretable and efficient methods for solving biological and medical problems. He has published widely in high impact journals such as Science, Cell, Nature Genetics and Nature Machine Intelligence.





