Why Interoperability Still Fails in 2025

MJ Horowitz

15 Jun 2025 — 12 min read

BioIT isn’t starved for data. It’s choking on chaos. No shared formats. No clean metadata. No real interoperability. If we don’t fix that, AI stays blind.

The scene is not particularly dramatic, unless you’re the kind of person who thinks databases can scream.

We’re in the corner of a cluttered translational lab at a well-funded research hospital on the East Coast. A bioinformatician named Alex (real person, real name redacted, real headache ongoing) is trying to perform what should be a relatively pedestrian task in 2025: reconciling two datasets that describe the same cancer patient.

One is a set of short-read whole genome sequences from five years ago, archived as CRAM files from an earlier clinical study. The other is long-read nanopore data collected this year, meant to clarify structural variants missed in the earlier round. Same patient. Same tumor. Different eras of technology. And, just for fun, no shared metadata standards, no harmonized variant annotations, no obvious path to interoperability.

He’s been at it for three days, writing conversion scripts, combing through sample manifest spreadsheets with timestamps as headers, hunting down file provenance from archived pipeline logs that may or may not correspond to the actual study protocol.

This isn’t a corner case. This is the norm.

If biology is the world’s most complex language, a swirling grammar of cellular states, evolutionary pressure, post-translational contortion, and stochastic noise, then modern BioIT is still stuck in Babel.

The tools don’t talk to each other. The data doesn’t travel well. And the institutions, for all their sequencing power and AI enthusiasm, are functionally monolingual.

We are, by all public-facing measures, in the golden age of biological data. Well, we should be

The volume of omics data has grown by several orders of magnitude since 2010, driven by falling sequencing costs and an almost metaphysical confidence that more data will eventually equal more insight. Petabytes flow through hospital systems, sequencing cores, contract research organizations, and biotech drug discovery platforms. GPUs train ever-larger protein language models on curated UniProt alignments. Startups pitch multi-omics dashboards with the swagger of Formula 1 teams.

But the truth is the data doesn’t add up.

The problem is not that we don’t have enough information. It’s that the information we have lives in different dialects…file formats, schemas, identifiers, metadata conventions, timestamp logic, and context frames. It cannot be trivially unified. One lab’s FASTQ is another lab’s unusable artifact.

One hospital’s variant annotation system calls a 17-base deletion a benign typo; another flags it as pathogenic. One vendor’s proteomics output includes timestamps as a string; another as a float. What’s a researcher to do with that?

Oh. And that’s assuming they even have access to the full metadata at all. In many datasets, crucial context (like the source of the biological sample, the protocol used, the kit version, the freezer temperature, the exact machine settings) is missing, inconsistent, or so deeply nested in free text fields that it may as well be gone. Worse, some of this slippage happens silently. The downstream analyst, five steps removed from the wet lab, never sees the discrepancy and treats the results as coherent.

The model trains. The paper gets published. And the infrastructure problem metastasizes. Oops.

Let’s name it clearly. interoperability is the foundational weakness of bioinformatics in the post-AI era.

What AI has done, mostly by accident, sometimes by brute force, is expose the full scope of the biological data mess.

You can’t scale large language models, diffusion pipelines, or graph-based embeddings across biology unless the data behaves like a system. And right now, it doesn’t. There are hundreds of file formats, dozens of overlapping standards, competing ontologies, and vanishingly little agreement on what constitutes a “clean” dataset.

That should be surprising by the way. I mean, we’ve had two decades to figure this out. The Human Genome Project ended in 2003. The Cancer Genome Atlas launched in 2005. We’ve had enough time to sequence millions of individuals, to fund consortia, to build high-powered academic infrastructures like ENCODE, GTEx, and UK Biobank. But instead of convergence, we’ve had proliferation. Every new tool generates its own variant of truth. Every vendor builds its own portal. Every researcher forks their own pipeline.

But so here we are. A field defined by exponential growth in hardware and wet lab instrumentation is now blocked by something embarrassingly analog: the inability to agree on how to name things. How to describe things. How to share meaning across space and time.

This is not a problem of engineering talent. Nor is it a question of cloud budget or GPU access. It’s a deeper structural flaw—a lack of shared context, a deficit of epistemological humility, and a persistent culture of reinvention. Everyone is optimizing. No one is aligning.

Alex eventually got his two datasets to talk to each other. Sort of. He used a battery of tools (some from GitHub, some internal, some Frankenstein’ed together) to normalize variant calls, flatten the metadata, and apply a harmonized gene model from an older Ensembl release to match both sides.

He estimates the process introduced a small but non-negligible error rate, and he’ll be forced to disclaim that in the paper. Still, the real cost wasn’t the technical debt. It was the lost time. Three days of effort, three days of brainpower, three days that could have been spent on actual science.

This is the future of medicine we’re building? Atop of a foundation that forgets its own definitions?

And we haven’t even really talked about silos yet.

The original sin of bioinformatics was not data overload it was independence.

In the earliest days of large-scale genomics, back when sequencing centers were still rarefied cathedrals of compute and PCR machines looked like high-end microwaves, every team of scientists built their own little world. A parsing script here. A BLAST variant there. Some Perl, some R, a little Bash to hold it all together. The tools grew up around the problems, not the other way around. And so, when new types of data arrived (transcriptomic, epigenomic, proteomic, metabolomic) each came with its own peculiar system of record.

These were not minor differences. They were tectonic shifts. A lab focused on RNA-seq had no reason to share tooling with a lab focused on protein-ligand docking. A cancer research group collecting DICOM radiology files had little overlap with an immunologist wrangling flow cytometry matrices. Each domain produced data at different scales, with different temporal properties, different error models, and very different underlying assumptions about what counted as signal and what could be safely ignored.

And so, without much fanfare, the biological sciences became a patchwork of computational dialects. Not because anyone wanted it that way, but because no one had the power or foresight to unify things. The result is a kind of institutional memory loss. A data landscape littered with self-contained ecosystems that do not understand one another. Genomics. Proteomics. Imaging. Electronic health records. Lab notebooks. All walled off and speaking in their weird private tongues.

Some of these silos are technical. Different data types demand different storage strategies. A FASTQ file is line-based and compressible. A mass spectrometry run might live as a binary blob wrapped in metadata. A 3D cryo-EM map requires spatial resolution and continuous floating-point precision. These are real constraints. But technical constraints become political ones fast. Vendors lean in. Platforms optimize for format fidelity, not portability. Open standards get lip service but rarely full adoption. And most researchers, trying to publish or hit milestones, do not have time to engineer their way out of this.

That is how you get an NIH-funded database built on technology that can’t speak to the latest open-source single-cell visualization tool.

That is how you get two major clinical trials for the same cancer using different reference genomes, different variant callers, different gene annotation sets.

That is how you get fifteen years of microbiome studies indexed by strain name alone, with no common taxonomic resolution.

What makes this even harder to unwind is that many of these systems work well in isolation. Within their own bounds, they are elegant. They are fast. They are trusted. Galaxy is brilliant at tracking bioinformatics workflows. XCMS does wonders for LC-MS analysis. Cell Ranger handles 10x Genomics pipelines with remarkable efficiency. But none of them were built for interoperation. They assume the world is their domain and their domain alone.

This assumption becomes dangerous as soon as science crosses disciplines. A cancer project that spans bulk RNA-seq, imaging mass cytometry, and clinical outcome data cannot be neatly contained within any single tool. It must leap across systems. And each leap brings loss. Data must be flattened, transformed, reinterpreted. Precision erodes. Metadata is dropped. Biological meaning gets sanded down to fit the shape of the next pipeline.

To call this a software problem is to miss the point. It is a structural issue. A cultural artifact. A result of decades of grant structures, publication incentives, and academic turf wars that encouraged vertical optimization over horizontal synthesis. Every lab was taught to innovate, not integrate. The prize went to the novel insight, not the reusable schema.

and yes. There have been efforts to build bridges.

The Genomic Data Commons tried. The Human Cell Atlas made a real dent. The Global Alliance for Genomics and Health continues to propose standards.

But the gravitational pull of the silo remains strong. It is easier to start fresh than to retrofit. Easier to publish a shiny new tool than to invest in the thankless labor of making existing ones talk to each other.

The private sector made the same mistake, just with better fonts. Cloud-based platforms like DNAnexus, Seven Bridges, and Terra offered the promise of integrated analysis and to be fair, they delivered it within their walls. But they also created new vendor-specific boundaries. Portability between systems became another cost center.

And the data, once again, stayed in silos, only now virtualized and subscription-gated.

This is the paradox of progress in BioIT. Every new capability, every new sequencing platform, every new analytical method, every new modeling breakthrough, adds another layer to the stack. But that stack is not a tower. It is a pile. And unless someone unifies the base, the whole thing becomes unstable.

There is an irony here. The biological systems we study are themselves masterpieces of integration. Proteins talk to lipids. Genes regulate distant networks. The immune system coordinates across time and tissue. Nature is not siloed. But our tools are.

Until BioIT systems become capable of the same kind of cross-modal fluency that biology itself demonstrates, the field will continue to run into the same wall. It is not enough to generate data. We must also make it comprehensible, translatable, and usable beyond the context in which it was born.

That will require more than standards. It will require humility. And it will require the scientific world to admit that in our pursuit of biological truth, we have built too many castles, and not enough bridges.

The data itself is not the problem. Not entirely. The real trouble sits just beneath it, hidden in the scaffolding that tells us where the data came from, how it was generated, what it supposedly means. This is metadata. And in modern bioinformatics, metadata is the ghost in the machine.

It is everywhere, it is essential, and uh, well, it is often wrong.

Start with the simplest case. A genomic dataset from a hospital biobank arrives labeled as whole blood. Mkay. Seems fine. Until you realize “whole blood” can include leukocytes, platelets, plasma residues, and depending on the centrifugation process, a wildly variable composition of cell-free DNA. That single field hides an entire set of unrecorded variables. What kit was used? What time of day was the draw? Was the patient fasting? Was the sample processed immediately or held on ice?

These details shape the data. But most of them are missing, or worse, encoded in notes nobody standardized. Context becomes contamination.

Across the field, even when metadata is present, it is inconsistent. There is no universal agreement on naming conventions, let alone structure. One lab might describe a condition as “triple-negative breast cancer,” another as “ER-/PR-/HER2-,” and another as “TNBC.” Automated pipelines stumble. Matching across datasets becomes brittle. Worse, assumptions about comparability get baked into downstream models that were never designed to handle semantic drift.

A researcher once said, half-joking, that trying to harmonize biological metadata is like trying to merge three family trees where half the people go by nicknames and the other half lied about their birthdays.

And this. This is the environment in which AI is expected to thrive.

The irony is sharp. AI, which promises insight at scale, becomes the first victim of poor structure. Models trained on one cohort fail to generalize to the next. Embeddings get polluted with irrelevant features. Retrieval systems hallucinate connections between things that only look similar because someone used the same vague term in a spreadsheet six years ago.

Even when the data itself is brilliant, the scaffolding can sabotage it. One major pharmaceutical company ran into this last year during a machine learning project aimed at predicting compound efficacy across cancer cell lines. The model worked well internally. Then it collapsed on external validation. The reason was trivial. Internally, they had labeled response in micromolar concentrations. The validation set used nanomolar. No one had documented the units. No one had checked. The model had learned a scale, not a biology.

When this happens at the level of small molecules and cancer screens, the stakes are high but fixable. When it happens in clinical trials, population genomics, or therapeutic target validation, the stakes are existential. The wrong label can drive the wrong hypothesis, which drives the wrong experiment, which costs millions and delays progress. Multiply that across thousands of studies and you start to see the shape of the crisis.

There are efforts to fix this. FAIR data principles were a start. The idea that data should be Findable, Accessible, Interoperable, and Reusable has become a mantra in institutional decks and grant proposals.

Ah. But implementation lags far behind. Being FAIR-compliant in theory is not the same as being interoperable in practice. Most datasets labeled FAIR still require bespoke parsing. Most metadata schemas still collapse under multi-modal weight.

What makes this even harder is the increasing reliance on proprietary formats, especially from wet lab vendors. A single-cell RNA-seq run from a 10x Genomics instrument will output in a form optimized for their own Cell Ranger software. That same data, exported for use in open-source frameworks like Scanpy or Seurat, often loses resolution or metadata richness along the way. The result is a kind of slow-motion degradation, where each handoff between tools strips context, like photocopying a photocopy until all that remains is the outline.

Now, you might think that AI would solve this. That we would train models to learn across modalities, to infer missing metadata, to build harmonized ontologies from messy input. And yes, some are trying. Helixon’s models, GLIMPSE-1, NVIDIA’s BioNeMo tools, yeah they are remarkable. But they are built atop training data that itself is subject to the same inconsistencies. Without clean underlying structure, AI becomes a probabilistic guesser in a field that requires precision.

In clinical contexts, the problem is magnified.

Here, interoperability collides not just with technical inconsistency, but institutional mistrust. Electronic health records are famously fractured. One hospital may use Epic, another Cerner, a third some stitched-together local tool. Each one encodes diagnosis, drug names, procedural history, and even demographic fields in different formats. Integrating genomic data with EHRs sounds beautiful on a slide. In reality, it means reconciling ten years of code drift, inconsistent CPT usage, and physician notes full of contradictory shorthand.

At the edge of this mess sit the regulatory bodies. They do not care about your pipeline’s elegance. They care whether you can explain what it did. Explainability in AI is no longer optional. It is mandatory. And yet, when the underlying data lacks traceability (when metadata has been transformed, dropped, or retroactively cleaned without audit) explainability becomes performance art.

You can say what your model did. You cannot say why. And that is not acceptable when lives are on the line.

And calm down. This is not a critique of ambition. The biological sciences should be ambitious. They should aim for fusion modeling across genomic, proteomic, metabolomic, and clinical dimensions. They should imagine a world where a patient’s blood, behavior, and biology are all legible to an AI that helps physicians intervene before disease takes root.

But but but…you cannot build that world on top of inconsistent language.

What’s at stake is not just technical. It is epistemological. When every dataset is an island, every insight becomes anecdotal. And without interoperability, you cannot scale discovery. You can only re-discover, over and over again, what someone else already learned in a slightly different format.

There are ways out. One is enforcement. Tying public funding to adherence to open standards. Making publishers reject papers without transparent metadata schemas. Requiring that major repositories not just accept data, but validate its structure. Another is infrastructure.

We need open-source, community-governed tools that validate, align, and reconcile data as part of standard scientific workflows. And we need those tools to be boring. To be invisible. To, you know, just work.

There is also a more radical possibility.

That foundation models themselves could become interoperability layers.

Trained not just on sequences and images, but on the metadata itself. That they could learn to resolve inconsistencies, translate between standards, and align disparate vocabularies in real time. It is not a guarantee. But it is a vision worth pursuing.

Because right now, we are asking brilliant scientists to waste their best hours on cleaning spreadsheets, writing conversion scripts, and second-guessing whether “BRCA1_mut” means what they think it means. And that is a failure of infrastructure, not of intellect.

There is no single villain in this story. Just thousands of small decisions made in isolation, each one defensible, each one understandable, and each one slowly eroding our ability to think across systems. It is not malice. It is entropy.

And so the question becomes: can the field of BioIT build a common language before its own tools outpace our capacity to understand them? Or will we keep filling the world with beautiful, unusable data?

The breakthroughs are not hiding in the signal. They are trapped in the noise between systems.

To unlock them, we do not need more models. We need more meaning.

And that will only come when we stop treating interoperability as a luxury and start treating it as the foundation of everything.

Why Interoperability Still Fails in 2025

MJ Horowitz

Read more

SandboxAQ Lives in the Future That Trains on Synthetic Molecules

The Governance Era Arrives for AI in Clinical Trials

Xaira Dropped a Dataset that Might Just Change Biology Forever

Fauna Bio's Wild Bet? Cracking Evolution's Code with AI at Nature’s Edge