Xaira Dropped a Dataset that Might Just Change Biology Forever

Xaira just dropped X-Atlas/Orion—the largest genome-wide Perturb-seq dataset ever, covering 8 million cells. It's public, powerful, and poised to reshape how AI predicts biology.
Launching a biotech startup is often about making promises.
These are promises to investors, to science, to medicine. But when you start life armed with over a billion dollars, a Nobel laureate, and a former Stanford president in your corner, the stakes, and the promises, tend to get amplified.
Xaira Therapeutics knows this well. Founded just over a year ago, it quickly joined the small club of biotech companies born already burdened by outsized expectations.
This week Xaira delivered something more tangible than promises. They released the X-Atlas/Orion dataset, now the largest publicly available genome-wide perturb-seq dataset ever created.
It meticulously catalogs around eight million cells, each one treated with carefully titrated gene perturbations across every known human protein-coding gene. And each cell's data footprint is deep, averaging over sixteen thousand unique molecular identifiers, or UMIs.
Xaira just handed the scientific community a detailed instruction manual for deciphering human cellular biology at a level of nuance previously inaccessible.
But the significance of X-Atlas/Orion goes well beyond its size or even its remarkable level of detail. It emerges from a philosophical stance: biology needs better data to feed artificial intelligence models that could transform how we understand, predict, and ultimately intervene in disease.
If this dataset delivers as Xaira hopes, it might not just accelerate drug discovery. It could redefine how drug discovery even happens.
The secret to X-Atlas/Orion’s power rests in the methods used to build it. At its heart is a novel technology platform developed by Xaira, somewhat innocuously named FiCS (Fix-Cryopreserve-ScRNAseq). FiCS is an engineering triumph in its own right, streamlining the complex pipeline of single-cell RNA sequencing. It freezes biological states so they can be examined at scale, providing high-fidelity data ripe for computational analysis.
Critically, FiCS enables dose-dependent gene perturbations, meaning it doesn’t merely switch genes on or off. Instead, it dials gene expression up or down incrementally, measuring subtle biological reactions in a controlled fashion.
This “analog” approach is revolutionary because it mirrors reality far better than simpler, binary gene perturbations. Life isn’t about being completely off or completely on. Biology operates through gradients, through subtle shifts and nuances, and FiCS captures that.
Leading the scientific charge behind X-Atlas/Orion is Ci Chu, Xaira’s VP of Early Discovery. Chu emphasized precisely this nuance when unveiling the dataset. "Understanding biology deeply requires capturing the subtle gradients and dose-dependent effects that genes produce in cells. That’s exactly what X-Atlas/Orion enables."
Alongside him, Bo Wang, Xaira’s SVP of Biomedical AI, underscored the dataset’s potential to fuel artificial intelligence models capable of accurately simulating cells. These virtual cells, he believes, could revolutionize biology, turning painstaking empirical guesswork into precise computational predictions.
Xaira itself is uniquely positioned for this effort. Founded in April 2024 with headline-grabbing backing from ARCH Venture Partners, Foresite Labs, Sequoia, Lux, and others, the company immediately attracted scientific luminaries. David Baker, Nobel laureate and protein-design pioneer, is a co-founder. Marc Tessier-Lavigne, former president of Stanford and ex-CSO of Genentech, sits as CEO and chair. This heavyweight scientific pedigree raised eyebrows from day one, setting expectations unusually high, even for a venture so generously funded.
But until now, the market waited quietly to see if Xaira would move beyond impressive credentials and venture cash.
With X-Atlas/Orion, the biotech is finally showing its hand. It’s choosing openness first, signaling that true value often emerges from collaboration, from community interrogation, and from models trained on genuinely world-class datasets.
Xaira’s strategic bet is that by releasing this dataset openly (under non-commercial licenses), it will quickly become foundational for computational biology, attracting academic researchers, small startups, and established pharmaceutical giants alike.
Industry reaction has been swift and enthusiastic.
Michael Koeris, a leading biotech entrepreneur and industry observer, publicly called the dataset’s release "a game changer," suggesting its true impact could ripple through biology, artificial intelligence, and pharmaceutical innovation. Yet caution is advised. The history of AI-driven biology remains littered with cautionary tales...datasets that didn’t deliver, models that never translated, promises deferred indefinitely.
Still, there is reason for optimism. Xaira’s approach shows a keen awareness of these historical pitfalls. The company is not promising immediate cures or sudden clinical leaps; instead, it is laying groundwork.
But here's the big questoin. Will competitors swiftly adopt and leverage this dataset, diluting Xaira’s early advantage, or does the open approach actually bolster Xaira’s credibility, cementing its role as a hub for collaboration?
And when and how will this ambitious data resource translate into tangible clinical breakthroughs, something that, despite advances in AI, remains frustratingly elusive?
For now, though, Xaira has earned a rare moment of acknowledgment. A biotech founded with lofty ambitions and outsized promises has produced something tangible, something valuable, and something openly shared.