Systems

Recursion Drew The Blueprint of Future BioIT Infrastructure

MJ Horowitz

16 Jun 2025 — 4 min read

Recursion Pharma's stack is bioIT future spelled out clearly from automated labs through edge GPU compute streaming Kafka Kubernetes orchestration and cloud TPUs The architecture isn't just efficient it's inevitable.

It’s almost a cliché now to say biology has become a data science problem.

But like most clichés, this holds water because it reflects an undeniable truth.

Yet, beyond its conceptual neatness lies the gritty reality of enormous datasets, complex analytics, and infrastructure challenges vast enough to break traditional paradigms.

Recursion Pharmaceuticals’ infrastructure isn’t merely a tech stack. It’s an instructive blueprint illustrating what the future of bioIT infrastructure will likely resemble, though perhaps with different vendor labels.

The real significance lies in understanding why this model, blending automation-heavy laboratories with hybrid-cloud and data-centric architectures, isn’t just efficient. It’s inevitable.

Recursion’s foundation is its lab, churning out over two million biological experiments weekly. The sheer scale forces technology to move beyond classic manual pipelines toward fully automated robotics platforms. This kind of automation is no longer optional at such scale, it's essential.

In Recursion’s labs, tissue cultures, CRISPR screens, and transcriptomics data flow continuously, feeding a constant torrent of cellular images and genetic information.

Microscopes don’t simply take pictures. They pour data into integrated computational workflows, each snapshot another node in a vast computational biology graph.

Every cell, every perturbation, and every image is captured digitally, turned into structured data at birth.

But the importance of high-throughput automation goes beyond data volume. It shifts focus from reactive experimentation to proactive, hypothesis-free exploration. Biology traditionally waits on theories to test. Recursion’s labs run massively parallelized perturbations, broadly scanning biological response space to generate entirely new hypotheses.

In this future model, labs become data generators first and experiment stations second.

The flood of laboratory-generated data cannot simply flow unchecked into cloud storage. Instead, powerful on-premises computing infrastructure serves as the first point of contact.

Recursion relies on GPU clusters like NVIDIA’s DGX SuperPODs and its internally named BioHive systems to perform immediate data processing tasks. These specialized systems are crucial, as they handle initial transformations and filtering, reducing raw experimental data into manageable forms before they hit the cloud.

GPUs are uniquely suited to tasks like image processing and training due to their ability to handle massively parallel computations. Recursion’s latest upgrade to NVIDIA’s H100 GPUs offers a glimpse into why bioIT infrastructure must continually evolve. As biological data complexity grows, so too must the computational capability sitting at its edge.

The future bioIT core infrastructure thus isn’t cloud-only but a powerful hybrid approach, balancing local immediacy with cloud elasticity.

Hardware is fun but let's talk about data.

Managing enormous data sets is pointless if the data can't flow efficiently. Here, Recursion’s streaming pipelines offer a crucial insight. Confluent Kafka provides a streaming backbone robust enough to handle tens of terabytes of data weekly. Kafka streams images directly from microscopes into containerized processing pipelines, orchestrated by Kubernetes clusters, ensuring seamless continuity.

Why Kafka, and not Spark or Storm, you ask? Because Kafka has lower latency, improved fault tolerance, and greater scalability in environments where continuous, high-throughput data is critical. This architecture underscores a fundamental bioIT principle: data must never sit idle. Constant flow and immediate processing are central to rapidly iterating machine-learning-driven experiments.

BioIT infrastructure of the future must embrace streaming data paradigms, moving away from batch processing models entirely.

And now we go in between on-prem infrastructure and the software stack.

Hybrid cloud strategies offer perhaps the clearest picture of why Recursion’s model is destined to become ubiquitous. Kubernetes, especially through Google Kubernetes Engine (GKE), allows Recursion to manage workloads across local data centers and the cloud. It is designed to ensure uniformity in container management, software deployment, and orchestration.

In this biz, compute demands often spike unpredictably. Training machine-learning models to interpret cellular data may require bursts of computational power exceeding local infrastructure. Cloud elasticity meets this demand seamlessly. By integrating GKE On-Prem, Recursion blends local immediacy with cloud scalability.

Recursion’s adoption of Google’s Cloud TPUs represents more than incremental improvement. It’s transformative. TPUs significantly shorten the training times for complex neural networks, sometimes by factors of 20 or more compared to traditional GPU-based training alone.

This isn’t merely a speed advantage. It translates directly into faster iterations of hypothesis generation, pushing discovery timelines.

In the future, bioIT infrastructure will optimize computational resources by workload type, dynamically allocating GPUs or TPUs depending on need. GPU-based local infrastructure handles image-intensive workloads effectively. Cloud TPUs handle heavy lifting for large-scale neural network training.

Recursion demonstrates clearly why future infrastructure won’t just be hybrid by geography. It will also be hybrid by computational architecture, matching compute types precisely to scientific workflows.

BioIT infrastructure’s true value lies in its ability to derive actionable insight from data. Recursion’s integration of Google Cloud Storage, BigQuery analytics, and custom analytics tools illustrate a robust, fully-integrated data ecosystem. Data from Kafka streams directly into cloud storage, from which analytical pipelines feed sophisticated models.

This architecture ensures smooth transitions from raw data to analytic insights, providing scientists with immediately usable information rather than vast troves of unexplored data.

Future infrastructure will follow this pattern because it aligns precisely with scientific workflows: capturing data, structuring it meaningfully, analyzing it rapidly, and feeding back results into the experimental pipeline.

Recursion Pharmaceuticals’ infrastructure, from automated lab benches to hybrid cloud orchestration, illustrates precisely why bioIT’s future looks hybridized, data-centric, and relentlessly optimized for computational speed and efficiency. Vendors may change, but the fundamental principles—high-throughput automation, powerful localized computing, streaming data management, hybrid-cloud flexibility, and purpose-built analytical stacks—will persist.

Recursion’s infrastructure isn’t simply a single company’s choice. It’s the logical consequence of biology’s inexorable evolution into a data-driven science.

SandboxAQ Lives in the Future That Trains on Synthetic Molecules

The Governance Era Arrives for AI in Clinical Trials

Xaira Dropped a Dataset that Might Just Change Biology Forever

Fauna Bio's Wild Bet? Cracking Evolution's Code with AI at Nature’s Edge

Read more

SandboxAQ Lives in the Future That Trains on Synthetic Molecules

The Governance Era Arrives for AI in Clinical Trials

Xaira Dropped a Dataset that Might Just Change Biology Forever

Fauna Bio's Wild Bet? Cracking Evolution's Code with AI at Nature’s Edge