Nvidia-Backed SandboxAQ Generates Synthetic Data to Accelerate Drug Discovery

Artificial intelligence startup SandboxAQ, spun out of Alphabet’s Google and backed by Nvidia, unveiled a large synthetic dataset designed to speed up drug discovery by improving predictions of how drugs bind to proteins. This crucial step helps scientists determine whether a drug candidate will effectively target biological processes involved in diseases.

Although the dataset is rooted in real-world experimental science, SandboxAQ created it computationally using Nvidia’s powerful chips rather than through lab experiments. By combining traditional scientific computing with advanced AI, the startup generated approximately 5.2 million new three-dimensional molecular structures that have not been observed naturally but are scientifically plausible based on existing data.

This synthetic data is being released publicly to train AI models capable of rapidly and accurately predicting drug-protein interactions, a process that would otherwise take far longer to compute manually—even on the fastest computers. SandboxAQ plans to monetize its own AI models developed using this data, offering a faster, cost-effective alternative to lab experiments.

Nadia Harhen, SandboxAQ’s general manager of AI simulation, explained the breakthrough: “This is a long-standing problem in biology that the industry has been trying to solve. Our synthetic data is tagged with ground-truth experimental results, enabling models trained on this data to achieve unprecedented accuracy.”

The approach represents a promising intersection of scientific computation and AI, potentially accelerating the development of new medicines and improving outcomes in pharmaceutical research.