Skip to content

PrexSyn

Introduction

PrexSyn is an efficient, accurate, and programmable framework for synthesizable molecular design. It is based on a decoder-only transformer architecture that autoregressively generates postfix notations of synthesis1 (a molecular representation based on chemical reactions and purchasable building blocks) conditioned on molecular descriptors.

PrexSyn is trained on a billion-scale datastream of postfix notations paired with molecular descriptors using only two GPUs and 32 CPU cores in two days. This is made possible by PrexSyn Engine, a real-time, high-throughput C++-based data generation pipeline.

Capabilities

Capability Input Output
Chemical space projection
Graph / SMILES

Fingerprint/descriptor based generation
Fingerprint / descriptor

Molecular sampling
Scoring functions

Performance

Capability Result
Record-high accuracy and speed in chemical space projection and fingerprint/descriptor-based generation Performance comparison
Record-high sample efficiency in molecular sampling against scoring functions Molecular Sampling Performance

Resources

Repositories

Papers and Documentation

Miscellaneous

Citation

@article{luo2025prexsyn,
  title   = {Efficient and Programmable Exploration of Synthesizable Chemical Space},
  author  = {Shitong Luo and Connor W. Coley},
  year    = {2025},
  journal = {arXiv preprint arXiv: 2512.00384}
}

  1. Projecting Molecules into Synthesizable Chemical Spaces. https://arxiv.org/abs/2406.04628