OpenBind's main aim is to release data and models into the public domain, adhering to best practices in AI.
We have devised the release strategy illustrated below, which focuses on:
- Preventing data leakage before objective model assessments.
- Frequent releases to accelerate data accumulation and community progress.
- Alignment with funder expectations and early-access benefits for partners.
- Respond dynamically to industry needs.
Core objectives
- Maximise OpenBind infrastructure utilisation.
- Shorten active learning cycles for richer data.
- Use blind challenges for unbiased predictive assessment.
- Maintain robust data curation both prior to and after public data release.
- Respond dynamically to industry needs.
Cadence
The cadence of work is the "release interval"; the end of each marked by:- Public release of new data.
- Retrained open model.
- Blind challenge readout. .
Each release is the product of a two-stage "data cycle" spanning two release intervals:
Stage 1 – Data generation (magenta) in combination with active learning (blue) through iterative model fine-tuning (teal).
Stage 2 – Model retraining (yellow) involves: retraining the model starting with data cleanup and annotating outliers; preparation of data and protocols for FAIR deposition; and running the blind challenge with the cleaned up data.
Experiment cycles
Release intervals depend on data generation speed, which will accelerate as OpenBind matures. Our aim is for:- Initially multiple short cycles (inset) (~3 weeks target; initially 6–8 weeks) to enable fine steering of data generation.
- Continuous data generation (small magenta arrows), with analysis (blue) and iterative fine-tuning following initial output (first draft).
- Only pre-prepared targets and chemistries are included to avoid delays.
Participant access
OpenBind participants benefit from:- Access to data up to 12 months ahead of public release.
- Access to intermediate fine-tuned models (never published) as soon as they are generated.
- Fully trained models and blind challenge results ahead of public release.
- Access to blind challenge readouts ahead of public release.
Data steering
The data strategy feeds in through two cadences:- Coarse-grained steering of targets and chemistries during preparation of each data cycle, guided by partner models.
- Fine-grained steering of compound selection for experiment cycles, guided by partner models and fine-tuned with realtime OpenBind data.
- Access will be through the programmatic interfaces that support active learning, streamlined model training and execution of blind challenges.
Long-term goals
We aim to achieve broad, generalisable data coverage quickly. Unlike CASP’s 2-year cadence, OpenBind targets:- Phase 1 – 6-month releases.
- Future – 3-month releases.






