Release Strategy

OpenBind's main aim is to release data and models into the public domain, adhering to best practices in AI.

We have devised the release strategy illustrated below, which focuses on:

  • Preventing data leakage before objective model assessments.
  • Frequent releases to accelerate data accumulation and community progress.
  • Alignment with funder expectations and early-access benefits for partners.
  • Respond dynamically to industry needs.

Core objectives

  • Maximise OpenBind infrastructure utilisation.
  • Shorten active learning cycles for richer data.
  • Use blind challenges for unbiased predictive assessment.
  • Maintain robust data curation both prior to and after public data release.
  • Respond dynamically to industry needs.

Cadence

The cadence of work is the "release interval"; the end of each marked by:

  • Public release of new data.
  • Retrained open model.
  • Blind challenge readout. .
  • Each release is the product of a two-stage "data cycle" spanning two release intervals:

    Stage 1 – Data generation (magenta) in combination with active learning (blue) through iterative model fine-tuning (teal).

    Stage 2 – Model retraining (yellow) involves: retraining the model starting with data cleanup and annotating outliers; preparation of data and protocols for FAIR deposition; and running the blind challenge with the cleaned up data.

Experiment cycles

Release intervals depend on data generation speed, which will accelerate as OpenBind matures. Our aim is for:

  • Initially multiple short cycles (inset) (~3 weeks target; initially 6–8 weeks) to enable fine steering of data generation.
  • Continuous data generation (small magenta arrows), with analysis (blue) and iterative fine-tuning following initial output (first draft).
  • Only pre-prepared targets and chemistries are included to avoid delays.

Participant access

OpenBind participants benefit from:

  • Access to data up to 12 months ahead of public release.
  • Access to intermediate fine-tuned models (never published) as soon as they are generated.
  • Fully trained models and blind challenge results ahead of public release.
  • Access to blind challenge readouts ahead of public release.

Data steering

The data strategy feeds in through two cadences:

  • Coarse-grained steering of targets and chemistries during preparation of each data cycle, guided by partner models.
  • Fine-grained steering of compound selection for experiment cycles, guided by partner models and fine-tuned with realtime OpenBind data.
  • Access will be through the programmatic interfaces that support active learning, streamlined model training and execution of blind challenges.

Long-term goals

We aim to achieve broad, generalisable data coverage quickly. Unlike CASP’s 2-year cadence, OpenBind targets:

  • Phase 1 – 6-month releases.
  • Future – 3-month releases.

Our partners

Diamond logo
University of Oxford logo
Columbia University logo
EBI logo
IPD logo
MedChemica logo
MSKCC logo
OMSF logo