Protein AI Series Part 4: Co-Folding and the Open-Source Race
The Technical Evolution of Protein AI — A Record of Key Design Decisions
This is Part 4 of a 10-part series tracing the architectural choices behind modern protein structure prediction and design models.
The Core Question
How do we handle proteins, nucleic acids, ligands, ions, and modified residues within a single model?
Parts 1–3 focused on the core prediction pipeline: input representation → Trunk → Structure Generation. But real biology involves heterogeneous molecular complexes — a kinase bound to a small-molecule inhibitor, an antibody recognizing a glycosylated antigen, a ribosome with mRNA and tRNA. This Part traces how models evolved from protein-only prediction to universal co-folding, and examines the open-source ecosystem that emerged in parallel.
Part A: The Technical Evolution of Co-Folding
1. The Representation Problem
Different molecular types have fundamentally different structures:
| Molecular Type | Structure | Natural Unit | Examples |
|---|---|---|---|
| Proteins | Linear polymer | Residue (backbone + side-chain) | Enzymes, antibodies |
| Nucleic acids | Linear polymer | Nucleotide (sugar + base + phosphate) | DNA, RNA |
| Small molecules | Arbitrary graph | Atom | Drug-like ligands, cofactors |
| Ions / water | Single atom | Atom | Zn²⁺, Mg²⁺, Ca²⁺ |
| Modified residues | Residue + modification | Hybrid | Phosphoserine, glycosylation |
The core challenge: polymers are naturally described at the residue level (one token per residue), but ligands and ions require atom-level description. How do you run a unified Trunk (Part 2) and Structure Generation (Part 3) over tokens that represent fundamentally different things?
2. AF2 → AF2-Multimer → AF3: The DeepMind Progression
2.1 AF2: One Token = One Residue, Proteins Only
AlphaFold2 tokenizes exclusively at the residue level. Each token represents one amino acid — its backbone atoms (N, Cα, C, O) and side-chain atoms are grouped under a single token. The pair representation $z_{ij}$ encodes residue-residue relationships.
This design is elegant for proteins but structurally incapable of representing non-polymeric molecules. A drug molecule with 30 atoms doesn’t have residues, backbone, or side-chains.
2.2 AF2-Multimer: Multi-Chain Extension
AF2-Multimer (2021) extended AF2 to protein complexes with multiple chains:
- Paired MSA: Cross-chain evolutionary coupling via paired sequence search
- Chain permutation: Handle symmetric complexes where chain labels are interchangeable
- Relative chain encoding: Add inter-chain positional features to the pair representation
But the fundamental tokenization remained protein-residue-only — no ligands, no nucleic acids.
2.3 AF3: Mixed Tokenization + AtomAttention
AlphaFold3 (2024) introduced mixed tokenization — the key architectural innovation enabling universal co-folding:
1
2
3
4
5
6
7
Standard residues (protein, DNA, RNA):
1 token = 1 residue (as in AF2)
→ reference atoms define the token's position
Non-standard entities (ligands, ions, modified residues):
1 token = 1 atom
→ each atom is its own token
This creates a heterogeneous token sequence where some tokens represent entire residues (~14 atoms) and others represent single atoms. To bridge these scales, AF3 introduced the AtomAttention Encoder/Decoder:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
All atoms (raw coordinates)
│
▼
┌─────────────────────┐
│ Atom Encoder │
│ (3 layers) │
│ Atom → Token │
│ Window attention │
│ (Q=32, K=128) │
└──────────┬──────────┘
│
Token-level representations
│
▼
┌─────────────────────┐
│ Pairformer Trunk │
│ (48 blocks) │
│ + Token Transformer│
│ in diffusion │
└──────────┬──────────┘
│
Token-level representations
│
▼
┌─────────────────────┐
│ Atom Decoder │
│ (3 layers) │
│ Token → Atom │
│ Predict Δ coords │
└─────────────────────┘
│
▼
All-atom coordinates
The Atom Encoder uses windowed attention (query window Q=32, key window K=128) to aggregate atom-level features into token-level representations. Within each window, atoms belonging to the same token (e.g., all atoms of a residue) attend to each other and to nearby atoms from other tokens. The Atom Decoder reverses this mapping, producing per-atom coordinate updates from token-level predictions.
This design means the Trunk (Pairformer) and the Token Transformer in the diffusion module operate at the token level — the same resolution regardless of molecular type. The atom-to-token and token-to-atom conversions happen in thin wrapper layers (3 layers each), keeping the core architecture unchanged.
3. RFAA: The Hybrid Alternative
RoseTTAFold All-Atom (RFAA, Baker Lab, 2024) took a different approach to multi-molecular modeling:
- Proteins and nucleic acids: processed through the standard RF2 three-track architecture (1D/2D/3D) at residue resolution
- Small molecules: represented as atom graphs with learned embeddings, processed through a separate graph neural network
- Cross-modal interaction: protein-residue and ligand-atom features interact through cross-attention layers
This hybrid design contrasts with AF3’s unified tokenization. Instead of forcing everything into the same token space, RFAA maintains type-specific representations and lets them interact through explicit cross-modal layers.
Trade-offs:
- RFAA’s hybrid approach requires designing cross-modal interaction mechanisms for each pair of molecular types
- AF3’s unified tokenization handles any combination automatically through the same attention mechanism
- RFAA’s design may be more expressive for individual molecular types but less scalable to new entity types
RFAA’s significance extends beyond structure prediction — it became the backbone for RFdiffusion-AA, which extended backbone design (RFdiffusion) to include small molecule interactions (e.g., designing proteins that bind specific drug molecules).
4. NP1 → NP2 → NP3: Incremental Expansion
The NeuralPLexer lineage shows a gradual expansion of scope:
| Version | Year | Scope | Architecture | Generation |
|---|---|---|---|---|
| NP1 | 2022 | Protein-ligand only | SE(3)-equivariant GNN | SE(3) diffusion |
| NP2 | 2024 | All biomolecules | Extended GNN | SE(3) diffusion |
| NP3 | 2025 | All biomolecules | Encoder-Decoder (PairFormer + DiT) | Flow matching |
NP3’s transition to the PairFormer-based encoder (Part 2) and flow matching decoder (Part 3) represents a convergence with the AF3 architectural paradigm — while adding its own innovations (polymer prior, encoder-decoder separation, Flash-TriangularAttention).
NP3 introduces a two-level atom handling strategy:
- Anchor atoms (one per residue/nucleotide, or each atom for ligands): processed with dense attention in the encoder
- All heavy atoms: processed with sliding window attention in the decoder, achieving $O(N)$ scaling
This design allows NP3 to handle large complexes efficiently while maintaining atomic resolution where it matters most (ligand binding sites, protein-nucleic acid interfaces).
Part B: The Open-Source Reproduction Race
5. The Landscape After AF3
When AlphaFold3 was published (May 2024), its code was not initially released. The paper’s restricted academic license (released November 2024) further motivated the community to build open alternatives. The result was an unprecedented race to reproduce — and extend — AF3’s capabilities.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
AF2 (2021)
├── OpenFold (2022) ─── Apache 2.0, AF2 reproduction
└── UniFold (2022) ──── Apache 2.0, AF2 reproduction (DP Technology)
AF3 paper (2024.05)
│
├── Boltz-1 (2024.09) ─── MIT license, full training code
│ └── Boltz-2 (2025) ─── + Affinity head
│ └── BoltzGen (2025) ─── + Design capability
│
├── Chai-1 (2024.09) ─── Apache 2.0, inference only
│ └── Chai-2 (2025) ─── Antibody design (closed)
│
├── Protenix (2024.10) ─── ByteDance, MSA module changes
│
├── SeedFold (2025) ──── ByteDance, linear TriAtt + wider Pairformer
│
├── AF3 code release (2024.11) ─── Academic license
│
└── OpenFold3 (2025.10) ─── Apache 2.0, full reproduction
5.1 AF2 Reproductions: OpenFold and UniFold
OpenFold (Columbia University et al., 2022, Apache 2.0) established the template: a complete, permissively licensed reimplementation of AF2 with full training code and weights. Its contributions went beyond reproduction:
- First application of FlashAttention to protein structure prediction
- Compatible with AF2 weights (identical predictions)
- Became the foundation for AlphaFlow and numerous fine-tuning studies
UniFold (DP Technology, 2022, Apache 2.0) independently reproduced AF2’s Evoformer architecture, providing another fully open training pipeline. UniFold served as the foundation for DP Technology’s subsequent models, including Uni-Fold Symmetry for symmetric complexes. Together with OpenFold, it validated that AF2’s methodology was complete and reproducible — establishing the expectation that major models should be openly reimplementable.
OpenFold3 (October 2025, Apache 2.0) applied the same philosophy to AF3:
| Aspect | Detail |
|---|---|
| License | Apache 2.0 (commercial use permitted) |
| Training data | 300K experimental structures + 13M synthetic structures |
| Consortium | Columbia University, LLNL, Seoul National University, SandboxAQ |
| Key strength | RNA structure prediction on par with AF3 — the only open-source model to achieve this |
| Training code | Fully open |
OpenFold3’s RNA performance is notable because RNA structure prediction is widely considered harder than protein structure prediction (fewer training examples, more conformational flexibility), and most AF3 reproductions underperform significantly on RNA benchmarks.
5.2 Boltz-1/2: The Community Workhorse
Boltz-1 (MIT license, September 2024) was the first fully open AF3 reproduction with complete training code. Boltz-2 (2025) added the most significant architectural extension:
Affinity Dual Head: Boltz-2 predicts binding affinity alongside structure — a binary binder/non-binder classification and a continuous affinity value — using a dedicated head that reads the same Trunk representations.
- Won the CASP16 affinity prediction challenge (outperforming all other methods)
- Achieves FEP+-comparable accuracy at ~1000× lower computational cost (Pearson $r = 0.62$ vs FEP+ benchmarks)
- Complete three-stage training pipeline publicly available: structure → confidence → affinity
Boltz-2’s MIT license and complete training pipeline make it the most adopted starting point for new research — from fine-tuning on specialized protein families to developing new output heads.
5.3 Protenix and SeedFold: ByteDance’s Two Approaches
Protenix (ByteDance, 2024) reproduced AF3 with modifications to the MSA module’s signal flow, achieving modest improvements over the AF3 baseline. Its training code is public, though the license terms are less permissive than Boltz or OpenFold3.
SeedFold (ByteDance, 2025) takes a more ambitious approach — rather than faithfully reproducing AF3, it modifies the Pairformer architecture itself:
- Linear Triangle Attention: Replaces the standard $O(L^3)$ triangle attention with a linearized variant, reducing complexity to sub-cubic scaling (see Part 2 for details)
- Wider Pairformer representations: Expands the pair representation dimension beyond AF3’s default of 128, increasing per-layer capacity
- Result: Outperforms AF3 on most protein-related benchmarks
ByteDance’s dual strategy — faithful reproduction (Protenix) alongside architectural innovation (SeedFold) — illustrates that reproduction is a stepping stone, not the end goal.
5.4 Chai-1: Inference-Only but Architecturally Distinct
Chai-1 (Apache 2.0, September 2024) is architecturally the most distinct AF3-family model:
- ESM-2 3B integration as a separate input track (Part 1) — unique among AF3-family models
- Heun sampler (second-order ODE solver) instead of Euler — 2× model evaluations per step but better trajectory accuracy
- Multi-track input: MSA + templates + ESM-2 + experimental restraints
However, Chai-1’s training code is not publicly available — only inference code and weights are released. This limits its utility as a research platform despite its permissive license.
5.5 Comparison Table: Open-Source Ecosystem
AF2 Reproductions:
| OpenFold | UniFold | |
|---|---|---|
| Developer | Columbia University et al. | DP Technology |
| Year | 2022 | 2022 |
| License | Apache 2.0 | Apache 2.0 |
| Key contribution | FlashAttention, AF2 weight compatibility | Independent reproduction, Uni-Fold Symmetry |
AF3-family Models:
| OpenFold3 | Boltz-2 | Protenix | SeedFold | Chai-1 | AF3 (official) | |
|---|---|---|---|---|---|---|
| Developer | Columbia et al. | MIT/Recursion | ByteDance | ByteDance | Chai Discovery | DeepMind |
| License | Apache 2.0 | MIT | Mixed | — | Apache 2.0 | Academic only |
| Training code | Full | Full | Full | — | Inference only | Limited |
| Commercial use | Yes | Yes | Varies | — | Yes | No |
| Unique feature | RNA strength | Affinity head | MSA variant | Linear TriAtt, wider pair | ESM-2 3B | Reference model |
| vs AF3 accuracy | Matches | Matches | Slightly better | Exceeds | Matches | Baseline |
| Community adoption | Growing | Highest | Moderate | — | Moderate | Reference |
6. IsoDDE: Why Reproduction ≠ Surpassing
IsoDDE (Isomorphic Labs, 2026) — the “Drug Design Engine” — demonstrates the gap between open-source reproductions and the frontier:
| Benchmark | IsoDDE | AF3 | Best Open-Source |
|---|---|---|---|
| Antibody-Antigen (DockQ > 0.8) | 39% | ~17% | Similar to AF3 |
| CDR-H3 loop (RMSD ≤ 2 Å) | 70% | 58% | ~60% |
| Pocket prediction (AUPRC) | 1.5× P2Rank | — | — |
IsoDDE’s advantages come not from architectural novelty but from:
- Multi-task learning: Structure prediction, binding affinity, and pocket prediction are trained jointly on a shared representation — each task improves the others
- Scale of training data: Access to proprietary experimental data (Isomorphic Labs’ internal assay data) far exceeding public databases
- Compute budget: Training at a scale that academic groups cannot match
- 1000-seed multi-state inference: Running 1000 independent diffusion trajectories per prediction, then selecting the best by confidence
This gap highlights a sobering reality: architecture changes alone (the focus of open-source efforts) account for only part of model performance. Training data diversity, compute scale, and multi-task synergies are equally — perhaps more — important.
7. Co-Folding Technical Comparison
| AF3 | RFAA | NP3 | Boltz-2 | |
|---|---|---|---|---|
| Molecular representation | Unified tokens | Hybrid (residue + graph) | Encoder-Decoder | Unified tokens |
| Tokenization | Residue + atom | Sequence + atom graph | Anchor + atom levels | Residue + atom |
| Trunk | Pairformer (48) | 3-track RF2 | PairFormer encoder | Pairformer (64) |
| Structure generation | EDM diffusion (200 steps) | SE(3) IPA | Flow matching (40 steps) | EDM diffusion (200 steps) |
| Affinity prediction | No | No | No | Yes (dual head) |
| PLM integration | No | No | ESM-2 + RiNALMo | No |
| License | Academic | BSD | Proprietary | MIT |
| Training code | Limited | Yes | No | Yes |
8. Convergence and Open Questions
Where the field agrees
- Unified tokenization (AF3-style mixed residue + atom tokens) has become the dominant approach, adopted by Boltz, Chai, and OpenFold3. The alternative hybrid approach (RFAA) has not been widely adopted by other groups.
- The Pairformer Trunk is universal — every co-folding model (AF3, Boltz, Chai, NP3, OpenFold3, SeedFold) uses a Pairformer variant as its core representation engine, though SeedFold and Pairmixer (Part 2) differ on whether triangle attention should be linearized or removed.
- Open-source models match or exceed AF3 on standard benchmarks for protein structure prediction and protein-ligand docking. SeedFold exceeds AF3 on most tasks; the gap has largely closed for well-characterized targets.
What remains unresolved
Affinity prediction accuracy: Boltz-2 and IsoDDE have added affinity prediction, but achieving physics-based accuracy (FEP+ level) for arbitrary protein-ligand pairs remains elusive. Current models excel at relative ranking but struggle with absolute $\Delta G$ prediction.
The data moat: IsoDDE’s performance gap suggests that proprietary training data (especially experimental binding data) provides advantages that architectural innovation alone cannot overcome. Whether the open-source community can close this gap through synthetic data generation, data augmentation, or novel training strategies is an open question.
RNA and non-canonical molecules: While OpenFold3 matches AF3 on RNA, most other open-source models significantly underperform. Nucleic acid structure prediction remains less mature than protein prediction, with fewer training examples and greater conformational complexity.
Next: Part 5 — How to Turn a Prediction Model into a Design Model? Four Strategies Compared
We shift from prediction to design — examining how co-folding models like Boltz-2 can be repurposed for generating novel proteins, and comparing four distinct strategies: SE(3) diffusion, conditional generation, latent flow matching, and discrete multimodal generation.
Part of the series: The Technical Evolution of Protein AI — A Record of Key Design Decisions