The AI3 Antibody was developed as part of a bispecific antibody (bsAb) platform targeting two distinct immune checkpoints: PD-L1 and CD28. The AI3 arm specifically binds CD28, a costimulatory receptor on T cells, while the complementary S79 arm targets PD-L1, a protein that suppresses T-cell activation. The AI3 arm was generated using single-chain variable fragment (scFv) phage-display libraries derived from the IGHV3–23 germline gene, with diversification focused on the light chain variable (VL) domain .
Germline Origin: Based on the IGHV3–23 germline gene, a common template for human antibodies .
Selection Strategy: Screened for specific binding to soluble CD28 or CD28-expressing Jurkat cells using ELISA and flow cytometry .
Therapeutic Context: Part of NI-3201, a bsAb designed to enhance T-cell activation by colocalizing PD-L1 and CD28 on the cell surface .
The AI3 fragment-antigen binding (Fab) structure was resolved via crystallography, revealing critical interactions with CD28. The antibody’s paratope (binding site) includes residues from the heavy and light chain variable regions. Structural data highlights:
Epitope Mapping: The AI3 Fab binds CD28’s extracellular domain, focusing on residues K63, R64, and E65 .
Paratope Residues: Key interactions involve the heavy chain’s HCDR2 (R50, S51, Y53) and light chain’s LCDR3 (Q91, S92, Y93) .
| Epitope Residues (CD28) | Paratope Residues (AI3) | Interaction Type |
|---|---|---|
| K63 | HCDR2 (R50, S51, Y53) | Hydrogen bonding |
| R64 | LCDR3 (Q91, S92, Y93) | Salt bridge |
| E65 | HCDR2 (R50) | Electrostatic |
Affinity: AI3 binds CD28 with a dissociation constant (Kd) of 3.2 nM, as measured by surface plasmon resonance (SPR) .
Specificity: No cross-reactivity with non-target proteins, confirmed by immunoprecipitation and ELISA .
In vitro assays demonstrated that NI-3201 (containing AI3) enhances T-cell activation by 3.5-fold compared to anti-PD-L1 monotherapy, measured via IFN-γ secretion . In vivo studies in syngeneic tumor models showed 70% tumor regression with NI-3201 treatment, compared to 30% with monotherapy .
| Model | Tumor Type | Efficacy (Tumor Regression) |
|---|---|---|
| CT26 (colon cancer) | Colorectal | 70% |
| MC38 (melanoma) | Melanoma | 65% |
Dual-Specificity: AI3 enables simultaneous binding to CD28 and PD-L1, facilitating T-cell activation while blocking immune suppression .
Structural Stability: The bispecific design maintains a serum half-life of 4.8 days, comparable to monospecific antibodies .
The AI3 Antibody exemplifies the potential of AI-driven antibody engineering in targeting complex immune pathways. Its integration into bispecific platforms highlights advancements in:
KEGG: sce:Q0060
STRING: 4932.Q0060
AI technologies are transforming therapeutic antibody discovery by addressing traditional bottlenecks related to inefficiency, high costs, failure rates, and limited scalability. Recent developments, such as the VUMC project funded by ARPA-H with $30 million, aim to build extensive antibody-antigen atlases and develop AI algorithms that can effectively generate antibody therapies against virtually any antigen target of interest . This represents a significant shift toward democratizing the antibody discovery process, making it more accessible for researchers to efficiently generate monoclonal antibody therapeutics against specific targets . Modern AI approaches have demonstrated the ability to significantly reduce experimental screening requirements while maintaining or improving the quality of antibody candidates.
AI models generate antigen-specific antibody sequences through several methodological approaches:
Language model-based generation: Systems like IgLM are trained on antibody sequence repositories to learn the inherent patterns and rules of antibody sequences . These models can generate novel sequences that follow natural antibody structure constraints.
Structure-guided design: AI systems can incorporate structural information about target antigens to generate complementary binding regions, particularly in the complementarity determining regions (CDRs) that are crucial for antigen recognition .
Template-based approaches: Many systems use germline-based templates as starting points, mimicking the natural processes of antibody generation while focusing computational power on optimizing the hypervariable regions like CDRH3 .
In a documented example, researchers successfully employed AI to design antibodies against a specific epitope on SARS-CoV-2 spike protein using VH3-53 germline gene templates, achieving a notable success rate of approximately 15% in generating functional antigen-specific antibodies .
Training effective AI models for antibody design requires diverse and high-quality datasets that typically include:
| Data Type | Description | Importance | Common Sources |
|---|---|---|---|
| Antibody sequences | Primary amino acid sequences of antibodies with known properties | Core training data | Public databases, repertoire sequencing |
| Structural data | 3D structural information of antibodies and antibody-antigen complexes | Critical for binding prediction | PDB, AlphaFold databases |
| Binding affinity data | Quantitative measurements of antibody-antigen binding strength | Essential for optimization | Experimental characterization (SPR, BLI) |
| Repertoire data | Natural antibody sequences from immune responses | Provides evolutionary context | Next-generation sequencing of B cell populations |
As demonstrated in recent research, even relatively small datasets of 35 experimentally characterized antigen-specific variants can be sufficient to train machine learning models that achieve remarkable accuracy in predicting antibody affinity, with R² values exceeding 0.86 using Gaussian Process models with Matérn kernels . These models can then successfully guide the design of synthetic antibody variants with desired binding properties.
Supervised machine learning approaches and generative models serve different but complementary functions in antibody design:
Supervised Learning Approaches:
Require labeled training data (e.g., sequence-affinity pairs)
Excel at predicting specific properties of existing sequences
Can achieve high accuracy even with limited datasets (~35 sequences)
Particularly effective for affinity optimization of existing antibodies
Common algorithms include Gaussian Processes, Random Forests, and Kernel Ridge Regression
Generative Models:
Can create entirely novel sequences without explicit property labels
Learn the underlying distribution of antibody sequences
Better suited for exploring new regions of sequence space
May require larger training datasets to capture sequence diversity
Include language models like IgLM that can generate de novo antibody sequences
Research demonstrates that supervised ML models trained on experimentally measured affinities of antibody variants can achieve remarkable prediction accuracy (R² values of 0.8625 for Gaussian Process models), enabling successful in silico design of antibodies with specifically engineered affinities . Generative approaches, meanwhile, have demonstrated the ability to produce entirely novel CDRH3 sequences that successfully bind specific antigen targets .
The most effective computational workflows for selecting antigen-specific variants from antibody repertoire data typically involve multi-stage filtering and clustering approaches. Based on recent research, a highly effective workflow includes:
Initial sequence similarity filtering: Using Levenshtein distance thresholds (typically 80% amino acid similarity) to identify sequences similar to known antigen-specific antibodies .
Complementary clustering approaches: Applying affinity propagation (AP) clustering to identify related sequence groups that may not be captured by strict threshold filtering .
Expanding selection to full variable domains: Including all VH sequences containing the selected CDR sequences to capture the influence of framework regions .
Representative sampling via k-medoids clustering: Ensuring diverse coverage of the potential binding space through cluster-based selection .
This workflow has been validated experimentally with success rates of identifying functional antigen-specific antibodies from repertoire data reaching 70% or higher in some studies. Importantly, this approach can start with just a single known antigen-specific antibody as a seed and expand to identify diverse functional variants, making it practical for researchers exploring novel targets .
Researchers can incorporate structural predictions into AI-driven antibody design through several methodological approaches:
Structure-based filtering of sequence-generated candidates: After generating antibody sequences using AI models, structural prediction tools can be employed to model candidate structures and assess their compatibility with target antigens. This approach was successfully used to down-select CDRH3 sequences based on predicted structural similarity to known antigen-specific antibodies .
Direct structure-guided sequence generation: More advanced systems incorporate structural constraints directly into the generation process, using predicted antibody-antigen complex structures to guide sequence optimization in binding regions.
Ensemble methods combining sequence and structure prediction: Most effective pipelines use ensemble approaches that integrate multiple prediction tools:
Sequence-based affinity prediction models
Structural stability assessment tools
Binding interface energy calculations
Molecular dynamics simulations for flexibility assessment
When implementing these methods, researchers should be aware that the quality of structural modeling significantly impacts outcomes. Ongoing optimization of structural prediction tools continues to improve the success rates of AI-designed antibodies, with current systems achieving noteworthy hit rates of approximately 15% for generating functional antigen-specific antibodies through validating relatively small numbers of candidates .
Accounting for post-translational modifications (PTMs) represents one of the more challenging aspects of AI-driven antibody design. Current methodological approaches include:
Feature engineering for PTM sites: AI models can be trained to recognize sequence motifs associated with common PTMs like glycosylation, incorporating these as explicit features in prediction algorithms.
Structural modeling of modification sites: Advanced structural prediction tools can model the impact of PTMs on antibody conformation and stability, particularly around the Fc region where glycosylation significantly affects effector functions.
Integrated experimental-computational workflows: The most effective approach involves iterative cycles where:
AI generates candidate sequences
Experimental characterization identifies any problematic PTMs
Results feed back into model refinement to improve future predictions
Current research suggests that while deep learning models can be trained to predict some PTM sites with reasonable accuracy, the complex interplay between modifications and antibody function often requires experimental validation. Researchers should implement controls for PTM variability when validating AI-generated antibody candidates, particularly when transitioning from recombinant expression systems to production cell lines.
Dataset bias presents significant challenges for AI models in antibody design. Researchers can implement several methodological approaches to mitigate these biases:
Diversification of training data sources: Combining antibody sequences from multiple species, different immunization protocols, and various discovery platforms (phage display, single B cell, etc.) helps prevent platform-specific biases .
Balanced sampling techniques: Implementing computational strategies to ensure balanced representation of:
Different antibody germline families
Diverse CDRH3 lengths
Various binding epitopes and mechanisms
Transfer learning approaches: Pre-training models on large diverse datasets before fine-tuning on specific targets can help preserve generalizability.
Validation with randomized controls: Testing models with randomized labels to confirm they are capturing meaningful patterns rather than dataset artifacts. Research demonstrates that properly trained models show significantly better performance compared to those trained on randomized data .
Cross-validation strategies: Implementing rigorous leave-one-out cross-validation (LOO-CV) and nested cross-validation protocols to assess model generalizability .
The effectiveness of these approaches has been demonstrated in recent research where models trained on carefully selected diverse antibody variant datasets achieved R² values exceeding 0.82 even in LOO-CV scenarios, indicating robust performance beyond the training data .
Recent research provides detailed comparisons of machine learning algorithms for predicting antibody-antigen binding affinity:
The practical implication is that researchers should consider dataset size when selecting algorithms, with GP models being particularly valuable for the early stages of antibody discovery when experimental data is limited. These models can effectively guide the design of synthetic antibody variants, with validation studies confirming predicted affinities for 7 out of 8 AI-designed variants .
AI-based antibody design faces several significant limitations that require methodological solutions:
Limited training data availability: While AI models can achieve remarkable results with relatively small datasets (35 variants), the scarcity of comprehensive binding data remains challenging . Addressing this requires:
Development of high-throughput characterization methods
Establishment of standardized public repositories for antibody-antigen interaction data
Transfer learning approaches to leverage knowledge from related binding systems
Sequence length constraints: Current approaches often restrict analysis to antibodies of identical length to avoid sequence alignment complexity . Future improvements should:
Develop advanced encoding techniques for variable-length sequences
Implement pre-trained language model (PLM) embeddings to handle sequence diversity
Create architectures specifically designed for the natural variability in antibody CDRs
Complex optimization objectives: Antibodies require simultaneous optimization of multiple properties beyond affinity (stability, expressibility, immunogenicity). Emerging solutions include:
Multi-objective optimization algorithms
Weighted ensemble models that balance different properties
Sequential filtering pipelines that address properties in prioritized order
Integration with experimental workflows: Bridging computational prediction and experimental validation remains challenging. Effective approaches include:
Development of standardized validation protocols for AI-generated antibodies
Implementation of active learning to prioritize experiments that maximize information gain
Creation of integrated computational-experimental platforms that accelerate iteration cycles
As the field advances, these limitations are gradually being addressed through methodological innovations and increasing integration of computational and experimental approaches .
Rigorous validation of AI model predictions for antibody design requires multi-faceted methodological approaches:
Statistical validation protocols:
Experimental validation strategies:
Expression and characterization of selected variants spanning the predicted property range
Biolayer interferometry (BLI) or surface plasmon resonance (SPR) for affinity measurements
Structural confirmation via crystallography or cryo-EM for selected candidates
Functional assays relevant to the intended antibody application
Prospective validation through design challenges:
Research demonstrates that properly implemented validation strategies can confirm AI models' ability to guide antibody engineering with high accuracy, even with relatively small training datasets. Gaussian Process models in particular provide valuable uncertainty estimates that can help researchers prioritize candidates with higher confidence predictions .
Several emerging AI technologies demonstrate exceptional promise for advancing antibody research:
Foundation models for antibody sequences: Large-scale pre-trained language models specifically developed for antibody sequences are beginning to demonstrate remarkable capabilities in capturing the complex grammar of functional antibodies. These models learn from millions of natural antibody sequences and can generate diverse, functional candidates .
End-to-end differentiable structural modeling: New approaches that integrate sequence generation with differentiable structural prediction enable direct optimization of antibodies based on predicted binding energetics to target antigens.
Multimodal learning frameworks: Systems that simultaneously leverage sequence, structure, and experimental binding data are showing superior performance compared to single-modality approaches.
Reinforcement learning for antibody optimization: Emerging reinforcement learning frameworks can efficiently navigate the vast sequence space by learning from experimental feedback, progressively improving design success rates.
Federated learning approaches: To address data scarcity and privacy concerns, federated learning systems allow training across distributed datasets without centralizing sensitive research data.
The integration of these technologies is leading toward truly de novo antibody design capabilities beyond CDR-only sequence generation, with ongoing improvements in ML models and structural prediction approaches expected to "increase immensely" the efficiency and accuracy of generating antigen-specific antibodies .
Effective integration of AI tools into traditional antibody discovery pipelines requires careful consideration of several methodological aspects:
Strategic insertion points: Identify specific stages where AI can provide maximum value:
Initial candidate generation to expand diversity beyond conventional approaches
Affinity maturation guidance to reduce library sizes and screening efforts
Developability prediction to filter candidates earlier in the process
Complementary application: Use AI as a complement rather than replacement for established methods:
Generate candidates via traditional methods and use AI for prioritization
Apply AI-guided design alongside conventional optimization techniques
Validate AI predictions with established experimental assays
Iterative workflow design: Implement feedback loops between computational and experimental steps:
Use initial experimental data to train project-specific models
Apply models to generate next-generation candidates
Continually refine models with new experimental results
Cross-disciplinary team integration: Ensure effective collaboration between:
Computational scientists who understand model capabilities and limitations
Experimental biologists who can interpret results in biological context
Protein engineers who can translate predictions into testable designs
This integrated approach has demonstrated significant advantages over traditional methods alone, with research showing AI-enhanced pipelines can reduce the time and costs required for antibody design by minimizing failures and increasing experimental success rates .
To maximize AI model effectiveness, researchers should prioritize collecting the following experimental data:
| Data Type | Measurement Method | Value for AI Models | Collection Priority |
|---|---|---|---|
| Binding affinity | BLI, SPR, KinExA | Core metric for optimization | Highest |
| Binding kinetics (kon, koff) | SPR, BLI | Mechanistic insights beyond KD | High |
| Thermal stability | DSC, nanoDSF | Critical developability parameter | High |
| Epitope mapping | HDX-MS, Peptide arrays | Guides targeting specificity | Medium-High |
| Expression yields | Various expression systems | Practical development metric | Medium |
| Cross-reactivity | Antigen panels | Specificity assessment | Medium |
| Structural data | X-ray, Cryo-EM | Validates binding mechanisms | Medium (resource-intensive) |
Research demonstrates that even modest datasets of 35 characterized variants can enable highly accurate ML models when the data captures diverse sequence variations and precise affinity measurements . Critically, data quality is often more important than quantity, with carefully controlled experimental conditions and standardized protocols significantly improving model performance.
For researchers with limited resources, prioritizing high-quality affinity measurements across a diverse but manageable set of sequence variants offers the best return on investment for enabling effective AI-guided antibody engineering .