Skip to content
/ bcell Public

A framework for linear B-cell epitope prediction and classification. (ECML PKDD 2023)

License

Notifications You must be signed in to change notification settings

yuanx749/bcell

Repository files navigation

BeeTLe

A deep learning framework for linear B-cell epitope prediction and antibody type-specific epitope classification using Transformer and LSTM encoders.

arXiv | ECML PKDD 2023

Usage

Command Line

After installed, run command like below. It takes a few seconds to predict 10000 peptides.

python cli.py -i input.fasta -o output.csv

To show help, run python cli.py -h. The input is a FASTA file of peptides. The output is a table with following columns:

  • identifier: FASTA header.
  • sequence: FASTA sequence.
  • score: Probability of being epitope.
  • epitope: {0, 1}. 1 for epitope (score > 0.5).
  • Ig: {A, E, M}. The antibody most probably binds to in these three types.

Web App

Without installation, navigate to Streamlit.

Installation

Linux is preferred. GPU is not required.

  • Clone this repo and navigate to the repo folder.

  • Install with pip, preferably in a virtual environment:

    pip install -r requirements.txt
  • Alternatively, to be more specific, use mamba in Linux:

    mamba env create -p ./envs -f environment.yml
    mamba activate ./envs

Data

Follow the notebook data/dataset.py to generate datasets, in which redundancy and false negatives are reduced. The raw data is on figshare.

Development

The code is designed to be reusable and extensible. It may be adopted in other peptide classification tasks. Some useful components are:

  • Loss functions: logit-adjusted, focal; sigmoid, softmax.
  • LSTM (packed variable length input), Transformer encoder, attention.
  • Amino acid encoder.

About

A framework for linear B-cell epitope prediction and classification. (ECML PKDD 2023)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages