Skip to content

Python toolkit for training and evaluating bias detectors for Web content.

License

Notifications You must be signed in to change notification settings

ngi-indi/module-bias-gym

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo

Bias Gym

License: MIT Version 0.1 Status: Stable

Bias Gym is a testing and benchmarking suite designed to train and evaluate machine-learning models on detecting biases present in textual content. It provides a structured and customizable framework to test various forms of biases, including but not limited to gender bias, racial bias, political bias, linguistic bias, and hate speech.

Table of Contents

Installation

Prerequisites

Before you begin, ensure you have the following installed on your system:

  • Python 3.8 as a base programming environment.
  • PyTorch for handling the model training and inference.
  • Transformers for state-of-the-art NLP models.

Setup

1. Clone the repository:

git clone git clone https://github.com/ngi-indi/module-bias-gym.git
cd module-bias-gym

2. Set up the virtual environment (optional but recommended):

  • On Windows:
python -m venv venv
.\venv\Scripts\activate
  • On macOS/Linux:
python3 -m venv venv
source venv/bin/activate

3. Install dependencies:

Install the required Python packages by running:

pip install -r requirements.txt

4. Download datasets:

  • Download the pre-processed datasets and place them in the appropriate directory: datasets/.
  • Ensure you have all necessary datasets downloaded and placed in the relevant directory.

5. Download pre-trained models (optional):

  • Download the pre-trained model weights and place them in the appropriate directory: models/.
  • Ensure you have all necessary models downloaded and placed in the relevant directory.

Usage

Training

  1. Prepare your data: Ensure datasets are placed in the correct directory or modify the script to point to your data.

  2. Run the train script: Train a model on a specific task, where you can specify the model and task you want to train on using the provided options:

    # Example: Training a single model on a single task
     python train.py --models roberta --tasks gender-bias --epochs 10
     
     # Example: Training multiple models on multiple tasks
     python train.py --models roberta electra --tasks gender-bias hate-speech

    Parameters list:

    • --models: List of models to be trained (default: ['robertatwitter', 'electra', ..., 't5'])
    • --tasks: List of tasks for bias detection (default: ['cognitive-bias', ..., 'gender-bias'])
    • --number_of_folds: Number of folds for cross-validation (default: 5)
    • --batch_size: Batch size for training (default: 32)
    • --max_length: Maximum sequence length for tokenization (default: 128)
    • --epochs: Number of epochs for training (default: 10)
  3. See the results: After training, results and metrics will be saved in the results/ directory, where you can find detailed reports about model performance in the generated CSV files.

Evaluating

If you want to evaluate an already trained model, use the --eval flag:

python train.py --model roberta --task gender-bias --eval

Contributing

Reporting bugs and requesting features

  • If you find a bug, please open an issue.
  • To request a feature, feel free to open an issue as well.

Developing a new feature

  1. Fork the repository by clicking the "Fork" button at the top right of this page.
  2. Clone your fork locally:
    git clone https://github.com/your-username/module-bias-manager.git
  3. Create a new branch for your feature or bug fix:
    git checkout -b feature-branch
  4. Make your changes. Please follow the existing code style and conventions.
  5. Commit your changes with a descriptive commit message:
    git commit -m "Add new feature: explanation of bias model predictions"
  6. Push to your fork:
    git push origin feature-branch
  7. Open a pull request from your fork’s branch to the main branch of this repository.
    • Describe the changes you’ve made in the pull request description.
    • Ensure that your pull request references any relevant issues.

License

This project is licensed under the GNU General Public License v3.0 License - see the LICENSE file for details.

Contact

For any questions or support, please reach out to: