Behavioral Languages for Online Characterization (BLOC)

This repository contains code and data required to replicate the results from the paper, Behavioral Languages for Online Characterization (BLOC). To cite, kindly use:

@article{nwala_flammini_menczer_bloc,
  title={A Language Framework for Modeling Social Media Account Behiavior},
  author={Nwala, Alexander C. and Flammini, Alessandro and Menczer, Filippo},
  journal = {EPJ Data Science},
  volume = {12},
  pages={33},
  year = {2023},
  doi = {10.1140/epjds/s13688-023-00410-9},
  url = {https://doi.org/10.1140/epjds/s13688-023-00410-9}
}

See also, the Github repo for the BLOC Python tool.

BOT DETECTION

We compared the performance of BLOC to three baselines, Botometer, Twitter DNA, and DNA-influenced on the bot detection task. Here we provide the references to all the methods excluding Botometer since its code is not publicly available.

Dataset

All tweets and accounts used in the bot detection task can be found in the Bot repository dataset.

Models

BLOC
Twitter DNA
DNA-Influenced

Evaluate BLOC

Install BLOC
Set RAW_TRAINING_DATA_ROOT in general-language-behavior/bot-detect/bloc-eval/workflow/Snakefile with the path to the evaluation dataset consisting of the tweets.jsons.gz
Set TARGET_ROOT in general-language-behavior/bot-detect/bloc-eval/workflow/Snakefile with the output path (e.g., /tmp/bot-detect-res/) for the evaluation results. Then run the following commands.
$ conda activate snakemake
$ cd general-language-behavior/bot-detect/bloc-eval/workflow
$ snakemake --cores=5 run_ml_all
The f1, recall, precision, and number of features are written to the ml_results_all.cvs file in the output path (e.g., /tmp/bot-detect-res/ml_results_all.csv). To reset experiment, delete all content from output path (e.g., $rm -rf /tmp/bot-detect-res/*), then run $ snakemake --cores=5 run_ml_all

Evaluate Twitter DNA and DNA-Influenced

Install Twitter DNA:

$ pip install -r general-language-behavior/bot-detect/eval-dna/requirements.txt
$ pip general-language-behavior/bot-detect/eval-dna/ddna-toolbox/glcr/
$ pip general-language-behavior/bot-detect/eval-dna/ddna-toolbox/

Install BLOC
Run the following command, and ensure to set --tweets-path with the path to the tweets dataset.

  $ python general-language-behavior/bot-detect/eval-dna/bloc_paper.py --max-users=200 --evaluate-models sf sf-influenced --tweets-path=/path/to/bot_repo_tweets --task evaluate verified kevin_feedback pronbots stock rtbust midterm-2018 zoher-organization botwiki gilani-17 varol-icwsm gregory_purchased astroturf cresci-17 josh_political

The evaluation results are written into ./bot_detection_results.json

COORDINATION DETECTION

We compared the performance of BLOC with three baselines, Activity, Co-retweet (CoRT), and Hashtag (Hash), in the coordination detection task. All models are implemented in the Twitter Infoops Toolkit

Dataset

The drivers and their tweets can be downloaded from the Twitter Information Operations dataset. Next, we describe the steps for creating the control dataset.

Create control dataset

See the Twitter Info Ops toolkit documentation on how to create tweets (stored in DriversControl/control_driver_tweets.jsonl.gz) for control users.

Evaluate coordination detection methods

Consider this example to evaluate all methods for a campaign (e.g., armenia_202012). The file containing the driver tweets must be named driver_tweets.csv.gz and the DriverControl folder which contains the control dataset must reside in the same location as driver_tweets.csv.gz.

$ ls `/tmp/armenia_202012`
DriversControl  driver_tweets.csv.gz

The following command evaluates BLOC and the baseline coordination detection methods for the first weeks of the life cycle of the drivers. Use --knn-reverse-dates to run the evaluation for the last weeks of the life cycle of drivers. The syntax is strict, so mimic the following command closely.

$ ops --task=knn_classify_bloc_drivers_vs_drivers_control --tweets-path=/tmp/ armenia_202012/driver_tweets.csv.gz

The evaluation result for each model (e.g., BLOC) would be written to Twitter_InfoOps_Output. For example, Twitter_InfoOps_Output/eval/drivers_v_control/knn/k-first-active-years/bloc_armenia_202012.json

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
bot-detect		bot-detect
coordination-detect/control-dataset		coordination-detect/control-dataset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Behavioral Languages for Online Characterization (BLOC)

BOT DETECTION

Dataset

Models

Evaluate BLOC

Evaluate Twitter DNA and DNA-Influenced

COORDINATION DETECTION

Dataset

Create control dataset

Evaluate coordination detection methods

About

Releases

Packages

Contributors 2

Languages

License

anwala/general-language-behavior

Folders and files

Latest commit

History

Repository files navigation

Behavioral Languages for Online Characterization (BLOC)

BOT DETECTION

Dataset

Models

Evaluate BLOC

Evaluate Twitter DNA and DNA-Influenced

COORDINATION DETECTION

Dataset

Create control dataset

Evaluate coordination detection methods

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages