Skip to content

A pipeline that automates the creation and transaction of monero wallets used to collect a dataset suitable for supervised learning applications.

Notifications You must be signed in to change notification settings

ACK-J/Monero-Dataset-Pipeline

Repository files navigation

Monero Dataset Pipeline

A pipeline that automates the creation and transaction of monero wallets used to collect a dataset suitable for supervised learning applications. The source code and datasets are used to reproduce the results of:

Lord of the Rings: An Empirical Analysis of Monero's Ring Signature Resilience to Artificially Intelligent Attacks

Installation

sudo apt update
sudo apt install vim git jq expect tmux parallel python3 python3-tk bc curl python3-pip -y
pip3 install -r requirements.txt
cd ~ && wget https://downloads.getmonero.org/cli/monero-linux-x64-v0.17.3.0.tar.bz2
tar -xvf monero-linux-x64-v0.17.3.0.tar.bz2 && cd monero-x86_64-linux-gnu-v0.17.3.0 && sudo cp monero* /usr/bin/ && cd ..
git clone git@github.com:ACK-J/Monero-Dataset-Pipeline.git && cd Monero-Dataset-Pipeline
chmod +x ./run.sh && chmod 777 -R Funding_Wallets/
# Make sure run.sh global variables are set
./run.sh

Dataset Files

File Stagenet Size Testnet Size Serialized Description
dataset.csv 1.4GB 13.4GB The exhaustive dataset including all metadata for each transaction in csv format.
dataset.json 1.5GB N/A The exhaustive dataset including all metadata for each transaction in json format.
dataset.pkl N/A 71.GB The exhaustive dataset including all metadata for each transaction in pickle format.
X.csv 4.1GB 32.5GB A modified version of dataset.csv with all features irrelevant to machine learning removed, in csv format.
X.pkl 6.5GB 51.9GB A modified version of dataset.json with all features irrelevant to machine learning removed, as a pickled pandas dataframe.
y.pkl 9.5MB 42.6MB A pickled list of python dictionaries which contain private information regarding the coresponding index of X.pkl.
X_Undersampled.csv 1.4GB 75.5MB A modified version of X.csv with all data points shuffled and undersampled.
X_Undersampled.pkl 2.3GB 101MB A modified version of X.pkl with all data points shuffled and undersampled.
y_Undersampled.pkl 325kB 312.1kB A pickled list containing the labels coresponding to the index of X_Undersampled.pkl.

Dataset Download Links

Stagenet_Dataset_7_2_2022.7z 837 MB

  • Includes all files mentioned above in the dataset table, compressed using 7-zip

  • Subsequent transactions were delayed with times sampled from the gamma distribution proposed by Möser et al.

  • The dataset was collected between April 19, 2022 and July 1, 2022 with 9,342 wallets. Totaling 248,723 ring signatures in 184,980 transactions.

  • SHA-256 Hash: bf1b87f83a5c220263071e75c453d3886f9190856c71411be164f3328be38b79

  • Download Link: https://drive.google.com/file/d/1cmkb_7_cVe_waLdVJ9USdK07SPWgdgva/view

Testnet_Dataset_6_7_2022.7z 4.7 GB

  • Includes all files mentioned above in the dataset table, compressed using 7-zip

  • Subsequent transactions were delayed only by 20 minutes.

  • The dataset was collected between January 20, 2022 and February 23, 2022 with 900 wallets. Totaling 1,333,756 ring signatures in 763,314 transactions.

  • SHA-256 Hash: 396c25083a8a08432df58c88cb94137850004bee3236b21cb628a8786fac15d3

  • Download Link: https://drive.google.com/file/d/13Jw3J8yyKiZ9J5WsIRTUX0GDzbqBI-R5/view?usp=sharing

Model Weights Download Link

How to load the dataset using Python and Pickle

import pickle
import json

# Full dataset including labels
with open("./Dataset_Files/dataset.json", "r") as fp:
    data = json.load(fp)

# -----------------------------------------------------

# Dataset only with ML features
with open("./Dataset_Files/X.pkl", "rb") as fp:
    X = pickle.load(fp)

# Associated labels
with open("./Dataset_Files/y.pkl", "rb") as fp:
    y = pickle.load(fp)
    
# -----------------------------------------------------

# Undersampled version of X
with open("./Dataset_Files/X_Undersampled.pkl", "rb") as fp:
    X_Undersampled = pickle.load(fp)
    
# Undersampled version of y
with open("./Dataset_Files/y_Undersampled.pkl", "rb") as fp:
    y_Undersampled = pickle.load(fp)

Dataset Features for Machine and Deep Learning

Exhaustive Dataset Fields

Problem Solving and Useful Commands

If Collect.sh throws the error: Failed to create a read transaction for the db: MDB_READERS_FULL: Environment maxreaders limit reached

# Testnet
/home/user/monero/external/db_drivers/liblmdb/mdb_stat -rr ~/.bitmonero/testnet/lmdb/
# Stagenet
/home/user/monero/external/db_drivers/liblmdb/mdb_stat -rr ~/.bitmonero/stagenet/lmdb/

Check progress of collect.sh while its running

find ./ -iname *.csv | cut -d '/' -f 2 | sort -u

After running collect.sh gather the ring positions

find . -name "*outgoing*" | xargs cat | cut -f 6 -d ',' | grep -v Ring_no/Ring_size | cut -f 1 -d '/'

Data Collection Pipeline Flowcharts

Run.sh

Collect.sh

Create_Dataset.py

About

A pipeline that automates the creation and transaction of monero wallets used to collect a dataset suitable for supervised learning applications.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published