If you don't want to use the helper script scripts/beesd
to setup and
configure bees, here's how you manually setup bees.
Create a directory for bees state files:
export BEESHOME=/some/path
mkdir -p "$BEESHOME"
Create an empty hash table (your choice of size, but it must be a multiple of 128KB). This example creates a 1GB hash table:
truncate -s 1g "$BEESHOME/beeshash.dat"
chmod 700 "$BEESHOME/beeshash.dat"
bees can only process the root subvol of a btrfs with nothing mounted over top. If the bees argument is not the root subvol directory, bees will just throw an exception and stop.
Use a separate mount point, and let only bees access it:
UUID=3399e413-695a-4b0b-9384-1b0ef8f6c4cd
mkdir -p /var/lib/bees/$UUID
mount /dev/disk/by-uuid/$UUID /var/lib/bees/$UUID -osubvol=/
If you don't set BEESHOME, the path ".beeshome
" will be used relative
to the root subvol of the filesystem. For example:
btrfs sub create /var/lib/bees/$UUID/.beeshome
truncate -s 1g /var/lib/bees/$UUID/.beeshome/beeshash.dat
chmod 700 /var/lib/bees/$UUID/.beeshome/beeshash.dat
You can use any relative path in BEESHOME
. The path will be taken
relative to the root of the deduped filesystem (in other words it can
be the name of a subvol):
export BEESHOME=@my-beeshome
btrfs sub create /var/lib/bees/$UUID/$BEESHOME
truncate -s 1g /var/lib/bees/$UUID/$BEESHOME/beeshash.dat
chmod 700 /var/lib/bees/$UUID/$BEESHOME/beeshash.dat
There are some runtime configurable options using environment variables:
-
BEESHOME: Directory containing bees state files:
- beeshash.dat | persistent hash table. Must be a multiple of 128KB, and must be created before bees starts.
- beescrawl.dat | state of SEARCH_V2 crawlers. ASCII text. bees will create this.
- beesstats.txt | statistics and performance counters. ASCII text. bees will create this.
-
BEESSTATUS: File containing a snapshot of current bees state: performance counters and current status of each thread. The file is meant to be human readable, but understanding it probably requires reading the source. You can watch bees run in realtime with a command like:
watch -n1 cat $BEESSTATUS
Other options (e.g. interval between filesystem crawls) can be configured
in src/bees.h
or on the command line.
Reduce CPU and IO priority to be kinder to other applications sharing
this host (or raise them for more aggressive disk space recovery). If you
use cgroups, put bees
in its own cgroup, then reduce the blkio.weight
and cpu.shares
parameters. You can also use schedtool
and ionice
in the shell script that launches bees
:
schedtool -D -n20 $$
ionice -c3 -p $$
You can also use the --loadavg-target
and --thread-min
options to further control the impact of bees on the rest
of the system.
Let the bees fly:
for fs in /var/lib/bees/*-*-*-*-*/; do
bees "$fs" >> "$fs/.beeshome/bees.log" 2>&1 &
done
You'll probably want to arrange for /var/log/bees.log
to be rotated
periodically. You may also want to set umask to 077 to prevent disclosure
of information about the contents of the filesystem through the log file.
There are also some shell wrappers in the scripts/
directory.