Skip to content

Scientist interface

David Anderson edited this page Jan 14, 2024 · 6 revisions

This is a design proposal for the BOINC interface for people who want to do computing with it (e.g. scientists). The goals are:

  • Provide an interface for first-time users that's simple and requires reading little documentation.
  • If users require more features or more computing power, provide it with minimal complexity and documentation.

We do this by dividing the interface into "tiers" (currently 4 of them). Tier 0 (for beginners) is fast and simple; tiers 1-3 are progressively more complex.

Terminology

(for the purposes of this doc).

  • volunteer: someone who runs the BOINC client

  • user: someone who submits jobs (e.g. a scientist)

  • keywords: a hierarchy of words describing

    • science areas
    • project locations (geographical, institutional)

    see https://github.com/BOINC/boinc/wiki/DesignKeywords

  • vetting (of a user):

    • partial vetting: we believe
      • they're doing the computing that they claim to (based on keywords)
      • they're not commercial (doesn't mean they're academic or scientific) This is typically based on an email communication with them, and/or an inspection of their web presence
    • full vetting: we trust them to deploy apps on volunteer hosts
      • they understand and obey security protocols
      • typically based on an email or phone "interview"

Pieces of technology

Universal VM app

A BOINC app that consists of vboxwrapper and a config file that limits resource usage Everything else (VM image, possibly Docker layers, input files) is part of the workunit.

status: not done

boinc2docker

A script and a VM image that takes as input a directory containing executables, libraries, Python and shell scripts and outputs a set of files (docker layers etc.) that, when used as inputs to the universal VM app, run that program.

https://github.com/marius311/boinc2docker

status: need to update, move to BOINC repo

BOINC App library

Includes

  • Universal VM app
  • Standard apps (autodock etc.)

Each app version has a digital signature (use something other than MD5).

Apps can be marked as "safe"; this means a job using that app can't do bad things (DoS, access outside data) regardless of the input files.

Is the universal VM app safe??

Status: not done

BOINC Docker Server

A set of 3 Docker images: DB, Apache, BOINC. A submitter can run these on their own machines or cloud nodes. When started, you have a BOINC project with an Admin user.

Normally the user never has to log in to any of the VMs; they can do everything through web interfaces and APIs:

  • start/stop
  • create accounts, set permissions
  • manage apps and app versions
  • manage project configuration
  • manage job files
  • submit jobs

https://github.com/marius311/boinc-server-docker

Status: out of date; missing web interfaces. need to move this into BOINC repo.

Science United

Volunteers can mark themselves as "demo grid". This means they're willing to run jobs using safe apps for unknown submitters

Projects can be marked as

  • fully vetted: can run any apps
  • partially vetted: can run only safe apps
  • unvetted: can run only safe apps, with demo grid volunteers

status: need to add notions of "demo grid" and safety.

BOINC Central

This lets submitters run jobs without creating a server. It includes all apps in the BOINC library Anyone can create an account, and can submit jobs to safe apps. Vetted submitters can submit jobs to unsafe apps

Extending this idea to its limit, we could let fully vetted submitters create their own apps and app versions here.

status: partly done

generic job submission scripts

  • boinc_app use boinc2docker to make files for app
  • boinc_submit
  • boinc_status
  • boinc_fetch

status: not written yet

make this into a package

Tiers

Tier 0 (serverless)

This is the "quick start" option. The goal is that non-technical scientists can run jobs for standard apps almost immediately, and for their own apps in an hour, without having to read much documentation.

The user doesn't create a BOINC server; the BOINC Central (BC) server is used. The user starts by creating a "job submitter" account on BC, and selects location keywords.

There are several job-submission variants:

  • Use Raccoon to run Autodock jobs.
  • Use BC's generic file-management and job-submission web interfaces to submit jobs to Autodock (or other apps on BC).
  • Install the BOINC job-submission and boinc2docker packages. Use the command-line job submission system to submit jobs for arbitrary (Linux/x86 or Python) applications. In this case the user must select science area keywords.

Initially, jobs will run on demo grid hosts. If the user applies for partial vetting on BC, jobs will run on all SU hosts (subject to keywords). We could provide an option where the user can attach their computers to BC, with a config so that these computers only run their jobs.

This is the quick-start option, but for many users this may be all they need.

Tier 1 (basic server)

Use if:

  • you need more capacity (computing or data) than what BC can provide
  • or you want to run apps not in library (e.g. GPU versions of your own apps)
  • you can operate a public-facing server

To do:

  1. deploy a server using BOINC server docker
    • install universal VM app (maybe should pre-install)
    • install BOINC client on your own computers
    • run jobs using job submission scripts and boinc2docker
  2. to expand resources:
    • make your server public
    • register project on Science United
    • will get some demo grid volunteers
    • after vetting: will get more volunteers
  3. add apps
    • install from BOINC App library
    • install your own (e.g. native Win/Mac)
      • some remote way of doing this?

This requires some sysadmin resources to set up the server. But it should require almost no knowledge of BOINC (unless you want to use native apps). In particular, it should not be necessary to log into the BOINC server container.

Tier 2 (advanced server)

Use if:

  • you need more efficient job processing, e.g.:
    • not have to move files between hosts
    • or app-specific result validation
    • or features like homogeneous replication
  • you have C++ or Python programming resources
  1. Develop work generators, validators, assimilators on the server
  2. Move your server from Docker to bare system (optional)

You need to log in to your server container and develop software there.

A lot of server documentation goes here (all the job processing stuff).

Tier 3: (Web site)

Use if:

  • you need more computing than what SU provides
  • you can develop a web site for your project
  • you can do PR to recruit volunteers
  • you can do customer support (e.g. manage volunteer message boards)
  1. Add a web site to your project
  2. Promote your project

A lot of server documentation goes here (all the web site related stuff).

Scientist intro

The intro for scientists (web page, handouts, etc.) might look like:

do you do high-throughput computing?
    explain what it is

could you use more HTC computing power?
    Is it expensive?
    Do you have to wait a long time for jobs to finish?
    could you do more/different science if you had more power?

Can your jobs run on home computers?
    e.g. in 16GB of RAM, 100GB of disk, high compute/transfer ratio

If yes to all the above, consider using BOINC-base volunteer computing

1) get started: link to tier 0
2) need more power and can run a public-facing server: link to tier 1
3) need even more power and have programming resources: link to tier 2
4) need still more and can do PR: link to tier 3
Clone this wiki locally