Skip to content

A simple framework for using the SGE cluster at Weizmann's CS department for parallel work

Notifications You must be signed in to change notification settings

syagev/weiz-grid

Repository files navigation

###############################################################################
## WeizGrid
##  By Stav Yagev, 2013 (Please contact me if you find bugs !)
##
## 
## Simple framework for using the SGE cluster on Weizmann for parallel work.
##
## FEATURES:
##  - Helps split your own code into fractions that can be run in parallel on the cluster.
##  - Allows you to aggregate results in an efficient manner.
##  - Write your code ONCE! Execute the same code on PC for debugging and on the cluster
##      for production.
##  - Recover from errors on the cluster - splitting means if 1 iteration out of a 
##      1000 failed you still have 999 iterations in your hand!
##
## Use case example: 
##  You have a job that processes 1000 images in some way, running the same algorithm 
##  for each image, outputing something for each image, and finally aggregating the 
##  results from all images. Up until now, you ran this in a single job that took 1000 
##  minutes (because lets assume it takes 1 minute to process each image). Using WeizGrid 
##  you can take advantage of the fact that the job can be parallelized - instead of 1
##  job WeizGrid helps you easily split the job to 80 parallel jobs so that you get all 
##  1000 images in 1000/80=12.5 minutes (!!)
##


Auto Installation:
==================
1. Copy all the files to ~/WeizGrid
2. In a UNIX terminal, type:
    chmod +x ~/WeizGrid/wgsetup
3. Then type:
    ~/WeizGrid/wgsetup

Usage:
======
1. Create files in the spirit of 'sample.m' and 'calcPrimes.m' (make sure 
	the WG m-files are in MATLAB's path)
2. Upload your files to UNIX
3. Under UNIX, from the direcotry of YOUR project, run: 
        ~/WeizGrid/qsubWG [name] [your-m-file] [queue]

    Example: ~/WeizGrid/qsubWG ParallelCracker sample all.q

4.  Enjoy!





Manual Installation (if auto doesn't work...):
=============================================
1. Copy all files to ~/WeizGrid.
2. Make sure all bash scripts have execute permission.
2. Create the following directory structure (1 folder for each queue you intend to use):
        ~/.matlab/cluster_jobs/all.q
        ~/.matlab/cluster_jobs/test.q
        ...
    NOTE: The ~/.matlab/ usually already exists but is hidden, so make sure you 
            are viewing hidden files and folders .




Tips & Troubleshooting:
=======================
- If your algorithm uses a random number, make sure you are not setting the RngShuffle
    option to false when invoking WGexec - because this will yield the same random
    series for every "parallel" piece of work.
- Sometimes, during the aggregation part of your script, there will be a bug. Then,
    you might think all your work is lost! This is not the case, look into part #2
    of the documentation of WGgetResults for more information.
- There isn't verbose error checking, so if something doesn't work, check the output
    files on your UNIX home for hints... If still unsuccessful, feel free to contact 
    me for help.

About

A simple framework for using the SGE cluster at Weizmann's CS department for parallel work

Resources

Stars

Watchers

Forks

Packages

No packages published