Mathematica Template for Condor

This is a work in progress until I have all of my examples written and have better organized this page

Condor

Before we can write any Mathematica code, it is important to understand how the jobs are distributed on the UC3 cluster. Any machine on UC3 that has Mathematica installed will advertise this capability to the rest of the cluster.

Two numbers are attached to each job submitted to condor. The "cluster number" is a unique number associated with each job, and the "process number" is an ID number for each queued process for a job. The process number, represented in the Condor submission file as $(Process), iterates up from 0 and can be passed to our Mathematica script as an argument. If we know how many processes we plan to queue, we can use this number to split up our computation. This is exactly what we do in the Mandelbrot example.

An alternative is shown in example 2, where we write a job that queues several executions with different arguments. Data analysis might be suited better to this kind of job.

Command-line Mathematica

There are some caveats to using Mathematica on the command line. As far as I know, any sort of plot creation needs the Mathematica GUI, even if you don't plan to display the plot. Instead, you should export the processed data and do the post-processing on a local Mathematica instance.

Something like:
 Export["sin.jpg",Plot[Sin[x], {x,0,2*Pi[]}],"JPG"] 
will not work.

Example 1: Mandelbrot Code

In our first approach, we can use Condor to parallelize by iterating over the process ID. This is really nice for jobs where, say, we're calculating a big table of values, each iteration is independent of every other (i.e., embarrassingly parallel), and each iteration takes some non-trivial amount of CPU time. The Mandelbrot set is a really nice example of this.

 
Run["rm -rf /home/lincoln/mandelbrot/output/*csv*"]

TotalCols = LCM[7/4, 2]*400
TotalRows = Ceiling[(TotalCols - 1)*4/7]
maxIterations = 100;

MandelbrotPixel = Compile[{{ColNum, _Integer}, {RowNum, _Integer}},
   Module[{x = 0., y = 0., xtemp, iterations = 0},
    While[x^2 + y^2 <= 4 && iterations < maxIterations,
     xtemp = x*x - y*y + (ColNum - 1)*3.5/TotalCols - 2.5;
     y = 2*x*y + (RowNum - 1)*2/TotalRows - 1;
     x = xtemp;
     iterations = iterations + 1;];
    iterations]
                         ];

ChunkSize=TotalCols/56;

MandelbrotData = Table[MandelbrotPixel[i, j], {j, 1, TotalRows}, {i, (PID*ChunkSize), ((PID + 1)*ChunkSize) - 1}];

Export[StringJoin["mandelbrot."<>ToString[PID]<>".csv"],MandelbrotData, "CSV"]

Condor Submission File

executable = math.sh
universe = vanilla 
Log = logfile.log 
Output = output.dat 
Error = errorfile  
getenv = True

Arguments = /home/lincolnb/mandelbrot/mandelbrot.m $(Process) 
requirements = (HAS_MATHEMATICA =?= True)

initialdir = /home/lincolnb/mandelbrot/output

queue 56 

Example 2: TBD..

math.sh

We pass two arguments to math.sh: first, the name of the Mathematica batch file, and the secondly the PID of the Condor process. I have designed my Mathematica code such that the calculation is split up based on the PiD

#!/bin/bash

# Run Mathematica
math -run "PID=$2" < $1

Another major reason to use "math.sh" instead of the "math" executable directly is because the Condor submit node doesn't have Mathematica installed, and so Condor errors when jobs are submitted with an executable that it cannot find.

References

http://info.phys.washington.edu/physics/index.php/Mathematica_on_Condor

-- LincolnBryant - 12 Apr 2012
Topic revision: r4 - 18 Apr 2012, LincolnBryant
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback