Mathematica Template for Condor
This is a work in progress until I have all of my examples written and have better organized this page
Condor
Before we can write any Mathematica code, it is important to understand how the jobs are distributed on the UC3 cluster. Any machine on UC3 that has Mathematica installed will advertise this capability to the rest of the cluster.
Two numbers are attached to each job submitted to condor. The "cluster number" is a unique number associated with each job, and the "process number" is an ID number for each queued process for a job. The process number, represented in the Condor submission file as $(Process), iterates up from 0 and can be passed to our Mathematica script as an argument. If we know how many processes we plan to queue, we can use this number to split up our computation. This is exactly what we do in the Mandelbrot example.
An alternative is shown in example 2, where we write a job that queues several executions with different arguments. Data analysis might be suited better to this kind of job.
Command-line Mathematica
There are some caveats to using Mathematica on the command line. As far as I know, any sort of plot creation
needs the Mathematica GUI, even if you don't plan to display the plot. Instead, you should export the processed data and do the post-processing on a local Mathematica instance.
Something like:
Export["sin.jpg",Plot[Sin[x], {x,0,2*Pi[]}],"JPG"]
will not work.
Example 1: Mandelbrot Code
In our first approach, we can use Condor to parallelize by iterating over the process ID. This is really nice for jobs where, say, we're calculating a big table of values, each iteration is independent of every other (i.e., embarrassingly parallel), and each iteration takes some non-trivial amount of CPU time. The Mandelbrot set is a really nice example of this.
Run["rm -rf /home/lincoln/mandelbrot/output/*csv*"]
TotalCols = LCM[7/4, 2]*400
TotalRows = Ceiling[(TotalCols - 1)*4/7]
maxIterations = 100;
MandelbrotPixel = Compile[{{ColNum, _Integer}, {RowNum, _Integer}},
Module[{x = 0., y = 0., xtemp, iterations = 0},
While[x^2 + y^2 <= 4 && iterations < maxIterations,
xtemp = x*x - y*y + (ColNum - 1)*3.5/TotalCols - 2.5;
y = 2*x*y + (RowNum - 1)*2/TotalRows - 1;
x = xtemp;
iterations = iterations + 1;];
iterations]
];
ChunkSize=TotalCols/56;
MandelbrotData = Table[MandelbrotPixel[i, j], {j, 1, TotalRows}, {i, (PID*ChunkSize), ((PID + 1)*ChunkSize) - 1}];
Export[StringJoin["mandelbrot."<>ToString[PID]<>".csv"],MandelbrotData, "CSV"]
Condor Submission File
executable = math.sh
universe = vanilla
Log = logfile.log
Output = output.dat
Error = errorfile
getenv = True
Arguments = /home/lincolnb/mandelbrot/mandelbrot.m $(Process)
requirements = (HAS_MATHEMATICA =?= True)
initialdir = /home/lincolnb/mandelbrot/output
queue 56
Example 2: TBD..
math.sh
We pass two arguments to math.sh: first, the name of the Mathematica batch file, and the secondly the PID of the Condor process. I have designed my Mathematica code such that the calculation is split up based on the
PiD
#!/bin/bash
# Run Mathematica
math -run "PID=$2" < $1
Another major reason to use "math.sh" instead of the "math" executable directly is because the Condor submit node doesn't have Mathematica installed, and so Condor errors when jobs are submitted with an executable that it cannot find.
References
http://info.phys.washington.edu/physics/index.php/Mathematica_on_Condor
--
LincolnBryant - 12 Apr 2012