Linux Follies: cluster computing

2020-08-04

Scripting Bright Cluster Manager 9.0 with Python

It has been more than 6 years since the previous post about using the Python API to script Bright Cluster Manager (CM). Time for an update.

I have to do the same as before: change the “category” on a whole bunch of nodes.

NB the Developer Manual has some typos, where it makes it look like you can specify categories as strings of their names, e.g. cluster.get_by_type('Node')

2017-10-04

Apache Spark integration with Grid Engine (update for Spark 2.2.0)

Apache Spark is a popular (because it is fast) big data engine. The speed comes from keeping data in memory. This is an update to my older post: it is still Spark in standalone mode, using the nodes assigned by GE as the worker nodes. I have an update for using Spark 2.2.0, with Java 1.8.0.

It is mostly the same, except only one file needs to be modified: sbin/slaves.sh The Parallel Environment (PE) startup script update only adds an environment variable for defining where the worker logs go. (Into a Grid Engine job-specific directory under the job directory.) And it now specifies Java 1.8.0.

As before, the modifications to sbin/slaves.sh handle using the proper spark-env script based on the user's shell. Since that spark-env script is set up by the PE script to generate job-specific conf and log directories, everything job-specific is separated.