AUBURN HPC Research Community

0

Building the Hopper Cluster Part II: Networking

On Tuesday, January 19th, 2016 work started on the networking phase of the Auburn University “Hopper” High Performance Compute Cluster. This phase involved the routing of hundreds of InfiniBand, Ethernet and Fiber Optic cables, enabling high speed communication between the previously installed servers. The InfiniBand network architecture provides the cluster with high speed, low latency shared disk access and the...

0

Building the Hopper Cluster Part I: Nodes

On Monday, January 5, the build began on the new HPC cluster in the AU Data Center. Resembling an old fashioned ‘barn raising’, OIT and Lenovo personnel unboxed and racked equipment in step one of constructing Auburn’s newest and most powerful research computer. Work continues with the goal to be operational by mid-February.  

0

Excluding a Host

As with all cluster problems you encounter, if you know of a problematic host in the cluster, please send an email to hpcadmin@auburn.edu.  It may take some time for the cluster admins to respond, so in the meantime, you can avoid that host with the following syntax… bsub -R “select[hname!=node000]” … qsub -l h=!node001 Additionally, you can specify one or...

0

OpenMPI in Xcode 6.1

To debug OpenMPI programs in Xcode 6.1… First build a recent version of OpenMPI.  From the Terminal… > curl -O http://www.open-mpi.org/software/ompi/v1.8/downloads/openmpi-1.8.3.tar.gz > tar -xvzf openmpi-1.8.3.tar.gz > cd openmpi-1.8.3 > ./configure –prefix=/usr/local/lib/openmpi-1.8.3/ CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64 > make > make install Start a new project.  Select OS X -> Application -> Command Line Application Select Project -> Build Settings -> Add /usr/local/lib/openmpi-1.8.3/include to...

0

Queue Information

To view the available queues and their basic limits, use the bqueues command. To find more detail on a particular queue, use bqueues -l <queuename> The most meaningful numbers are the JL/U, MAX, and the RUNLIMIT. JL\U is “Job Limit per User” or “Job Slot Limit per User.”  This is somewhat misleading, as it actually limits the number of cores...

0

Sequential Job Submission

For situations where you need to run several jobs back-to-back, with each waiting for the prior job’s completion, you can use the -K option with bsub.  This will have LSF wait until the job finishes before it accepts another job. Example… #!/bin/bash bsub -K -o out.1 sleep 10 & bsub -K -o out.2 sleep 5 & wait From the man...

0

Mail Notifications

*Please note we are currently investigating an issue that is preventing compute nodes from sending completion e-mail.  A workaround is to use the bsub -K option so that your script will return and allow you to send mail from the login node via script. You might want to be notified of the status of your jobs without having to log...

0

Interactive Jobs

In some cases you may want to run tests on the compute nodes, to validate your scripts. A good way to do this is to run an LSF job in interactive mode. This way, you can simulate the exact environment in which your code will run. Here are the suggested commands for experimenting at the compute node level with interactive...

0

LSF Error Codes

Determining why a job ended unexpectedly is an essential skill for running jobs successfully on the cluster and identifying systemic errors. The basic process for locating error codes, and subsequently an english translation, mostly involves the use of the bjobs and bhist commands. A script for locating job exit information is also provided in /tools/scripts. Here is some information on...