Skip to content

Running Jobs on Midway FAQ

Set-up and general questions

How do I submit a job to Midway?

RCC systems use Slurm to manage resources and job queues. For advice on how to run specific types of jobs, consult the Running jobs on midway section of the User Guide.

Can I login directly to a compute node?

You can start up an interactive session on a compute node with the sinteractive command. This command takes the same arguments as sbatch. More information about interactive jobs, see submitting Interactive Jobs.

How do I run jobs in parallel?

There are many ways to configure parallel jobs. The best approach will depend on your software and resource requirements. For more information on two commonly used approaches, see Parallel batch jobs and Job arrays.

Are there any limits to running jobs on Midway?

Run rcchelp qos on Midway to view the current "Quality of Service"--a set of parameters and contraints that includes maximum number of jobs and maximum wall time.

I am a member of multiple accounts. How do I choose which allocation is charged?

If you belong to multiple accounts, jobs will get charged to your default account unless you specify the --account=<account_name> option when you submit a job with sbatch. You may request a change in your default account by contacting our Help Desk.

How can I get emails when my job starts and when it finishes?

For security reasons, sending out notification emails directly using the standard slurm command #SBATCH --mail-user=<CNetID> is not allowed. As a robust alternative, we suggest using the RCC mail server to send out notification emails. Update your script with the following lines:

#SBATCH --mail-type=ALL                        # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<CNetID>  # Where to send email

How do I run jobs that need to run longer than the maximum wall time?

The RCC queuing system is designed to provide fair resource allocation to all RCC users. The maximum wall time is intended to prevent individual users from using more than their fair share of cluster resources.

If you have specific computing tasks that cannot be solved with the current constraints, please submit a special request for resources to our Help Desk.

Can I create a cron job?

The RCC does not support users creating cron jobs. However, it is possible to use Slurm to submit “cron-like” jobs. See Cron-like jobs for more information.

Job submission trouble

Why is my job not starting?

This could be due to a variety of factors. Running squeue --user=<userid> can will help to find the answer; see in particular the NODELIST(REASON) column in the squeue output. A job that is waiting in the queue may show one of the following labels in this column:

(Priority): Other jobs currently have higher priority than your job.

(Resources): Your job has enough priority to run, but there aren’t yet enough free resources to run it.

(QOSResourceLimit): Your job exceeds the QOS limits. The QOS limits include wall time, number of jobs a user can have running at once, number of nodes a user can use at once, and so on. For example, if you are at or near the limit of number of jobs that can be run at once, your job will become eligible to run as soon as other jobs finish.

Please contact RCC support if you believe that your job is not being handled correctly by the Slurm queuing system.

Note: If you see a large number of jobs that aren’t running when many resources are idle, it is possible that RCC staff have scheduled an upcoming maintenance window. In this case, any jobs requesting a wall time that overlaps with the maintenance window will remain in the queue until after the maintainence period is over. The RCC staff will typically notify users via email prior to performing a maintenance and after a maintenance is completed.

Why does my job fail after a few seconds?

This is most likely because there is an error in your job submission script, or because the program you are trying to run is producing an error and terminating prematurely.

If you need help troubleshooting the issue, please send your job submission script, as well as the error generated by your job submission script to ourHelp Desk

Why does my job fail with message “exceeded memory limit, being killed”?

Let's understand this with an example. Let's say on the main midway2 partition, broadwl, Slurm allocates 2 GB of memory per allocated CPU by default. If your computations require more than the default amount, you should adjust the memory allocated to your job with the --mem or --mem-per-cpu flags. For example, to request 10 cores and 40 GB of memory on a broadwl node, include these options when running sbatch or sinteractive: --ntasks=1 --cpus-per-task=10 --mem=40G.

Why does my sinteractive job fail with “Connection closed.”?

There are two likely explanations for this error.

One possibility is that you are over the time limit. The default walltime for sinteractive is 2 hours. This can be increased by including the --time flag to your sinteractive call.

Another possiblity is that your job exceeded the memory limit. You can resolve this by requesting additional memory using --mem or --mem-per-cpu.

Why does my sinteractive job fail with ssh: symbol lookup error: ssh: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b?

The error ssh: symbol lookup error: ssh: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b indicates the mismatch version of the OpenSSL used by sinteractive and that by the python module loaded in your shell environment. There are two options to resolve this issue:

1) Prepend LD_LBIRARY_PATH with the path to the SSH-compatible version of OpenSSL:

and run sinteractive again.

2) Unload the python module:

module unload python
then run sinteractive again, and load the python/anaconda module within the interactive session.

Technical questions

What compilers does the RCC support?

The RCC supports the GNU, Intel, PGI and NVidia’s CUDA compilers.

Which versions of MPI does RCC support?

The RCC maintains OpenMPI, IntelMPI, and MVAPICH2 compilers. See Message Passing Interface (MPI) for more information and instructions for using these MPI frameworks.

Can RCC help me parallelize and optimize my code?

The RCC support staff are available to consult with you or your research team to help parallelize and optimize your code for use on RCC systems. Contact our Help Desk to set up a consultation.

Does RCC provide GPU computing resources?

Yes. The RCC high-performance systems provide GPU-equipped compute nodes. For instructions on using the GPU nodes, see GPU jobs.