Running Jobs on Midway FAQ
Set-up and general questions
How do I submit a job to Midway?
Can I login directly to a compute node?
You can start up an interactive session on a compute node with the
sinteractive command. This command takes the same arguments as
sbatch. More information about interactive jobs, see submitting Interactive Jobs.
How do I run jobs in parallel?
There are many ways to configure parallel jobs. The best approach will depend on your software and resource requirements. For more information on two commonly used approaches, see Parallel batch jobs and Job arrays.
Are there any limits to running jobs on Midway?
rcchelp qos on Midway to view the current "Quality of Service"--a set of parameters and contraints that includes maximum number of jobs and maximum wall time.
I am a member of multiple accounts. How do I choose which allocation is charged?
If you belong to multiple accounts, jobs will get charged to your default account unless you specify the
--account=<account_name> option when you submit a job with sbatch. You may request a change in your default account by contacting our Help Desk.
How can I get emails when my job starts and when it finishes?
For security reasons, sending out notification emails directly using the standard slurm command
#SBATCH --mail-user=<CNetID>@uchicago.edu is not allowed. As a robust alternative, we suggest using the RCC mail server to send out notification emails. Update your script with the following lines:
#SBATCH --mail-type=ALL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=<CNetID>@rcc.uchicago.edu # Where to send email
How do I run jobs that need to run longer than the maximum wall time?
The RCC queuing system is designed to provide fair resource allocation to all RCC users. The maximum wall time is intended to prevent individual users from using more than their fair share of cluster resources.
If you have specific computing tasks that cannot be solved with the current constraints, please submit a special request for resources to our Help Desk.
Can I create a cron job?
The RCC does not support users creating cron jobs. However, it is possible to use Slurm to submit “cron-like” jobs. See Cron-like jobs for more information.
Job submission trouble
Why is my job not starting?
This could be due to a variety of factors. Running
squeue --user=<userid> can will help to find the answer; see in particular the NODELIST(REASON) column in the
squeue output. A job that is waiting in the queue may show one of the following labels in this column:
(Priority): Other jobs currently have higher priority than your job.
(Resources): Your job has enough priority to run, but there aren’t yet enough free resources to run it.
(QOSResourceLimit): Your job exceeds the QOS limits. The QOS limits include wall time, number of jobs a user can have running at once, number of nodes a user can use at once, and so on. For example, if you are at or near the limit of number of jobs that can be run at once, your job will become eligible to run as soon as other jobs finish.
Please contact RCC support if you believe that your job is not being handled correctly by the Slurm queuing system.
Note: If you see a large number of jobs that aren’t running when many resources are idle, it is possible that RCC staff have scheduled an upcoming maintenance window. In this case, any jobs requesting a wall time that overlaps with the maintenance window will remain in the queue until after the maintainence period is over. The RCC staff will typically notify users via email prior to performing a maintenance and after a maintenance is completed.
Why does my job fail after a few seconds?
This is most likely because there is an error in your job submission script, or because the program you are trying to run is producing an error and terminating prematurely.
If you need help troubleshooting the issue, please send your job submission script, as well as the error generated by your job submission script to ourHelp Desk
Why does my job fail with message “exceeded memory limit, being killed”?
Let's understand this with an example. Let's say on the main midway2 partition, broadwl, Slurm allocates 2 GB of memory per allocated CPU by default. If your computations require more than the default amount, you should adjust the memory allocated to your job with the
--mem-per-cpu flags. For example, to request 10 cores and 40 GB of memory on a broadwl node, include these options when running
--ntasks=1 --cpus-per-task=10 --mem=40G.
Why does my
sinteractive job fail with “Connection closed.”?
There are two likely explanations for this error.
One possibility is that you are over the time limit. The default walltime for sinteractive is 2 hours. This can be increased by including the
--time flag to your
Another possiblity is that your job exceeded the memory limit. You can resolve this by requesting additional memory using
Why does my
sinteractive job fail with
ssh: symbol lookup error: ssh: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b?
ssh: symbol lookup error: ssh: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b indicates
the mismatch version of the OpenSSL used by
sinteractive and that by the
python module loaded in your shell environment. There are two options to resolve this issue:
LD_LBIRARY_PATH with the path to the SSH-compatible version of OpenSSL:
2) Unload the
module unload python
sinteractiveagain, and load the
python/anacondamodule within the interactive session.
What compilers does the RCC support?
The RCC supports the GNU, Intel, PGI and NVidia’s CUDA compilers.
Which versions of MPI does RCC support?
The RCC maintains OpenMPI, IntelMPI, and MVAPICH2 compilers. See Message Passing Interface (MPI) for more information and instructions for using these MPI frameworks.
Can RCC help me parallelize and optimize my code?
The RCC support staff are available to consult with you or your research team to help parallelize and optimize your code for use on RCC systems. Contact our Help Desk to set up a consultation.
Does RCC provide GPU computing resources?
Yes. The RCC high-performance systems provide GPU-equipped compute nodes. For instructions on using the GPU nodes, see GPU jobs.