Stata
Stata is a powerful statistical software package that is widely used in scientific computing. RCC users are licensed to use Stata on all RCC resources. Stata can be used interactively or as a submitted script. Please note that if you would like to run it interactively, you must still run it on a compute node, in order to keep the login nodes free for other users. Stata can be run in parallel on up to 16 nodes.
NOTE: Stata examples in this document are adapted from a Princeton tutorial. You may find it useful if you are new to Stata or want a refresher.
Getting Started
If you need to use the Stata GUI, connect to Midway with ThinLinc.
Obtain an interactive session on a compute node. This is necessary so that your computation doesn’t interrupt other users on the login node. Now, load Stata:
sinteractive
module load stata
xstata
This will open up a Stata window. The middle pane has a text box to enter commands at the bottom, and a box for command results on top. On the left there’s a box called “Review” that shows your command history. The right-hand box contains information about variables in the currently-loaded data set.
One way Stata can be used is as a fancy desktop calculator. Type the following code into the command box:
display 2+2
Stata can do much more if data is loaded into it. The following code loads census data that ships with Stata, prints a description of the data, then creates a graph of life expectancy over GNP:
sysuse lifeexp
describe
graph twoway scatter lexp gnppc
Running Stata from the command line
This is very similar to running graphically; the command-line interface is equivalent to the “Results” pane in the graphical interface. Again, please use a compute node if you are running computationally-intensive calculations:
sinteractive
module load stata
stata
Running Stata Jobs with SLURM
You can also submit Stata jobs to SLURM, the scheduler. A Stata script is called a “do-file,” which contains a list of Stata commands that the interpreter will execute. You can write a do-file in any text editor, or in the Stata GUI’s do-file editor: click “Do-File Editor”” in the “Window” menu. If your do-file is named “example.do,” you can run it with either of the following commands:
stata < example.do
stata -b do example.do
Here is a very simple do-file, which computes a regression on the sample data set from above:
version 13 // current version of Stata, this is optional but recommended.
sysuse lifeexp
gen loggnppc = log(gnppc)
regress lexp loggnppc
Here is a submission script that submits the Stata program to the default queue on Midway:
#!/bin/bash
#SBATCH --job-name=stataEx
#SBATCH --output=stata_example.out
#SBATCH --error=stata_example.err
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
module load stata
stata -b stata_example.do
stata_example.do
is our example do-file, and stata_example.sbatch
is the submission script.
To run this example, download both files to a directory on Midway. Enter the following command to submit the program to the scheduler:
sbatch stata_example.sbatch
Output from this example can be found in the file named stata_example.log
, which will be created automatically in your current directory.
Running Parallel Stata Jobs
The parallel version of Stata, Stata/MP, can speed up computations and make effective use of RCC’s resources. When running Stata/MP, you are limited to 16 cores and 5000 variables. Run an interactive Stata/MP session:
sinteractive
module load stata
stata-mp
# or, for the graphical interface:
xstata-mp
Here is a sample do-file that would benefit from parallelization. It runs bootstrap estimation on another data set that ships with Stata.
version 13
sysuse auto
expand 10000
bootstrap: logistic foreign price-gear_ratio
Here is a submission script that will run the above do-file with Stata/MP:
#!/bin/bash
#SBATCH --job-name=stataMP
#SBATCH --output=stata_parallel.out
#SBATCH --error=stata_parallel.err
#SBATCH --nodes=1
#SBATCH --tasks-per-node=16
module load stata
stata-mp -b stata_parallel.do
Download stata_parallel.do
and stata_parallel.sbatch
to Midway, then run the program with:
sbatch stata_parallel.sbatch