Skip to content

Stata

Stata is a powerful statistical software package that is widely used in scientific computing. RCC users are licensed to use Stata on all RCC resources. Stata can be used interactively or as a submitted script. Please note that if you would like to run it interactively, you must still run it on a compute node, in order to keep the login nodes free for other users. Stata can be run in parallel on up to 16 nodes.

NOTE: Stata examples in this document are adapted from a Princeton tutorial. You may find it useful if you are new to Stata or want a refresher.

Getting Started

If you need to use the Stata GUI, connect to Midway with ThinLinc.

Obtain an interactive session on a compute node. This is necessary so that your computation doesn’t interrupt other users on the login node. Now, load Stata:

sinteractive
module load stata
xstata

This will open up a Stata window. The middle pane has a text box to enter commands at the bottom, and a box for command results on top. On the left there’s a box called “Review” that shows your command history. The right-hand box contains information about variables in the currently-loaded data set.

One way Stata can be used is as a fancy desktop calculator. Type the following code into the command box:

display 2+2

Stata can do much more if data is loaded into it. The following code loads census data that ships with Stata, prints a description of the data, then creates a graph of life expectancy over GNP:

sysuse lifeexp
describe
graph twoway scatter lexp gnppc

Running Stata from the command line

This is very similar to running graphically; the command-line interface is equivalent to the “Results” pane in the graphical interface. Again, please use a compute node if you are running computationally-intensive calculations:

sinteractive
module load stata
stata

Running Stata Jobs with SLURM

You can also submit Stata jobs to SLURM, the scheduler. A Stata script is called a “do-file,” which contains a list of Stata commands that the interpreter will execute. You can write a do-file in any text editor, or in the Stata GUI’s do-file editor: click “Do-File Editor”” in the “Window” menu. If your do-file is named “example.do,” you can run it with either of the following commands:

stata < example.do
stata -b do example.do

Here is a very simple do-file, which computes a regression on the sample data set from above:

version 13 // current version of Stata, this is optional but recommended.

sysuse lifeexp
gen loggnppc = log(gnppc)
regress lexp loggnppc

Here is a submission script that submits the Stata program to the default queue on Midway:

#!/bin/bash

#SBATCH --job-name=stataEx
#SBATCH --output=stata_example.out
#SBATCH --error=stata_example.err
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1

module load stata

stata -b stata_example.do

stata_example.do is our example do-file, and stata_example.sbatch is the submission script.

To run this example, download both files to a directory on Midway. Enter the following command to submit the program to the scheduler:

sbatch stata_example.sbatch

Output from this example can be found in the file named stata_example.log, which will be created automatically in your current directory.

Running Parallel Stata Jobs

The parallel version of Stata, Stata/MP, can speed up computations and make effective use of RCC’s resources. When running Stata/MP, you are limited to 16 cores and 5000 variables. Run an interactive Stata/MP session:

sinteractive
module load stata
stata-mp
# or, for the graphical interface:
xstata-mp

Here is a sample do-file that would benefit from parallelization. It runs bootstrap estimation on another data set that ships with Stata.

version 13

sysuse auto
expand 10000
bootstrap: logistic foreign price-gear_ratio

Here is a submission script that will run the above do-file with Stata/MP:

#!/bin/bash

#SBATCH --job-name=stataMP
#SBATCH --output=stata_parallel.out
#SBATCH --error=stata_parallel.err
#SBATCH --nodes=1
#SBATCH --tasks-per-node=16

module load stata
stata-mp -b stata_parallel.do

Download stata_parallel.do and stata_parallel.sbatch to Midway, then run the program with:

sbatch stata_parallel.sbatch