Alphafold
AlphaFold is an artificial intelligence program developed by Google DeepMind, a subsidiary of Alphabet, which performs predictions of protein structure.
Available modules
AlphaFold 2
AlphaFold 2 is available as modules on Midway3 that you can check via module avail alphafold
.
module avail alphafold
---------------------- /software/modulefiles----------------------------------
alphafold/2.0.0(default) alphafold/2.2.0 alphafold/2.3.2
The AlphaFold source code and running scripts (e.g. run_alphafold.py
) can be found at the Alphafold GitHub.
The training data sets for different versions of Alphafold are accessible under /software/alphafold-data/
, /software/alphafold-data-2.2/
and /software/alphafold-data-2.3/
.
AlphaFold 3
AlphaFold 3 uses a container-based approach and requires different input arguments. See the example job script below.
Example job scripts
Typically, Alphafold2 uses OpenMM, a GPU-accelerated molecular simulation package, to relax the candidate protein. OpenMM requires the CUDA toolkit to run on a GPU node.
If you want to run on a CPU-only node without the relaxation run for the candidate protein, you can run the python script run_alphafold.py
with --use_gpu_relax=false
.
The following example job script illustrates how to use the alphafold/2.3.2
module on a GPU node with 2 GPUs and up to 16 CPU cores for multithreading on Midway3.
#!/bin/bash
#SBATCH --job-name=alphafold2
#SBATCH --account=[your-accountname]
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --time=04:00:00
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --gres=gpu:2
#SBATCH --constraint=v100
#SBATCH --mem=64G
module load alphafold/2.3.2 cuda/11.3
cd $SLURM_SUBMIT_DIR
echo "GPUs available: GPU ID $CUDA_VISIBLE_DEVICES"
echo "CPU cores: $SLURM_CPUS_PER_TASK"
DOWNLOAD_DATA_DIR=/software/alphafold-data-2.3
python run_alphafold.py \
--data_dir=$DOWNLOAD_DATA_DIR \
--uniref90_database_path=$DOWNLOAD_DATA_DIR/uniref90/uniref90.fasta \
--mgnify_database_path=$DOWNLOAD_DATA_DIR/mgnify/mgy_clusters_2022_05.fa \
--bfd_database_path=$DOWNLOAD_DATA_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniref30_database_path=$DOWNLOAD_DATA_DIR/uniref30/UniRef30_2021_03 \
--pdb70_database_path=$DOWNLOAD_DATA_DIR/pdb70/pdb70 \
--template_mmcif_dir=$DOWNLOAD_DATA_DIR/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=$DOWNLOAD_DATA_DIR/pdb_mmcif/obsolete.dat \
--model_preset=monomer \
--max_template_date=2022-1-1 \
--db_preset=full_dbs \
--use_gpu_relax=true \
--output_dir=out_alphafold_2.1.1_multi-monomer \
--fasta_paths=T1083.fasta,T1084.fasta
For AlphaFold 3, suppose that you have downloaded a .json file that defines the sequences for the calculation, for instance,
nipah_zmr.json and put the downloaded json file under /home/$USER
.
#!/bin/bash
#SBATCH --job-name=alphafold3
#SBATCH --account=[your-accountname]
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --time=04:00:00
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --gres=gpu:2
#SBATCH --constraint=a100
#SBATCH --mem=32G
module load apptainer
cd $SLURM_SUBMIT_DIR
mkdir -p /tmp/$USER
DOWNLOAD_DATA_DIR=/software/alphafold3.0-el8-x86_64/databases
BIND_PATHS="/software/alphafold3.0-el8-x86_64/databases,/software/alphafold3.0-el8-x86_64/params,/software/alphafold3.0-el8-x86_64/singularity,/tmp/$USER,/home/$USER,/scratch/midway3/$USER"
# Run the Singularity container `alphafold3.sif` provided under `/software/alphafold3.0-el8-x86_64` with the .json file:
singularity exec --nv \
-B "$BIND_PATHS" \
--env CUDA_VISIBLE_DEVICES=0,1,NVIDIA_VISIBLE_DEVICES=0,1 \
/software/alphafold3.0-el8-x86_64/alphafold3.sif \
python /app/alphafold/run_alphafold.py \
--json_path=/home/$USER/nipah_zmr.json \
--db_dir=$DOWNLOAD_DATA_DIR \
--output_dir=/scratch/midway3/$USER/alphafold3_output \
--model_dir=/software/alphafold3.0-el8-x86_64/params \
--flash_attention_implementation=triton \
--run_data_pipeline=True \
--run_inference=True \
--jackhmmer_n_cpu=8 \
--nhmmer_n_cpu=8