Alphafold2-I

2022-11-21 12:18:51 浏览数 (1)

alphafold2的安装方法有两种:

  1. docker安装版本
  2. 通过conda安装不用docker版本

今天我们主要介绍的是第二种,

AlphaFold Non-Docker setup

首先第一步:安装conda:

代码语言:javascript复制
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh

第二步:创建conda环境并且更新和激活

代码语言:javascript复制
conda create --name alphafold python==3.8
conda update -n base conda
conda activate alphafold

第三步:安装依赖:

代码语言:javascript复制
conda install -y -c conda-forge openmm==7.5.1 cudnn==8.2.1.32 cudatoolkit==11.0.3 pdbfixer==1.7
conda install -y -c bioconda hmmer==3.3.2 hhsuite==3.3.0 kalign2==2.04

第四步:下载alphafold2:

代码语言:javascript复制
wget https://github.com/deepmind/alphafold/archive/refs/tags/v2.2.0.tar.gz && tar -xzf v2.2.0.tar.gz && export alphafold_path="$(pwd)/alphafold-2.2.0"

第五步:下载数据集,也是最头疼的一步,因为数据集太大,网络波动也很大,这里提供两个下载数据集的办法:

a.使用一个bash脚本——download_db.sh

代码语言:javascript复制
#!/bin/bash
# Description: Downloads and unzips all required data for AlphaFold2 (AF2).
# Author: Sanjay Kumar Srikakulam

# Since some parts of the script may resemble AlphaFold's download scripts copyright and License notice is added.
# Copyright 2021 DeepMind Technologies Limited
# Licensed under the Apache License, Version 2.0 (the "License");
# You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

set -e

# Input processing
usage() {
        echo ""
        echo "Please make sure all required parameters are given"
        echo "Usage: $0 <OPTIONS>"
        echo "Required Parameters:"
        echo "-d <download_dir>     Absolute path to the AF2 download directory (example: /home/johndoe/alphafold_data)"
        echo "Optional Parameters:"
        echo "-m <download_mode>    full_dbs or reduced_dbs mode [default: full_dbs]"
        echo ""
        exit 1
}

while getopts ":d:m:" i; do
        case "${i}" in
        d)
                download_dir=$OPTARG
        ;;
        m)
                download_mode=$OPTARG
        ;;
        esac
done

if [[  $download_dir == "" ]]; then
    usage
fi

if [[  $download_mode == "" ]]; then
    download_mode="full_dbs"
fi

if [[ $download_mode != "full_dbs" && $download_mode != "reduced_dbs" ]]; then
    echo "Download mode '$download_mode' is not recognized"
    usage
fi

# Check if rsync, wget, gunzip and tar command line utilities are available
check_cmd_line_utility(){
    cmd=$1
    if ! command -v "$cmd" &> /dev/null; then
        echo "Command line utility '$cmd' could not be found. Please install."
        exit 1
    fi    
}

check_cmd_line_utility "wget"
check_cmd_line_utility "rsync"
check_cmd_line_utility "gunzip"
check_cmd_line_utility "tar"

# Make AF2 data directory structure
params="$download_dir/params"
mgnify="$download_dir/mgnify"
pdb70="$download_dir/pdb70"
pdb_mmcif="$download_dir/pdb_mmcif"
mmcif_download_dir="$pdb_mmcif/data_dir"
mmcif_files="$pdb_mmcif/mmcif_files"
uniclust30="$download_dir/uniclust30"
uniref90="$download_dir/uniref90"
uniprot="$download_dir/uniprot"
pdb_seqres="$download_dir/pdb_seqres"

download_dir=$(realpath "$download_dir")
mkdir --parents "$download_dir"
mkdir "$params" "$mgnify" "$pdb70" "$pdb_mmcif" "$mmcif_download_dir" "$mmcif_files" "$uniclust30" "$uniref90" "$uniprot" "$pdb_seqres"

# Download AF2 parameters
echo "Downloading AF2 parameters"
params_filename="alphafold_params_2022-03-02.tar"
wget -P "$params" "https://storage.googleapis.com/alphafold/alphafold_params_2022-03-02.tar"
tar --extract --verbose --file="$params/$params_filename" --directory="$params" --preserve-permissions
rm "$params/$params_filename"

# Download BFD/Reduced BFD database
if [[ "$download_mode" = "full_dbs" ]]; then
    echo "Downloading BFD database"
    bfd="$download_dir/bfd"
    mkdir "$bfd"
    bfd_filename="bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz"
    wget -P "$bfd" "https://storage.googleapis.com/alphafold-databases/casp14_versions/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz"
    tar --extract --verbose --file="$bfd/$bfd_filename" --directory="$bfd"
    rm "$bfd/$bfd_filename"
else
    echo "Downloading reduced BFD database"
    small_bfd="$download_dir/small_bfd"
    mkdir "$small_bfd"
    small_bfd_filename="bfd-first_non_consensus_sequences.fasta.gz"
    wget -P "$small_bfd" "https://storage.googleapis.com/alphafold-databases/reduced_dbs/bfd-first_non_consensus_sequences.fasta.gz"
    (cd "$small_bfd" && gunzip "$small_bfd/$small_bfd_filename")
fi

# Download MGnify database
echo "Downloading MGnify database"
mgnify_filename="mgy_clusters_2018_12.fa.gz"
wget -P "$mgnify" "https://storage.googleapis.com/alphafold-databases/casp14_versions/${mgnify_filename}"
(cd "$mgnify" && gunzip "$mgnify/$mgnify_filename")

# Download PDB70 database
echo "Downloading PDB70 database"
pdb70_filename="pdb70_from_mmcif_200401.tar.gz"
wget -P "$pdb70" "http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/old-releases/${pdb70_filename}"
tar --extract --verbose --file="$pdb70/$pdb70_filename" --directory="$pdb70"
rm "$pdb70/$pdb70_filename"

# Download PDB obsolete data
wget -P "$pdb_mmcif" "ftp://ftp.wwpdb.org/pub/pdb/data/status/obsolete.dat"

# Download PDB mmCIF database
echo "Downloading PDB mmCIF database"
rsync --recursive --links --perms --times --compress --info=progress2 --delete --port=33444 rsync.rcsb.org::ftp_data/structures/divided/mmCIF/ "$mmcif_download_dir"
find "$mmcif_download_dir/" -type f -iname "*.gz" -exec gunzip {}  
find "$mmcif_download_dir" -type d -empty -delete

for sub_dir in "$mmcif_download_dir"/*; do
  mv "$sub_dir/"*.cif "$mmcif_files"
done

find "$mmcif_download_dir" -type d -empty -delete

# Download Uniclust30 database
echo "Downloading Uniclust30 database"
uniclust30_filename="uniclust30_2018_08_hhsuite.tar.gz"
wget -P "$uniclust30" "https://storage.googleapis.com/alphafold-databases/casp14_versions/${uniclust30_filename}"
tar --extract --verbose --file="$uniclust30/$uniclust30_filename" --directory="$uniclust30"
rm "$uniclust30/$uniclust30_filename"

# Download Uniref90 database
echo "Downloading Unifef90 database"
uniref90_filename="uniref90.fasta.gz"
wget -P "$uniref90" "ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/${uniref90_filename}"
(cd "$uniref90" && gunzip "$uniref90/$uniref90_filename")

# Download Uniprot database
echo "Downloading Uniprot (TrEMBL and Swiss-Prot) database"
trembl_filename="uniprot_trembl.fasta.gz"
trembl_unzipped_filename="uniprot_trembl.fasta"
wget -P "$uniprot" "ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/${trembl_filename}"
(cd "$uniprot" && gunzip "$uniprot/$trembl_filename")

sprot_filename="uniprot_sprot.fasta.gz"
sprot_unzipped_filename="uniprot_sprot.fasta"
wget -P "$uniprot" "ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/${sprot_filename}"
(cd "$uniprot" && gunzip "$uniprot/$sprot_filename")

# Concatenate TrEMBL and Swiss-Prot, rename to uniprot and clean up.
cat "$uniprot/$sprot_unzipped_filename" >> "$uniprot/$trembl_unzipped_filename"
mv "$uniprot/$trembl_unzipped_filename" "$uniprot/uniprot.fasta"
rm "$uniprot/$sprot_unzipped_filename"

# Download PDB seqres database
wget -P "$pdb_seqres" "ftp://ftp.wwpdb.org/pub/pdb/derived_data/pdb_seqres.txt"

echo "All AF2 required data is downloaded"

b.使用另外一个脚本,基于aria2c的

这里大家可以根据自己的网络来选择一种下载方式

第五步:如果大家可以顺利的把数据集下载下来,那么恭喜你就可以运行alphafold2了

还是使用bash脚本来运行

代码语言:javascript复制
Usage: run_alphafold.sh <OPTIONS>
Required Parameters:
-d <data_dir>         Path to directory of supporting data
-o <output_dir>       Path to a directory that will store the results.
-f <fasta_path>       Path to a FASTA file containing sequence. If a FASTA file contains multiple sequences, then it will be folded as a multimer
-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets
Optional Parameters:
-g <use_gpu>          Enable NVIDIA runtime to run with GPUs (default: true)
-r <run_relax>        Whether to run the final relaxation step on the predicted models. Turning relax off might result in predictions with distracting stereochemical violations but might help in case you are having issues with the relaxation stage (default: true)
-e <enable_gpu_relax> Run relax on GPU if GPU is enabled (default: true)
-n <openmm_threads>   OpenMM threads (default: all available cores)
-a <gpu_devices>      Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)
-m <model_preset>     Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or multimer model (default: 'monomer')
-c <db_preset>        Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (default: 'full_dbs')
-p <use_precomputed_msas> Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have changed (default: 'false')
-l <num_multimer_predictions_per_model> How many predictions (each with a different random seed) will be generated per model. E.g. if this is 2 and there are 5 models then there will be 10 predictions per input. Note: this FLAG only applies if model_preset=multimer (default: 5)
-b <benchmark>        Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: 'false')
代码语言:javascript复制
bash run_alphafold.sh -d /media/tom/extra/new/data -o /media/tom/extra/alphafold/cai -f /media/tom/extra/alphafold/pre.fasta -t 2022-05-14 -c reduced_dbs -l 1 -r false

上面这条指令就是我在服务器上使用的指令

最后程序运行的结果就是这个文件夹输出的内容

0 人点赞