Sections

Tools

A set of tools that complement OOPS has been implemented and it is available in the downloads section. The main purpose of these tools is to create the statistical parameters needed to run OOPS simulations.

PDBget

This tool allows one to download and update a local copy of all the structures stored in the Protein Data Bank. From this local copy, it can create a culled subset by running PISCES (included with PDBget). The culled version of the PDB is used, in turn, to calculate a database of sequence properties, consisting in secondary structure assignments, torsional angles, Ramachandran basins and solvent accessible areas for all the structures in the culled subset.

PDBget offers the following commands:

  • get DIR
    Downloads the PDB to the directory DIR (by default, DIR = ./pdb). If data has been alread downloaded to DIR, it will update the files, deleting outdated stuctures and adding new ones.
  • cull -h MAX_HOMOLOGY -res MAX_RESOLUTION -minl MIN_LENGTH -maxl MAX_LENGTH -R MAX_RFACTOR -pisces PISCES_DIR -pdb PDB_DIR -culled CULL_DIR [-NMR -list]
    It will cull the local copy of the PDB stored in PDB_DIR, using the stand-alone version of PISCES located in PISCES_DIR. The culled files will be saved to CULL_DIR. The default parameters for the culling are:
    MAX_HOMOLOGY = 25%
    MAX_RESOLUTION = 2.0 amstrongs
    MIN_LENGTH = 20 residues
    MAX_LENGTH = 10000 residues
    MAX_RFACTOR = 0.3
    If the switches -NMR and -list are added, then NMR structures are included and the culling files are not created, respectively.
  • genseq -in IN_DIR -out OUT_DIR -tmp TEMP_DIR -cel COIL_END_POINTS_LENGTH -cml COIL_MIN_LENGTH -connect CONNECT_FILE -rama RAMACHANDRAN_FILE -plg INPUT_PLUGIN
    Calculates the sequence properties of the structure files located in IN_DIR, and it saves the results to OUT_DIR (a file with extension .seq is created for each input structure). The coil assignment is controlled with the COIL_END_POINTS_LENGTH and COIL_MIN_LENGTH parameters. The Ramachandran basins are defined in RAMACHANDRAN_FILE, so by changing this file different basin assignments will be generated. The defaults parameters are:
    COIL_END_POINTS_LENGTH = 1
    COIL_MIN_LENGTH = 2
    RAMACHANDRAN_FILE = RamaTypology.par

TLIBgen

TLIBgen constructs the torsional libraries needed by OOPS to sample conformations in (phi, psi, omega) space. The source for these libraries is the database of sequence properties generated by PDBget.

TLIBgen offers the following commands:

  • generate -seq SEQ_DIR -tlib TLIB_DIR
    Constructs coil and native torsional libraries from the sequence data in SEQ_DIR, and saves them to TLIB_DIR (by default, SEQ_DIR = ../PDBget/seqdata and TLIB_DIR = ./tlib).
  • clean DIR
    Deletes the coil and native libraries stored in DIR.

BLIBgen

The ab-initio mode in OOPS requires a library of Ramachandran basin fragments for the input amino acid sequence. BLIBgen generates these basin libraries using the sequence properties from PDBget together with PSI-blast alignments.

BLIBgen offers the following commands:

  • setup SEQ_DIR
    It generates the basin and sequence databases needed to run BLIBgen, using the sequence properties stored in SEQ_DIR (by default, SEQ_DIR = ../PDBget/seqdata).
  • generate FASTA_FILE
    Generates the basin library for the amino acid sequence in FASTA_FILE, using standard parameters of BLIBgen. The resulting library is saved to ./output/NAME, where NAME is the same as the FASTA_FILE.
  • blib-blast -if FASTA_FILE -basind BASIN_DIR -blastd BLAST_DIR -t THRESHOLD -ti THRESHOLD_ITERATIONS -tm THRESHOLD_MIN -w WORD_SIZE -i ITERATIONS -min MIN_FRAG_LENGTH -max MAX_FRAG_LENGTH [-PSI]
    Runs BLIBgen on FASTA_FILE using all the specified parameters.

Decoys

Decoys is a tool intended to rank decoy sets according to user-provided energy functions. It calculates the energies of each decoy structure, and also the RMSD with the native conformation. It also generates RMSD-energy scatter plots and calculates the Z-scores of the decoy sets.

Decoys offers the following commands:

  • rank -id IN_DIR -cfg CFG_FILE
    It looks for decoys sets inside IN_DIR. Each decoy set must to be contained in a separate subdirectory, where the native can be added as a file where the name ends in "native.pdb". CFG_FILE contains the list of energy functions that are to be computed on each set.
  • zscore -id IN_DIR -cfg CFG_FILE
    It uses the valid decoy sets it finds inside IN_DIR to construct a linear combination of the energy terms given in CFG_FILE such the average Z-score computed over all the sets is maximized. Each decoy set must contain the files with the energy values of each desired term (which are generated with the script described above).