Skip to content
Snippets Groups Projects

WARIO: Weighted fAmilies of contact maps to chaRacterize conformational ensembles of (highly-)flexIble prOteins

WARIO is a tool for the structural characterization of highly-flexible proteins. WARIO takes as input a conformational ensemble (e.g. generated form molecular dynamics simulations or other sampling methods) and represents it as a weighted family of contact maps. Contact is redefined by a continuous function, taking values in

[0,1]
, that incorporates the relative orientation of the interacting residues as well as the sequence information. Then, the featured data is embedded into a 10-dimensional UMAP space and clustered using the HDBSCAN algorithm. Finally, the average values of the contact function across each cluster conformation are represented as cluster-specific contact maps. The maps are assigned with a weight given by the cluster occupancy. The following figure illustrates the main steps of the pipeline.







Running WARIO

To run WARIO to characterize an ensemble, the user can directly execute the contact_clustering notebook, which contains the detailed pipeline and allows a step-by-step implementation of the tool.

The paper

The complete and detailed presentation of the method is available in the WARIO paper: https://doi.org/10.21203/rs.3.rs-4149901/v1.

Installing WARIO

WARIO and its required dependencies can be automatically installed if Python >=3.8 is available. We recommend to perform the installation inside a Python virtual environment. It can be created as follows

python3 -m venv pythonEnv
source pythonEnv/bin/activate

Then, WARIO is installed with

pip install -U pip
pip install git+https://gitlab.laas.fr/moma/WARIO.git

Once the installation is completed, the command

wario-notebooks

opens the ready-to-use jupyter notebook.

The installation procedure works correctly with recent versions of Linux (Ubuntu 20.04 and 22.04) and MacOS (Sonoma 14.4.1) operating systems. Typical install time on a normal desktop computer is around 5 minutes. If you encounter any trouble to install WARIO, please file an issue or contact us.

DEMO (data example)

WARIO can be tested on the MD simulation of the P113 protein ensemble included here. This dataset contains

n=100001
conformations for a sequence of length
L=13
. To run WARIO on the P113 ensemble, the user can uncomment the lines

#ensemble_folder = "/".join([path_to_notebook,'P113']) # Path to the folder containing trajectory files
#ensemble_name = "P113" # Name the ensemble

in the second code cell of the contact_clustering notebook, and proceed following the usual guidelines. Then, results will be automatically saved in the P113 folder. The typical running time for this data example is around 30 minutes in a normal desktop computer using 5 threads.