@@ -12,24 +12,11 @@ WARIO is a tool for the structural characterization of highly-flexible proteins.
<br/>
<br/>
#### Running WARIO
To run WARIO to characterize an ensemble, the user can directly execute the [contact_clustering](https://gitlab.laas.fr/moma/methods/analysis/WARIO/-/blob/main/wario/contact_clustering.ipynb) notebook, which contains the detailed pipeline and allows a step-by-step implementation of the tool.
The notebook takes a conformational ensemble as input and returns the weighted family of contact maps that characterizes the ensemble. The data featurization is performed at a first stage, allowing the user to adjust the resolution of the clustering algorithm afterwards. Then, clustering and post-processing are performed, and the results are displayed and saved.
##### Input data
Ensembles can be given as input in several of the most common data formats. WARIO accepts one .xtc file together with a topology file in any format admitted by [MDTraj](https://mdtraj.org/), one multiframe .pdb file, or a folder containing one .pdb file per conformation. Users can also choose to characterize sequence segments instead of the entire sequence. Details are provided in [contact_clustering](https://gitlab.laas.fr/moma/methods/analysis/WARIO/-/blob/main/wario/contact_clustering.ipynb). It should be noted that the current implementation of WARIO requires an all-atom representation of the protein backbone.
##### Output and results
The main output of WARIO is given through a weighted set of $\omega$-contact maps depicting the interaction patterns that characterize each cluster. Plots with cluster-specific DSSP propensities and the average radius of gyration are also provided. The notebook allows to export the clusters of conformations in the same format as the input ensemble. These files can be used for further analysis of the retrieved contact patterns and for the calculation of other structural descriptors corresponding to the practitioner's needs.
#### The paper
## The paper
The complete and detailed presentation of the method is available in the WARIO paper: [https://doi.org/10.21203/rs.3.rs-4149901/v1](https://doi.org/10.21203/rs.3.rs-4149901/v1).
## Installing WARIO
WARIO and its required dependencies can be automatically installed if Python >=3.8 is available. We recommend to perform the installation inside a [Python virtual environment](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/). It can be created as follows
...
...
@@ -50,6 +37,21 @@ opens the ready-to-use jupyter notebook.
The installation procedure works correctly with recent versions of Linux (Ubuntu 20.04 and 22.04) and MacOS (Sonoma 14.4.1) operating systems. Typical install time on a normal desktop computer is around 5 minutes. If you encounter any trouble to install WARIO, please file an [issue](https://gitlab.laas.fr/moma/methods/analysis/WARIO/-/issues) or [contact us](mailto:javier.gonzalezdelgado@mcgill.ca).
## Running WARIO
To run WARIO to characterize an ensemble, the user can directly execute the [contact_clustering](https://gitlab.laas.fr/moma/methods/analysis/WARIO/-/blob/main/wario/contact_clustering.ipynb) notebook, which contains the detailed pipeline and allows a step-by-step implementation of the tool.
The notebook takes a conformational ensemble as input and returns the weighted family of contact maps that characterizes the ensemble. The data featurization is performed at a first stage, allowing the user to adjust the resolution of the clustering algorithm afterwards. Then, clustering and post-processing are performed, and the results are displayed and saved.
### Input data
Ensembles can be given as input in several of the most common data formats. WARIO accepts one .xtc file together with a topology file in any format admitted by [MDTraj](https://mdtraj.org/), one multiframe .pdb file, or a folder containing one .pdb file per conformation. Users can also choose to characterize sequence segments instead of the entire sequence. Details are provided in [contact_clustering](https://gitlab.laas.fr/moma/methods/analysis/WARIO/-/blob/main/wario/contact_clustering.ipynb). It should be noted that the current implementation of WARIO requires an all-atom representation of the protein backbone.
### Output and results
The main output of WARIO is given through a weighted set of $\omega$-contact maps depicting the interaction patterns that characterize each cluster. Plots with cluster-specific DSSP propensities and the average radius of gyration are also provided. The notebook allows to export the clusters of conformations in the same format as the input ensemble. These files can be used for further analysis of the retrieved contact patterns and for the calculation of other structural descriptors corresponding to the practitioner's needs.
## DEMO (data example)
To test WARIO, we provide a dataset corresponding to the conformational ensemble of a small peptide, P-113, obtained from [MD simulations](https://gitlab.laas.fr/moma/methods/analysis/WARIO/-/tree/main/wario/P113). Data was extracted from [(Jephthah _et al._, 2021)](https://doi.org/10.1021/acs.jctc.1c00408), where details on the simulation and the force-field can be found. The P-113 ensemble contains $n=100001$ conformations for a sequence of length $L=12$. To run WARIO on this data, the user can uncomment the lines