From 134310b5aeb7e5c9f76026e5a0f4b388ebcb76e2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Javier=20Gonz=C3=A1lez-Delgado?= <88046156+gonzalez-delgado@users.noreply.github.com> Date: Tue, 27 Jun 2023 14:30:28 +0200 Subject: [PATCH] Delete CONCHA/.ipynb_checkpoints directory --- .../contact_clustering-checkpoint.ipynb | 329 ------------- .../contact_features-checkpoint.ipynb | 245 ---------- .../get_contacts-checkpoint.ipynb | 130 ----- .../get_coordinates-checkpoint.ipynb | 192 -------- .../.ipynb_checkpoints/utils-checkpoint.ipynb | 455 ------------------ 5 files changed, 1351 deletions(-) delete mode 100644 CONCHA/.ipynb_checkpoints/contact_clustering-checkpoint.ipynb delete mode 100644 CONCHA/.ipynb_checkpoints/contact_features-checkpoint.ipynb delete mode 100644 CONCHA/.ipynb_checkpoints/get_contacts-checkpoint.ipynb delete mode 100644 CONCHA/.ipynb_checkpoints/get_coordinates-checkpoint.ipynb delete mode 100644 CONCHA/.ipynb_checkpoints/utils-checkpoint.ipynb diff --git a/CONCHA/.ipynb_checkpoints/contact_clustering-checkpoint.ipynb b/CONCHA/.ipynb_checkpoints/contact_clustering-checkpoint.ipynb deleted file mode 100644 index d185ea3..0000000 --- a/CONCHA/.ipynb_checkpoints/contact_clustering-checkpoint.ipynb +++ /dev/null @@ -1,329 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "bb55010f", - "metadata": {}, - "source": [ - "### CONCHA: Contact-based characterization of conformational ensembles of highly flexible proteins\n", - "\n", - "This is the Jupyter notebook implementing the pipeline that defines CONCHA. The notebook takes an ensemble as input, features its conformations using contact information and perform a clustering algorithm to retrieve the ensemble characterization. The user can follow the guidelines presented below to implement CONCHA step-by-step. First, the required libraries must be imported together with the experimental parameters that calibrate the contact function." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "09b972d2", - "metadata": {}, - "outputs": [], - "source": [ - "# Required imports\n", - "import ipynb\n", - "from ipynb.fs.full.contact_features import *\n", - "from ipynb.fs.full.assortativity_features import *\n", - "from ipynb.fs.full.utils import *\n", - "\n", - "import os\n", - "import numpy as np\n", - "import hdbscan\n", - "path_to_notebook = os.path.abspath(os.getcwd())\n", - "#print(path_to_notebook) # Visualize the notebook directory\n", - "\n", - "th_file = \"/\".join([path_to_notebook,'contact_thresholds_range.txt']) # Load experimentally determined contact parameters" - ] - }, - { - "cell_type": "markdown", - "id": "0d71c087", - "metadata": {}, - "source": [ - "#### Set the folder containing the ensemble information\n", - "\n", - "The input ensemble must be given as a directory containing trajectory and topology information. The folder must contain only:\n", - "\n", - "* One .xtc file together with a topology file in one of the formats admitted by [MDTraj](https://www.mdtraj.org/1.9.8.dev0/api/generated/mdtraj.load.html#mdtraj.load) or,\n", - "* One multiframe .pdb file or,\n", - "* If the ensemble is given as a list of .pdb files (one file per conformation), one folder containing one .pdb file per conformation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "402b3b92", - "metadata": {}, - "outputs": [], - "source": [ - "ensemble_folder = '/directory_with_data/'\n", - "ensemble_name = 'my_ensemble' \n", - "\n", - "# To run an example with real data, uncomment:\n", - "#ensemble_folder = \"/\".join([path_to_notebook,'P113']) # Path to the folder containing trajectory files\n", - "#ensemble_name = \"P113\" # Name the ensemble" - ] - }, - { - "cell_type": "markdown", - "id": "b8029a85", - "metadata": {}, - "source": [ - "We ask you not to choose ```ensemble_folder``` as a folder where multiple or redundant ensembles are located, but to create a specific directory per ensemble. The analysis can be launched in an interactive mode, which checks with the user whether the input data has been correctly understood. This mode is activated by setting the parameter ```interactive``` to ```True``` in the function ```contact_features``` described below. If you chose to set ```interactive``` to ```False```, the function will automatically interpret the input and launch the computation without checking with the user.\n", - "\n", - "All the results produced along the pipeline will be saved in subdirectories of ```ensemble_folder```. If the computation for the same ensemble has to be repeated from the beggining, we ask you to empty ```ensemble_folder``` or to create a new directory containing only the ensemble input data." - ] - }, - { - "cell_type": "markdown", - "id": "c3593eaf", - "metadata": {}, - "source": [ - "### 1. Data featurization\n", - "#### Compute contact weights and embed data into a low-dimensional UMAP space\n", - "\n", - "The function ```contact_features``` takes the input ensemble data and returns the $n\\times L(L-1)/2$ matrix $W$ containing the contact information, where $n$ is the number of conformations and $L$ the sequence length. The output is automatically saved in ``ensemble_folder`` as a .txt file. The arguments of ```contact_features``` are:\n", - "\n", - "* ```ensemble_name```: The name of the ensemble, to appear in the results files and plots,\n", - "* ```ensemble_path```: The directory containing the ensemble information, as described above,\n", - "* ```N_cores```: The number of threads to use in parallel computation,\n", - "* ```interactive```: Whether to check with the user that data has been correctly understood,\n", - "* ```thresholds```: The file containing the parameters that calibrate the contact function.\n", - "\n", - "The function computes and saves $W$ and its embeddings into a 2-dimensional and 10-dimensional UMAP spaces." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "70f5c839", - "metadata": {}, - "outputs": [], - "source": [ - "contact_features(ensemble_name = ensemble_name, ensemble_path = ensemble_folder, N_cores = 1, interactive = True, thresholds = th_file)" - ] - }, - { - "cell_type": "markdown", - "id": "e98848e5", - "metadata": {}, - "source": [ - "### 2. Contact-based clustering\n", - "\n", - "Once data has been featured with contact information, and embedded into a low-dimensional UMAP space, clustering can be performed to get the ensemble characterization.\n", - "\n", - "#### Perform HDBSCAN clustering in the low-dimensional space" - ] - }, - { - "cell_type": "markdown", - "id": "75f335b4", - "metadata": {}, - "source": [ - "First, load the featured data frame and its embeddings into the low-dimensional UMAP spaces. These have been automatically saved by ```contact_features``` in the following directories." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "80d44e80", - "metadata": {}, - "outputs": [], - "source": [ - "# Directory containing results\n", - "results_path = \"/\".join([os.path.abspath(ensemble_folder),\"_\".join(['results',ensemble_name])])\n", - "\n", - "# Matrix W with contact information\n", - "wcont_data = pd.read_csv(\"/\".join([results_path, \"_\".join([ensemble_name,'wcontmatrix.txt'])]), sep = ' ', header = None)\n", - "\n", - "# Embedding of W into a 2-dimensional UMAP space for visualization\n", - "embedding_2d = np.load('/'.join([results_path, \"_\".join([ensemble_name,'embedding_2d_wcont.npy'])]))\n", - "\n", - "# Embedding of W into a 10-dimensional UMAP space for clustering\n", - "clusterable_embedding = np.load('/'.join([results_path, \"_\".join([ensemble_name,'clusterable_embedding_wcont.npy'])]))" - ] - }, - { - "cell_type": "markdown", - "id": "a53758cd", - "metadata": {}, - "source": [ - "Now, we can perform HDBSCAN clustering in the low-dimensional UMAP space.\n", - "\n", - "**Minimum cluster size**\n", - "\n", - "The clustering is calibrated through the minimum number of conformations that can define a cluster. This quantity will indirectly define the number of retrieved clusters. The user might choose an appropiate value according to the desired level of precision in the classification and the encountered results. The default minimum cluster size is set to 1% of the total number of conformations. Large sequences might require a smaller minimum cluster size." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "76c8d967", - "metadata": {}, - "outputs": [], - "source": [ - "# Choose minimum cluster size\n", - "min_cluster_size = int(wcont_data.shape[0]*0.005) # Default = 1% of individuals" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c26a5745", - "metadata": {}, - "outputs": [], - "source": [ - "# Perform HDBSCAN clustering\n", - "labels_umap = hdbscan.HDBSCAN(\n", - " min_samples = 10,\n", - " min_cluster_size = min_cluster_size, \n", - ").fit_predict(clusterable_embedding)\n", - "\n", - "classified = np.where(labels_umap >= 0)[0]\n", - "print(\"\".join([\"\\nThe clustering algorithm found \",str(len(np.unique(labels_umap[labels_umap >= 0]))),\" clusters and classified the \",str(np.round(100*len(classified)/len(labels_umap),2)),\"% of conformations. \\n\"]))" - ] - }, - { - "cell_type": "markdown", - "id": "53f0ce88", - "metadata": {}, - "source": [ - "#### Results visualization\n", - "\n", - "Clustering partition visualized on the 2-dimensional UMAP space. This illustrates the repartition of conformations among clusters and their corresponding occupancy. By looking at the number of connected components in the space, the minimum cluster size might be re-calibrated. Note that unclassified points appear in gray.\n", - "\n", - "The function ```plot_2umap``` is documented in the notebook [utils](utils.ipynb)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "616b989a", - "metadata": {}, - "outputs": [], - "source": [ - "plot_2umap(embedding_2d, labels_umap, ensemble_name, results_path)" - ] - }, - { - "cell_type": "markdown", - "id": "dc831da1", - "metadata": {}, - "source": [ - "### 3. Cluster-specific $\\omega$-contact maps\n", - "\n", - "Compute the ensemble characterization defined as a weighted family of cluster-specific $\\omega$-contact maps." - ] - }, - { - "cell_type": "markdown", - "id": "9e821614", - "metadata": {}, - "source": [ - "#### 3.1 Plot cluster-specific $\\omega$-contact maps" - ] - }, - { - "cell_type": "markdown", - "id": "396acb0c", - "metadata": {}, - "source": [ - "Run the cell below to create and save cluster-specific $\\omega$-contact maps. Plots are saved in a new subdirectory of ```ensemble_folder```. The function ```get_wmaps``` is documented in [utils](utils.ipynb)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "96f6dcf0", - "metadata": {}, - "outputs": [], - "source": [ - "get_wmaps(wcont_data, labels_umap, ensemble_name, results_path)" - ] - }, - { - "cell_type": "markdown", - "id": "f9916b8d", - "metadata": {}, - "source": [ - "#### 3.2 Create cluster-specific files (for .xtc trajectories) or folders (for .pdb folders)\n", - "\n", - "The clustering partition can be used to build new cluster-specific files containing the cluster conformations. The files are automatically saved in a new directory inside of ```ensemble_folder```. Note that the files will be produced in the same format as the one of the input data: one .xtc file per cluster (if data is given as a .xtc file with a topology file, or as a multiframe .pdb file) or a folder per cluster containing its conformations (if data is given as a folder with one .pdb file per conformation). The function ```get_cluster_files``` is documented in [utils](utils.ipynb)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "b5d146a1", - "metadata": {}, - "outputs": [], - "source": [ - "get_cluster_files(ensemble_path = ensemble_folder, ensemble_name = ensemble_name, clustering_partition = labels_umap)" - ] - }, - { - "cell_type": "markdown", - "id": "290cce9e", - "metadata": {}, - "source": [ - "#### 3.3 Sample a representative family of conformations\n", - "\n", - "The ensemble characterization can be used to sample a representative family of conformations of a given size. This is done by sampling conformations from clusters with probabilites given by the cluster occupancies. In other words, if $p_1,\\ldots,p_K$ are the (normalized) occupancies of clusters $\\mathcal{C}_1,\\ldots,\\mathcal{C}_K$ respectively, sample from the distribution\n", - "\n", - "$$\n", - " p_1\\mathcal{U}(\\mathcal{C}_1)+\\cdots + p_K\\mathcal{U}(\\mathcal{C}_K),\n", - "$$\n", - "\n", - "where $\\mathcal{U}(\\mathcal{S})$ denotes the discrete uniform distribution on the set $\\mathcal{S}\\subset\\lbrace 1,\\ldots,n\\rbrace$. This is performed by the function ```representative_ensemble``` below, which needs to be given the ```size``` of the representative family (in number of conformations) as an argument. Its complete documentation can be found in [utils](utils.ipynb)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "2eb7586a", - "metadata": {}, - "outputs": [], - "source": [ - "representative_ensemble(size = 10, ensemble_path = ensemble_folder, ensemble_name = ensemble_name, clustering_partition = labels_umap)" - ] - }, - { - "cell_type": "markdown", - "id": "b89412b2", - "metadata": {}, - "source": [ - "#### 3.4 Secondary structure propensities and average radius of gyration\n", - "\n", - "The function below computes the average DSSP secondary structure propensities per cluster, together with the average radius of gyration across cluster conformations. The output is given as a plot" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "68305f88", - "metadata": {}, - "outputs": [], - "source": [ - "cluster_descriptors(ensemble_path = ensemble_folder, ensemble_name = ensemble_name, clustering_partition = labels_umap)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.16" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/CONCHA/.ipynb_checkpoints/contact_features-checkpoint.ipynb b/CONCHA/.ipynb_checkpoints/contact_features-checkpoint.ipynb deleted file mode 100644 index 3eb06c2..0000000 --- a/CONCHA/.ipynb_checkpoints/contact_features-checkpoint.ipynb +++ /dev/null @@ -1,245 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "id": "3ff6fe72", - "metadata": {}, - "outputs": [], - "source": [ - "# Load notebooks with required functions\n", - "import ipynb\n", - "from ipynb.fs.full.wcontact_matrix import *\n", - "\n", - "# Load required libraries\n", - "import numpy as np\n", - "import os\n", - "from tqdm import tqdm \n", - "from joblib import Parallel, delayed \n", - "from functools import partial\n", - "import mdtraj as md \n", - "import itertools\n", - "import pandas as pd \n", - "import MDAnalysis\n", - "import time\n", - "import umap\n", - "import numba\n", - "import pynndescent\n", - "import sys\n", - "import warnings #Optional\n", - "warnings.filterwarnings(\"ignore\") #Optional" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "d158cc6b", - "metadata": {}, - "outputs": [], - "source": [ - "def contact_features(ensemble_path, ensemble_name, thresholds, interactive = True, N_cores = 1, start = None, end = None, sel_chain = None, umap_dim = 10):\n", - " \n", - " # Initial parameters\n", - " var_dict = {'multiframe' : 'n', 'check_folder' : True, 'do_xtc' : False, 'do_pdb' : False,\n", - " 'N' : 1, 'start' : start, 'end' : end,\n", - " 'ensemble_name' : ensemble_name, 'ensemble_path' : ensemble_path}\n", - " \n", - " var_dict['xtc_files'] = [file for file in os.listdir(ensemble_path) if file.endswith(\".xtc\")] \n", - " var_dict['pdb_files'] = [file for file in os.listdir(ensemble_path) if file.endswith(\".pdb\") or file.endswith(\".prmtop\") or file.endswith(\".gsd\") or file.endswith(\".hdf5\") or file.endswith(\".mol2\") or file.endswith(\".hoomdxml\") or file.endswith(\".prm7\") or file.endswith(\".arc\") or file.endswith(\".parm7\") or file.endswith(\".gro\") or file.endswith(\".pdb.gz\") or file.endswith(\".pdb.gz\") or file.endswith(\".h5\") or file.endswith(\".lh5\") or file.endswith(\".psf\")]\n", - " var_dict['folders'] = [file for file in os.listdir(ensemble_path) if (os.path.isdir(\"/\".join([ensemble_path,file])) and not file.startswith('.'))]\n", - " \n", - " \n", - " \n", - " print(\"\\n----------------------------------------------------------------------------------\\n\")\n", - " print(' \\(·_·) \\(·_·)')\n", - " print(' ) )z This is CONCHA! ) )z')\n", - " print(\" / \\\\ / \\\\ \\n\")\n", - " if interactive == True:\n", - " print(\"Before launching the computation, let me check I understood everything correctly...\")\n", - " print(\"\\n----------------------------------------------------------------------------------\\n\")\n", - " \n", - " # File processing\n", - " \n", - " print(\"\".join([\"For the ensemble named \",var_dict[\"ensemble_name\"],', I found ',\n", - " str(len(var_dict[\"xtc_files\"])),' .xtc file(s), ',str(len(var_dict[\"pdb_files\"])),' .pdb file(s) and ',\n", - " str(len(var_dict[\"folders\"])),' folder(s).']))\n", - " \n", - " if len(var_dict[\"xtc_files\"]) + len(var_dict[\"folders\"]) + len(var_dict[\"pdb_files\"]) == 0:\n", - " sys.exit(\"\".join(['Folder for ', var_dict[\"ensemble_name\"], ' ensemble is empty...']))\n", - " \n", - " # .xtc file with a .pdb topology file\n", - " \n", - " if len(var_dict[\"xtc_files\"]) >= len(var_dict[\"pdb_files\"]) and len(var_dict[\"pdb_files\"]) == 1:\n", - " \n", - " if interactive == True:\n", - " print('\\nShould I interprete this input as:\\n')\n", - " else:\n", - " print('\\nI will interprete this input as:\\n')\n", - " print(\"\".join([str(var_dict[\"xtc_files\"][0]),' : trajectory of ',var_dict[\"ensemble_name\"],',']))\n", - " print(\"\".join([str(var_dict[\"pdb_files\"][0]),' : topology file of ',var_dict[\"ensemble_name\"],'.']))\n", - " if len(var_dict[\"xtc_files\"]) > 1:\n", - " print(\"\\nMore than one .xtc file were found. Taking the first as the trajectory file.\\n\")\n", - " if interactive == True:\n", - " ens_input = input('...? (y/n)')\n", - " else:\n", - " ens_input = 'y'\n", - " if ens_input == 'n':\n", - " var_dict['multiframe'] = input(\"Should I ignore .xtc files and consider the .pdb file as a multiframe file? (y/n)\")\n", - " else:\n", - " var_dict[\"do_xtc\"] = True\n", - " var_dict[\"xtc_root_path\"] = var_dict[\"ensemble_path\"]\n", - " var_dict['check_folder'] = False\n", - " \n", - " # multiframe .pdb files\n", - " \n", - " if var_dict['multiframe'] == 'y' or (len(var_dict[\"pdb_files\"]) >= 1 and len(var_dict[\"xtc_files\"]) == 0):\n", - " \n", - " if interactive == True:\n", - " print('\\nShould I interprete this input as:\\n')\n", - " else:\n", - " print('\\nI will interprete this input as:\\n') \n", - " print(\"\".join([str(var_dict[\"pdb_files\"][0]),' : trajectory of ',var_dict[\"ensemble_name\"],'.']))\n", - " \n", - " if len(var_dict[\"pdb_files\"]) > 1:\n", - " print(\"\\nMore than one multiframe .pdb file were found. Taking the first as the trajectory file.\\n\")\n", - " if interactive == True:\n", - " ens_input = input('...? (y/n)')\n", - " else: \n", - " ens_input = 'y'\n", - " \n", - " if ens_input == 'y':\n", - " print('Trajectory has been given as multiframe .pdb file, which is not supported.')\n", - " print(\"Converting file to .xtc + topology .pdb...\\n \")\n", - " if not os.path.exists(\"/\".join([var_dict[\"ensemble_path\"],'converted_files'])):\n", - " os.mkdir(\"/\".join([var_dict[\"ensemble_path\"],'converted_files']))\n", - " multiframe_pdb_to_xtc(pdb_file = \"/\".join([var_dict[\"ensemble_path\"],var_dict[\"pdb_files\"][0]]), save_path = \"/\".join([var_dict[\"ensemble_path\"],'converted_files']), prot_name = var_dict[\"pdb_files\"][0].split('.pdb')[0])\n", - " print(\"Done.\")\n", - " var_dict[\"do_xtc\"] = True\n", - " var_dict[\"xtc_root_path\"] = \"/\".join([var_dict[\"ensemble_path\"],'converted_files'])\n", - " var_dict[\"xtc_files\"] = [file for file in os.listdir(var_dict[\"xtc_root_path\"]) if file.endswith(\".xtc\")]\n", - " var_dict[\"pdb_files\"] = [file for file in os.listdir(var_dict[\"xtc_root_path\"]) if file.endswith(\".pdb\")]\n", - " var_dict['check_folder'] = False\n", - " \n", - " # folder with .pdb files\n", - " \n", - " if len(var_dict[\"folders\"]) >= 1 and var_dict['check_folder'] == True:\n", - " \n", - " if interactive:\n", - " print('\\nShould I interprete this input as:\\n')\n", - " else:\n", - " print('\\nI will interprete this input as:\\n')\n", - " print(\"\".join([var_dict[\"folders\"][0],' folder contains: trajectory of ',var_dict[\"ensemble_name\"],\".\"]))\n", - " \n", - " if len(var_dict[\"folders\"]) > 1:\n", - " print(\"\\nMore than one .pdb folder were found. Taking the first as the trajectory folder.\\n\")\n", - " \n", - " if interactive:\n", - " ens_input = input('...? (y/n)')\n", - " else:\n", - " ens_input = 'y'\n", - " if ens_input == 'y':\n", - " var_dict[\"do_pdb\"] = True\n", - " \n", - " if not var_dict[\"do_pdb\"] and not var_dict[\"do_xtc\"]:\n", - " sys.exit(\"\".join(['\\n Sorry, I did not understood the input. Please follow the guidelines described in the function documentation to create ',ensemble_name,' folder.\\n'])) \n", - " \n", - " print(\"\\n----------------------------------------------------------------------------------\\n\")\n", - " \n", - " if interactive == True:\n", - " print(\"Everything seems OK!\\n\")\n", - " print(\"\".join(['There are ',str(os.cpu_count()),' threads (cores) available.']))\n", - " n_cores = int(input(\"Please specify the number of threads (cores) you would like to use (positive integer):\"))\n", - " else:\n", - " if N_cores == 'max':\n", - " n_cores = int(os.cpu_count())\n", - " else:\n", - " n_cores = int(N_cores)\n", - " \n", - " if not os.path.exists(\"/\".join([os.path.abspath(ensemble_path),\"_\".join(['results',ensemble_name])])):\n", - " os.mkdir(\"/\".join([os.path.abspath(ensemble_path),\"_\".join(['results',ensemble_name])]))\n", - " results_path = \"/\".join([os.path.abspath(ensemble_path),\"_\".join(['results',ensemble_name])])\n", - " if os.path.exists(\"/\".join([os.path.abspath(ensemble_path),\"_\".join(['results',ensemble_name])])):\n", - " if len(os.listdir(\"/\".join([os.path.abspath(ensemble_path),\"_\".join(['results',ensemble_name])]))) == 0:\n", - " results_path = \"/\".join([os.path.abspath(ensemble_path),\"_\".join(['results',ensemble_name])])\n", - " else:\n", - " sys.exit(\"\".join(['The folder ', \"/\".join([os.path.abspath(ensemble_path),\"_\".join(['results',ensemble_name])]),' already exists and it is not empty. Please empty or delete it.'])) \n", - " \n", - " print(\"\\n----------------------------------------------------------------------------------\\n\")\n", - " print(\"3...\"); time.sleep(1); \n", - " print(\"2...\"); time.sleep(1)\n", - " print(\"1...\"); time.sleep(1)\n", - " print(\"Go!\"); time.sleep(0.2)\n", - " print(\"\\n----------------------------------------------------------------------------------\")\n", - " \n", - " # Build frames and save coordinates\n", - " \n", - " print('\\nComputing contact weights for ' + var_dict[\"ensemble_name\"] + '...\\n')\n", - " \n", - " if var_dict[\"do_xtc\"] == True:\n", - " \n", - " wcontact_matrix(thresholds, xtc_file = \"/\".join([var_dict[\"xtc_root_path\"],var_dict[\"xtc_files\"][0]]), top_file = \"/\".join([var_dict[\"xtc_root_path\"],var_dict[\"pdb_files\"][0]]),\n", - " pdb_folder = None, num_cores = n_cores, prot_name = ensemble_name, save_to = results_path,\n", - " start = var_dict[\"start\"], end = var_dict[\"end\"], \n", - " select_chain = sel_chain,\n", - " name_variable = 'ipynb.fs.full.wcontact_matrix')\n", - " \n", - " if var_dict[\"do_pdb\"] == True:\n", - " \n", - " wcontact_matrix(thresholds, xtc_file = None, top_file = None, pdb_folder = \"/\".join([var_dict[\"ensemble_path\"],var_dict[\"folders\"][0]]), num_cores = n_cores,\n", - " prot_name = ensemble_name, save_to = results_path,\n", - " start = var_dict[\"start\"], end = var_dict[\"end\"],\n", - " select_chain = sel_chain,\n", - " name_variable = 'ipynb.fs.full.wcontact_matrix')\n", - " \n", - " print(\"\\n----------------------------------------------------------------------------------\\n\")\n", - " print(\"Contact weights computed.\\n\")\n", - " print(\"Embedding data into a 2-dimensional UMAP space for visualization...\\n\")\n", - " \n", - " wcont_data = pd.read_csv(\"/\".join([results_path, \"_\".join([ensemble_name,'wcontmatrix.txt'])]), sep = ' ', header = None)\n", - "\n", - " embedding_2d = umap.UMAP(random_state = 42,\n", - " n_neighbors = 15,\n", - " min_dist = 0.1).fit_transform(wcont_data)\n", - " np.save('/'.join([results_path, \"_\".join([ensemble_name,'embedding_2d_wcont'])]), embedding_2d)\n", - "\n", - " \n", - " print(\"\".join([\"Done! Embedding data into a \",str(umap_dim),\"-dimensional UMAP space...\\n\"]))\n", - " \n", - " clusterable_embedding = umap.UMAP(\n", - " n_neighbors = 30,\n", - " min_dist = 0.0,\n", - " n_components = umap_dim,\n", - " random_state = 42\n", - " ).fit_transform(wcont_data)\n", - " np.save('/'.join([results_path, \"_\".join([ensemble_name,'clusterable_embedding_wcont'])]), clusterable_embedding)\n", - " \n", - " \n", - " print(\"\\n----------------------------------------------------------------------------------\\n\")\n", - " print(\"Done! Clustering can be performed in the low-dimensional space.\")\n", - " \n", - " " - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.16" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/CONCHA/.ipynb_checkpoints/get_contacts-checkpoint.ipynb b/CONCHA/.ipynb_checkpoints/get_contacts-checkpoint.ipynb deleted file mode 100644 index 918003b..0000000 --- a/CONCHA/.ipynb_checkpoints/get_contacts-checkpoint.ipynb +++ /dev/null @@ -1,130 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "id": "5417c4b3", - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import os\n", - "import mdtraj as md \n", - "import itertools\n", - "import pandas as pd \n", - "import warnings #Optional\n", - "warnings.filterwarnings(\"ignore\") #Optional" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "57a23fbe", - "metadata": {}, - "outputs": [], - "source": [ - "def get_contacts(coor_conf, threshold_file, assort = False):\n", - "\n", - " L = int(0.5*(1 + np.sqrt(1 + 8*np.shape(coor_conf)[0])))\n", - " pos_pairs = np.array(list(itertools.combinations(range(L), 2)))\n", - "\n", - " contact_thresholds = pd.read_csv(threshold_file, sep=\" \", header=0)\n", - " contact_thresholds['th11'] = np.deg2rad(contact_thresholds['th11'])\n", - " contact_thresholds['th12'] = np.deg2rad(contact_thresholds['th12'])\n", - " contact_thresholds['th13'] = np.deg2rad(contact_thresholds['th13'])\n", - " contact_thresholds['th21'] = np.deg2rad(contact_thresholds['th21'])\n", - " contact_thresholds['th22'] = np.deg2rad(contact_thresholds['th22'])\n", - " contact_thresholds['th23'] = np.deg2rad(contact_thresholds['th23'])\n", - " \n", - " add = pd.DataFrame(contact_thresholds)\n", - " add.columns = contact_thresholds.columns\n", - " add = add.loc[(add.AA1 - add.AA2 != 0)]\n", - " add[['AA1','AA2']] = add[['AA2','AA1']].values\n", - " contact_thresholds = pd.concat([contact_thresholds,add], ignore_index = True)\n", - " contact_thresholds['AA1'] = contact_thresholds.AA1.astype('int')\n", - " contact_thresholds['AA2'] = contact_thresholds.AA2.astype('int')\n", - " contact_thresholds['range'] = contact_thresholds.range.astype('int')\n", - " contact_thresholds['AA1-AA2-range'] = contact_thresholds.AA1.astype(str) + '-' + contact_thresholds.AA2.astype(str) + '-' + contact_thresholds.range.astype(str)\n", - " contact_thresholds = contact_thresholds[['AA1-AA2-range','delta_min','delta_max','delta', 'th11', 'th12', 'th13', 'th21', 'th22', 'th23', 'delta_se3_min', 'delta_se3_max']]\n", - " mins = np.minimum(contact_thresholds.delta_min.values,contact_thresholds.delta_max.values)\n", - " maxs = np.maximum(contact_thresholds.delta_min.values,contact_thresholds.delta_max.values)\n", - " contact_thresholds.delta_min = mins\n", - " contact_thresholds.delta_max = maxs\n", - "\n", - " coor_conf = pd.DataFrame(np.concatenate([coor_conf, pos_pairs], axis = 1),\n", - " columns = ['coor_x','coor_y','coor_z','or1_x','or1_y','or1_z','or2_x','or2_y','or2_z','AA1','AA2','pos1','pos2'])\n", - " coor_conf.range = np.abs(coor_conf['pos1'] - coor_conf['pos2'])\n", - " coor_conf.range = (coor_conf.range*(coor_conf.range<5) + 5*(coor_conf.range>=5)).astype('int') \n", - " coor_conf['AA1'] = coor_conf.AA1.astype('int')\n", - " coor_conf['AA2'] = coor_conf.AA2.astype('int')\n", - " coor_conf['AA1-AA2-range'] = coor_conf.AA1.astype(str) + '-' + coor_conf.AA2.astype(str) + '-' + coor_conf.range.astype(str)\n", - " \n", - " #coor_conf = coor_conf.join(vaex.from_pandas(contact_thresholds), left_on = 'AA1-AA2-range', right_on = 'AA1-AA2-range', how = 'left')\n", - " coor_conf = coor_conf.merge(contact_thresholds, on = 'AA1-AA2-range', how = 'left')\n", - " \n", - " # For range <= 4, we correct distance by admissible orientations\n", - " coor_conf['min_th1'] = np.nanmin([np.abs(np.arccos(coor_conf['or1_x']) - coor_conf['th11']),\n", - " np.abs(np.arccos(coor_conf['or1_x']) - coor_conf['th12']), \n", - " np.abs(np.arccos(coor_conf['or1_x']) - coor_conf['th13'])], axis = 0)\n", - " coor_conf['min_th2'] = np.nanmin([np.abs(np.arccos(coor_conf['or2_z']) - coor_conf['th21']),\n", - " np.abs(np.arccos(coor_conf['or2_z']) - coor_conf['th22']), \n", - " np.abs(np.arccos(coor_conf['or2_z']) - coor_conf['th23'])], axis = 0) \n", - " coor_conf['dis_th1'] = 0.5*(np.sin(coor_conf.min_th1)**2*(coor_conf.min_th1 < np.pi/2) + (1 + np.cos(coor_conf.min_th1)**2)*(coor_conf.min_th1 >= np.pi/2))\n", - " coor_conf['dis_th2'] = 0.5*(np.sin(coor_conf.min_th2)**2*(coor_conf.min_th2 < np.pi/2) + (1 + np.cos(coor_conf.min_th2)**2)*(coor_conf.min_th2 >= np.pi/2))\n", - " \n", - " alpha = beta = 0.5\n", - " coor_conf['dis_r3'] = np.sqrt(coor_conf.coor_x**2 + coor_conf.coor_y**2 + coor_conf.coor_z**2)\n", - " coor_conf['dis_or'] = np.sqrt(alpha*coor_conf['dis_th1']**2 + beta*coor_conf['dis_th1']**2)\n", - " \n", - " argtanh = lambda x: 0.5*np.log((1+x)/(1-x))\n", - " coor_conf[coor_conf.delta_min < 2].delta_min = 2 \n", - " coor_conf[coor_conf.delta_max <= 2].delta_max = 3 \n", - " coor_conf['d'] = np.log(argtanh(1/coor_conf.delta_min))/np.log(coor_conf.delta_min/coor_conf.delta_max)\n", - " coor_conf['w_or_pos'] = 1-np.tanh((coor_conf.dis_r3/coor_conf.delta_max)**coor_conf.d)\n", - " coor_conf['a'] = 0.5*np.sqrt(argtanh(1-coor_conf.w_or_pos))\n", - " coor_conf['w_or_or'] = 1-np.tanh((2*(coor_conf.dis_or+coor_conf.a))**2)\n", - " coor_conf[coor_conf.w_or_pos == 0].w_or_or = 0\n", - " coor_conf['dis_or'] = coor_conf['dis_or'].fillna(0)\n", - " coor_conf['w_or_or'] = coor_conf['w_or_or'].fillna(0)\n", - " coor_conf['dis_se3'] = np.sqrt((1-coor_conf.w_or_or)**2*coor_conf.dis_r3**2 + coor_conf.w_or_or**2*coor_conf.dis_or**2) \n", - " \n", - " coor_conf.delta_se3_min[coor_conf['delta_se3_min'].isnull()] = coor_conf[coor_conf['delta_se3_min'].isnull()]['delta_min']\n", - " coor_conf.delta_se3_min[coor_conf['delta_se3_min'] < 2] = 2 \n", - " coor_conf.delta_se3_max[coor_conf['delta_se3_max'].isnull()] = coor_conf[coor_conf['delta_se3_max'].isnull()]['delta_max']\n", - " coor_conf.delta_se3_max[coor_conf['delta_se3_max'] <= 2] = 3 \n", - " coor_conf['d_se3'] = np.log(argtanh(1/coor_conf.delta_se3_min))/np.log(coor_conf.delta_se3_min/coor_conf.delta_se3_max)\n", - " coor_conf['w_dis_se3'] = 1-np.tanh((coor_conf.dis_se3/coor_conf.delta_se3_max)**coor_conf.d_se3)\n", - " \n", - " coor_conf = coor_conf[['pos1','pos2','w_dis_se3','AA1', 'AA2']]\n", - " coor_conf['pos1'] = coor_conf.pos1.astype('int') + 1\n", - " coor_conf['pos2'] = coor_conf.pos2.astype('int') + 1 \n", - " \n", - " if assort:\n", - " return coor_conf\n", - " else:\n", - " return coor_conf.w_dis_se3" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.16" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/CONCHA/.ipynb_checkpoints/get_coordinates-checkpoint.ipynb b/CONCHA/.ipynb_checkpoints/get_coordinates-checkpoint.ipynb deleted file mode 100644 index 4c935ad..0000000 --- a/CONCHA/.ipynb_checkpoints/get_coordinates-checkpoint.ipynb +++ /dev/null @@ -1,192 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "id": "a3eaebb3", - "metadata": {}, - "outputs": [], - "source": [ - "# Load required libraries\n", - "import numpy as np\n", - "import os\n", - "import mdtraj as md \n", - "import itertools\n", - "import pandas as pd \n", - "from Bio import PDB \n", - "import warnings #Optional\n", - "warnings.filterwarnings(\"ignore\") #Optional" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "23d430e2", - "metadata": {}, - "outputs": [], - "source": [ - "def get_coordinates(conf_name, pdb = None, traj = None, res_start = None, res_end = None, which_chain = None):\n", - " \n", - " aa_list = list([\"ALA\", \"ARG\", \"ASN\", \"ASP\", \"CYS\", \"GLN\", \"GLU\",\"GLY\",\"HIS\", \"ILE\", \"LEU\", \"LYS\", \"MET\", \"PHE\",\"PRO\", \"SER\", \"THR\", \"TRP\", \"TYR\", \"VAL\"])\n", - " \n", - " parser = PDB.PDBParser()\n", - "\n", - " def get_structure(conf_name, conf_path, chain_id = None): \n", - " \n", - " os.chdir(conf_path)\n", - " struct = parser.get_structure('prot',conf_name)\n", - " \n", - " coor_x=list()\n", - " coor_y=list()\n", - " coor_z=list()\n", - " model_list=list()\n", - " chain_list=list()\n", - " residue_list=list()\n", - " atom_list=list()\n", - " position_list=list()\n", - " \n", - " for model in struct:\n", - " for chain in model: \n", - " for residue in chain:\n", - " for atom in residue:\n", - " x,y,z = atom.get_coord()\n", - " coor_x.append(x)\n", - " coor_y.append(y)\n", - " coor_z.append(z)\n", - " model_list.append(1+model.id)\n", - " chain_list.append(chain.id)\n", - " residue_list.append(residue.get_resname())\n", - " atom_list.append(atom.id)\n", - " position_list.append(residue.get_full_id()[3][1])\n", - " \n", - " data = {'Model': model_list,\n", - " 'Chain': chain_list,\n", - " 'Residue': residue_list,\n", - " 'Atom': atom_list,\n", - " 'Position': position_list,\n", - " 'coor_x': coor_x,\n", - " 'coor_y': coor_y,\n", - " 'coor_z': coor_z\n", - " }\n", - " \n", - " df = pd.DataFrame (data, columns = ['Model','Chain','Residue','Atom','Position','coor_x','coor_y','coor_z'],index=None)\n", - " df = df[df.Model == df.Model[0]] # Keep just one model\n", - " if chain_id is None:\n", - " df = df[df.Chain == df.Chain[0]] # Keep just one chain\n", - " else:\n", - " df = df[df.Chain == chain_id] # Keep just one chain\n", - " \n", - " # Check for HIE\n", - " df = df.replace('HIE', 'HIS')\n", - " \n", - " return df\n", - " \n", - " if traj is None and pdb is not None:\n", - " \n", - " df = get_structure(conf_name, conf_path = pdb, chain_id = which_chain)\n", - " # Remove residues not in aa_list\n", - " df = df.loc[df[\"Residue\"].isin(aa_list),]\n", - " L = len(np.unique(df.Position))\n", - " \n", - " elif pdb is None and traj is not None:\n", - " \n", - " traj = traj[conf_name]\n", - " top_table = traj.top.to_dataframe()[0]\n", - " df = pd.concat([top_table, pd.DataFrame(traj.xyz[0], columns = np.array(['x','y','z']))], axis = 1)\n", - " df = df[['segmentID','chainID','resName','name','resSeq','x','y','z']]\n", - " df.columns = ['Model','Chain','Residue','Atom','Position','coor_x','coor_y','coor_z']\n", - " # Remove residues not in aa_list\n", - " df = df.loc[df[\"Residue\"].isin(aa_list),]\n", - " L = len(np.unique(df.Position))\n", - " \n", - " # Correct division by 10\n", - " df.loc[:,['coor_x', 'coor_y', 'coor_z']] = df.loc[:,['coor_x', 'coor_y', 'coor_z']]*10\n", - " \n", - " if res_start is not None:\n", - " \n", - " df = df.loc[df.Position >= res_start]\n", - " L = len(np.unique(df.Position))\n", - " \n", - " if res_end is not None:\n", - " \n", - " df = df.loc[df.Position <= res_end]\n", - " L = len(np.unique(df.Position))\n", - " \n", - " # Build reference systems\n", - " \n", - " basis_angles = np.array([1.917213, 1.921843, 2.493444])\n", - " b = np.array([np.cos(basis_angles[0]), np.cos(basis_angles[1]), np.cos(basis_angles[2])]).T\n", - " \n", - " # 1. Definition of the reference frame on every sequence position\n", - "\n", - " CA_coor = df.loc[ (df.Atom == 'CA') , ['coor_x','coor_y','coor_z']].to_numpy() # CA coordinates \n", - " N_coor = df.loc[ (df.Atom == 'N') , ['coor_x','coor_y','coor_z']].to_numpy() # N coordinates \n", - " C_coor = df.loc[ (df.Atom == 'C') , ['coor_x','coor_y','coor_z']].to_numpy() # C coordinates \n", - " \n", - " N_CA_coor = N_coor - CA_coor; N_CA_coor = N_CA_coor / np.linalg.norm(N_CA_coor, axis = 1)[:, None]\n", - " C_CA_coor = C_coor - CA_coor; C_CA_coor = C_CA_coor / np.linalg.norm(C_CA_coor, axis = 1)[:, None]\n", - " CxN_coor = np.cross(C_CA_coor, N_CA_coor); CxN_coor = CxN_coor / np.linalg.norm(CxN_coor, axis = 1)[:, None]\n", - " \n", - " A_list = np.concatenate([N_CA_coor,C_CA_coor,CxN_coor], axis = 1)\n", - " A_list = np.reshape(A_list, [np.shape(A_list)[0]*3, 3])\n", - " A_list = [A_list[i:(i+3),:] for i in 3*np.arange(np.shape(N_CA_coor)[0])]\n", - "\n", - " CB_coor = np.linalg.solve(A_list, [b for i in np.arange(len(A_list))]) # Virtual CB coordinates\n", - " \n", - " # Reference frames \n", - " \n", - " b1_coor = CB_coor / np.linalg.norm(CB_coor, axis = 1)[:, None] # b1 = CA-CB\n", - " CN_coor = N_CA_coor - C_CA_coor # CN\n", - " b2_coor = np.cross(CN_coor, b1_coor); b2_coor = b2_coor / np.linalg.norm(b2_coor, axis = 1)[:, None] # b2 = b1 x CN\n", - " b3_coor = np.cross(b1_coor, b2_coor); b3_coor = b3_coor / np.linalg.norm(b3_coor, axis = 1)[:, None] # b3 = b1 x b2 = CN for a perfect tetrahedron\n", - " \n", - " P_list = np.concatenate([b1_coor, b2_coor, b3_coor], axis = 1)\n", - " P_list = np.reshape(P_list, [np.shape(P_list)[0]*3, 3]).T\n", - " P_list = [P_list[:,i:(i+3)] for i in 3*np.arange(np.shape(b1_coor)[0])]\n", - " P_list = np.linalg.inv(P_list) # Change-of-basis matrix for each position\n", - " \n", - " positions = df.loc[ ((df.Atom =='CB') & (df.Residue!='GLY')) | ((df.Atom =='CA') & (df.Residue=='GLY')), ['coor_x','coor_y','coor_z']]\n", - " \n", - " pos_pairs = np.array(list(itertools.combinations(range(L), 2)))\n", - " P_list_pairs = [P_list[i] for i in pos_pairs[:,0]]\n", - " positions_pairs = positions.to_numpy()[pos_pairs[:,1],:] - positions.to_numpy()[pos_pairs[:,0],:]\n", - " or1_pairs = b1_coor[pos_pairs[:,1],:]\n", - " or2_pairs = b3_coor[pos_pairs[:,1],:]\n", - " \n", - " relative_pairwise_positions = np.einsum('ij,ikj->ik',positions_pairs, P_list_pairs)\n", - " relative_pairwise_or1 = np.einsum('ij,ikj->ik', or1_pairs, P_list_pairs)\n", - " relative_pairwise_or2 = np.einsum('ij,ikj->ik', or2_pairs, P_list_pairs)\n", - " \n", - " aa_seq = df.Residue[df.Atom == 'CA'].to_numpy()\n", - " d = {item: idx for idx, item in enumerate(aa_list)}\n", - " aa_index = np.array([d.get(item) for item in aa_seq])\n", - " aa_pairs = np.concatenate([aa_index[pos_pairs[:,0]][:,None],aa_index[pos_pairs[:,1]][:, None]], axis = 1)\n", - " positions_and_frames = np.concatenate([relative_pairwise_positions, relative_pairwise_or1,\n", - " relative_pairwise_or2, aa_pairs], axis = 1) \n", - " \n", - " return positions_and_frames" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.16" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/CONCHA/.ipynb_checkpoints/utils-checkpoint.ipynb b/CONCHA/.ipynb_checkpoints/utils-checkpoint.ipynb deleted file mode 100644 index 2b9aa68..0000000 --- a/CONCHA/.ipynb_checkpoints/utils-checkpoint.ipynb +++ /dev/null @@ -1,455 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 7, - "id": "f76b47d6", - "metadata": {}, - "outputs": [], - "source": [ - "import mdtraj as md\n", - "import os\n", - "import numpy as np\n", - "from ipywidgets import widgets\n", - "import matplotlib.pyplot as plt\n", - "import seaborn as sns\n", - "import pandas as pd\n", - "import itertools\n", - "from tqdm import tqdm " - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "6f669fe3", - "metadata": {}, - "outputs": [], - "source": [ - "def get_cluster_files(ensemble_path, ensemble_name, clustering_partition):\n", - " \n", - " # Initial parameters\n", - " var_dict = {'multiframe' : 'n', 'check_folder' : True, 'do_xtc' : False, 'do_pdb' : False,\n", - " 'ensemble_name' : ensemble_name, 'ensemble_path' : ensemble_path}\n", - " \n", - " var_dict['xtc_files'] = [file for file in os.listdir(ensemble_path) if file.endswith(\".xtc\")] \n", - " var_dict['pdb_files'] = [file for file in os.listdir(ensemble_path) if file.endswith(\".pdb\") or file.endswith(\".prmtop\") or file.endswith(\".parm7\") or file.endswith(\".gro\")]\n", - " var_dict['folders'] = [file for file in os.listdir(ensemble_path) if (os.path.isdir(\"/\".join([ensemble_path,file])) and not file.startswith('.') and not file.startswith(\"results\"))]\n", - " \n", - " # File processing\n", - " \n", - " if len(var_dict[\"xtc_files\"]) + len(var_dict[\"folders\"]) + len(var_dict[\"pdb_files\"]) == 0:\n", - " sys.exit(\"\".join(['Folder for ', var_dict[\"ensemble_name\"], ' ensemble is empty...']))\n", - " \n", - " # .xtc file with a .pdb topology file\n", - " \n", - " if len(var_dict[\"xtc_files\"]) >= len(var_dict[\"pdb_files\"]) and len(var_dict[\"pdb_files\"]) == 1:\n", - "\n", - " print('\\nTaking as input:\\n')\n", - " print(\"\".join([str(var_dict[\"xtc_files\"][0]),' : trajectory of ',var_dict[\"ensemble_name\"],',']))\n", - " print(\"\".join([str(var_dict[\"pdb_files\"][0]),' : topology file of ',var_dict[\"ensemble_name\"],'.']))\n", - " if len(var_dict[\"xtc_files\"]) > 1:\n", - " print(\"\\nMore than one .xtc file were found. Taking the first as the trajectory file.\\n\")\n", - " var_dict[\"do_xtc\"] = True\n", - " var_dict[\"xtc_root_path\"] = var_dict[\"ensemble_path\"]\n", - " var_dict['check_folder'] = False\n", - " \n", - " # multiframe .pdb files\n", - " \n", - " if var_dict['multiframe'] == 'y' or (len(var_dict[\"pdb_files\"]) >= 1 and len(var_dict[\"xtc_files\"]) == 0):\n", - " \n", - " print('\\nTaking as input:\\n') \n", - " print(\"\".join([str(var_dict[\"pdb_files\"][0]),' : trajectory of ',var_dict[\"ensemble_name\"],'.']))\n", - " if len(var_dict[\"pdb_files\"]) > 1:\n", - " print(\"\\nMore than one multiframe .pdb file were found. Taking the first as the trajectory file.\\n\")\n", - " print(\"\\nTaking the previously converted files.\\n\")\n", - " var_dict[\"do_xtc\"] = True\n", - " var_dict[\"xtc_root_path\"] = \"/\".join([var_dict[\"ensemble_path\"],'converted_files'])\n", - " var_dict[\"xtc_files\"] = [file for file in os.listdir(var_dict[\"xtc_root_path\"]) if file.endswith(\".xtc\")]\n", - " var_dict[\"pdb_files\"] = [file for file in os.listdir(var_dict[\"xtc_root_path\"]) if file.endswith(\".pdb\")]\n", - " var_dict['check_folder'] = False\n", - " \n", - " # folder with .pdb files\n", - " \n", - " if len(var_dict[\"folders\"]) >= 1 and var_dict['check_folder'] == True:\n", - " \n", - " print('\\nTaking as input:\\n')\n", - " print(\"\".join([var_dict[\"folders\"][0],' folder contains: trajectory of ',var_dict[\"ensemble_name\"],\".\"]))\n", - " if len(var_dict[\"folders\"]) > 1:\n", - " print(\"\\nMore than one .pdb folder were found. Taking the first as the trajectory folder.\\n\")\n", - " var_dict[\"do_pdb\"] = True\n", - " \n", - " if not var_dict[\"do_pdb\"] and not var_dict[\"do_xtc\"]:\n", - " sys.exit(\"\".join(['\\n Sorry, I did not understood the input. Please follow the guidelines described in the function documentation to create ',ensemble_name,' folder.\\n'])) \n", - " \n", - " print(\"\\n----------------------------------------------------------------------------------\\n\")\n", - " print(\"\\nCreating cluster-specific files...\\n\")\n", - " \n", - " results_path = \"/\".join([os.path.abspath(ensemble_path),\"_\".join(['results',ensemble_name])])\n", - " save_files = \"/\".join([results_path, \"cluster_files\"])\n", - " if not os.path.exists(save_files):\n", - " os.mkdir(save_files)\n", - " \n", - " if var_dict[\"do_xtc\"]:\n", - " \n", - " traj_file = md.load_xtc(\"/\".join([var_dict[\"xtc_root_path\"],var_dict[\"xtc_files\"][0]]), top = \"/\".join([var_dict[\"xtc_root_path\"],var_dict[\"pdb_files\"][0]]))\n", - " \n", - " # Save .xtc cluster files\n", - " for k in tqdm(range(len(np.unique(clustering_partition[clustering_partition >= 0])))):\n", - " traj_file[np.where(clustering_partition == k)].save_xtc(\"/\".join([save_files, \"\".join([ensemble_name,'_',str(k),'.xtc'])]))\n", - "\n", - " if var_dict[\"do_pdb\"]:\n", - " \n", - " conf_list = os.listdir(\"/\".join([var_dict[\"ensemble_path\"],var_dict[\"folders\"][0]]))\n", - "\n", - " for k in tqdm(range(len(np.unique(clustering_partition[clustering_partition >= 0])))):\n", - " clus_k_path = \"/\".join([save_path, \"_\".join(['clus',str(k)])])\n", - " if not os.path.exists(clus_k_path):\n", - " os.mkdir(clus_k_path)\n", - " \n", - " clus_k = np.where(clustering_partition == k)[0]\n", - " for j in range(len(clus_k)):\n", - " traj = md.load_pdb(\"/\".join([\"/\".join([var_dict[\"ensemble_path\"],var_dict[\"folders\"][0]]),conf_list[clus_k[j]]]))\n", - " traj.save_pdb(\"/\".join([clus_k_path, \"\".join([ensemble_name,'_',str(clus_k[j]),'.pdb'])]))\n", - "\n", - " print(\"\\nFiles saved.\\n\") " - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "ff9c4941", - "metadata": {}, - "outputs": [], - "source": [ - "def plot_2umap(embedding_2d, clustering_partition, ensemble_name, results_path):\n", - " \n", - " classified = np.where(labels_umap >= 0)[0]\n", - " \n", - " output1 = widgets.Output()\n", - " with output1:\n", - " fig, ax = plt.subplots()\n", - " ax.scatter(embedding_2d[~classified, 0],\n", - " embedding_2d[~classified, 1],\n", - " color=(0.5, 0.5, 0.5),\n", - " s=0.5,\n", - " alpha=0.5)\n", - " scatter = ax.scatter(embedding_2d[classified, 0],\n", - " embedding_2d[classified, 1],\n", - " c=labels_umap[classified],\n", - " s=0.5,\n", - " alpha = 1,\n", - " cmap='Spectral')\n", - " plt.xlabel('UMAP coordinate 1')\n", - " plt.ylabel('UMAP coordinate 2')\n", - " plt.title(\"\".join(['UMAP 2-dimensional projection after contact clustering for ',ensemble_name,' ensemble']), fontsize = 8)\n", - " plt.savefig(\"/\".join([results_path, \"\".join([\"clusters_2d\", ensemble_name, '.png'])]), dpi = 199)\n", - " plt.show()\n", - "\n", - " output2 = widgets.Output()\n", - " with output2:\n", - " repartition = pd.Series(labels_umap).value_counts()\n", - " repartition.index = [\"Unclassified\" if i == -1 else i for i in repartition.index]\n", - " display(pd.DataFrame({\"Cluster\" : np.array(repartition.index), \"Occupancy (%)\" : 100*np.array(repartition.values)/len(labels_umap)}))\n", - " two_columns = widgets.HBox([output1, output2])\n", - " display(two_columns)" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "00dfbe24", - "metadata": {}, - "outputs": [], - "source": [ - "def get_wmaps(wcont_data, clustering_partition, ensemble_name, results_path):\n", - " \n", - " \n", - " maps_path = \"/\".join([results_path,\"wcont_maps\"]) # Path to save files\n", - " if not os.path.exists(maps_path): # Create if doesn't exist\n", - " os.mkdir(maps_path)\n", - " \n", - " L = int(0.5*(1+np.sqrt(1+8*wcont_data.shape[1]))) # Sequence length\n", - " list_pos = np.asarray(list(itertools.combinations(range(1,L+1), 2))) # List of position pairs\n", - " \n", - " repartition = pd.Series(labels_umap).value_counts() # Clustering partition\n", - " repartition.index = [\"Unclassified\" if i == -1 else i for i in repartition.index]\n", - " \n", - " for cluster in tqdm(repartition.index.drop('Unclassified')): \n", - " \n", - " # Cluster-specific w-contact matrix\n", - " prop_cluster = round(pd.Series(labels_umap).value_counts().sort_index()[cluster]/np.shape(wcont_data)[0]*100,2)\n", - " cont_matrix = pd.DataFrame(np.concatenate([list_pos,np.asarray([wcont_data.loc[labels_umap == cluster,].mean()]).T], axis = 1), columns=['pos1','pos2','cp'])\n", - " cont_matrix.pos1 = cont_matrix.pos1.astype(int)\n", - " cont_matrix.pos2 = cont_matrix.pos2.astype(int)\n", - " cont_matrix = cont_matrix.pivot(index='pos1',columns='pos2',values='cp')\n", - "\n", - " fig = plt.figure()\n", - " res = sns.heatmap(cont_matrix.T, cmap='Reds',square=True, cbar_kws={\"shrink\": .5,'label':\"Contact weight average\"})\n", - " plt.suptitle(\" \".join([ensemble_name,'contact-based clustering']), fontsize=10)\n", - " plt.title(\"\".join(['Cluster #',str(cluster),' with ',str(prop_cluster),'% of occupation']), fontsize = 8)\n", - "\n", - " plt.xlabel('Sequence position')\n", - " plt.ylabel('Sequence position')\n", - " plt.xticks(rotation=0) \n", - " res.set_xticklabels(res.get_xmajorticklabels(), fontsize = 6)\n", - " res.set_yticklabels(res.get_ymajorticklabels(), fontsize = 6)\n", - " plt.savefig(\"/\".join([maps_path,\"\".join([ensemble_name,'_',str(cluster),'.png'])]), dpi=199) # Save figure in maps_path" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "a4c98c97", - "metadata": {}, - "outputs": [], - "source": [ - "def representative_ensemble(size, ensemble_path, ensemble_name, clustering_partition):\n", - " \n", - " # Initial parameters\n", - " var_dict = {'multiframe' : 'n', 'check_folder' : True, 'do_xtc' : False, 'do_pdb' : False,\n", - " 'ensemble_name' : ensemble_name, 'ensemble_path' : ensemble_path}\n", - " \n", - " var_dict['xtc_files'] = [file for file in os.listdir(ensemble_path) if file.endswith(\".xtc\")] \n", - " var_dict['pdb_files'] = [file for file in os.listdir(ensemble_path) if file.endswith(\".pdb\") or file.endswith(\".prmtop\") or file.endswith(\".parm7\") or file.endswith(\".gro\")]\n", - " var_dict['folders'] = [file for file in os.listdir(ensemble_path) if (os.path.isdir(\"/\".join([ensemble_path,file])) and not file.startswith('.') and not file.startswith(\"results\"))]\n", - " \n", - " # File processing\n", - " \n", - " if len(var_dict[\"xtc_files\"]) + len(var_dict[\"folders\"]) + len(var_dict[\"pdb_files\"]) == 0:\n", - " sys.exit(\"\".join(['Folder for ', var_dict[\"ensemble_name\"], ' ensemble is empty...']))\n", - " \n", - " # .xtc file with a .pdb topology file\n", - " \n", - " if len(var_dict[\"xtc_files\"]) >= len(var_dict[\"pdb_files\"]) and len(var_dict[\"pdb_files\"]) == 1:\n", - "\n", - " print('\\nTaking as input:\\n')\n", - " print(\"\".join([str(var_dict[\"xtc_files\"][0]),' : trajectory of ',var_dict[\"ensemble_name\"],',']))\n", - " print(\"\".join([str(var_dict[\"pdb_files\"][0]),' : topology file of ',var_dict[\"ensemble_name\"],'.']))\n", - " if len(var_dict[\"xtc_files\"]) > 1:\n", - " print(\"\\nMore than one .xtc file were found. Taking the first as the trajectory file.\\n\")\n", - " var_dict[\"do_xtc\"] = True\n", - " var_dict[\"xtc_root_path\"] = var_dict[\"ensemble_path\"]\n", - " var_dict['check_folder'] = False\n", - " \n", - " # multiframe .pdb files\n", - " \n", - " if var_dict['multiframe'] == 'y' or (len(var_dict[\"pdb_files\"]) >= 1 and len(var_dict[\"xtc_files\"]) == 0):\n", - " \n", - " print('\\nTaking as input:\\n') \n", - " print(\"\".join([str(var_dict[\"pdb_files\"][0]),' : trajectory of ',var_dict[\"ensemble_name\"],'.']))\n", - " if len(var_dict[\"pdb_files\"]) > 1:\n", - " print(\"\\nMore than one multiframe .pdb file were found. Taking the first as the trajectory file.\\n\")\n", - " print(\"\\nTaking the previously converted files.\\n\")\n", - " var_dict[\"do_xtc\"] = True\n", - " var_dict[\"xtc_root_path\"] = \"/\".join([var_dict[\"ensemble_path\"],'converted_files'])\n", - " var_dict[\"xtc_files\"] = [file for file in os.listdir(var_dict[\"xtc_root_path\"]) if file.endswith(\".xtc\")]\n", - " var_dict[\"pdb_files\"] = [file for file in os.listdir(var_dict[\"xtc_root_path\"]) if file.endswith(\".pdb\")]\n", - " var_dict['check_folder'] = False\n", - " \n", - " # folder with .pdb files\n", - " \n", - " if len(var_dict[\"folders\"]) >= 1 and var_dict['check_folder'] == True:\n", - " \n", - " print('\\nTaking as input:\\n')\n", - " print(\"\".join([var_dict[\"folders\"][0],' folder contains: trajectory of ',var_dict[\"ensemble_name\"],\".\"]))\n", - " if len(var_dict[\"folders\"]) > 1:\n", - " print(\"\\nMore than one .pdb folder were found. Taking the first as the trajectory folder.\\n\")\n", - " var_dict[\"do_pdb\"] = True\n", - " \n", - " if not var_dict[\"do_pdb\"] and not var_dict[\"do_xtc\"]:\n", - " sys.exit(\"\".join(['\\n Sorry, I did not understood the input. Please follow the guidelines described in the function documentation to create ',ensemble_name,' folder.\\n'])) \n", - " \n", - " print(\"\\n----------------------------------------------------------------------------------\\n\")\n", - " print(\"\\nSampling representative family...\\n\")\n", - " \n", - " repartition = pd.Series(clustering_partition).value_counts() # Clustering partition\n", - " repartition.index = [\"Unclassified\" if i == -1 else i for i in repartition.index]\n", - " repartition = repartition.drop(\"Unclassified\")\n", - " probas = repartition.values/np.sum(repartition.values)\n", - "\n", - " selected_conf = np.zeros(size)\n", - " for i in range(size):\n", - "\n", - " choose_cluster = np.random.choice(repartition.index, size = 1, p = probas)[0]\n", - " selected_conf[i] = np.random.choice(np.where(clustering_partition == choose_cluster)[0], size = 1)[0]\n", - " \n", - " selected_conf = np.ndarray.astype(selected_conf, int)\n", - " results_path = \"/\".join([os.path.abspath(ensemble_path),\"_\".join(['results',ensemble_name])])\n", - " save_files = \"/\".join([results_path, \"representative_family\"])\n", - " if not os.path.exists(save_files):\n", - " os.mkdir(save_files)\n", - " \n", - " if var_dict[\"do_xtc\"]:\n", - " \n", - " traj_file = md.load_xtc(\"/\".join([var_dict[\"xtc_root_path\"],var_dict[\"xtc_files\"][0]]), top = \"/\".join([var_dict[\"xtc_root_path\"],var_dict[\"pdb_files\"][0]]))\n", - " \n", - " # Save .xtc file\n", - " traj_file[selected_conf].save_xtc(\"/\".join([save_files, \"\".join([ensemble_name,'_repfam.xtc'])]))\n", - "\n", - " if var_dict[\"do_pdb\"]:\n", - " \n", - " # Save pdb folder\n", - " for j in selected_conf:\n", - " traj = md.load_pdb(\"/\".join([\"/\".join([var_dict[\"ensemble_path\"],var_dict[\"folders\"][0]]),j]))\n", - " traj.save_pdb(\"/\".join([save_files, \"\".join([ensemble_name,'_',str(j),'.pdb'])]))\n", - "\n", - " print(\"\\nFiles saved.\\n\") " - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "id": "7069e00f", - "metadata": {}, - "outputs": [], - "source": [ - "def cluster_descriptors(ensemble_path, ensemble_name, clustering_partition):\n", - " \n", - " # Initial parameters\n", - " var_dict = {'multiframe' : 'n', 'check_folder' : True, 'do_xtc' : False, 'do_pdb' : False,\n", - " 'ensemble_name' : ensemble_name, 'ensemble_path' : ensemble_path}\n", - " \n", - " var_dict['xtc_files'] = [file for file in os.listdir(ensemble_path) if file.endswith(\".xtc\")] \n", - " var_dict['pdb_files'] = [file for file in os.listdir(ensemble_path) if file.endswith(\".pdb\") or file.endswith(\".prmtop\") or file.endswith(\".parm7\") or file.endswith(\".gro\")]\n", - " var_dict['folders'] = [file for file in os.listdir(ensemble_path) if (os.path.isdir(\"/\".join([ensemble_path,file])) and not file.startswith('.') and not file.startswith(\"results\"))]\n", - " \n", - " # File processing\n", - " \n", - " if len(var_dict[\"xtc_files\"]) + len(var_dict[\"folders\"]) + len(var_dict[\"pdb_files\"]) == 0:\n", - " sys.exit(\"\".join(['Folder for ', var_dict[\"ensemble_name\"], ' ensemble is empty...']))\n", - " \n", - " # .xtc file with a .pdb topology file\n", - " \n", - " if len(var_dict[\"xtc_files\"]) >= len(var_dict[\"pdb_files\"]) and len(var_dict[\"pdb_files\"]) == 1:\n", - "\n", - " print('\\nTaking as input:\\n')\n", - " print(\"\".join([str(var_dict[\"xtc_files\"][0]),' : trajectory of ',var_dict[\"ensemble_name\"],',']))\n", - " print(\"\".join([str(var_dict[\"pdb_files\"][0]),' : topology file of ',var_dict[\"ensemble_name\"],'.']))\n", - " if len(var_dict[\"xtc_files\"]) > 1:\n", - " print(\"\\nMore than one .xtc file were found. Taking the first as the trajectory file.\\n\")\n", - " var_dict[\"do_xtc\"] = True\n", - " var_dict[\"xtc_root_path\"] = var_dict[\"ensemble_path\"]\n", - " var_dict['check_folder'] = False\n", - " \n", - " # multiframe .pdb files\n", - " \n", - " if var_dict['multiframe'] == 'y' or (len(var_dict[\"pdb_files\"]) >= 1 and len(var_dict[\"xtc_files\"]) == 0):\n", - " \n", - " print('\\nTaking as input:\\n') \n", - " print(\"\".join([str(var_dict[\"pdb_files\"][0]),' : trajectory of ',var_dict[\"ensemble_name\"],'.']))\n", - " if len(var_dict[\"pdb_files\"]) > 1:\n", - " print(\"\\nMore than one multiframe .pdb file were found. Taking the first as the trajectory file.\\n\")\n", - " print(\"\\nTaking the previously converted files.\\n\")\n", - " var_dict[\"do_xtc\"] = True\n", - " var_dict[\"xtc_root_path\"] = \"/\".join([var_dict[\"ensemble_path\"],'converted_files'])\n", - " var_dict[\"xtc_files\"] = [file for file in os.listdir(var_dict[\"xtc_root_path\"]) if file.endswith(\".xtc\")]\n", - " var_dict[\"pdb_files\"] = [file for file in os.listdir(var_dict[\"xtc_root_path\"]) if file.endswith(\".pdb\")]\n", - " var_dict['check_folder'] = False\n", - " \n", - " # folder with .pdb files\n", - " \n", - " if len(var_dict[\"folders\"]) >= 1 and var_dict['check_folder'] == True:\n", - " \n", - " print('\\nTaking as input:\\n')\n", - " print(\"\".join([var_dict[\"folders\"][0],' folder contains: trajectory of ',var_dict[\"ensemble_name\"],\".\"]))\n", - " if len(var_dict[\"folders\"]) > 1:\n", - " print(\"\\nMore than one .pdb folder were found. Taking the first as the trajectory folder.\\n\")\n", - " var_dict[\"do_pdb\"] = True\n", - " \n", - " if not var_dict[\"do_pdb\"] and not var_dict[\"do_xtc\"]:\n", - " sys.exit(\"\".join(['\\n Sorry, I did not understood the input. Please follow the guidelines described in the function documentation to create ',ensemble_name,' folder.\\n'])) \n", - " \n", - " print(\"\\n----------------------------------------------------------------------------------\\n\")\n", - " print(\"\\nComputing cluster-specific descriptors...\\n\")\n", - " \n", - " results_path = \"/\".join([os.path.abspath(ensemble_path),\"_\".join(['results',ensemble_name])])\n", - " save_files = \"/\".join([results_path, \"cluster_descriptors\"])\n", - " if not os.path.exists(save_files):\n", - " os.mkdir(save_files)\n", - " \n", - " if var_dict[\"do_xtc\"]:\n", - " \n", - " traj_file = md.load_xtc(\"/\".join([var_dict[\"xtc_root_path\"],var_dict[\"xtc_files\"][0]]), top = \"/\".join([var_dict[\"xtc_root_path\"],var_dict[\"pdb_files\"][0]]))\n", - " L = traj_file.n_residues\n", - " Nconf = traj_file.n_frames\n", - " \n", - " dssp_types = ['H','B','E','G','I','T','S',' ']\n", - " prop_dssp = np.zeros([len(dssp_types),L,len(clustering_partition)-1])\n", - " rg = np.zeros([len(clustering_partition)-1])\n", - " \n", - " for k in range(len(np.unique(clustering_partition[clustering_partition >= 0]))):\n", - " \n", - " prop_dssp_k = np.zeros([len(dssp_types),L])\n", - " dssp_k = md.compute_dssp(traj_file[np.where(clustering_partition == k)], simplified = False)\n", - " rg[k] = np.mean(md.compute_rg(traj_file[np.where(clustering_partition == k)]))\n", - " for dt in range(len(dssp_types)):\n", - " prop_dssp_k[dt,:] = (dssp_k == dssp_types[dt]).sum(axis = 0)/len(np.where(clustering_partition == k)[0])\n", - " prop_dssp[:,:,k] = prop_dssp_k\n", - "\n", - " if var_dict[\"do_pdb\"]:\n", - " \n", - " conf_list = os.listdir(var_dict[\"folders\"][0])\n", - " md_file = md.load_pdb(\"/\".join([var_dict[\"folders\"][0],conf_list[0]]))\n", - " L = md_file.topology.n_residues\n", - " Nconf = len(conf_list)\n", - " \n", - " dssp_types = ['H','B','E','G','I','T','S',' ']\n", - " prop_dssp = np.zeros([len(dssp_types),L,len(clustering_partition)-1])\n", - " rg = np.zeros([len(clustering_partition)-1])\n", - " \n", - " for k in range(len(np.unique(clustering_partition[clustering_partition >= 0]))):\n", - " \n", - " prop_dssp_k = np.zeros([len(dssp_types),L])\n", - " clus_k = np.where(clustering_partition == k)[0]\n", - " dssp_k = np.zeros([len(clus_k),L]).astype(str)\n", - " rg_k = np.zeros([len(clus_k)])\n", - "\n", - " for l in range(len(clus_k)):\n", - " dssp_k[l,:] = md.compute_dssp(md.load_pdb(\"/\".join([pdb_folder,conf_list[clus_k[l]]])), simplified = False)[0].astype(str)\n", - " rg_k[l] = md.compute_rg(md.load_pdb(\"/\".join([pdb_folder,conf_list[clus_k[l]]])))\n", - " rg[k] = np.mean(rg_k)\n", - " for dt in range(len(dssp_types)):\n", - " prop_dssp_k[dt,:] = (dssp_k == dssp_types[dt]).sum(axis = 0)/len(np.where(clustering_partition == k)[0])\n", - " prop_dssp[:,:,k] = prop_dssp_k\n", - " \n", - " for cluster in tqdm(range(len(np.unique(clustering_partition[clustering_partition >= 0])))):\n", - " \n", - " prop_cluster = round(100*len(np.where(clustering_partition == cluster)[0])/Nconf,2)\n", - " fig = plt.figure(figsize=(10, 1.7))\n", - " res = sns.heatmap(prop_dssp[:,:,cluster], cmap='Blues', square = True, cbar_kws={\"shrink\": .7,'label':\"Class prop.\"})\n", - " xlabels = [item.get_text() for item in res.get_xmajorticklabels()]\n", - " plt.xlabel('Sequence position')\n", - " plt.ylabel('DSSP class')\n", - " plt.title(\"\".join([ensemble_name, ' - cluster #',str(cluster),' (',str(prop_cluster),'% oc.). Average RG = ', str(round(10*rg[cluster],2)),r'$\\AA$.']), fontsize = 8)\n", - " plt.yticks(rotation=0) \n", - " res.set_xticklabels(np.asarray(xlabels).astype(int) + 1, fontsize = 7)\n", - " res.set_yticklabels(['L' if x==' ' else x for x in dssp_types], fontsize = 7)\n", - " plt.savefig(\"/\".join([save_files,\"\".join([ensemble_name,'_',str(cluster),'_DSSP.png'])]), dpi=199, bbox_inches='tight')\n", - "\n", - " \n", - " print(\"\\nPlots saved.\\n\") " - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.16" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} -- GitLab