pysoftk.pol_analysis package

Subpackages

Submodules

pysoftk.pol_analysis.clustering module

class pysoftk.pol_analysis.clustering.SCP(tpr_file, xtc_file)[source]

Bases: MDA_input

A class used to compute a spatial cluster for polymers (SCP)

contact_matrix(u, name_list, cluster_cutoff)[source]

Function to compute the contact matrix.

Parameters:
  • u (MDAnalaysis.Universe) – An user-provided MDAnalysis universe.

  • name_list (class.list) – A list containing the names used for the contact matrix analysis.

  • cluster_cutoff (class.float) – A user-defined number for a radius cutoff.

Returns:

pairs – A list of tuples containing the indexes of the selected molecules.

Return type:

class.list

spatial_clustering(u, atom_sel, cluster_cutoff, frame)[source]

Function that gives you the resids and sized of each polymer aggregate per frame in a trajectory.

Parameters:
  • frame (MDanalysis.Object) – An MDAnalysis object that contains one frame of a trajectory.

  • atom_sel (class.str) – User-provided string containing the ResID/tag of the molecule in the trajectory.

  • cluster_cutoff (class.int) – User defined cutoff to select as neighbour.

  • max_workers (class.int) – Number of threads to be used in the calculation.

Returns:

  • tuple containing clusters and cluster_sizes.

  • clusters (class.list) – A tuple containing the ResIDs.

  • cluster_sizes (class.list) – Length of the computed clusters.

spatial_clustering_run(start, stop, skip, atom_names, cluster_cutoff, name_file)[source]
Function to compute the spatial clustering function over an MDAnalysis

trajectory.

Parameters:
  • start (class.int) – Frame to start calculation.

  • stop (class.int) – Frame to stop calculation.

  • step (class.int) – Number of frames to skip.

  • atom_names (class.str) – Atomic names used to procure the search.

  • cluster_cutoff (class.float) – User-provided value to define the cluster length.

  • name_file (class.str) – Name of the User-supplied file.

Returns:

A pandas.DataFrame.parquet file with the results stored.

Return type:

None

pysoftk.pol_analysis.clustering.timeit(func)[source]

pysoftk.pol_analysis.contact_analysis module

class pysoftk.pol_analysis.contact_analysis.contacts(tpr_file, xtc_file)[source]

Bases: MDA_input

A class used to compute the contacts between the polymers of a micelle.

contacts_calc(u, micelle_selection, micelle_positions, group1, group2, contact_distance)[source]

Function to get atom pairs

Parameters:
  • u (MDAnalysis.Universe) – An user-provided MDanalysis trajectory.

  • micelle_selection (MDAnalysis.Object) – MDAnalysis atom group of atoms belonging to the micelle

  • micelle_positions (np.array) – positions of all atoms from micelle selection

  • group1 (class.str) – atom names of the atoms for group1 of the contacts calculation

  • group2 (class.str) – atom names of the atoms for group2 of the contacts calculation

Returns:

contacts_mat – Array with the contacts between polymers

Return type:

np.array

run_contacts_calc(micelle_selection, micelle_positions, group1, group2, contacts_distance)[source]

Function to run contacts_calc function over timme

Parameters:
  • micelle_selection (class.list) – List with resids of the polymers belonging to the micelle

  • micelle_positions (np.array) – Array with the positions of all atoms of the micelle

  • group1 (class.str) – Atom names of the atoms for group1 of the contacts calculation

  • group2 (class.str) – Atom names of the atoms for group2 of the contacts calculation

  • contacts_distance (class.int) – Cut off distance for contacts

Returns:

contacts_over time – Array with the contacts between polymers in each timestep

Return type:

np.array

pysoftk.pol_analysis.contact_analysis.timeit(func)[source]

pysoftk.pol_analysis.ecc_micelle module

class pysoftk.pol_analysis.ecc_micelle.ecc(tpr_file, xtc_file)[source]

Bases: MDA_input

A class used to compute the eccentricity of micelles

ecc_calc(atom_sel_micelle, micelle_positions)[source]

Function to calculate the eccentricity of the chosen atoms

Parameters:

atom_sel_micelle (MDAnalysis.Object) – MDAnalysis.atom_selection function

micelle_positionsnp.array

np.array with the positions of the micelle atoms

Returns:

ecc – Eccentricity of the selected atoms

Return type:

class.float

running_ecc(atom_names, micelle_positions, start, stop, skip)[source]

Function to calculate the eccentricity of the chosen atoms over time

Parameters:
  • atom_sel_type (class.str) – Type of atom selection

  • atom_names (class.list) – List with the names of atoms

  • micelle_positions (np.array) – np.array with the positions of the micelle atoms

  • Start (class.int) – Starting frame of the trajectory

  • Stop (class.int) – Ending rame of the trajectory

  • Skip (class.int) – Skipping every that many framesof the trajectory

Returns:

ecc – Eccentricity of the selected atoms over time

Return type:

np.array

pysoftk.pol_analysis.intrinsic_density_micelle module

class pysoftk.pol_analysis.intrinsic_density_micelle.intrinsic_density(tpr_file, xtc_file)[source]

Bases: MDA_input

A class used to compute the contacts between the polymers of a micelle.

box_volume(u, frame)[source]

Function to calculate the volume of the system box at a specific frame

Parameters:
  • u (MDAnalysis.Universe) – An user-provided MDanalysis trajectory.

  • frame (int) – frame selected for volume calculation

Returns:

vol – volume of system box

Return type:

float

intrinsic_density_not_normalized(u, frame, whole_micelle_largest_cluster_resids, micelle_selection, micelle_positions, core_group, shell_group, n_rand_points, n_bins, n_min, n_max, n_step)[source]

Function to run the intrinsic density in a specific frame

Parameters:
  • u (MDAnalysis.Universe) – An user-provided MDanalysis trajectory.

  • whole_micelle_largest_cluster_resids (list) – list with resids of polymers forming the desired micelle

  • micelle_selection (MDAnalysis.Object) – MDAnalysis atom group of atoms belonging to the micelle

  • micelle_positions (np.array) –

    positions of all atoms from micelle selection

    core_groupclass.str

    atom names of the atoms of the core for density calculation

  • shell_group (class.str) –

    atom names of the atoms of the shell for density calculation

    n_rand_pointsint

    number of water molecules

    n_binsint

    nnumber of bins for the density calculation

    n_minint

    min bin value for density calculation

    n_maxint

    max bins value for density calculation

    n_stepint

    size of bin

Returns:

  • intrinsic_r_total (np.array) – not normalized intrinsic density

  • intrinsic_r_total_norm (np.array) – normalized intrinsic density

run_intrinsic_density(micelle_selection, micelle_positions, core, shell, water_name, start, stop, step, n_bins=31, n_min=-40.5, n_max=150, n_step=0.1)[source]

Function to run the intrinsic density over time

Parameters:
  • u (MDAnalysis.Universe) – An user-provided MDanalysis trajectory.

  • micelle_selection (MDAnalysis.Object) – MDAnalysis atom group of atoms belonging to the micelle

  • micelle_positions (np.array) –

    positions of all atoms from micelle selection

    coreclass.str

    atom names of the atoms of the core for density calculation

  • shell (class.str) –

    atom names of the atoms of the shell for density calculation

    water_nameclass.str

    name of water atoms

    startint

    starting frame to perform calculation

    stopint

    last frame to perform calculation

    stepint

    frames skipped in calculation

    n_binsint

    nnumber of bins for the density calculation

    n_minint

    min bin value for density calculation

    n_maxint

    max bins value for density calculation

    n_stepint

    size of bin

Returns:

final_density – intrinsic density

binned_space : np.array array with the values of the binned radial distance, this allows easier plotting of the density

Return type:

np.array

pysoftk.pol_analysis.intrinsic_density_micelle.timeit(func)[source]

pysoftk.pol_analysis.intrinsic_density_water_micelle module

class pysoftk.pol_analysis.intrinsic_density_water_micelle.intrinsic_density_water(tpr_file, xtc_file)[source]

Bases: MDA_input

A class used to compute the contacts between the polymers of a micelle.

box_volume(u, frame)[source]

Function to calculate the volume of the system box at a specific frame

Parameters:
  • u (MDAnalysis.Universe) – An user-provided MDanalysis trajectory.

  • frame (int) – frame selected for volume calculation

Returns:

vol – volume of system box

Return type:

float

intrinsic_density_not_normalized(u, frame, whole_micelle_largest_cluster_resids, micelle_positions, water_selection, core_group, n_rand_points, n_bins, n_min, n_max, n_step)[source]

Function to run the intrinsic density in a specific frame

Parameters:
  • u (MDAnalysis.Universe) – An user-provided MDanalysis trajectory.

  • frame (int) – trajectory frame

  • whole_micelle_largest_cluster_resids (list) – list with resids of polymers forming the desired micelle

micelle_positionsnp.array

positions of all atoms from micelle selection

water_selection: list

list of solvent atoms to calculate the density

core_groupclass.str

atom names of the atoms of the core for density calculation

n_rand_pointsint

number of water molecules

n_binsint

nnumber of bins for the density calculation

n_minint

min bin value for density calculation

n_maxint

max bins value for density calculation

n_stepint

size of bin

Returns:

  • intrinsic_r_total (np.array) – not normalized intrinsic density

  • intrinsic_r_total_norm (np.array) – normalized intrinsic density

run_intrinsic_density(micelle_selection, micelle_positions, water_selection, core, heavy_atom_water_name, start, stop, step, n_bins=31, n_min=-40.5, n_max=150, n_step=0.1)[source]

Function to run the intrinsic density over time

Parameters:
  • micelle_selection (list) – list of atoms belonging to the micelle

  • micelle_positions (np.array) –

    positions of all atoms from micelle selection

    water_selectionlist

    solvent atom selection

    coreclass.str

    atom names of the atoms of the core for density calculation

  • shell (class.str) –

    atom names of the atoms of the shell for density calculation

    heavy_atom_water_nameclass.str

    name of heavy water atoms

    startint

    starting frame to perform calculation

    stopint

    last frame to perform calculation

    stepint

    frames skipped in calculation

    n_binsint

    nnumber of bins for the density calculation

    n_minint

    min bin value for density calculation

    n_maxint

    max bins value for density calculation

    n_stepint

    size of bin

Returns:

final_density – intrinsic density

binned_space : np.array array with the values of the binned radial distance, this allows easier plotting of the density

Return type:

np.array

select_water_atoms(u, name)[source]

Function to select the solvent atoms

Parameters:
  • u (MDAnalysis.Universe) – An user-provided MDanalysis trajectory

  • name (list) – List with the atom names of the solvent

pysoftk.pol_analysis.intrinsic_density_water_micelle.timeit(func)[source]

pysoftk.pol_analysis.kinetic_clustering module

class pysoftk.pol_analysis.kinetic_clustering.AutomatedKineticClustering(tpr_file, xtc_file)[source]

Bases: MDA_input

Automated Kinetic Clustering pipeline for single molecules.

This class provides a completely unsupervised workflow to isolate metastable kinetic states from molecular dynamics trajectories. It leverages RDKit for topological backbone identification, spatial contact maps for dimensionality reduction, and tICA combined with HDBSCAN for kinetic state clustering.

Parameters:
  • tpr_file (str) – Path to the topology file (e.g., .pdb, .tpr). Must contain bond information.

  • xtc_file (str) – Path to the trajectory file (e.g., .xyz, .xtc, .trr).

u

The MDAnalysis Universe object containing the loaded topology and trajectory.

Type:

MDAnalysis.Universe

dt

The time step between frames in picoseconds (ps). Defaults to 1.0 if not found.

Type:

float

core_atoms

The subset of atoms identified as the structural backbone.

Type:

MDAnalysis.core.groups.AtomGroup or None

feature_matrix

A 2D array of shape (n_frames, n_features) representing the binary contact maps.

Type:

numpy.ndarray or None

embedding

A 2D array of shape (n_frames, n_components) representing the projected tICA space.

Type:

numpy.ndarray or None

labels

A 1D array of shape (n_frames,) containing the cluster IDs assigned by HDBSCAN. A label of -1 indicates transition states (noise).

Type:

numpy.ndarray or None

define_backbone_by_centrality(percentile=75)[source]

Calculates Closeness Centrality via the RDKit topological distance matrix to automatically define the structural core/backbone of the molecule.

Atoms with a centrality score in the top 100 - percentile percent are kept, effectively filtering out high-frequency flapping side-chains.

Parameters:

percentile (int or float, optional) – The percentile threshold for exclusion. A value of 75 means the top 25% most central atoms are retained (default is 75).

Returns:

An MDAnalysis AtomGroup containing the identified core atoms.

Return type:

MDAnalysis.core.groups.AtomGroup

Raises:

RuntimeError – If the MDAnalysis topology cannot be converted to an RDKit molecule, typically because the topology file lacks explicit bond information.

export_representative_states(output_file='metastable_states.pdb', selection='all')[source]

Extracts the geometric medoid (closest actual frame to the cluster center) for each identified kinetic state and exports them sequentially to a single file.

Parameters:
  • output_file (str, optional) – The output filepath for the resulting trajectory/PDB file (default is “metastable_states.pdb”).

  • selection (str, optional) – The MDAnalysis selection string for atoms to export (default is “all”).

Return type:

None

Raises:

ValueError – If the tICA embedding or labels have not been generated by running run_tica_clustering first.

extract_contact_maps(r_cutoff=8.0, step=1)[source]

Computes pairwise distances for the core atoms and applies a spatial cutoff to generate binary contact matrices for each frame.

Parameters:
  • r_cutoff (float, optional) – The distance threshold in Angstroms to define a contact (default is 8.0).

  • step (int, optional) – The step size for slicing the trajectory (default is 1, which reads every frame).

Returns:

A 2D matrix where each row represents a trajectory frame and each column is a binary feature (1 if contact, 0 if no contact).

Return type:

numpy.ndarray

Raises:

ValueError – If core_atoms have not been defined by running define_backbone_by_centrality first.

run_tica_clustering(lag_time_frames=10, n_components=2, min_cs=15)[source]

Performs Time-lagged Independent Component Analysis (tICA) on the contact maps to find slow collective variables, followed by HDBSCAN clustering to identify distinct metastable states.

Parameters:
  • lag_time_frames (int, optional) – The lag time (tau) for the tICA estimator, defined in number of frames (default is 10).

  • n_components (int, optional) – The number of tICA components (dimensions) to project the data onto (default is 2).

  • min_cs (int, optional) – The min_cluster_size parameter for the HDBSCAN algorithm (default is 15).

Returns:

  • embedding: The (n_frames, n_components) array of coordinates in tICA space.

  • labels: The (n_frames,) array of cluster assignments per frame.

Return type:

tuple of (numpy.ndarray, numpy.ndarray)

Raises:

ValueError – If feature_matrix has not been generated by running extract_contact_maps first.

pysoftk.pol_analysis.kinetic_clustering.timeit(func)[source]

Decorator to measure and print the execution time of a function.

Parameters:

func (callable) – The function to be timed.

Returns:

The wrapped function that includes execution time printing.

Return type:

callable

pysoftk.pol_analysis.make_micelle_whole module

class pysoftk.pol_analysis.make_micelle_whole.micelle_whole(tpr_file, xtc_file)[source]

Bases: MDA_input

A class used to make the largest micelle whole across the pbc for better analysis.

count_dimensions(box, cluster_atoms_positions, dimensions)[source]

Function to get the dimensions of a system, the box size and cluster_atom_positions in the form to use in make_cluster_whole

Parameters:
  • box (np.array) – u.dimensions

  • cluster_atom_positions (np.array) – Array with the atom positions of the polymers belonging to the cluster

  • dimensions (class.int) – Number of dimensions

Returns:

  • box (np.array) – Array with the u.dimensions to feed in into make_cluster_whole

  • cluster_atom_positions (np.array) – Array with the atom positions of the micelle/cluster in the correct shape to feed it into make_cluster_whole

  • dimensions (class.list) – List with the number of dimensions. E.g. 3D=[0, 1, 2]

get_atom_name_list(atom_sel)[source]

Function to obtain the residues based on user-provided atom selection.

Parameters:

atom_sel (class.str) – User Provided name/tag of the molecules inside the box.

Returns:

final – List containing all names of atoms.

Return type:

class.list

get_polymer_positions(u, frame, resname, cluster_resids)[source]

Function to find the polymer positions

Parameters:
  • u (MDAnalysis.Universe) – An user-provided MDAnalysis universe.

  • frame (class.int) – Time frame for calculation to be done.

  • resname (class.list) – List of strings with the resname of the polymers.

  • cluster_resids (class.str or class.int) – List with the polymer resids.

Returns:

  • box (np.array) – Array with the u.dimensions

  • cluster_atom_positions (np.array) – Array with the atom positions of the micelle/cluster.

  • cluster_sel (np.array) – Array with the atoms that belong to the micelle/cluster.

make_cluster_whole(frame, resname, cluster_resids, bins_min, bins_max, bins_step, dimensions=3)[source]

Function to make a polymer cluster whole across the pbc

Parameters:
  • frame (class.int) – Time frame that wants to be made whole

  • resname (class.list) – List with the resname of the molecules of the cluster

  • cluster_resids (class.list) – List with resids of the polymers that want to be made whole

  • bins_min (class.int) – Smallest distance to evaluate if the pbc are broken (smallest x-y distance position)

  • bins_max (class.int) – Biggest distance to evaluate if the pbc are broken (largest x-y distance position)

  • bins_step (class.int) – Step between bins to evaluate broken pbc

Returns:

  • frame (class.int) – Time frame where the micelle has been made whole

  • atom_positions_whole (np.array) – Array with the atom positions of the micelle made whole across the pbc

obtain_largest_micelle_resids(spatial_clustering_results)[source]

Function to obtain the largest micelle resids Bare in mind this may give two different clusters if they have the same size - should check that you only select one when using the results from here

Note-need to check if it actually gives two different clusters if they have the same size.

Parameters:

spatial_clustering_results (pandas.DataFrame) – Results from clustering analysis class. Pandas dataframe with three columns: time (ps), micelle_resids and micelle_size

Returns:

longest_lists – list with the number resids (of MDanalysis) of the largets micelle at the timesteps that want to be evaluated.

Return type:

list

obtain_snapshot(output_name, atom_positions_at_specific_frame, cluster_resids, polymer_resname, frame, water_resname=['SOL'], water_atom_name=['OW'], solvant=False)[source]

Function to obtain an snapshot of the micelle (pdb) file of a specific frame and returns the new universe to analyse.

Parameters:
  • output_name (class.str) – Name of output file

  • atom_positions_at_specific_frame (np.array) – Position of atoms at a specific frame

  • cluster_resids (class.list) – List of polymers resids for the snapshot

  • frame (class.int) – Time frame for snapshot

  • water_resname (class.list) – List with name of solvants

  • water_atom_name (class.list) – List of solvant atom names that want to be outputed in the sanpshot. E.g. with water you may only be interested in outputing the oxygen water atom for further calculations

  • solvant (class.boolean) – Set to True if interested in obtainig an output snapshot with the solvant too.

Returns:

  • u3 (MDAnalysis.Universe) – An MDAnalysis universe containing the resulting frames

  • output (MDAnalysis.Write) – A PDB file containing the selected molecules.

pbc_per_dimension(cluster_sel_positions, box, dimension, bins_min, bins_max, bins_step)[source]

Function to find the broken parts of the micelle across the pbc and make it whole

Parameters:
  • spatial_clustering_results (np.array) – Numpy array with the positions of the resids of interest. It needs to have the shape of n_molec x 3

  • box (np.array) – Numpy array with the MDanalysis strat u.dimensions. Array with the x y and z dimensions follow by the angles of the box.

  • dimension (class.int) – Dimension number that wants to be evaluated

  • bins_min (class.int) – Smallest distance to evaluate if the pbc are broken (smallest x-y distance position)

  • bins_max (class.int) – Biggest distance to evaluate if the pbc are broken (largest x-y distance position)

  • bins_step (class.int) – Step between bins to evaluate broken pbc

Returns:

cluster_sel_positions_return – Array with the atom positions of the micelle made whole across the pbc

Return type:

np.array

running_make_cluster_whole(resname, cluster_resids, start, stop, skip, bins_min=-50, bins_max=50, bins_step=10)[source]

Function to run the make_cluster_whole function over several frames

Parameters:
  • resname (class.list) – List with the resname of the molecules of the cluster in each time step.

  • cluster_resids (class.list) – List with resids of the polymers that want to be made whole in each time step.

  • start (class.int) – Starting frame to perfomr the calculation.

  • stop (class.int) – Last frame to perform the calculation.

  • skip (class.int) – Length of step to perform the calculation every number of frames.

  • bins_min (class.int) – Smallest distance to evaluate if the pbc are broken (smallest x-y distance position).

  • bins_max (class.int) – Biggest distance to evaluate if the pbc are broken (largest x-y distance position).

  • bins_step (class.int) – Step between bins to evaluate broken pbc.

Returns:

atom_positions_over_trajectory – List of np.arrays where each entry of the list is the position of atoms that are made whole across the pbc at a specific time step.

Return type:

class.list

pysoftk.pol_analysis.make_micelle_whole.timeit(func)[source]

pysoftk.pol_analysis.rgyr_micelle module

class pysoftk.pol_analysis.rgyr_micelle.rgyr(tpr_file, xtc_file)[source]

Bases: MDA_input

A class used to compute the radius of gyration of micelles

rgyr_calc(atom_sel_micelle, micelle_positions, subgroup=[])[source]

Funcion to calculate the radius of gyration of the chosen atoms

Parameters:
  • atom_sel_micelle (MDAnalysis.atom_selection) – MDAnalysis atom_selection function.

  • subgroup (MDAnalysis.atom_group) – MDAnalysis atom group to calculate the rgyr

  • micelle_positions (np.array) – np.array with the positions of the micelle atoms

Returns:

radius_of_gyration – Radius of gyration of the selected atoms

Return type:

class.float

running_rgyr(atom_names, micelle_positions, start, stop, skip, subgroup=[])[source]

Function to calculate the radius of gyration of the chosen atoms.

Parameters:
  • u (MDAnalysis.Universe) – An user-provided MDAnalysis universe.

  • atom_sel_type (class.str) – Type of atom selection.

  • atom_names (class.list) – List with the names of atoms.

  • micelle_positions (np.array) – An array with the positions of the micelle atoms.

  • subgroup (class.str) – A list of atom names to calculate the rgyr.

  • Start (class.int) – Starting frame of the trajectory.

  • Stop (class.int) – Ending rame of the trajectory.

  • Skip (class.int) – Skipping every that many frames of the trajectory.

Returns:

rgyr_f – Radius of gyration of the selected atoms over time

Return type:

np.array

pysoftk.pol_analysis.rgyr_micelle.timeit(func)[source]

pysoftk.pol_analysis.ring_ring module

class pysoftk.pol_analysis.ring_ring.RSA(tpr_file, xtc_file)[source]

Bases: MDA_input

Ring Stacking Analysis (RSA) for polymer systems.

Inherits from MDA_input to handle MDAnalysis universe initialization.

find_several_rings_stacked(rings_df_name)[source]

Extract connected components (clusters) of polymers from the stacking results.

Parameters:

rings_df_name (str) – Path to the parquet file generated by stacking_analysis.

Returns:

For each frame, a list of polymer clusters (each cluster is a list of Residue IDs).

Return type:

list of lists

load_or_build_rings(u, cache_file='ring_cache.dill')[source]

Load ring data from a cache file or build it if the cache does not exist.

Parameters:
  • u (MDAnalysis.Universe) – The universe object.

  • cache_file (str, optional) – Path to the dill cache file, by default “ring_cache.dill”.

pol_cutoff(u, cutoff)[source]

Filter polymer pairs that are spatially close using a cKDTree.

Parameters:
  • u (MDAnalysis.Universe) – The universe object at the current frame.

  • cutoff (float) – Distance cutoff for the initial spatial search.

Returns:

  • indices (list) – List of indices (range of valid pairs).

  • pairs (list of tuples) – List of polymer Residue ID pairs that are proximity candidates.

stacking_analysis(cut_off, ang_cut, start, stop, step, output_name='output.parquet', write_pdb=False)[source]

Perform high-throughput ring stacking analysis across trajectory frames.

Parameters:
  • cut_off (float) – Distance threshold for stacking.

  • ang_cut (float) – Angle threshold (acute) for face-to-face orientation.

  • start (int) – Starting frame index.

  • stop (int) – Stopping frame index.

  • step (int) – Frame stride.

  • output_name (str, optional) – Name of the output parquet file, by default “output.parquet”.

  • write_pdb (bool, optional) – Whether to write pdb files (legacy flag), by default False.

Returns:

The results containing polymer stacking pairs per frame.

Return type:

pandas.DataFrame

pysoftk.pol_analysis.ring_ring.evaluate_dimer_pair(ring_comb, positions, box_dims, cut_off, ang_cutoff, pol_idx)[source]

Worker function to check if any pair of rings between two polymers are stacking.

Parameters:
  • ring_comb (list of tuples) – Combinations of ring atom indices between polymer A and polymer B.

  • positions (ndarray of shape (N, 3)) – Global atom positions from the trajectory.

  • box_dims (ndarray of shape (6,)) – Simulation box dimensions for PBC handling.

  • cut_off (float) – Distance threshold for ring-ring proximity.

  • ang_cutoff (float) – Angle threshold (in degrees) for face-to-face stacking.

  • pol_idx (tuple) – The Residue IDs of the two polymers being compared.

Returns:

Returns pol_idx if a stacking event is found, otherwise None.

Return type:

tuple or None

pysoftk.pol_analysis.ring_ring.fast_calc_angle(u_norm, u_norm2)[source]

Calculate the acute angle between two normal vectors.

Parameters:
  • u_norm (ndarray of shape (3,)) – Normal vector of the first ring.

  • u_norm2 (ndarray of shape (3,)) – Normal vector of the second ring.

Returns:

The angle between the planes in degrees.

Return type:

float

pysoftk.pol_analysis.ring_ring.fast_svd(coordinates)[source]

Compute the normal vector of a ring plane using Singular Value Decomposition.

Parameters:

coordinates (ndarray of shape (N, 3)) – The Cartesian coordinates of the atoms in the ring.

Returns:

The normal vector (third principal component) of the ring plane.

Return type:

ndarray of shape (3,)

pysoftk.pol_analysis.ring_ring.timeit(func)[source]

Decorator to measure and print function execution time.

pysoftk.pol_analysis.solvation module

class pysoftk.pol_analysis.solvation.solvation(tpr_file, xtc_file)[source]

Bases: MDA_input

A class used to compute the contacts between the polymers of a micelle

hydration_calc(frame, largest_cluster_resids, micelle_pos, water_name, polymer_oxygen_names, cut_off)[source]

Function to calculate the solvation of selected polymer atoms

Parameters:
  • frame (class.int) – Time frame of step for calculation.

  • largest_cluster_resids (class.list) – List with resids of polymers forming the micelle at a specific time step.

  • micelle_pos (np.array) – 3D np.array with the positions of the atoms that conform the micelle whole (not broken across the pbc boundaries).

  • water_name (class.str) – Water atom names for hydration calculation.

  • polymer_oxygen_names (class.list) – List of atom names of the polymer for hydration calculation.

  • cut_off (class.float) – Number cut off of the distance between the polymer beads and the water for hydration calculation.

Returns:

coord_number – Coordination number of the polymer atoms.

Return type:

list

solvation_calc_run(start_frame, stop_frame, step_frame, largest_cluster_resids, micelle_pos, water_name, polymer_oxygen_names, cut_off)[source]

Function to calculate the solvation calculation over the selected frames

Parameters:
  • start_frame (class.int) – frame to start calculation

  • stop_frame (class.int) – frame to stop calculation

  • step_frame (class.int) – number of frames to skip

  • largest_cluster_resids (class.list) – list with resids of polymers forming the micelle at a specific time step

  • micelle_pos (np.array) – 3D np.array with the positions of the atoms that conform the micelle whole (not broken across the pbc boundaries)

  • water_name (class.str) – warer atom names for hydration calculation

  • polymer_oxygen_names (class.list) – list of atom names of the polymer for hydration calculation

  • cut_off (class.float) – number cut off of the distance between the polymer beads and the water for hydration calculation.

Returns:

coord_number – coordination number of the polymer atoms at a specific frame

Return type:

class.list

pysoftk.pol_analysis.solvation.timeit(func)[source]

pysoftk.pol_analysis.spherical_density module

class pysoftk.pol_analysis.spherical_density.spherical_density(tpr_file, xtc_file)[source]

Bases: MDA_input

A class used to compute the contacts between the polymers of a micelle.

density_calc(binned_space, all_histograms)[source]

Function to calculate the density

Parameters:

binned_space (np.array) – bins

all_histogramsnp.array

distances histograms

Returns:

density – array with density

Return type:

np.array

dist_com(u, atom_sel_micelle, micelle_positions, atom_selection)[source]

Function to calculate the distance between atoms and the center of mass of the micelle

Parameters:
  • u (mda.Universe) – mda.Universe

  • atom_sel_micelle

    atom group selection of the micelle

    micelle_positionsnp.array

    array with the positions of the micelle whole

Returns:

distance_com – array with the distances of the aotms to the COM of the micelle

Return type:

np.array

run_density_calc(micelle_sel_type, whole_micelle_selection, micelle_positions, group_atom_selection, maxbin=200, minbin=0, step=0.1)[source]

Function to get the mean density calculation over time as a funciton wrt the micelle COM

Parameters:
  • micelle_sel_type (str) – string type for the selection of the micelle atoms

  • whole_micelle_selection (np.array) – elements to select the micelle atoms

  • micelle_positions (np.array) – array with the positions of the micelle atoms whole

  • group_atom_selection (str) – list with the names of the polymer species chosen for the density calculation

  • maxbin (int) – largest bin value

  • minbin (int) – smalles bin value

  • step (int) – bin size

Returns:

density – array with the density values as a function of the distance to the COM of the micelle

binned_spacenp.array

array with the values of the binned radial distance, this allows easier plotting of the density

Return type:

np.array

sph_histogram(distance, maxbin, minbin, step)[source]

Function to create a histogram of the distances

Parameters:
  • distance (np.array) – distances to be binned

  • maxbin (int) – largest bin value

  • minbin (int) – smalles bin value

  • step (int) – bin size

Returns:

all_histograms – array with the distances values binned

Return type:

np.array

pysoftk.pol_analysis.spherical_density.timeit(func)[source]

pysoftk.pol_analysis.spherical_density_water module

class pysoftk.pol_analysis.spherical_density_water.spherical_density_water(tpr_file, xtc_file)[source]

Bases: MDA_input

A class used to compute the contacts between the polymers of a micelle.

density_calc(binned_space, all_histograms)[source]

Function to calculate the density

Parameters:

binned_space (np.array) – bins

all_histogramsnp.array

distances histograms

Returns:

density – array with density

Return type:

np.array

dist_com(u, frame, atom_sel_micelle, micelle_positions, water_selection)[source]

Function to calculate the distance between atoms and the center of mass of the micelle

Parameters:
  • u (mda.Universe) – mda.Universe

  • atom_sel_micelle

    atom group selection of the micelle

    micelle_positionsnp.array

    array with the positions of the micelle whole

Returns:

distance_com – array with the distances of the aotms to the COM of the micelle

Return type:

np.array

run_density_calc(micelle_sel_type, whole_micelle_selection, micelle_positions, water_sel_type, water_atom_selection, start, stop, step_frame, maxbin=200, minbin=0, step=0.1)[source]

Function to get the mean density calculation over time as a funciton wrt the micelle COM

Parameters:
  • micelle_sel_type (str) – string type for the selection of the micelle atoms

  • whole_micelle_selection (np.array) – elements to select the micelle atoms

  • micelle_positions (np.array) – array with the positions of the micelle atoms whole

  • water_sel_type (str) – string type for the selection of the solvent atoms

  • water_atom_selection (list) – list with the names of the solvent chosen for the density calculation

  • maxbin (int) – largest bin value

  • minbin (int) – smalles bin value

  • step (int) – bin size

Returns:

density – array with the density values as a function of the distance to the COM of the micelle

binned_spacenp.array

array with the values of the binned radial distance, this allows easier plotting of the density

Return type:

np.array

sph_histogram(distance, maxbin, minbin, step)[source]

Function to create a histogram of the distances

Parameters:
  • distance (np.array) – distances to be binned

  • maxbin (int) – largest bin value

  • minbin (int) – smalles bin value

  • step (int) – bin size

Returns:

all_histograms – array with the distances values binned

Return type:

np.array

pysoftk.pol_analysis.spherical_density_water.timeit(func)[source]

pysoftk.pol_analysis.umap_analysis module

pysoftk.pol_analysis.umap_analysis.timeit(func)[source]
class pysoftk.pol_analysis.umap_analysis.umap_analysis(all_pos)[source]

Bases: object

A class used to compute to do umap cluster identification of polymer conformations within an spatial configuration of moieties.

cluster_order_appearance(X_embedded, y_pred, cwd)[source]
Function to ouput hdbscan with the groups labeled ordered

with their order of appearance in the trajectory.

Parameters:
  • X_embedded (np.array) – umap clustered data

  • y_pred (np.array) – cluster label assignment for each data point in the umap cluster data.

  • cwd (class.str) – path to save output file.

Returns:

A plot with the results of the calculation.

Return type:

None

find_closest_point(array, target_point)[source]

Function to find the closest point from an array to a given target point

Parameters:
  • array (np.array) – array of numbers from which the value will be found

  • target_point (class.float) – target number

Returns:

closest_point – numpy array with the closest number from array to the target point.

Return type:

np.array

get_array()[source]

Function to load the all pos distances.

Parameters:

None

Returns:

all_pos – np.array to load

Return type:

np.array

get_average_rep_cluster(X_embedded, y_pred)[source]
Function to get the X_embedded point that represents the

mean of each umap cluster.

Parameters:
  • X_embedded (np.array) – umap clustered data

  • y_pred (np.array) – cluster label assignment for each data point in the umap cluster data.

  • cwd (class.str) – path to save output file.

Returns:

cluster_points – np.array with the X_embedded values that represent the mean of each cluster.

Return type:

np.array

hdbscan_cluster(X_embedded, min_cs, epsilon, cwd, method='leaf')[source]

Function to hdbscan on umap clustered data.

Parameters:
  • X_embedded (np.array) – umap clustered data

  • min_cs (class.int) – minimum number of points for a group to be considered a cluster.

  • epsilon (class.int) – clusters closer than this distance apart will be merged.

  • cwd (class.str) – path to save output file.

  • method (class.str) – cluster_method_selection from hdbscan to determine how it selects flat clusters from the cluster tree hierarchy.

Returns:

y_pred – cluster label assignment for each data point in the umap cluster data.

Return type:

np.array

umap_run(n_neigh, cwd, n_comp=2, min_d=0.0, verb=True, rs=9)[source]

Function to run umap on all_pos array.

Parameters:
  • n_neigh (class.int) – larger numbers forcus more on global properties (and increase computational effort required)

  • cwd (class.str) – path to save output file

  • n_comp (class.int) – set the number of dimensions to reduce to

  • min_d (class.int) – must be 0.0 if points are to be clustered later with HDBSCAN.

  • verb (class.boolean) – To show the progress bar

  • rs (class.int) – random seed. Must be set to the same number for reproducibility.

Returns:

X_embedded – Clustered data

Return type:

np.array

Module contents