pysoftk.pol_analysis package
Subpackages
Submodules
pysoftk.pol_analysis.clustering module
- class pysoftk.pol_analysis.clustering.SCP(tpr_file, xtc_file)[source]
Bases:
MDA_inputA class used to compute a spatial cluster for polymers (SCP)
- contact_matrix(u, name_list, cluster_cutoff)[source]
Function to compute the contact matrix.
- Parameters:
u (MDAnalaysis.Universe) – An user-provided MDAnalysis universe.
name_list (class.list) – A list containing the names used for the contact matrix analysis.
cluster_cutoff (class.float) – A user-defined number for a radius cutoff.
- Returns:
pairs – A list of tuples containing the indexes of the selected molecules.
- Return type:
class.list
- spatial_clustering(u, atom_sel, cluster_cutoff, frame)[source]
Function that gives you the resids and sized of each polymer aggregate per frame in a trajectory.
- Parameters:
frame (MDanalysis.Object) – An MDAnalysis object that contains one frame of a trajectory.
atom_sel (class.str) – User-provided string containing the ResID/tag of the molecule in the trajectory.
cluster_cutoff (class.int) – User defined cutoff to select as neighbour.
max_workers (class.int) – Number of threads to be used in the calculation.
- Returns:
tuple containing clusters and cluster_sizes.
clusters (class.list) – A tuple containing the ResIDs.
cluster_sizes (class.list) – Length of the computed clusters.
- spatial_clustering_run(start, stop, skip, atom_names, cluster_cutoff, name_file)[source]
- Function to compute the spatial clustering function over an MDAnalysis
trajectory.
- Parameters:
start (class.int) – Frame to start calculation.
stop (class.int) – Frame to stop calculation.
step (class.int) – Number of frames to skip.
atom_names (class.str) – Atomic names used to procure the search.
cluster_cutoff (class.float) – User-provided value to define the cluster length.
name_file (class.str) – Name of the User-supplied file.
- Returns:
A pandas.DataFrame.parquet file with the results stored.
- Return type:
None
pysoftk.pol_analysis.contact_analysis module
- class pysoftk.pol_analysis.contact_analysis.contacts(tpr_file, xtc_file)[source]
Bases:
MDA_inputA class used to compute the contacts between the polymers of a micelle.
- contacts_calc(u, micelle_selection, micelle_positions, group1, group2, contact_distance)[source]
Function to get atom pairs
- Parameters:
u (MDAnalysis.Universe) – An user-provided MDanalysis trajectory.
micelle_selection (MDAnalysis.Object) – MDAnalysis atom group of atoms belonging to the micelle
micelle_positions (np.array) – positions of all atoms from micelle selection
group1 (class.str) – atom names of the atoms for group1 of the contacts calculation
group2 (class.str) – atom names of the atoms for group2 of the contacts calculation
- Returns:
contacts_mat – Array with the contacts between polymers
- Return type:
np.array
- run_contacts_calc(micelle_selection, micelle_positions, group1, group2, contacts_distance)[source]
Function to run contacts_calc function over timme
- Parameters:
micelle_selection (class.list) – List with resids of the polymers belonging to the micelle
micelle_positions (np.array) – Array with the positions of all atoms of the micelle
group1 (class.str) – Atom names of the atoms for group1 of the contacts calculation
group2 (class.str) – Atom names of the atoms for group2 of the contacts calculation
contacts_distance (class.int) – Cut off distance for contacts
- Returns:
contacts_over time – Array with the contacts between polymers in each timestep
- Return type:
np.array
pysoftk.pol_analysis.ecc_micelle module
- class pysoftk.pol_analysis.ecc_micelle.ecc(tpr_file, xtc_file)[source]
Bases:
MDA_inputA class used to compute the eccentricity of micelles
- ecc_calc(atom_sel_micelle, micelle_positions)[source]
Function to calculate the eccentricity of the chosen atoms
- Parameters:
atom_sel_micelle (MDAnalysis.Object) – MDAnalysis.atom_selection function
- micelle_positionsnp.array
np.array with the positions of the micelle atoms
- Returns:
ecc – Eccentricity of the selected atoms
- Return type:
class.float
- running_ecc(atom_names, micelle_positions, start, stop, skip)[source]
Function to calculate the eccentricity of the chosen atoms over time
- Parameters:
atom_sel_type (class.str) – Type of atom selection
atom_names (class.list) – List with the names of atoms
micelle_positions (np.array) – np.array with the positions of the micelle atoms
Start (class.int) – Starting frame of the trajectory
Stop (class.int) – Ending rame of the trajectory
Skip (class.int) – Skipping every that many framesof the trajectory
- Returns:
ecc – Eccentricity of the selected atoms over time
- Return type:
np.array
pysoftk.pol_analysis.intrinsic_density_micelle module
- class pysoftk.pol_analysis.intrinsic_density_micelle.intrinsic_density(tpr_file, xtc_file)[source]
Bases:
MDA_inputA class used to compute the contacts between the polymers of a micelle.
- box_volume(u, frame)[source]
Function to calculate the volume of the system box at a specific frame
- Parameters:
u (MDAnalysis.Universe) – An user-provided MDanalysis trajectory.
frame (int) – frame selected for volume calculation
- Returns:
vol – volume of system box
- Return type:
float
- intrinsic_density_not_normalized(u, frame, whole_micelle_largest_cluster_resids, micelle_selection, micelle_positions, core_group, shell_group, n_rand_points, n_bins, n_min, n_max, n_step)[source]
Function to run the intrinsic density in a specific frame
- Parameters:
u (MDAnalysis.Universe) – An user-provided MDanalysis trajectory.
whole_micelle_largest_cluster_resids (list) – list with resids of polymers forming the desired micelle
micelle_selection (MDAnalysis.Object) – MDAnalysis atom group of atoms belonging to the micelle
micelle_positions (np.array) –
positions of all atoms from micelle selection
- core_groupclass.str
atom names of the atoms of the core for density calculation
shell_group (class.str) –
atom names of the atoms of the shell for density calculation
- n_rand_pointsint
number of water molecules
- n_binsint
nnumber of bins for the density calculation
- n_minint
min bin value for density calculation
- n_maxint
max bins value for density calculation
- n_stepint
size of bin
- Returns:
intrinsic_r_total (np.array) – not normalized intrinsic density
intrinsic_r_total_norm (np.array) – normalized intrinsic density
- run_intrinsic_density(micelle_selection, micelle_positions, core, shell, water_name, start, stop, step, n_bins=31, n_min=-40.5, n_max=150, n_step=0.1)[source]
Function to run the intrinsic density over time
- Parameters:
u (MDAnalysis.Universe) – An user-provided MDanalysis trajectory.
micelle_selection (MDAnalysis.Object) – MDAnalysis atom group of atoms belonging to the micelle
micelle_positions (np.array) –
positions of all atoms from micelle selection
- coreclass.str
atom names of the atoms of the core for density calculation
shell (class.str) –
atom names of the atoms of the shell for density calculation
- water_nameclass.str
name of water atoms
- startint
starting frame to perform calculation
- stopint
last frame to perform calculation
- stepint
frames skipped in calculation
- n_binsint
nnumber of bins for the density calculation
- n_minint
min bin value for density calculation
- n_maxint
max bins value for density calculation
- n_stepint
size of bin
- Returns:
final_density – intrinsic density
binned_space : np.array array with the values of the binned radial distance, this allows easier plotting of the density
- Return type:
np.array
pysoftk.pol_analysis.intrinsic_density_water_micelle module
- class pysoftk.pol_analysis.intrinsic_density_water_micelle.intrinsic_density_water(tpr_file, xtc_file)[source]
Bases:
MDA_inputA class used to compute the contacts between the polymers of a micelle.
- box_volume(u, frame)[source]
Function to calculate the volume of the system box at a specific frame
- Parameters:
u (MDAnalysis.Universe) – An user-provided MDanalysis trajectory.
frame (int) – frame selected for volume calculation
- Returns:
vol – volume of system box
- Return type:
float
- intrinsic_density_not_normalized(u, frame, whole_micelle_largest_cluster_resids, micelle_positions, water_selection, core_group, n_rand_points, n_bins, n_min, n_max, n_step)[source]
Function to run the intrinsic density in a specific frame
- Parameters:
u (MDAnalysis.Universe) – An user-provided MDanalysis trajectory.
frame (int) – trajectory frame
whole_micelle_largest_cluster_resids (list) – list with resids of polymers forming the desired micelle
- micelle_positionsnp.array
positions of all atoms from micelle selection
- water_selection: list
list of solvent atoms to calculate the density
- core_groupclass.str
atom names of the atoms of the core for density calculation
- n_rand_pointsint
number of water molecules
- n_binsint
nnumber of bins for the density calculation
- n_minint
min bin value for density calculation
- n_maxint
max bins value for density calculation
- n_stepint
size of bin
- Returns:
intrinsic_r_total (np.array) – not normalized intrinsic density
intrinsic_r_total_norm (np.array) – normalized intrinsic density
- run_intrinsic_density(micelle_selection, micelle_positions, water_selection, core, heavy_atom_water_name, start, stop, step, n_bins=31, n_min=-40.5, n_max=150, n_step=0.1)[source]
Function to run the intrinsic density over time
- Parameters:
micelle_selection (list) – list of atoms belonging to the micelle
micelle_positions (np.array) –
positions of all atoms from micelle selection
- water_selectionlist
solvent atom selection
- coreclass.str
atom names of the atoms of the core for density calculation
shell (class.str) –
atom names of the atoms of the shell for density calculation
- heavy_atom_water_nameclass.str
name of heavy water atoms
- startint
starting frame to perform calculation
- stopint
last frame to perform calculation
- stepint
frames skipped in calculation
- n_binsint
nnumber of bins for the density calculation
- n_minint
min bin value for density calculation
- n_maxint
max bins value for density calculation
- n_stepint
size of bin
- Returns:
final_density – intrinsic density
binned_space : np.array array with the values of the binned radial distance, this allows easier plotting of the density
- Return type:
np.array
pysoftk.pol_analysis.kinetic_clustering module
- class pysoftk.pol_analysis.kinetic_clustering.AutomatedKineticClustering(tpr_file, xtc_file)[source]
Bases:
MDA_inputAutomated Kinetic Clustering pipeline for single molecules.
This class provides a completely unsupervised workflow to isolate metastable kinetic states from molecular dynamics trajectories. It leverages RDKit for topological backbone identification, spatial contact maps for dimensionality reduction, and tICA combined with HDBSCAN for kinetic state clustering.
- Parameters:
tpr_file (str) – Path to the topology file (e.g., .pdb, .tpr). Must contain bond information.
xtc_file (str) – Path to the trajectory file (e.g., .xyz, .xtc, .trr).
- u
The MDAnalysis Universe object containing the loaded topology and trajectory.
- Type:
MDAnalysis.Universe
- dt
The time step between frames in picoseconds (ps). Defaults to 1.0 if not found.
- Type:
float
- core_atoms
The subset of atoms identified as the structural backbone.
- Type:
MDAnalysis.core.groups.AtomGroup or None
- feature_matrix
A 2D array of shape (n_frames, n_features) representing the binary contact maps.
- Type:
numpy.ndarray or None
- embedding
A 2D array of shape (n_frames, n_components) representing the projected tICA space.
- Type:
numpy.ndarray or None
- labels
A 1D array of shape (n_frames,) containing the cluster IDs assigned by HDBSCAN. A label of -1 indicates transition states (noise).
- Type:
numpy.ndarray or None
- define_backbone_by_centrality(percentile=75)[source]
Calculates Closeness Centrality via the RDKit topological distance matrix to automatically define the structural core/backbone of the molecule.
Atoms with a centrality score in the top 100 - percentile percent are kept, effectively filtering out high-frequency flapping side-chains.
- Parameters:
percentile (int or float, optional) – The percentile threshold for exclusion. A value of 75 means the top 25% most central atoms are retained (default is 75).
- Returns:
An MDAnalysis AtomGroup containing the identified core atoms.
- Return type:
MDAnalysis.core.groups.AtomGroup
- Raises:
RuntimeError – If the MDAnalysis topology cannot be converted to an RDKit molecule, typically because the topology file lacks explicit bond information.
- export_representative_states(output_file='metastable_states.pdb', selection='all')[source]
Extracts the geometric medoid (closest actual frame to the cluster center) for each identified kinetic state and exports them sequentially to a single file.
- Parameters:
output_file (str, optional) – The output filepath for the resulting trajectory/PDB file (default is “metastable_states.pdb”).
selection (str, optional) – The MDAnalysis selection string for atoms to export (default is “all”).
- Return type:
None
- Raises:
ValueError – If the tICA embedding or labels have not been generated by running run_tica_clustering first.
- extract_contact_maps(r_cutoff=8.0, step=1)[source]
Computes pairwise distances for the core atoms and applies a spatial cutoff to generate binary contact matrices for each frame.
- Parameters:
r_cutoff (float, optional) – The distance threshold in Angstroms to define a contact (default is 8.0).
step (int, optional) – The step size for slicing the trajectory (default is 1, which reads every frame).
- Returns:
A 2D matrix where each row represents a trajectory frame and each column is a binary feature (1 if contact, 0 if no contact).
- Return type:
numpy.ndarray
- Raises:
ValueError – If core_atoms have not been defined by running define_backbone_by_centrality first.
- run_tica_clustering(lag_time_frames=10, n_components=2, min_cs=15)[source]
Performs Time-lagged Independent Component Analysis (tICA) on the contact maps to find slow collective variables, followed by HDBSCAN clustering to identify distinct metastable states.
- Parameters:
lag_time_frames (int, optional) – The lag time (tau) for the tICA estimator, defined in number of frames (default is 10).
n_components (int, optional) – The number of tICA components (dimensions) to project the data onto (default is 2).
min_cs (int, optional) – The min_cluster_size parameter for the HDBSCAN algorithm (default is 15).
- Returns:
embedding: The (n_frames, n_components) array of coordinates in tICA space.
labels: The (n_frames,) array of cluster assignments per frame.
- Return type:
tuple of (numpy.ndarray, numpy.ndarray)
- Raises:
ValueError – If feature_matrix has not been generated by running extract_contact_maps first.
pysoftk.pol_analysis.make_micelle_whole module
- class pysoftk.pol_analysis.make_micelle_whole.micelle_whole(tpr_file, xtc_file)[source]
Bases:
MDA_inputA class used to make the largest micelle whole across the pbc for better analysis.
- count_dimensions(box, cluster_atoms_positions, dimensions)[source]
Function to get the dimensions of a system, the box size and cluster_atom_positions in the form to use in make_cluster_whole
- Parameters:
box (np.array) – u.dimensions
cluster_atom_positions (np.array) – Array with the atom positions of the polymers belonging to the cluster
dimensions (class.int) – Number of dimensions
- Returns:
box (np.array) – Array with the u.dimensions to feed in into make_cluster_whole
cluster_atom_positions (np.array) – Array with the atom positions of the micelle/cluster in the correct shape to feed it into make_cluster_whole
dimensions (class.list) – List with the number of dimensions. E.g. 3D=[0, 1, 2]
- get_atom_name_list(atom_sel)[source]
Function to obtain the residues based on user-provided atom selection.
- Parameters:
atom_sel (class.str) – User Provided name/tag of the molecules inside the box.
- Returns:
final – List containing all names of atoms.
- Return type:
class.list
- get_polymer_positions(u, frame, resname, cluster_resids)[source]
Function to find the polymer positions
- Parameters:
u (MDAnalysis.Universe) – An user-provided MDAnalysis universe.
frame (class.int) – Time frame for calculation to be done.
resname (class.list) – List of strings with the resname of the polymers.
cluster_resids (class.str or class.int) – List with the polymer resids.
- Returns:
box (np.array) – Array with the u.dimensions
cluster_atom_positions (np.array) – Array with the atom positions of the micelle/cluster.
cluster_sel (np.array) – Array with the atoms that belong to the micelle/cluster.
- make_cluster_whole(frame, resname, cluster_resids, bins_min, bins_max, bins_step, dimensions=3)[source]
Function to make a polymer cluster whole across the pbc
- Parameters:
frame (class.int) – Time frame that wants to be made whole
resname (class.list) – List with the resname of the molecules of the cluster
cluster_resids (class.list) – List with resids of the polymers that want to be made whole
bins_min (class.int) – Smallest distance to evaluate if the pbc are broken (smallest x-y distance position)
bins_max (class.int) – Biggest distance to evaluate if the pbc are broken (largest x-y distance position)
bins_step (class.int) – Step between bins to evaluate broken pbc
- Returns:
frame (class.int) – Time frame where the micelle has been made whole
atom_positions_whole (np.array) – Array with the atom positions of the micelle made whole across the pbc
- obtain_largest_micelle_resids(spatial_clustering_results)[source]
Function to obtain the largest micelle resids Bare in mind this may give two different clusters if they have the same size - should check that you only select one when using the results from here
Note-need to check if it actually gives two different clusters if they have the same size.
- Parameters:
spatial_clustering_results (pandas.DataFrame) – Results from clustering analysis class. Pandas dataframe with three columns: time (ps), micelle_resids and micelle_size
- Returns:
longest_lists – list with the number resids (of MDanalysis) of the largets micelle at the timesteps that want to be evaluated.
- Return type:
list
- obtain_snapshot(output_name, atom_positions_at_specific_frame, cluster_resids, polymer_resname, frame, water_resname=['SOL'], water_atom_name=['OW'], solvant=False)[source]
Function to obtain an snapshot of the micelle (pdb) file of a specific frame and returns the new universe to analyse.
- Parameters:
output_name (class.str) – Name of output file
atom_positions_at_specific_frame (np.array) – Position of atoms at a specific frame
cluster_resids (class.list) – List of polymers resids for the snapshot
frame (class.int) – Time frame for snapshot
water_resname (class.list) – List with name of solvants
water_atom_name (class.list) – List of solvant atom names that want to be outputed in the sanpshot. E.g. with water you may only be interested in outputing the oxygen water atom for further calculations
solvant (class.boolean) – Set to True if interested in obtainig an output snapshot with the solvant too.
- Returns:
u3 (MDAnalysis.Universe) – An MDAnalysis universe containing the resulting frames
output (MDAnalysis.Write) – A PDB file containing the selected molecules.
- pbc_per_dimension(cluster_sel_positions, box, dimension, bins_min, bins_max, bins_step)[source]
Function to find the broken parts of the micelle across the pbc and make it whole
- Parameters:
spatial_clustering_results (np.array) – Numpy array with the positions of the resids of interest. It needs to have the shape of n_molec x 3
box (np.array) – Numpy array with the MDanalysis strat u.dimensions. Array with the x y and z dimensions follow by the angles of the box.
dimension (class.int) – Dimension number that wants to be evaluated
bins_min (class.int) – Smallest distance to evaluate if the pbc are broken (smallest x-y distance position)
bins_max (class.int) – Biggest distance to evaluate if the pbc are broken (largest x-y distance position)
bins_step (class.int) – Step between bins to evaluate broken pbc
- Returns:
cluster_sel_positions_return – Array with the atom positions of the micelle made whole across the pbc
- Return type:
np.array
- running_make_cluster_whole(resname, cluster_resids, start, stop, skip, bins_min=-50, bins_max=50, bins_step=10)[source]
Function to run the make_cluster_whole function over several frames
- Parameters:
resname (class.list) – List with the resname of the molecules of the cluster in each time step.
cluster_resids (class.list) – List with resids of the polymers that want to be made whole in each time step.
start (class.int) – Starting frame to perfomr the calculation.
stop (class.int) – Last frame to perform the calculation.
skip (class.int) – Length of step to perform the calculation every number of frames.
bins_min (class.int) – Smallest distance to evaluate if the pbc are broken (smallest x-y distance position).
bins_max (class.int) – Biggest distance to evaluate if the pbc are broken (largest x-y distance position).
bins_step (class.int) – Step between bins to evaluate broken pbc.
- Returns:
atom_positions_over_trajectory – List of np.arrays where each entry of the list is the position of atoms that are made whole across the pbc at a specific time step.
- Return type:
class.list
pysoftk.pol_analysis.rgyr_micelle module
- class pysoftk.pol_analysis.rgyr_micelle.rgyr(tpr_file, xtc_file)[source]
Bases:
MDA_inputA class used to compute the radius of gyration of micelles
- rgyr_calc(atom_sel_micelle, micelle_positions, subgroup=[])[source]
Funcion to calculate the radius of gyration of the chosen atoms
- Parameters:
atom_sel_micelle (MDAnalysis.atom_selection) – MDAnalysis atom_selection function.
subgroup (MDAnalysis.atom_group) – MDAnalysis atom group to calculate the rgyr
micelle_positions (np.array) – np.array with the positions of the micelle atoms
- Returns:
radius_of_gyration – Radius of gyration of the selected atoms
- Return type:
class.float
- running_rgyr(atom_names, micelle_positions, start, stop, skip, subgroup=[])[source]
Function to calculate the radius of gyration of the chosen atoms.
- Parameters:
u (MDAnalysis.Universe) – An user-provided MDAnalysis universe.
atom_sel_type (class.str) – Type of atom selection.
atom_names (class.list) – List with the names of atoms.
micelle_positions (np.array) – An array with the positions of the micelle atoms.
subgroup (class.str) – A list of atom names to calculate the rgyr.
Start (class.int) – Starting frame of the trajectory.
Stop (class.int) – Ending rame of the trajectory.
Skip (class.int) – Skipping every that many frames of the trajectory.
- Returns:
rgyr_f – Radius of gyration of the selected atoms over time
- Return type:
np.array
pysoftk.pol_analysis.ring_ring module
- class pysoftk.pol_analysis.ring_ring.RSA(tpr_file, xtc_file)[source]
Bases:
MDA_inputRing Stacking Analysis (RSA) for polymer systems.
Inherits from MDA_input to handle MDAnalysis universe initialization.
- find_several_rings_stacked(rings_df_name)[source]
Extract connected components (clusters) of polymers from the stacking results.
- Parameters:
rings_df_name (str) – Path to the parquet file generated by stacking_analysis.
- Returns:
For each frame, a list of polymer clusters (each cluster is a list of Residue IDs).
- Return type:
list of lists
- load_or_build_rings(u, cache_file='ring_cache.dill')[source]
Load ring data from a cache file or build it if the cache does not exist.
- Parameters:
u (MDAnalysis.Universe) – The universe object.
cache_file (str, optional) – Path to the dill cache file, by default “ring_cache.dill”.
- pol_cutoff(u, cutoff)[source]
Filter polymer pairs that are spatially close using a cKDTree.
- Parameters:
u (MDAnalysis.Universe) – The universe object at the current frame.
cutoff (float) – Distance cutoff for the initial spatial search.
- Returns:
indices (list) – List of indices (range of valid pairs).
pairs (list of tuples) – List of polymer Residue ID pairs that are proximity candidates.
- stacking_analysis(cut_off, ang_cut, start, stop, step, output_name='output.parquet', write_pdb=False)[source]
Perform high-throughput ring stacking analysis across trajectory frames.
- Parameters:
cut_off (float) – Distance threshold for stacking.
ang_cut (float) – Angle threshold (acute) for face-to-face orientation.
start (int) – Starting frame index.
stop (int) – Stopping frame index.
step (int) – Frame stride.
output_name (str, optional) – Name of the output parquet file, by default “output.parquet”.
write_pdb (bool, optional) – Whether to write pdb files (legacy flag), by default False.
- Returns:
The results containing polymer stacking pairs per frame.
- Return type:
pandas.DataFrame
- pysoftk.pol_analysis.ring_ring.evaluate_dimer_pair(ring_comb, positions, box_dims, cut_off, ang_cutoff, pol_idx)[source]
Worker function to check if any pair of rings between two polymers are stacking.
- Parameters:
ring_comb (list of tuples) – Combinations of ring atom indices between polymer A and polymer B.
positions (ndarray of shape (N, 3)) – Global atom positions from the trajectory.
box_dims (ndarray of shape (6,)) – Simulation box dimensions for PBC handling.
cut_off (float) – Distance threshold for ring-ring proximity.
ang_cutoff (float) – Angle threshold (in degrees) for face-to-face stacking.
pol_idx (tuple) – The Residue IDs of the two polymers being compared.
- Returns:
Returns pol_idx if a stacking event is found, otherwise None.
- Return type:
tuple or None
- pysoftk.pol_analysis.ring_ring.fast_calc_angle(u_norm, u_norm2)[source]
Calculate the acute angle between two normal vectors.
- Parameters:
u_norm (ndarray of shape (3,)) – Normal vector of the first ring.
u_norm2 (ndarray of shape (3,)) – Normal vector of the second ring.
- Returns:
The angle between the planes in degrees.
- Return type:
float
- pysoftk.pol_analysis.ring_ring.fast_svd(coordinates)[source]
Compute the normal vector of a ring plane using Singular Value Decomposition.
- Parameters:
coordinates (ndarray of shape (N, 3)) – The Cartesian coordinates of the atoms in the ring.
- Returns:
The normal vector (third principal component) of the ring plane.
- Return type:
ndarray of shape (3,)
pysoftk.pol_analysis.solvation module
- class pysoftk.pol_analysis.solvation.solvation(tpr_file, xtc_file)[source]
Bases:
MDA_inputA class used to compute the contacts between the polymers of a micelle
- hydration_calc(frame, largest_cluster_resids, micelle_pos, water_name, polymer_oxygen_names, cut_off)[source]
Function to calculate the solvation of selected polymer atoms
- Parameters:
frame (class.int) – Time frame of step for calculation.
largest_cluster_resids (class.list) – List with resids of polymers forming the micelle at a specific time step.
micelle_pos (np.array) – 3D np.array with the positions of the atoms that conform the micelle whole (not broken across the pbc boundaries).
water_name (class.str) – Water atom names for hydration calculation.
polymer_oxygen_names (class.list) – List of atom names of the polymer for hydration calculation.
cut_off (class.float) – Number cut off of the distance between the polymer beads and the water for hydration calculation.
- Returns:
coord_number – Coordination number of the polymer atoms.
- Return type:
list
- solvation_calc_run(start_frame, stop_frame, step_frame, largest_cluster_resids, micelle_pos, water_name, polymer_oxygen_names, cut_off)[source]
Function to calculate the solvation calculation over the selected frames
- Parameters:
start_frame (class.int) – frame to start calculation
stop_frame (class.int) – frame to stop calculation
step_frame (class.int) – number of frames to skip
largest_cluster_resids (class.list) – list with resids of polymers forming the micelle at a specific time step
micelle_pos (np.array) – 3D np.array with the positions of the atoms that conform the micelle whole (not broken across the pbc boundaries)
water_name (class.str) – warer atom names for hydration calculation
polymer_oxygen_names (class.list) – list of atom names of the polymer for hydration calculation
cut_off (class.float) – number cut off of the distance between the polymer beads and the water for hydration calculation.
- Returns:
coord_number – coordination number of the polymer atoms at a specific frame
- Return type:
class.list
pysoftk.pol_analysis.spherical_density module
- class pysoftk.pol_analysis.spherical_density.spherical_density(tpr_file, xtc_file)[source]
Bases:
MDA_inputA class used to compute the contacts between the polymers of a micelle.
- density_calc(binned_space, all_histograms)[source]
Function to calculate the density
- Parameters:
binned_space (np.array) – bins
- all_histogramsnp.array
distances histograms
- Returns:
density – array with density
- Return type:
np.array
- dist_com(u, atom_sel_micelle, micelle_positions, atom_selection)[source]
Function to calculate the distance between atoms and the center of mass of the micelle
- Parameters:
u (mda.Universe) – mda.Universe
atom_sel_micelle –
atom group selection of the micelle
- micelle_positionsnp.array
array with the positions of the micelle whole
- Returns:
distance_com – array with the distances of the aotms to the COM of the micelle
- Return type:
np.array
- run_density_calc(micelle_sel_type, whole_micelle_selection, micelle_positions, group_atom_selection, maxbin=200, minbin=0, step=0.1)[source]
Function to get the mean density calculation over time as a funciton wrt the micelle COM
- Parameters:
micelle_sel_type (str) – string type for the selection of the micelle atoms
whole_micelle_selection (np.array) – elements to select the micelle atoms
micelle_positions (np.array) – array with the positions of the micelle atoms whole
group_atom_selection (str) – list with the names of the polymer species chosen for the density calculation
maxbin (int) – largest bin value
minbin (int) – smalles bin value
step (int) – bin size
- Returns:
density – array with the density values as a function of the distance to the COM of the micelle
- binned_spacenp.array
array with the values of the binned radial distance, this allows easier plotting of the density
- Return type:
np.array
- sph_histogram(distance, maxbin, minbin, step)[source]
Function to create a histogram of the distances
- Parameters:
distance (np.array) – distances to be binned
maxbin (int) – largest bin value
minbin (int) – smalles bin value
step (int) – bin size
- Returns:
all_histograms – array with the distances values binned
- Return type:
np.array
pysoftk.pol_analysis.spherical_density_water module
- class pysoftk.pol_analysis.spherical_density_water.spherical_density_water(tpr_file, xtc_file)[source]
Bases:
MDA_inputA class used to compute the contacts between the polymers of a micelle.
- density_calc(binned_space, all_histograms)[source]
Function to calculate the density
- Parameters:
binned_space (np.array) – bins
- all_histogramsnp.array
distances histograms
- Returns:
density – array with density
- Return type:
np.array
- dist_com(u, frame, atom_sel_micelle, micelle_positions, water_selection)[source]
Function to calculate the distance between atoms and the center of mass of the micelle
- Parameters:
u (mda.Universe) – mda.Universe
atom_sel_micelle –
atom group selection of the micelle
- micelle_positionsnp.array
array with the positions of the micelle whole
- Returns:
distance_com – array with the distances of the aotms to the COM of the micelle
- Return type:
np.array
- run_density_calc(micelle_sel_type, whole_micelle_selection, micelle_positions, water_sel_type, water_atom_selection, start, stop, step_frame, maxbin=200, minbin=0, step=0.1)[source]
Function to get the mean density calculation over time as a funciton wrt the micelle COM
- Parameters:
micelle_sel_type (str) – string type for the selection of the micelle atoms
whole_micelle_selection (np.array) – elements to select the micelle atoms
micelle_positions (np.array) – array with the positions of the micelle atoms whole
water_sel_type (str) – string type for the selection of the solvent atoms
water_atom_selection (list) – list with the names of the solvent chosen for the density calculation
maxbin (int) – largest bin value
minbin (int) – smalles bin value
step (int) – bin size
- Returns:
density – array with the density values as a function of the distance to the COM of the micelle
- binned_spacenp.array
array with the values of the binned radial distance, this allows easier plotting of the density
- Return type:
np.array
- sph_histogram(distance, maxbin, minbin, step)[source]
Function to create a histogram of the distances
- Parameters:
distance (np.array) – distances to be binned
maxbin (int) – largest bin value
minbin (int) – smalles bin value
step (int) – bin size
- Returns:
all_histograms – array with the distances values binned
- Return type:
np.array
pysoftk.pol_analysis.umap_analysis module
- class pysoftk.pol_analysis.umap_analysis.umap_analysis(all_pos)[source]
Bases:
objectA class used to compute to do umap cluster identification of polymer conformations within an spatial configuration of moieties.
- cluster_order_appearance(X_embedded, y_pred, cwd)[source]
- Function to ouput hdbscan with the groups labeled ordered
with their order of appearance in the trajectory.
- Parameters:
X_embedded (np.array) – umap clustered data
y_pred (np.array) – cluster label assignment for each data point in the umap cluster data.
cwd (class.str) – path to save output file.
- Returns:
A plot with the results of the calculation.
- Return type:
None
- find_closest_point(array, target_point)[source]
Function to find the closest point from an array to a given target point
- Parameters:
array (np.array) – array of numbers from which the value will be found
target_point (class.float) – target number
- Returns:
closest_point – numpy array with the closest number from array to the target point.
- Return type:
np.array
- get_array()[source]
Function to load the all pos distances.
- Parameters:
None
- Returns:
all_pos – np.array to load
- Return type:
np.array
- get_average_rep_cluster(X_embedded, y_pred)[source]
- Function to get the X_embedded point that represents the
mean of each umap cluster.
- Parameters:
X_embedded (np.array) – umap clustered data
y_pred (np.array) – cluster label assignment for each data point in the umap cluster data.
cwd (class.str) – path to save output file.
- Returns:
cluster_points – np.array with the X_embedded values that represent the mean of each cluster.
- Return type:
np.array
- hdbscan_cluster(X_embedded, min_cs, epsilon, cwd, method='leaf')[source]
Function to hdbscan on umap clustered data.
- Parameters:
X_embedded (np.array) – umap clustered data
min_cs (class.int) – minimum number of points for a group to be considered a cluster.
epsilon (class.int) – clusters closer than this distance apart will be merged.
cwd (class.str) – path to save output file.
method (class.str) – cluster_method_selection from hdbscan to determine how it selects flat clusters from the cluster tree hierarchy.
- Returns:
y_pred – cluster label assignment for each data point in the umap cluster data.
- Return type:
np.array
- umap_run(n_neigh, cwd, n_comp=2, min_d=0.0, verb=True, rs=9)[source]
Function to run umap on all_pos array.
- Parameters:
n_neigh (class.int) – larger numbers forcus more on global properties (and increase computational effort required)
cwd (class.str) – path to save output file
n_comp (class.int) – set the number of dimensions to reduce to
min_d (class.int) – must be 0.0 if points are to be clustered later with HDBSCAN.
verb (class.boolean) – To show the progress bar
rs (class.int) – random seed. Must be set to the same number for reproducibility.
- Returns:
X_embedded – Clustered data
- Return type:
np.array