pyretis.analysis package

This package defines analysis tools for the PyRETIS program.

The analysis tools are intended to be used for analysis of the simulation output from the PyRETIS program. The typical use of this package is in post-processing of the results from a simulation (or several simulations).

Package structure

Modules

__init__.py
This file, imports from the other modules. The method to analyse results from MD flux simulations is defined here since it will make use of analysis tools from energy_analysis.py and order_analysis.py.
analysis.py (pyretis.analysis.analysis)
General methods for numerical analysis.
energy_analysis.py (pyretis.analysis.energy_analysis)
Defines methods useful for analysing the energy output.
histogram.py (pyretis.analysis.histogram)
Defines methods useful for generating histograms.
order_analysis.py (pyretis.analysis.order_analysis)
Defines methods useful for analysis of order parameters.
path_analysis.py (pyretis.analysis.path_analysis)
Defines methods for analysis of path ensembles.

Important methods defined in this package

analyse_energies (analyse_energies())
Analyse energy data from a simulation. It will calculate a running average, a distribution and do a block error analysis.
analyse_flux (analyse_flux())
Analyse flux data from a MD flux simulation. It will calculate a running average, a distribution and do a block error analysis.
analyse_orderp (analyse_orderp())
Analyse order parameter data. It will calculate a running average, a distribution and do a block error analysis. In addition, it will analyse the mean square displacement (if requested).
analyse_path_ensemble (analyse_path_ensemble())
Analyse the results from a single path ensemble. It will calculate a running average of the probabilities, a crossing probability, perform a block error analysis, analyse lengths of paths, type of Monte Carlo moves and calculate an efficiency.
match_probabilities (match_probabilities())
Method to match probabilities from several path simulations. Useful for obtaining the overall crossing probability.
histogram (histogram())
Generates histogram, basically a wrapper around numpy’s histogram.
match_all_histograms (match_all_histograms())
Method to match histograms from umbrella simulations.
retis_flux (retis_flux())
Method for calculating the initial flux for RETIS simulations.
retis_rate (retis_rate())
Method for calculating the rate constant for RETIS simulations.
pyretis.analysis.analyse_md_flux(crossdata, energydata, orderdata, settings)[source]

Analyse the output from a MD-flux simulation.

The obtained results will be returned as a convenient structure for plotting or reporting.

Parameters:
  • crossdata (numpy.array) – This is the data containing information about crossings.
  • energydata (numpy.array) – This is the raw data for the energies.
  • orderdata (numpy.array) – This is the raw data for the order parameter.
  • settings (dict) – The settings for the analysis (e.g block length for error analysis) and some settings from the simulation (interfaces, time step etc.).
Returns:

results (dict) – This dict contains the results from the different analysis as a dictionary. This dict can be used further for plotting or for generating reports.

pyretis.analysis.analysis module

Module defining functions useful in the analysis of simulation data.

Important methods defined here

running_average (running_average())
Method to calculate a running average.
block_error (block_error())
Perform block error analysis.
block_error_corr (block_error_corr())
Method to run a block error analysis and calculate relative errors and correlation length.
pyretis.analysis.analysis.block_error(data, maxblock=None, blockskip=1, weights=None)[source]

Perform block error analysis.

This function will estimate the standard deviation in the input data by performing a block analysis. The number of blocks to consider can be specified or it will be taken as the half of the length of the input data. Averages and variance are calculated using an on-the-fly algorithm [1].

Parameters:
  • data (numpy.array (or iterable with data points)) – The data to analyse.
  • maxblock (int, optional) – Can be used to set the maximum length of the blocks to consider. Note that the maxblock will never be set longer than half the length in data.
  • blockskip (int, optional) – This can be used to skip certain block lengths, i.e. blockskip = 1 will consider all blocks up to maxblock, while blockskip = n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1 + n, 1 + 2*n, and so on.
Returns:

  • blocklen (numpy.array) – These contain the block lengths considered.
  • block_avg (numpy.array) – The averages as a function of the block length.
  • block_err (numpy.array) – Estimate of errors as a function of the block length.
  • block_err_avg (float) – Average of the error estimate using blocks where length > maxblock//2.

References

[1]Wikipedia, “Algorithms for calculating variance”, http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
pyretis.analysis.analysis.block_error_corr(data, maxblock=None, blockskip=1)[source]

Run block error analysis on the given data.

This will run the block error analysis and return the relative errors and correlation length.

Parameters:
  • data (numpy.array) – Data to analyse.
  • maxblock (int, optional) – The maximum block length to consider.
  • blockskip (int, optional) – This can be used to skip certain block lengths, i.e. blockskip = 1 will consider all blocks up to maxblock, while blockskip = n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1 + n, 1 + 2*n, and so on.
Returns:

  • out[0] (numpy.array) – These contains the block lengths considered (blen).
  • out[1] (numpy.array) – Estimate of errors as a function of the block length (berr).
  • out[2] (float) – Average of the error estimate for blocks (berr_avg) with length > maxblock // 2.
  • out[3] (numpy.array) – Estimate of relative errors normalised by the overall average as a function of block length (rel_err).
  • out[4] (float) – The average relative error (avg_rel_err), for blocks with length > maxblock // 2.
  • out[5] (numpy.array) – The estimated correlation length as a function of the block length (ncor).
  • out[6] (float) – The average (for blocks with length > maxblock // 2) estimated correlation length (avg_ncor).

pyretis.analysis.analysis.running_average(data)[source]

Create a running average of the given data.

The running average will be calculated over the rows.

Parameters:data (numpy.array) – This is the data we will average.
Returns:out (numpy.array) – The running average.

pyretis.analysis.energy_analysis module

Methods for analysing energy data from simulations.

Important methods defined here

analyse_energies (py:func:.analyse_energies)
Run the analysis for energies (kinetic, potential etc.).
pyretis.analysis.energy_analysis.analyse_energies(energies, settings)[source]

Run the energy analysis on several energy types.

The function will run the energy analysis on several energy types and collect the energies into a structure which is convenient for plotting the results.

Parameters:
  • energies (dict) – This dict contains the energies to analyse.
  • settings (dict) – This dictionary contains settings for the analysis.
Returns:

results (dict) – For each energy key results[key] contains the result from the energy analysis.

See also

analyse_data()

pyretis.analysis.flux_analysis module

Methods for analysis of crossings for flux data.

Important methods defined here

analyse_flux (analyse_flux())
Run analysis for simulation flux data. This will calculate the initial flux for a simulation.
pyretis.analysis.flux_analysis._calculate_flux(effective_cross, time_in_state, time_window, time_step)[source]

Calculate the flux in different time windows.

Parameters:
  • effective_cross (list) – The number of effective crossings, obtained from _effective_crossings.
  • time_in_state (int) – Time spent in over-all state A.
  • time_window (int) – This is the time window we consider for calculating the flux.
  • time_step (float) – This is the time-step for the simulation.
Returns:

  • time (np.array) – The times for which we have calculated the flux.
  • ncross (np.array) – The number of crossings within a time window.
  • flux (np.array) – The flux within a time window.

pyretis.analysis.flux_analysis._effective_crossings(fluxdata, nint, end_step)[source]

Analyse flux data and obtain effective crossings.

Parameters:
  • fluxdata (list of tuples of ints) – The list contains the data obtained from a md-flux simulation.
  • nint (int) – The number of interfaces used.
  • end_step (int) – This is the last step done in the simulation.
Returns:

  • eff_cross (list of lists) – eff_cross[i] is the effective crossings times for interface i.
  • ncross (list of ints) – ncross[i] is the number of crossings for interface i.
  • neffcross (list of ints) – neffcross[i] is the number of effective crossings for interface i.
  • time_in_state (dict) – The time spent in the different states which are labeled with the keys ‘A’, ‘B’, ‘OA’, ‘OB’. ‘O’ is taken to mean the ‘overall’ state.

Note

We do here intf - 1. This is just to be compatible with the old FORTRAN code where the interfaces are numbered 1, 2, 3 rather than 0, 1, 2. If this is to be changed in the future the -1 can just be removed. Such a change will also require changes to the formatter for flux data.

pyretis.analysis.flux_analysis.analyse_flux(fluxdata, settings)[source]

Run the analysis on the given flux data.

This will run the flux analysis and collect the results into a structure which is convenient for plotting and reporting the results.

Parameters:
  • fluxdata (list of tuples of integers) – This array contains the data obtained from a MD simulation for the fluxes.
  • settings (dict) – This dict contains the settings for the analysis. Note that this dictionary also needs some settings from the simulation, in particular the number of cycles, the interfaces and information about the time step.
Returns:

results (dict) – This dict contains the results from the flux analysis. The keys are defined in the results variable.

pyretis.analysis.flux_analysis.find_crossings(order, interfaces)[source]

Find crossings with interfaces for given order parameter data.

Parameters:
  • order (numpy.array (1D)) – Order parameters, as a function of time.
  • interfaces (list of floats) – The interfaces for which we will investigate crossings.
Returns:

out (list of tuple) – Each tuple contains the crossings on the following form: (step, interface-number, direction), where direction = ‘+’ if the interface was crossed while moving to the right and ‘-’ if the movement was towards the left.

pyretis.analysis.histogram module

Histogram functions for data analysis.

This module defines some simple functions for histograms.

Important methods defined here

histogram (histogram())
Create a histogram from given data.
match_all_histograms (match_all_histograms())
Function to match histograms, for instance from an umbrella sampling simulation.
histogram_and_avg (histogram_and_avg())
Create a histogram and return bins, midpoints and simple statistics.
pyretis.analysis.histogram._match_histograms(histo1, histo2, bin_x, overlap)[source]

Match two histograms given an overlapping region.

The matching is done so that the integral of the overlapping regions of the two histograms are equal.

Parameters:
  • histo1 (numpy.array) – The first histogram.
  • histo2 (numpy.array) – The second histogram, this is the histogram that will be scaled.
  • bin_x (numpy.array) – This is the bin mid-points of the histograms. Note that we assume here that histo1 and histo2 are obtained using the same number of bins and limits.
  • overlap (object like list, tuple or numpy.array) – This is the overlapping region.
Returns:

  • out[0] (numpy.array) – A scaled version of second input histogram histo2.
  • out[1] (float) – The calculated scale factor.

pyretis.analysis.histogram.histogram(data, bins=10, limits=(-1, 1), density=False, weights=None)[source]

Create a histogram of the given data.

Parameters:
  • data (numpy.array) – The data for making the histogram.
  • bins (int, optional) – The number of bins to divide the data into.
  • limits (tuple/list, optional) – The max/min values to consider.
  • density (boolean, optional) – If True the histogram will be normalized.
  • weights (numpy.array, optional) – Weighting factors for data.
Returns:

  • hist (numpy.array) – The binned counts.
  • bins (numpy.array) – The edges of the bins.
  • bin_mid (numpy.array) – The midpoint of the bins.

pyretis.analysis.histogram.histogram_and_avg(data, bins, density=True, weights=None)[source]

Create histogram an return bins, midpoints and simple statistics.

The simple statistics include the mean value and the standard deviation. The return structure is useful for plotting routines. The midpoints returned are the midpoints of the bins.

Parameters:
  • data (either 1D numpy.array or 2D numpy.array) – This is the data to create the histogram from. The eventual second dimension contains the weights.
  • bins (int) – The number of bins to use for the histogram.
  • density (boolean, optional) – If density is true, the histogram will be normalized.
  • weights (numpy.array, optional) – Weighting factors for data. Not used if data contains them.
Returns:

  • out[0] (numpy.array) – The histogram (frequency) values.
  • out[1] (numpy.array) – The midpoints for the bins.
  • out[2] (tuple of floats) – These are some simple statistics, out[2][0] is the average out[2][1] is the standard deviation.

pyretis.analysis.histogram.match_all_histograms(histograms, umbrellas)[source]

Match several histograms from an umbrella sampling.

Parameters:
  • histograms (list of numpy.arrays) – The histograms to match.
  • umbrellas (list of lists) – The umbrella windows used in the computation.
Returns:

  • histograms_s (list of numpy.arrays) – The scaled histograms.
  • scale_factor (list of floats) – The scale factors.
  • matched_count (numpy.array) – Count for overall matched histogram (an “averaged” histogram).

pyretis.analysis.order_analysis module

Module defining functions for analysis of order parameters.

Important methods defined here

analyse_orderp (analyse_orderp())
Run a simple order parameter analysis.
pyretis.analysis.order_analysis.analyse_orderp(orderdata, settings)[source]

Run the analysis on several order parameters.

The results are collected into a structure which is convenient for plotting.

Parameters:
  • orderdata (numpy.arrays) – The data read from the order parameter file.
  • settings (dict) – This dictionary contains settings for the analysis.
Returns:

results (numpy.array) – For each order parameter key, results[key] contains the result of the analysis.

See also

None

Note

We here (and in the plotting routines) make certain assumptions about the structure, i.e. the positions are assumed to have a specific meaning: column zero is the time, column one the order parameter and so on.

pyretis.analysis.path_analysis module

Methods for analysis of path ensembles.

Important methods defined here

analyse_path_ensemble (analyse_path_ensemble())
Method to analyse a path ensemble, it will calculate crossing probabilities and information about moves etc. This method can be applied to files as well as path ensemble objects.
analyse_path_ensemble_object (analyse_path_ensemble_object())
Method to analyse a path ensemble, it will calculate crossing probabilities and information about moves etc. This method is intended to work directly on path ensemble objects.
match_probabilities (match_probabilities())
Match probabilities from several path ensembles and calculate efficiencies and the error for the matched probability.
retis_flux (retis_flux())
Calculate the initial flux with errors for a RETIS simulation.
retis_rate (retis_rate())
Calculate the rate constant with errors for a RETIS simulation.
pyretis.analysis.path_analysis._calc_tau(op_data, bin_edges=None, timestep=1)[source]

Calculate the value of tau.

This function calculates tau, a measure of time spent within a certain interval (bin) in the [0^-’] ensemble. It does so by counting the number of times the order parameter falls within this interval defined by the bin_edges. It then multiplies this count by the timestep and divides by the width of the interval. If bin_edges is None or not a 2-element list, or if the order parameter data op_data is None, it just returns 0.

Parameters:
  • op_data (list) – The order parameter data from the [0^-’] ensemble.
  • bin_edges (list, optional) – The edges of the bin to calculate tau for. If None, it returns 0.
  • timestep (float, optional) – The simulation timestep. Default is 1.
Returns:

tau (float) – The value of tau.

pyretis.analysis.path_analysis._create_shoot_histograms(shoot_stats, bins)[source]

Create histograms and scale for the shoot analysis.

Parameters:
  • shoot_stats (dict) – This dict contains the results from the analysis of shooting moves. E.g. shoot_stats[key] contain the order parameters for the status key which can be the different statuses defined in pyretis.core.path._STATUS or ‘REJ’ (for rejected).
  • bins (int) – The number of bins to use for the histograms.
Returns:

  • out[0] (dict) – For each possible status (‘ACC, ‘BWI’, etc.) this dict will contain a histogram as returned by the histogram function. It will also contain a ‘REJ’ key which is the concatenation of all rejections and a ‘ALL’ key which is simply all the values.
  • out[1] (dict) – For each possible status (‘ACC, ‘BWI’, etc.) this dict will contain the scale factors for the histograms. The scale factors are obtained by dividing with the ‘ALL’ value.

See also

histogram()

pyretis.analysis.path_analysis._get_path_length(path, ensemble_number)[source]

Return the path length for different moves.

Different moves may have a different way of obtaining the path length. (Example: time-reversal vs. shooting move).

Parameters:
  • path (dict) – This is the dict containing the information about the path. It can typically be obtained by iterating over the path ensemble object, e.g. with a for path in path_ensemble.get_paths():.
  • ensemble_number (int) – This integer identifies the ensemble. This is used for the swapping moves in [0^-] and [0^+].
Returns:

out (int) – The path length

pyretis.analysis.path_analysis._pcross_lambda_cumulative(orderparam, ordermin, ordermax, ngrid, weights=None, ha_weights=None)[source]

Obtain crossing probability as a function of the order parameter.

It will do the actual calculation of the crossing probability as a function of order parameter. It is split off from pcross_lambda since the analysis is intended to be backward compatible with the output/results from the old TISMOL FORTRAN program.

Parameters:
  • orderparam (numpy.array) – Array containing the order parameters.
  • ordermin (float) – Minimum allowed order parameter.
  • ordermax (float) – Maximum allowed order parameter.
  • ngrid (int) – This is the number of grid points.
  • weights (numpy.array, optional) – The weight of each order parameter. This is used in order to count a specific order parameter more than once. If not given, the values in orderparam will be weighted equally.
  • ha_weights (numpy.array, optional) – The weights of the high acceptance scheme (e.g. Stone Skipping)
Returns:

  • lamb (numpy.array) – The grid on which pcross is based.
  • pcross (numpy.array) – Estimated crossing probability distribution.

pyretis.analysis.path_analysis._update_shoot_stats(shoot_stats, path)[source]

Update the shooting statistics with the status of the given path.

Parameters:
  • shoot_stats (dict) – This dict contains the results from the analysis of shooting moves. E.g. shoot_stats[key] contain the order parameters for the status key which can be the different statuses defined in pyretis.core.path._STATUS or ‘REJ’ (for rejected).
  • path (dict) – This is the path information, represented as a dictionary.
Returns:

out (None) – Returns None but will update shoot_stats for shooting moves.

pyretis.analysis.path_analysis.analyse_path_ensemble(path_ensemble, settings)[source]

Analyse a path ensemble.

This function will make use of the different analysis functions and analyse a path ensemble. This function is more general than the analyse_path_ensemble_object function in that it should work on both PathEnsemble and PathEnsembleFile objects. The running average is updated on-the-fly, see Wikipedia for details [wikimov].

Parameters:
  • path_ensemble (object like PathEnsemble) – This is the path ensemble to analyse.
  • settings (dict) – This dictionary contains settings for the analysis. We make use of the following keys:
    • ngrid: The number of grid points for calculating the crossing probability as a function of the order parameter.
    • maxblock: The max length of the blocks for the block error analysis. Note that this will maximum be equal the half of the length of the data, see block_error in .analysis.
    • blockskip: Can be used to skip certain block lengths. A blockskip equal to n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1+n, 1+2n, etc.
    • bins: The number of bins to use for creating histograms.
Returns:

out (dict) – This dictionary contains the main results for the analysis which can be used for plotting or other kinds of output.

See also

_update_shoot_stats(), pcross_lambda_cumulative()

and
py:func:._create_shoot_histograms.

References

[wikimov]Wikipedia, “Moving Average”, https://en.wikipedia.org/wiki/Moving_average
pyretis.analysis.path_analysis.analyse_repptis_ensemble(path_ensemble, settings)[source]

Analyse a repptis path ensemble.

This function is a modification of analyse_path_ensemble(), which make use of the different analysis functions and analyse a repptis path ensemble.

Parameters:
  • path_ensemble (object like PathEnsemble) – This is the path ensemble to analyse.
  • settings (dict) – This dictionary contains settings for the analysis. We make use of the following keys:
    • ngrid: The number of grid points for calculating the crossing probability as a function of the order parameter.
    • maxblock: The max length of the blocks for the block error analysis. Note that this will maximum be equal the half of the length of the data, see block_error in .analysis.
    • blockskip: Can be used to skip certain block lengths. A blockskip equal to n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1+n, 1+2n, etc.
    • bins: The number of bins to use for creating histograms.
Returns:

out (dict) – This dictionary contains the main results for the analysis which can be used for plotting or other kinds of output.

pyretis.analysis.path_analysis.match_probabilities(path_results, detect, settings=None)[source]

Match probabilities from several path ensembles.

It calculates the efficiencies and errors for the matched probability.

Parameters:
  • path_results (list) – These are the results from the path analysis. path_results[i] contains the output from analyse_path_ensemble applied to ensemble no. i. Here we make use of the following keys from path_results[i]: * pcross: The crossing probability. * prun: The running average of the crossing probability. * blockerror: The output from the block error analysis. * efficiency: The output from the efficiency analysis.
  • detect (list of floats) – These are the detect interfaces used in the analysis.
  • settings (dict, optional) – This dictionary contains settings for the analysis. Here we make use of the following keys from the analysis section:
    • ngrid: The number of grid points for calculating the crossing probability as a function of the order parameter.
    • maxblock: The max length of the blocks for the block error analysis. Note that this will maximum be equal the half of the length of the data, see block_error in .analysis.
    • blockskip: Can be used to skip certain block lengths. A blockskip equal to n will consider every n’th block up to maxblock, i.e. it will use block lengths equal to 1, 1+n, 1+2n, etc.
    • bins: The number of bins to use for creating histograms.
Returns:

results (dict) – These are results for the over-all probability and error and also some over-all TIS efficiencies.

pyretis.analysis.path_analysis.perm_calculations(results0, pcross, pcross_err)[source]

Calculate the permeability.

This function collects the relevant permeability properties from the output data. It then calculates the permeability and its error based on the values of xi, tau, and the crossing probability pcross.

results0dict
Results from the analysis of ensemble [0^-]
pcrossfloat
Estimated crossing probability
pcross_errfloat
Relative error in crossing probability.
Returns:
  • xxi (float) – The value of xi
  • tau (float) – The value of tau
  • perm (float) – The value of the permeability
  • xi_err (float) – The error in xi
  • tau_err (float) – The error in tau
  • perm_err (float) – The error in permeability
pyretis.analysis.path_analysis.retis_flux(results0, results1, timestep)[source]

Calculate the initial flux for RETIS.

Parameters:
  • results0 (dict) – Results from the analysis of ensemble [0^-]
  • results1 (dict) – Results from the analysis of ensemble [0^+]
  • timestep (float) – The simulation timestep.
Returns:

  • flux (float) – The initial flux.
  • flux_error (float) – The relative error in the initial flux.

pyretis.analysis.path_analysis.retis_rate(pcross, pcross_relerror, flux, flux_relerror)[source]

Calculate the rate constant for RETIS.

Parameters:
  • pcross (float) – Estimated crossing probability
  • pcross_relerror (float) – Relative error in crossing probability.
  • flux (float) – The initial flux.
  • flux_relerror (float) – Relative error in the initial flux.
Returns:

  • rate (float) – The rate constant
  • rate_error (float) – The relative error in the rate constant.

pyretis.analysis.path_analysis.skip_paths(weights, nacc, nskip)[source]

Skips not counted paths.

Parameters:
  • weights (array of integers) – The weights of the paths.
  • nacc (integer) – Number of accepted paths.
  • nskip (integer) – Number of paths to skip.
Returns:

  • new_weights (array of integers) – The new weights of the paths once skipped the nskip ones.
  • new_acc (integer) – Recomputed number of accepted paths.