nolds
module¶
Nolds only consists of to single module called nolds
which contains all
relevant algorithms and helper functions.
Internally these functions are subdivided into different modules such as
measures
and datasets
, but you should not need to import these modules
directly unless you want access to some internal helper functions.
Algorithms¶
Lyapunov exponent (Rosenstein et al.)¶
- nolds.lyap_r(data, emb_dim=10, lag=None, min_tsep=None, tau=1, min_neighbors=20, trajectory_len=20, fit='RANSAC', debug_plot=False, debug_data=False, plot_file=None, fit_offset=0)[source]¶
Estimates the largest Lyapunov exponent using the algorithm of Rosenstein et al. [lr_1].
- Explanation of Lyapunov exponents:
See lyap_e.
- Explanation of the algorithm:
The algorithm of Rosenstein et al. is only able to recover the largest Lyapunov exponent, but behaves rather robust to parameter choices.
The idea for the algorithm relates closely to the definition of Lyapunov exponents. First, the dynamics of the data are reconstructed using a delay embedding method with a lag, such that each value x_i of the data is mapped to the vector
X_i = [x_i, x_(i+lag), x_(i+2*lag), …, x_(i+(emb_dim-1) * lag)]
For each such vector X_i, we find the closest neighbor X_j using the euclidean distance. We know that as we follow the trajectories from X_i and X_j in time in a chaotic system the distances between X_(i+k) and X_(j+k) denoted as d_i(k) will increase according to a power law d_i(k) = c * e^(lambda * k) where lambda is a good approximation of the highest Lyapunov exponent, because the exponential expansion along the axis associated with this exponent will quickly dominate the expansion or contraction along other axes.
To calculate lambda, we look at the logarithm of the distance trajectory, because log(d_i(k)) = log(c) + lambda * k. This gives a set of lines (one for each index i) whose slope is an approximation of lambda. We therefore extract the mean log trajectory d’(k) by taking the mean of log(d_i(k)) over all orbit vectors X_i. We then fit a straight line to the plot of d’(k) versus k. The slope of the line gives the desired parameter lambda.
- Method for choosing min_tsep:
Usually we want to find neighbors between points that are close in phase space but not too close in time, because we want to avoid spurious correlations between the obtained trajectories that originate from temporal dependencies rather than the dynamic properties of the system. Therefore it is critical to find a good value for min_tsep. One rather plausible estimate for this value is to set min_tsep to the mean period of the signal, which can be obtained by calculating the mean frequency using the fast fourier transform. This procedure is used by default if the user sets min_tsep = None.
- Method for choosing lag:
Another parameter that can be hard to choose by instinct alone is the lag between individual values in a vector of the embedded orbit. Here, Rosenstein et al. suggest to set the lag to the distance where the autocorrelation function drops below 1 - 1/e times its original (maximal) value. This procedure is used by default if the user sets lag = None.
- References:
- [lr_1]
M. T. Rosenstein, J. J. Collins, and C. J. De Luca, “A practical method for calculating largest Lyapunov exponents from small data sets,” Physica D: Nonlinear Phenomena, vol. 65, no. 1, pp. 117–134, 1993.
- Reference Code:
- [lr_a]
mirwais, “Largest Lyapunov Exponent with Rosenstein’s Algorithm”, url: http://www.mathworks.com/matlabcentral/fileexchange/38424-largest-lyapunov-exponent-with-rosenstein-s-algorithm
[lr_b]Shapour Mohammadi, “LYAPROSEN: MATLAB function to calculate Lyapunov exponent”, url: https://ideas.repec.org/c/boc/bocode/t741502.html
[lr_c]Rainer Hegger, Holger Kantz, and Thomas Schreiber, “TISEAN 3.0.0 - Nonlinear Time Series Analysis”, url: https://www.pks.mpg.de/tisean/Tisean_3.0.0/docs/docs_c/lyap_r.html
- Args:
- data (iterable of float):
(one-dimensional) time series
- Kwargs:
- emb_dim (int):
embedding dimension for delay embedding
- lag (float):
lag for delay embedding
- min_tsep (float):
minimal temporal separation between two “neighbors” (default: find a suitable value by calculating the mean period of the data)
- tau (float):
step size between data points in the time series in seconds (normalization scaling factor for exponents)
- min_neighbors (int):
if lag=None, the search for a suitable lag will be stopped when the number of potential neighbors for a vector drops below min_neighbors
- trajectory_len (int):
the time (in number of data points) to follow the distance trajectories between two neighboring points
- fit (str):
the fitting method to use for the line fit, either ‘poly’ for normal least squares polynomial fitting or ‘RANSAC’ for RANSAC-fitting which is more robust to outliers
- debug_plot (boolean):
if True, a simple plot of the final line-fitting step will be shown
- debug_data (boolean):
if True, debugging data will be returned alongside the result
- plot_file (str):
if debug_plot is True and plot_file is not None, the plot will be saved under the given file name instead of directly showing it through
plt.show()
- fit_offset (int):
neglect the first fit_offset steps when fitting
- Returns:
- float:
an estimate of the largest Lyapunov exponent (a positive exponent is a strong indicator for chaos)
- (1d-vector, 1d-vector, list):
only present if debug_data is True: debug data of the form
(ks, div_traj, poly)
whereks
are the x-values of the line fit,div_traj
are the y-values andpoly
are the line coefficients ([slope, intercept]
).
Lyapunov exponent (Eckmann et al.)¶
- nolds.lyap_e(data, emb_dim=10, matrix_dim=4, min_nb=None, min_tsep=0, tau=1, debug_plot=False, debug_data=False, plot_file=None)[source]¶
Estimates the Lyapunov exponents for the given data using the algorithm of Eckmann et al. [le_1].
- Recommendations for parameter settings by Eckmann et al.:
long recording time improves accuracy, small tau does not
use large values for emb_dim
matrix_dim should be ‘somewhat larger than the expected number of positive Lyapunov exponents’
min_nb = min(2 * matrix_dim, matrix_dim + 4)
- Explanation of Lyapunov exponents:
The Lyapunov exponent describes the rate of separation of two infinitesimally close trajectories of a dynamical system in phase space. In a chaotic system, these trajectories diverge exponentially following the equation:
|X(t, X_0) - X(t, X_0 + eps)| = e^(lambda * t) * |eps|
In this equation X(t, X_0) is the trajectory of the system X starting at the point X_0 in phase space at time t. eps is the (infinitesimal) difference vector and lambda is called the Lyapunov exponent. If the system has more than one free variable, the phase space is multidimensional and each dimension has its own Lyapunov exponent. The existence of at least one positive Lyapunov exponent is generally seen as a strong indicator for chaos.
- Explanation of the Algorithm:
To calculate the Lyapunov exponents analytically, the Jacobian of the system is required. The algorithm of Eckmann et al. therefore tries to estimate this Jacobian by reconstructing the dynamics of the system from which the time series was obtained. For this, several steps are required:
Embed the time series [x_1, x_2, …, x_(N-1)] in an orbit of emb_dim dimensions (map each point x_i of the time series to a vector [x_i, x_(i+1), x_(i+2), … x_(i+emb_dim-1)]).
For each vector X_i in this orbit find a radius r_i so that at least min_nb other vectors lie within (chebyshev-)distance r_i around X_i. These vectors will be called “neighbors” of X_i.
Find the Matrix T_i that sends points from the neighborhood of X_i to the neighborhood of X_(i+1). To avoid undetermined values in T_i, we construct T_i not with size (emb_dim x emb_dim) but with size (matrix_dim x matrix_dim), so that we have a larger “step size” m in the X_i, which are now defined as X’_i = [x_i, x_(i+m), x_(i+2m), … x_(i+(matrix_dim-1)*m)]. This means that emb_dim-1 must be divisible by matrix_dim-1. The T_i are then found by a linear least squares fit, assuring that T_i (X_j - X_i) ~= X_(j+m) - X_(i+m) for any X_j in the neighborhood of X_i.
Starting with i = 1 and Q_0 = identity successively decompose the matrix T_i * Q_(i-1) into the matrices Q_i and R_i by a QR-decomposition.
Calculate the Lyapunov exponents from the mean of the logarithm of the diagonal elements of the matrices R_i. To normalize the Lyapunov exponents, they have to be divided by m and by the step size tau of the original time series.
- References:
- [le_1]
J. P. Eckmann, S. O. Kamphorst, D. Ruelle, and S. Ciliberto, “Liapunov exponents from time series,” Physical Review A, vol. 34, no. 6, pp. 4971–4979, 1986.
- Reference code:
- [le_a]
Manfred Füllsack, “Lyapunov exponent”, url: http://systems-sciences.uni-graz.at/etextbook/sw2/lyapunov.html
[le_b]Steve SIU, Lyapunov Exponents Toolbox (LET), url: http://www.mathworks.com/matlabcentral/fileexchange/233-let/content/LET/findlyap.m
[le_c]Rainer Hegger, Holger Kantz, and Thomas Schreiber, TISEAN, url: http://www.mpipks-dresden.mpg.de/~tisean/Tisean_3.0.1/index.html
- Args:
- data (array-like of float):
(scalar) data points
- Kwargs:
- emb_dim (int):
embedding dimension
- matrix_dim (int):
matrix dimension (emb_dim - 1 must be divisible by matrix_dim - 1)
- min_nb (int):
minimal number of neighbors (default: min(2 * matrix_dim, matrix_dim + 4))
- min_tsep (int):
minimal temporal separation between two “neighbors”
- tau (float):
step size of the data in seconds (normalization scaling factor for exponents)
- debug_plot (boolean):
if True, a histogram matrix of the individual estimates will be shown
- debug_data (boolean):
if True, debugging data will be returned alongside the result
- plot_file (str):
if debug_plot is True and plot_file is not None, the plot will be saved under the given file name instead of directly showing it through
plt.show()
- Returns:
- float array:
array of matrix_dim Lyapunov exponents (positive exponents are indicators for chaos)
- 2d-array of floats:
only present if debug_data is True: all estimates for the matrix_dim Lyapunov exponents from the x iterations of R_i. The shape of this debug data is (x, matrix_dim).
Sample entropy¶
- nolds.sampen(data, emb_dim=2, tolerance=None, lag=1, dist=<function rowwise_chebyshev>, closed=False, debug_plot=False, debug_data=False, plot_file=None)[source]¶
Computes the sample entropy of the given data.
- Explanation of the sample entropy:
The sample entropy of a time series is defined as the negative natural logarithm of the conditional probability that two sequences similar for emb_dim points remain similar at the next point, excluding self-matches.
A lower value for the sample entropy therefore corresponds to a higher probability indicating more self-similarity.
- Explanation of the algorithm:
The algorithm constructs all subsequences of length emb_dim [s_1, s_1+lag, s_1+2*lag, …] and then counts each pair (s_i, s_j) with i != j where dist(s_i, s_j) < tolerance. The same process is repeated for all subsequences of length emb_dim + 1. The sum of similar sequence pairs with length emb_dim + 1 is divided by the sum of similar sequence pairs with length emb_dim. The result of the algorithm is the negative logarithm of this ratio/probability.
- References:
- [se_1]
J. S. Richman and J. R. Moorman, “Physiological time-series analysis using approximate entropy and sample entropy,” American Journal of Physiology-Heart and Circulatory Physiology, vol. 278, no. 6, pp. H2039–H2049, 2000.
- Reference code:
- [se_a]
“sample_entropy” function in R-package “pracma”, url: https://cran.r-project.org/web/packages/pracma/pracma.pdf
- Args:
- data (array-like of float):
input data
- Kwargs:
- emb_dim (int):
the embedding dimension (length of vectors to compare)
- tolerance (float):
distance threshold for two template vectors to be considered equal (default: 0.2 * std(data) at emb_dim = 2, corrected for dimension effect for other values of emb_dim)
- lag (int):
delay for the delay embedding
- dist (function (2d-array, 1d-array) -> 1d-array):
distance function used to calculate the distance between template vectors. Sampen is defined using
rowwise_chebyshev
. You should only use something else, if you are sure that you need it.- closed (boolean):
if True, will check for vector pairs whose distance is in the closed interval [0, r] (less or equal to r), otherwise the open interval [0, r) (less than r) will be used
- debug_plot (boolean):
if True, a histogram of the individual distances for m and m+1
- debug_data (boolean):
if True, debugging data will be returned alongside the result
- plot_file (str):
if debug_plot is True and plot_file is not None, the plot will be saved under the given file name instead of directly showing it through
plt.show()
- Returns:
- float:
the sample entropy of the data (negative logarithm of ratio between similar template vectors of length emb_dim + 1 and emb_dim)
- [c_m, c_m1]:
list of two floats: count of similar template vectors of length emb_dim (c_m) and of length emb_dim + 1 (c_m1)
- [float list, float list]:
Lists of lists of the form
[dists_m, dists_m1]
containing the distances between template vectors for m (dists_m) and for m + 1 (dists_m1).
Hurst exponent¶
- nolds.hurst_rs(data, nvals=None, fit='RANSAC', debug_plot=False, debug_data=False, plot_file=None, corrected=True, unbiased=True)[source]¶
Calculates the Hurst exponent by a standard rescaled range (R/S) approach.
- Explanation of Hurst exponent:
The Hurst exponent is a measure for the “long-term memory” of a time series, meaning the long statistical dependencies in the data that do not originate from cycles.
It originates from H.E. Hursts observations of the problem of long-term storage in water reservoirs. If x_i is the discharge of a river in year i and we observe this discharge for N years, we can calculate the storage capacity that would be required to keep the discharge steady at its mean value.
To do so, we first subtract the mean over all x_i from the individual x_i to obtain the departures x’_i from the mean for each year i. As the excess or deficit in discharge always carries over from year i to year i+1, we need to examine the cumulative sum of x’_i, denoted by y_i. This cumulative sum represents the filling of our hypothetical storage. If the sum is above 0, we are storing excess discharge from the river, if it is below zero we have compensated a deficit in discharge by releasing water from the storage. The range (maximum - minimum) R of y_i therefore represents the total capacity required for the storage.
Hurst showed that this value follows a steady trend for varying N if it is normalized by the standard deviation sigma over the x_i. Namely he obtained the following formula:
R/sigma = (N/2)^K
In this equation, K is called the Hurst exponent. Its value is 0.5 for white noise, but becomes greater for time series that exhibit some positive dependency on previous values. For negative dependencies it becomes less than 0.5.
- Explanation of the algorithm:
The rescaled range (R/S) approach is directly derived from Hurst’s definition. The time series of length N is split into non-overlapping subseries of length n. Then, R and S (S = sigma) are calculated for each subseries and the mean is taken over all subseries yielding (R/S)_n. This process is repeated for several lengths n. Finally, the exponent K is obtained by fitting a straight line to the plot of log((R/S)_n) vs log(n).
There seems to be no consensus how to chose the subseries lenghts n. This function therefore leaves the choice to the user. The module provides some utility functions for “typical” values:
binary_n: N/2, N/4, N/8, …
logarithmic_n: min_n, min_n * f, min_n * f^2, …
- References:
- [h_1]
H. E. Hurst, “The problem of long-term storage in reservoirs,” International Association of Scientific Hydrology. Bulletin, vol. 1, no. 3, pp. 13–27, 1956.
[h_2]H. E. Hurst, “A suggested statistical model of some time series which occur in nature,” Nature, vol. 180, p. 494, 1957.
- Reference Code:
- [h_a]
“hurst” function in R-package “pracma”, url: https://cran.r-project.org/web/packages/pracma/pracma.pdf
Note: Pracma yields several estimates of the Hurst exponent, which are listed below. Unless otherwise stated they use the divisors of the length of the sequence as n. The length is reduced by at most 1% to find the value that has the most divisors.
The “Simple R/S” estimate is just log((R/S)_n) / log(n) for n = N.
The “theoretical Hurst exponent” is the value that would be expected of an uncorrected rescaled range approach for random noise of the size of the input data.
The “empirical Hurst exponent” is the uncorrected Hurst exponent obtained by the rescaled range approach.
The “corrected empirical Hurst exponent” is the Anis-Lloyd-Peters corrected Hurst exponent, but with sqrt(1/2 * pi * n) added to the (R/S)_n before the log.
The “corrected R over S Hurst exponent” uses the R-function “lm” instead of pracmas own “polyfit” and uses n = N/2, N/4, N/8, … by successively halving the subsequences (which means that some subsequences may be one element longer than others). In contrast to its name it does not use the Anis-Lloyd-Peters correction factor.
If you want to compare the output of pracma to the output of nolds, the “empirical hurst exponent” is the only measure that exactly corresponds to the Hurst measure implemented in nolds (by choosing corrected=False, fit=”poly” and employing the same strategy for choosing n as the divisors of the (reduced) sequence length).
[h_b]Rafael Weron, “HURST: MATLAB function to compute the Hurst exponent using R/S Analysis”, url: https://ideas.repec.org/c/wuu/hscode/m11003.html
Note: When the same values for nvals are used and fit is set to “poly”, nolds yields exactly the same results as this implementation.
[h_c]Bill Davidson, “Hurst exponent”, url: http://www.mathworks.com/matlabcentral/fileexchange/9842-hurst-exponent
- Args:
- data (array-like of float):
time series
- Kwargs:
- nvals (iterable of int):
sizes of subseries to use (default: logmid_n(total_N, ratio=1/4.0, nsteps=15) , that is 15 logarithmically spaced values in the medium 25% of the logarithmic range)
Generally, the choice for n is a trade-off between the length and the number of the subsequences that are used for the calculation of the (R/S)_n. Very low values of n lead to high variance in the
r
ands
while very high values may leave too few subsequences that the mean along them is still meaningful. Logarithmic spacing makes sense, because it translates to even spacing in the log-log-plot.- fit (str):
the fitting method to use for the line fit, either ‘poly’ for normal least squares polynomial fitting or ‘RANSAC’ for RANSAC-fitting which is more robust to outliers
- debug_plot (boolean):
if True, a simple plot of the final line-fitting step will be shown
- debug_data (boolean):
if True, debugging data will be returned alongside the result
- plot_file (str):
if debug_plot is True and plot_file is not None, the plot will be saved under the given file name instead of directly showing it through
plt.show()
- corrected (boolean):
if True, the Anis-Lloyd-Peters correction factor will be applied to the output according to the expected value for the individual (R/S)_n (see [h_3])
- unbiased (boolean):
if True, the standard deviation based on the unbiased variance (1/(N-1) instead of 1/N) will be used. This should be the default choice, since the true mean of the sequences is not known. This parameter should only be changed to recreate results of other implementations.
- Returns:
- float:
estimated Hurst exponent K using a rescaled range approach (if K = 0.5 there are no long-range correlations in the data, if K < 0.5 there are negative long-range correlations, if K > 0.5 there are positive long-range correlations)
- (1d-vector, 1d-vector, list):
only present if debug_data is True: debug data of the form
(nvals, rsvals, poly)
wherenvals
are the values used for log(n),rsvals
are the corresponding log((R/S)_n) andpoly
are the line coefficients ([slope, intercept]
)
Correlation dimension¶
- nolds.corr_dim(data, emb_dim, lag=1, rvals=None, dist=<function rowwise_euclidean>, fit='RANSAC', debug_plot=False, debug_data=False, plot_file=None)[source]¶
Calculates the correlation dimension with the Grassberger-Procaccia algorithm
- Explanation of correlation dimension:
The correlation dimension is a characteristic measure that can be used to describe the geometry of chaotic attractors. It is defined using the correlation sum C(r) which is the fraction of pairs of points X_i in the phase space whose distance is smaller than r.
If the relation between C(r) and r can be described by the power law
C(r) ~ r^D
then D is called the correlation dimension of the system.
In a d-dimensional system, the maximum value for D is d. This value is obtained for systems that expand uniformly in each dimension with time. The lowest possible value is 0 for a system with constant C(r) (i.e. a system that visits just one point in the phase space). Generally if D is lower than d and the system has an attractor, this attractor is called “strange” and D is a measure of this “strangeness”.
- Explanation of the algorithm:
The Grassberger-Procaccia algorithm calculates C(r) for a range of different r and then fits a straight line into the plot of log(C(r)) versus log(r).
This version of the algorithm is created for one-dimensional (scalar) time series. Therefore, before calculating C(r), a delay embedding of the time series is performed to yield emb_dim dimensional vectors Y_i = [X_i, X_(i+1*lag), X_(i+2*lag), … X_(i+(embd_dim-1)*lag)]. Choosing a higher value for emb_dim allows to reconstruct higher dimensional dynamics and avoids “systematic errors due to corrections to scaling”. Choosing a higher value for lag allows to avoid overestimating correlation because X_i ~= X_i+1, but it should also not be set too high to not underestimate correlation due to exponential divergence of trajectories in chaotic systems.
- References:
- [cd_1]
P. Grassberger and I. Procaccia, “Characterization of strange attractors,” Physical review letters, vol. 50, no. 5, p. 346, 1983.
[cd_2]P. Grassberger and I. Procaccia, “Measuring the strangeness of strange attractors,” Physica D: Nonlinear Phenomena, vol. 9, no. 1, pp. 189–208, 1983.
[cd_3]P. Grassberger, “Grassberger-Procaccia algorithm,” Scholarpedia, vol. 2, no. 5, p. 3043. urL: http://www.scholarpedia.org/article/Grassberger-Procaccia_algorithm
- Reference Code:
- [cd_a]
“corrDim” function in R package “fractal”, url: https://cran.r-project.org/web/packages/fractal/fractal.pdf
[cd_b]Peng Yuehua, “Correlation dimension”, url: http://de.mathworks.com/matlabcentral/fileexchange/24089-correlation-dimension
- Args:
- data (array-like of float):
time series of data points
- emb_dim (int):
embedding dimension
- Kwargs:
- rvals (iterable of float):
list of values for to use for r (default: logarithmic_r(0.1 * std, 0.5 * std, 1.03))
- dist (function (2d-array, 1d-array) -> 1d-array):
row-wise difference function
- fit (str):
the fitting method to use for the line fit, either ‘poly’ for normal least squares polynomial fitting or ‘RANSAC’ for RANSAC-fitting which is more robust to outliers
- debug_plot (boolean):
if True, a simple plot of the final line-fitting step will be shown
- debug_data (boolean):
if True, debugging data will be returned alongside the result
- plot_file (str):
if debug_plot is True and plot_file is not None, the plot will be saved under the given file name instead of directly showing it through
plt.show()
- Returns:
- float:
correlation dimension as slope of the line fitted to log(r) vs log(C(r))
- (1d-vector, 1d-vector, list):
only present if debug_data is True: debug data of the form
(rvals, csums, poly)
wherervals
are the values used for log(r),csums
are the corresponding log(C(r)) andpoly
are the line coefficients ([slope, intercept]
)
Detrended fluctuation analysis¶
- nolds.dfa(data, nvals=None, overlap=True, order=1, fit_trend='poly', fit_exp='RANSAC', debug_plot=False, debug_data=False, plot_file=None)[source]¶
Performs a detrended fluctuation analysis (DFA) on the given data
- Recommendations for parameter settings by Hardstone et al.:
nvals should be equally spaced on a logarithmic scale so that each window scale hase the same weight
min(nvals) < 4 does not make much sense as fitting a polynomial (even if it is only of order 1) to 3 or less data points is very prone to errors.
max(nvals) > len(data) / 10 does not make much sense as we will then have less than 10 windows to calculate the average fluctuation
use overlap=True to obtain more windows and therefore better statistics (at an increased computational cost)
- Explanation of DFA:
Detrended fluctuation analysis, much like the Hurst exponent, is used to find long-term statistical dependencies in time series. However, while the Hurst exponent will indicate long-term correlations for any non-stationary process (i.e. a stochastic process whose probability distribution changes when shifted in time, such as a random walk whose mean changes over time), DFA was designed to distinguish between correlations that are purely an artifact of non-stationarity and those that show inherent long-term behavior of the studied system.
Mathematically, the long-term correlations that we are interested in can be characterized using the autocorrelation function C(s). For a time series (x_i) with i = 1, …, N it is defined as follows:
C(s) = 1/(N-s) * (y_1 * y_1+s + y_2 * y_2+s + … y_(N-s) * y_N)
with y_i = x_i - mean(x). If there are no correlations at all, C(s) would be zero for s > 0. For short-range correlations, C(s) will decline exponentially, but for long-term correlations the decline follows a power law of the form C(s) ~ s^(-gamma) instead with 0 < gamma < 1.
Due to noise and underlying trends, calculating C(s) directly is usually not feasible. The main idea of DFA is therefore to remove trends up to a given order from the input data and analyze the remaining fluctuations. Trends in this sense are smooth signals with monotonous or slowly oscillating behavior that are caused by external effects and not the dynamical system under study.
To get a hold of these trends, the first step is to calculate the “profile” of our time series as the cumulative sum of deviations from the mean, effectively integrating our data. This both smoothes out measurement noise and makes it easier to distinguish the fractal properties of bounded time series (i.e. time series whose values cannot grow or shrink beyond certain bounds such as most biological or physical signals) by applying random walk theory (see [dfa_3] and [dfa_4]).
y_i = x_1 - mean(x) + x_2 - mean(x) + … + x_i - mean(x).
After that, we split Y(i) into (usually non-overlapping) windows of length n to calculate local trends at this given scale. The ith window of this size has the form
W_(n,i) = [y_i, y_(i+1), y_(i+2), … y_(i+n-1)]
The local trends are then removed for each window separately by fitting a polynomial p_(n,i) to the window W_(n,i) and then calculating W’_(n,i) = W_(n,i) - p_(n,i) (element-wise subtraction).
This leaves us with the deviations from the trend - the “fluctuations” - that we are interested in. To quantify them, we take the root mean square of these fluctuations. It is important to note that we have to sum up all individual fluctuations across all windows and divide by the total number of fluctuations here before finally taking the root as last step. Some implementations apply another root per window, which skews the result.
The resulting fluctuation F(n) is then only dependent on the window size n, the scale at which we observe our data. It behaves similar to the autocorrelation function in that it follows a power-law for long-term correlations:
F(n) ~ n^alpha
Where alpha is the Hurst parameter, which we can obtain from fitting a line into the plot of log(n) versus log(F(n)) and taking the slope.
The result can be interpreted as follows: For alpha < 1 the underlying process is stationary and can be modelled as fractional Gaussian noise with H = alpha. This means for alpha = 0.5 we have no long-term correlation or “memory”, for 0.5 < alpha < 1 we have positive long-term correlations and for alpha < 0.5 the long-term correlations are negative.
For alpha > 1 the underlying process is non-stationary and can be modeled as fractional Brownian motion with H = alpha - 1.
- References:
- [dfa_1]
C.-K. Peng, S. V. Buldyrev, S. Havlin, M. Simons, H. E. Stanley, and A. L. Goldberger, “Mosaic organization of DNA nucleotides,” Physical Review E, vol. 49, no. 2, 1994.
[dfa_2]J. W. Kantelhardt, E. Koscielny-Bunde, H. H. A. Rego, S. Havlin, and A. Bunde, “Detecting long-range correlations with detrended fluctuation analysis,” Physica A: Statistical Mechanics and its Applications, vol. 295, no. 3–4, pp. 441–454, Jun. 2001, doi: 10.1016/S0378-4371(01)00144-3.
[dfa_3]C. Peng, J. M. Hausdorff, and A. L. Goldberger, “Fractal mechanisms in neuronal control: human heartbeat and gait dynamics in health and disease,” in Self-Organized Biological Dynamics and Nonlinear Control, 1st ed., J. Walleczek, Ed., Cambridge University Press, 2000, pp. 66–96. doi: 10.1017/CBO9780511535338.006.
[dfa_4]A. Bashan, R. Bartsch, J. W. Kantelhardt, and S. Havlin, “Comparison of detrending methods for fluctuation analysis,” Physica A: Statistical Mechanics and its Applications, vol. 387, no. 21, pp. 5080–5090, Sep. 2008, doi: 10.1016/j.physa.2008.04.023.
[dfa_5]R. Hardstone, S.-S. Poil, G. Schiavone, R. Jansen, V. V. Nikulin, H. D. Mansvelder, and K. Linkenkaer-Hansen, “Detrended fluctuation analysis: A scale-free view on neuronal oscillations,” Frontiers in Physiology, vol. 30, 2012.
- Reference code:
- [dfa_a]
Peter Jurica, “Introduction to MDFA in Python”, url: http://bsp.brain.riken.jp/~juricap/mdfa/mdfaintro.html
[dfa_b]JE Mietus, “dfa”, url: https://www.physionet.org/physiotools/dfa/dfa-1.htm
[dfa_c]“DFA” function in R package “fractal”
- Args:
- data (array-like of float):
time series
- Kwargs:
- nvals (iterable of int):
subseries sizes at which to calculate fluctuation (default: logarithmic_n(4, 0.1*len(data), 1.2))
- overlap (boolean):
if True, the windows W_(n,i) will have a 50% overlap, otherwise non-overlapping windows will be used
- order (int):
(polynomial) order of trend to remove
- fit_trend (str):
the fitting method to use for fitting the trends, either ‘poly’ for normal least squares polynomial fitting or ‘RANSAC’ for RANSAC-fitting which is more robust to outliers but also tends to lead to unstable results
- fit_exp (str):
the fitting method to use for the line fit, either ‘poly’ for normal least squares polynomial fitting or ‘RANSAC’ for RANSAC-fitting which is more robust to outliers
- debug_plot (boolean):
if True, a simple plot of the final line-fitting step will be shown
- debug_data (boolean):
if True, debugging data will be returned alongside the result
- plot_file (str):
if debug_plot is True and plot_file is not None, the plot will be saved under the given file name instead of directly showing it through
plt.show()
- Returns:
- float:
the estimate alpha for the Hurst parameter (alpha < 1: stationary process similar to fractional Gaussian noise with H = alpha, alpha > 1: non-stationary process similar to fractional Brownian motion with H = alpha - 1)
- (1d-vector, 1d-vector, list):
only present if debug_data is True: debug data of the form
(nvals, fluctuations, poly)
wherenvals
are the values used for log(n),fluctuations
are the corresponding log(std(X,n)) andpoly
are the line coefficients ([slope, intercept]
)
Generalized Hurst Exponent (Barabási et al.)¶
- nolds.mfhurst_b(data, qvals=None, dists=None, fit='poly', debug_plot=False, debug_data=False, plot_file=None)[source]¶
Calculates the Generalized Hurst Exponent H_q for different q according to A.-L. Barabási and T. Vicsek.
- Explanation of the Generalized Hurst Exponent:
The Generalized Hurst Exponent (GHE, H_q or H(q)) can (as the name implies) be seen as a generalization of the Hurst exponent for data series with multifractal properties. It’s origins are however not directly related to Hurst’s rescaled range approach, but to the definition of self-affine functions.
A single-valued self-affine function h by definition satisfies the relation
h(x) ~= lambda^(-H) h(lambda x)
for any positive real valued lambda and some positive real valued exponent H, which is called the Hurst, Hölder, Hurst-Hölder or roughness exponent in the literature. In other words you can view lambda as a scaling factor or “step size”. With lambda < 1 we decrease the step size and zoom into our function. In this case lambda^(-H) becomes greater than one, meaning that h(lambda x) looks similar to a smaller version of h(x). With lambda > 1 we zoom out and get lambda^(-H) < 1.
To calculate H, you can use the height-height correlation function (also called autocorrelation) c(d) = <(h(x) - h(x + d))^2>_x where <…>_x denotes the expected value over x. Here, the aforementioned self-affine property is equivalent to c(d) ~ d^(2H). You can also think of d as a step size. Increasing or decreasing d from 1 to some y is the same as setting lambda = y: It increases or decreases the scale of the function by a factor of 1/y^(-H) = y^H. Therefore the squared differences will be proportional to y^2H.
A.-L. Barabási and T. Vicsek extended this notion to an infinite hierarchy of exponents H_q for the qth-order correlation function with
c_q(d) = <(h(x) - h(x + d))^q>_x ~ d^(q H_q)
With q = 1 you get a value H_1 that is closely related to the normal Hurst exponent, but with different q you either get a constant value H_q = H_0 independent of q, which indicates that the function has no multifractal properties, or different H_q, which is a sign for multifractal behavior.
T. Di Matteo, T. Aste and M. M. Dacorogna applied this technique to financial data series and gave it the name “Generalized Hurst Exponent”.
- Explanation of the Algorithm:
Curiously, I could not find any algorithmic description how to calculate H_q in the literature. Researchers seem to just imply that you can obtain the exponent by a line fitting algorithm in a log-log plot, but they do not talk about the actual procedure or the required parameters.
Essentially, we can calculate c_q(d) of a discrete evenly sampled time series Y = [y_0, y_1, y_2, … y_(N-1)] by taking the absolute differences [|y_0 - y_d|, |y_1 - y_(d+1)|, … , |y_(N-d-1) - y_(N-1)|] raising them to the qth power and taking the mean.
Now we take the logarithm on both sides of our relation c_q(d) ~ d^(q H_q) and get
log(c_q(d)) ~ log(d) * q H_q
So in other words if we plot log(c_q(d)) against log(d) for several d we should get a straight line with slope q H_q. This enables us to use a linear least squares algorithm to obtain H_q.
Note that we consider x as a discrete variable in the range 0 <= x < N. We can do this, because the actual sampling rate of our data series does not alter the result. After taking the logarithm any scaling factor delta_x would only result in an additive term since log(delta_x * x) = log(x) + log(delta_x) and we only care about the slope of the line and not the intercept.
- References:
- [mh_1]
A.-L. Barabási and T. Vicsek, “Multifractality of self-affine fractals,” Physical Review A, vol. 44, no. 4, pp. 2730–2733, 1991.
- Args:
- data (array-like of float):
time series of data points (should be evenly sampled)
- Kwargs:
- qvals (iterable of float or int):
values of q for which H_q should be calculated (default: [1])
- dists (iterable of int):
distances for which the height-height correlation should be calculated (determines the x-coordinates in the log-log plot) default: logarithmic_n(1, max(20, 0.02 * len(data)), 1.5) to ensure even spacing on the logarithmic axis
- fit (str):
the fitting method to use for the line fit, either ‘poly’ for normal least squares polynomial fitting or ‘RANSAC’ for RANSAC-fitting which is more robust to outliers
- debug_plot (boolean):
if True, a simple plot of the final line-fitting step will be shown
- debug_data (boolean):
if True, debugging data will be returned alongside the result
- plot_file (str):
if debug_plot is True and plot_file is not None, the plot will be saved under the given file name instead of directly showing it through
plt.show()
- Returns:
- array of float:
list of H_q for every q given in
qvals
- (1d-vector, 2d-vector, 2d-vector):
only present if debug_data is True: debug data of the form
(xvals, yvals, poly)
wherexvals
is the logarithm ofdists
,yvals
are the logarithms of the corresponding height-height- correlations for each distance (first dimension) and each q (second dimension) in the shape len(dists) x len(qvals) andpoly
are the line coefficients ([slope, intercept]
) for each q in the shape len(qvals) x 2.
Generalized Hurst Exponent (Di Matteo et al.)¶
- nolds.mfhurst_dm(data, qvals=None, max_dists=range(5, 20), detrend=True, fit='poly', debug_plot=False, debug_data=False, plot_file=None)[source]¶
Calculates the Generalized Hurst Exponent H_q for different q according to the MATLAB code of Tomaso Aste - one of the authors that introduced this measure.
- Explanation of the General Hurst Exponent:
See mfhurst_b.
Warning: I do not recommend to use this function unless you want to reproduce examples from Di Matteo et al.. From my experiments and a critical code analysis it seems that mfhurst_b should provide more robust results.
The design choices that make mfhurst_dm different than mfhurst_d are the following:
- By default, a linear trend is removed from the data. This can be sensible
in some application areas (such as stock market analysis), but I think this should be an additional preprocessing step and not part of this algorithm.
- In the calculation of the height-height correlations, the differences
(h(x) - h(x + d) are not calculated for every possible x from 0 to N-d-1, but instead d is used as a step size for x. I see no justification for this choice. It makes the algorithm run faster, but it also takes away a lot of statistical robustness, especially for large values of d. This effect can be clearly seen when setting debug_plot to True.
- The algorithm uses a linear scale for the distance values d = 1, 2, 3,
…, tau_max. This is counter intuitive, since we later plot log(d) against log(c_q(d)). A linear scale will have a bias towards larger values in the logarithmic scale. A logarithmic scale for d seems to be a more natural fit. If low values of d yield statistically unstable results, they should simply be omitted.
- The algorithm tests multiple values for tau_max, which is the maximum
distance that will be calculated. In [mhd_1] the authors state that this is done to test the robustness of the approach. However, taking the mean of several runs with different tau_max will not produce any more information than performing one run with the largest tau_max. Instead it will only introduce a bias towards low values for d.
- References:
- [mhd_1]
T. Di Matteo, T. Aste, and M. M. Dacorogna, “Scaling behaviors in differently developed markets,” Physica A: Statistical Mechanics and its Applications, vol. 324, no. 1–2, pp. 183–188, 2003.
- Reference code:
- [mhd_a]
Tomaso Aste, “Generalized Hurst exponent”, url: http://de.mathworks.com/matlabcentral/fileexchange/30076-generalized-hurst-exponent
- Args:
- data (1d-vector of float):
input data (should be evenly sampled)
- qvals (1d-vector of float)
values of q for which H_q should be calculated (default: [1])
- Kwargs:
- max_dists (1d-vector of int):
different values to test for tau_max, the maximum value for the distance d. The resulting H_q will be a mean of all H_q calculated with tau_max = max_dists[0], max_dists[1], … .
- detrend (boolean):
if True, a linear trend will be removed from the data before H_q will be calculated
- fit (str):
the fitting method to use for the line fit, either ‘poly’ for normal least squares polynomial fitting or ‘RANSAC’ for RANSAC-fitting which is more robust to outliers
- debug_plot (boolean):
if True, a simple plot of the final line-fitting step will be shown
- debug_data (boolean):
if True, debugging data will be returned alongside the result
- plot_file (str):
if debug_plot is True and plot_file is not None, the plot will be saved under the given file name instead of directly showing it through
plt.show()
- Returns:
- array of float:
array of mH_q for every q given in
qvals
where mH_q is the mean of all H_q calculated for different max distances in max_dists.- array of float:
array of standard deviations sH_q for each mH_q returned
- (1d-vector, 2d-vector, 2d-vector):
only present if debug_data is True: debug data of the form
(xvals, yvals, poly)
wherexvals
is the logarithm ofdists
,yvals
are the logarithms of the corresponding height-height- correlations for each distance (first dimension) and each q (second dimension) in the shape len(dists) x len(qvals) andpoly
are the line coefficients ([slope, intercept]
) for each q in the shape len(qvals) x 2.
Helper functions¶
- nolds.binary_n(total_N, min_n=50)[source]¶
Creates a list of values by successively halving the total length total_N until the resulting value is less than min_n.
Non-integer results are rounded down.
- Args:
- total_N (int):
total length
- Kwargs:
- min_n (int):
minimal length after division
- Returns:
- list of integers:
total_N/2, total_N/4, total_N/8, … until total_N/2^i < min_n
- nolds.logarithmic_n(min_n, max_n, factor)[source]¶
Creates a list of values by successively multiplying a minimum value min_n by a factor > 1 until a maximum value max_n is reached.
Non-integer results are rounded down.
- Args:
- min_n (float):
minimum value (must be < max_n)
- max_n (float):
maximum value (must be > min_n)
- factor (float):
factor used to increase min_n (must be > 1)
- Returns:
- list of integers:
min_n, min_n * factor, min_n * factor^2, … min_n * factor^i < max_n without duplicates
- nolds.logarithmic_r(min_n, max_n, factor)[source]¶
Creates a list of values by successively multiplying a minimum value min_n by a factor > 1 until a maximum value max_n is reached.
- Args:
- min_n (float):
minimum value (must be < max_n)
- max_n (float):
maximum value (must be > min_n)
- factor (float):
factor used to increase min_n (must be > 1)
- Returns:
- list of floats:
min_n, min_n * factor, min_n * factor^2, … min_n * factor^i < max_n
- nolds.expected_h(nvals, fit='RANSAC')[source]¶
Uses expected_rs to calculate the expected value for the Hurst exponent h based on the values of n used for the calculation.
- Args:
- nvals (iterable of int):
the values of n used to calculate the individual (R/S)_n
- KWargs:
- fit (str):
the fitting method to use for the line fit, either ‘poly’ for normal least squares polynomial fitting or ‘RANSAC’ for RANSAC-fitting which is more robust to outliers
- Returns:
- float:
expected h for white noise
- nolds.expected_rs(n)[source]¶
Calculates the expected (R/S)_n for white noise for a given n.
This is used as a correction factor in the function hurst_rs. It uses the formula of Anis-Lloyd-Peters (see [h_3]).
- Args:
- n (int):
the value of n for which the expected (R/S)_n should be calculated
- Returns:
- float:
expected (R/S)_n for white noise
- nolds.logmid_n(max_n, ratio=0.25, nsteps=15)[source]¶
Creates an array of integers that lie evenly spaced in the “middle” of the logarithmic scale from 0 to log(max_n).
If max_n is very small and/or nsteps is very large, this may lead to duplicate values which will be removed from the output.
This function has benefits in hurst_rs, because it cuts away both very small and very large n, which both can cause problems, and still produces a logarithmically spaced sequence.
- Args:
- max_n (int):
largest possible output value (should be the sequence length when used in hurst_rs)
- Kwargs:
- ratio (float):
width of the “middle” of the logarithmic interval relative to log(max_n). For example, for ratio=1/2.0 the logarithm of the resulting values will lie between 0.25 * log(max_n) and 0.75 * log(max_n).
- nsteps (float):
(maximum) number of values to take from the specified range
- Returns:
- array of int:
a logarithmically spaced sequence of at most nsteps values (may be less, because only unique values are returned)
- nolds.lyap_r_len(**kwargs)[source]¶
Helper function that calculates the minimum number of data points required to use lyap_r.
Note that none of the required parameters may be set to None.
- Kwargs:
- kwargs(dict):
arguments used for lyap_r (required: emb_dim, lag, trajectory_len and min_tsep)
- Returns:
minimum number of data points required to call lyap_r with the given parameters
- nolds.lyap_e_len(**kwargs)[source]¶
Helper function that calculates the minimum number of data points required to use lyap_e.
Note that none of the required parameters may be set to None.
- Kwargs:
- kwargs(dict):
arguments used for lyap_e (required: emb_dim, matrix_dim, min_nb and min_tsep)
- Returns:
minimum number of data points required to call lyap_e with the given parameters
Datasets¶
Benchmark dataset for hurst exponent¶
- nolds.brown72 = array([45.47422, 42.55601, 46.5188 , ..., 42.78297, 44.34307, 40.70655])¶
- ndarray(shape, dtype=float, buffer=None, offset=0,
strides=None, order=None)
An array object represents a multidimensional, homogeneous array of fixed-size items. An associated data-type object describes the format of each element in the array (its byte-order, how many bytes it occupies in memory, whether it is an integer, a floating point number, or something else, etc.)
Arrays should be constructed using array, zeros or empty (refer to the See Also section below). The parameters given here refer to a low-level method (ndarray(…)) for instantiating an array.
For more information, refer to the numpy module and examine the methods and attributes of an array.
Parameters¶
(for the __new__ method; see Notes below)
- shapetuple of ints
Shape of created array.
- dtypedata-type, optional
Any object that can be interpreted as a numpy data type.
- bufferobject exposing buffer interface, optional
Used to fill the array with data.
- offsetint, optional
Offset of array data in buffer.
- stridestuple of ints, optional
Strides of data in memory.
- order{‘C’, ‘F’}, optional
Row-major (C-style) or column-major (Fortran-style) order.
Attributes¶
- Tndarray
Transpose of the array.
- databuffer
The array’s elements, in memory.
- dtypedtype object
Describes the format of the elements in the array.
- flagsdict
Dictionary containing information related to memory use, e.g., ‘C_CONTIGUOUS’, ‘OWNDATA’, ‘WRITEABLE’, etc.
- flatnumpy.flatiter object
Flattened version of the array as an iterator. The iterator allows assignments, e.g.,
x.flat = 3
(See ndarray.flat for assignment examples; TODO).- imagndarray
Imaginary part of the array.
- realndarray
Real part of the array.
- sizeint
Number of elements in the array.
- itemsizeint
The memory use of each array element in bytes.
- nbytesint
The total number of bytes required to store the array data, i.e.,
itemsize * size
.- ndimint
The array’s number of dimensions.
- shapetuple of ints
Shape of the array.
- stridestuple of ints
The step-size required to move from one element to the next in memory. For example, a contiguous
(3, 4)
array of typeint16
in C-order has strides(8, 2)
. This implies that to move from element to element in memory requires jumps of 2 bytes. To move from row-to-row, one needs to jump 8 bytes at a time (2 * 4
).- ctypesctypes object
Class containing properties of the array needed for interaction with ctypes.
- basendarray
If the array is a view into another array, that array is its base (unless that array is also a view). The base array is where the array data is actually stored.
See Also¶
array : Construct an array. zeros : Create an array, each element of which is zero. empty : Create an array, but leave its allocated memory unchanged (i.e.,
it contains “garbage”).
dtype : Create a data-type. numpy.typing.NDArray : An ndarray alias generic
w.r.t. its dtype.type <numpy.dtype.type>.
Notes¶
There are two modes of creating an array using
__new__
:If buffer is None, then only shape, dtype, and order are used.
If buffer is an object exposing the buffer interface, then all keywords are interpreted.
No
__init__
method is needed because the array is fully initialized after the__new__
method.Examples¶
These examples illustrate the low-level ndarray constructor. Refer to the See Also section above for easier ways of constructing an ndarray.
First mode, buffer is None:
>>> np.ndarray(shape=(2,2), dtype=float, order='F') array([[0.0e+000, 0.0e+000], # random [ nan, 2.5e-323]])
Second mode:
>>> np.ndarray((2,), buffer=np.array([1,2,3]), ... offset=np.int_().itemsize, ... dtype=int) # offset = 1*itemsize, i.e. skip first element array([2, 3])
The brown72 dataset has a prescribed (uncorrected) Hurst exponent of 0.7270. It is a synthetic dataset from the book “Chaos and Order in the Capital markets”[b7_a].
It is included here, because the dataset can be found online [b7_b] and is used by other software packages such as the R-package
pracma
[b7_c]. As such it can be used to compare different implementations.However, it should be noted that the idea that the “true” Hurst exponent of this series is indeed 0.7270 is problematic for several reasons:
This value does not take into account the Anis-Lloyd-Peters correction factor.
It was obtained using the biased version of the standard deviation.
It depends (as always for the Hurst exponent) on the choice of the length of the subsequences.
If you want to reproduce the prescribed value, you can use the following code:
nolds.hurst_rs( nolds.brown72, nvals=2**np.arange(3,11), fit="poly", corrected=False, unbiased=False )
References:
[b7_a]Edgar Peters, “Chaos and Order in the Capital Markets: A New View of Cycles, Prices, and Market Volatility”, Wiley: Hoboken, 2nd Edition, 1996.
[b7_b]Ian L. Kaplan, “Estimating the Hurst Exponent”, url: http://bearcave.com/misl/misl_tech/wavelets/hurst/index.html
[b7_c]HwB, “Pracma: brown72”, url: https://www.rdocumentation.org/packages/pracma/versions/1.9.9/topics/brown72
Tent map¶
- nolds.tent_map(x, steps, mu=2)[source]¶
Generates a time series of the tent map.
- Characteristics and Background:
The name of the tent map is derived from the fact that the plot of x_i vs x_i+1 looks like a tent. For mu > 1 one application of the mapping function can be viewed as stretching the surface on which the value is located and then folding the area that is greater than one back towards the zero. This corresponds nicely to the definition of chaos as expansion in one dimension which is counteracted by a compression in another dimension.
- Calculating the Lyapunov exponent:
The lyapunov exponent of the tent map can be easily calculated as due to this stretching behavior a small difference delta between two neighboring points will indeed grow exponentially by a factor of mu in each iteration. We thus can assume that:
delta_n = delta_0 * mu^n
We now only have to change the basis to e to obtain the exact formula that is used for the definition of the lyapunov exponent:
delta_n = delta_0 * e^(ln(mu) * n)
Therefore the lyapunov exponent of the tent map is:
lambda = ln(mu)
- References:
- Args:
- x (float):
starting point
- steps (int):
number of steps for which the generator should run
- Kwargs:
- mu (int):
parameter mu that controls the behavior of the map
- Returns:
- generator object:
the generator that creates the time series
Logistic map¶
- nolds.logistic_map(x, steps, r=4)[source]¶
Generates a time series of the logistic map.
- Characteristics and Background:
The logistic map is among the simplest examples for a time series that can exhibit chaotic behavior depending on the parameter r. For r between 2 and 3, the series quickly becomes static. At r=3 the first bifurcation point is reached after which the series starts to oscillate. Beginning with r = 3.6 it shows chaotic behavior with a few islands of stability until perfect chaos is achieved at r = 4.
- Calculating the Lyapunov exponent:
To calculate the “true” Lyapunov exponent of the logistic map, we first have to make a few observations for maps in general that are repeated applications of a function to a starting value.
If we have two starting values that differ by some infinitesimal \(delta_0\) then according to the definition of the lyapunov exponent we will have an exponential divergence:
\[|\delta_n| = |\delta_0| e^{\lambda n}\]We can now write that:
\[e^{\lambda n} = \lim_{\delta_0 -> 0} |\frac{\delta_n}{\delta_0}|\]This is the definition of the derivative \(\frac{dx_n}{dx_0}\) of a point \(x_n\) in the time series with respect to the starting point \(x_0\) (or rather the absolute value of that derivative). Now we can use the fact that due to the definition of our map as repetitive application of some f we have:
\[f^{n\prime}(x) = f(f(f(...f(x_0)...))) = f'(x_n-1) \cdot f'(x_n-2) \cdot ... \cdot f'(x_0)\]with
\[e^{\lambda n} = |f^{n\prime}(x)|\]we now have
\[\begin{split}e^{\lambda n} &= |f'(x_n-1) \cdot f'(x_n-2) \cdot ... \cdot f'(x_0)| \\ \Leftrightarrow \\ \lambda n &= \ln |f'(x_n-1) \cdot f'(x_n-2) \cdot ... \cdot f'(x_0)| \\ \Leftrightarrow \\ \lambda &= \frac{1}{n} \ln |f'(x_n-1) \cdot f'(x_n-2) \cdot ... \cdot f'(x_0)| \\ &= \frac{1}{n} \sum_{k=0}^{n-1} \ln |f'(x_k)|\end{split}\]With this sum we can now calculate the lyapunov exponent for any map. For the logistic map we simply have to calculate \(f'(x)\) and as we have
\[f(x) = r x (1-x) = rx - rx²\]we now get
\[f'(x) = r - 2 rx\]- References:
- Args:
- x (float):
starting point
- steps (int):
number of steps for which the generator should run
- Kwargs:
- r (int):
parameter r that controls the behavior of the map
- Returns:
- generator object:
the generator that creates the time series
Fractional brownian motion¶
Fractional gaussian noise¶
Quantum random numbers¶
- nolds.qrandom(n)[source]¶
Creates an array of n true random numbers obtained from the quantum random number generator at qrng.anu.edu.au
This function requires the package quantumrandom and an internet connection.
- Args:
- n (int):
length of the random array
- Return:
- array of ints:
array of truly random unsigned 16 bit int values
Financial example datasets¶
- nolds.load_financial()[source]¶
Loads the following datasets from CSV files in this package:
jkse: Jakarta Composite Index, downloaded on 2019-02-12 from https://finance.yahoo.com/quote/%5EJKSE/history?period1=631148400&period2=988668000&interval=1d&filter=history&frequency=1d
n225: Nikkei 225, downloaded on 2019-02-12 from https://finance.yahoo.com/quote/%5EN225/history?period1=631148400&period2=988668000&interval=1d&filter=history&frequency=1d
ndx: NASDAQ 100, downloaded on 2019-02-12 from https://finance.yahoo.com/quote/%5ENDX/history?period1=631148400&period2=988668000&interval=1d&filter=history&frequency=1d
All datasets are daily prices from the period from 1990-01-01 to 2001-05-01 missing values are NaN except for opening values which are treated as follows:
- If the first opening value is missing, the first existing opening value
is used for the first day.
- All other missing opening values are filled by the close value of the last
day where data was available.
- Returns:
- list of tuple(1d-array, 2d-array):
datasets with days as array of date objects and 2d-array with the columns “Open”, “High”, “Low”, “Close”, “Adj Close”, and “Volume”. Note that “Open” values have been padded to ensure that there are no NaNs left.
Fractal data used by Barabasi et al. (1991)¶
- nolds.barabasi1991_fractal(size, iterations, b1=0.8, b2=0.5)[source]¶
Generates the simple fractal described in [bf].
The fractal divides a rectangular segment starting at (x0, y0) with width w and height h along the x axis into four line segments of equal size with the boundary points [x0, x1, x2, x3, x4]. It has two parameters b1 and b2 that allow to choose the value for y(x1) and y(x3) while it always holds that y(x0) = y0, y(x2) = y0 and y(x4) = y0 + h.
The process starts with a single line segment of height 1 spanning the whole data range. In each iteration, the rectangles spanning the line segments from the previous iteration are subdivided according to the same rule.
- References:
- [bf]
A.-L. Barabási and T. Vicsek, “Multifractality of self-affine fractals,” Physical Review A, vol. 44, no. 4, pp. 2730–2733, 1991.
- Args:
- size (int):
number of data points in the resulting array
- iterations (int):
number of iterations to perform
- Kwargs:
- b1 (float):
relative height at x1 (between 0 and 1)
- b2 (float):
relative height at x3 (between 0 and 1)
- Returns:
- (1d-array of float):
generated fractal