Generalization and Benchmarking of a Nonparametric Method for Derivative Discontinuity Detection

Generalisierung und Evaluierung nicht-parametrischer Methoden zur Unstetigkeitserkennung

Dokumentinformationen

Schule

Montanuniversität Leoben

Fachrichtung Automation
Veröffentlichungsjahr Wintersemester 2020/21
Dokumenttyp Masterarbeit
Sprache German
Seitenanzahl 177
Format | PDF
Größe 9.61 MB

Zusammenfassung

I.Derivative Discontinuities

This thesis explores the detection of discontinuities in real-time sensor data, particularly in physical systems whose behavior is described by differential equations. Discontinuities violate the assumption of smooth, differentiable functions in these systems. Extending previous work on C n discontinuity detection, this thesis generalizes the approach to define which derivative orders should be examined for discontinuities.

1 Derivative Discontinuities

Definition and Significance:

Discontinuities in real-time sensor data are crucial to detect as physical systems with differential equations should only exhibit continuous functions and derivatives. This work extends the detection of C n discontinuities to a more general case, allowing users to specify which derivative orders to inspect for discontinuities.

Method Development:

The generalized method relies on matrix algebraic formulations and provides numerical solutions. The derivations are presented in matrix notation for computational efficiency. Extensive testing using synthetic and real datasets evaluates the method's performance.

Performance Comparison:

The new method is compared with other discontinuity and change detection algorithms using two metrics: the segmentation covering metric and the F1-Score. The results show that the generalized method is among the best performing algorithms, making it attractive for various applications due to its numerical efficiency and wide applicability.

II.Numerical Implementation and Evaluation

The extended method is numerically implemented and tested on various synthetic and real-world datasets. Numerical solutions are provided using matrix algebraic formulations. The detection method is compared to other discontinuity and change detection algorithms using two performance metrics. The results demonstrate the effectiveness of the new approach.

3.3.1 KUSUM Method

Die KUSUM-Methode (cumulative sum) ist eine weit verbreitete und sehr effektive Erkennungstechnik, die auf Bayes'schen Prinzipien beruht. Eine verwandte Methode stammt von Fearnhead und Liu. Im Gegensatz zu den meisten Bayes'schen Algorithmen können diese beiden Methoden sowohl für die Online- als auch für die Offline-Erkennung von Änderungen verwendet werden. Wie bereits beschrieben, lassen sich diese Methoden jedoch leicht an die Offline-Nutzung anpassen.

4.1 constrained coupled polynomial method

Das constrained coupled polynomial method ist eine Methode zur Erkennung von Unstetigkeiten in Ableitungen. Es basiert auf der Approximierung von Polynomen links und rechts eines Stützpunktes, wobei die Koeffizienten bis zur Ordnung (n-1) am Stützpunkt eingeschränkt werden, um eine C^(n-1)-Stetigkeit zu gewährleisten. Dies reduziert die Freiheitsgrade der Approximation und verbessert die Varianz der Lösung.

4.2.3 Combined Error

Als drittes und letztes Maß für eine mögliche Ableitungsunstetigkeit wird der kombinierte Fehler eingeführt. Dabei wird der Residuenfehler über den gesamten Approximationsbereich zweimal berechnet. Numerisch kann dieser Vorgang in zwei Schritten unterteilt werden. Für jede Seite - links und rechts vom Stützpunkt xi - wird der kombinierte Residuenfehler e_f (oder e_g) als Differenz zwischen der tatsächlichen Beobachtung und der kombinierten Approximation einschließlich der Interpolation berechnet.

III.Change Point Detection

Change point analysis is a key field in time series analysis, focusing on the detection of abrupt changes in data. In 1954, Page introduced the well-known CUSUM (cumulative sum) method combined with a quality criterion for change point detection. This technique has been widely used in various applications, including manufacturing quality control.

1.1 Change Point Detection

We will focus on detecting jump discontinuities of functions and their derivatives, since such irregularities can affect the behaviour of a model describing a physical system by differential equations.

3 Change Point Analysis

Change point analysis deals with detecting abrupt changes in data, which can be of various nature and characteristics. It aims to answer questions about the location, size and amount of discontinuities, and the certainty of their detection.

3.1 Change Point Analysis Methods

Various methods exist for change point detection, including non-parametric methods, Bayesian methods, and the CUSUM method, each with their advantages and disadvantages.

3.2 CUSUM Method

The CUSUM method calculates the cumulative sum of observed data and detects potential change points by finding the point where the sum has its absolute maximum. It can be used in both one-sided and two-sided forms.

3.3 Non parametric Methods

Non-parametric methods for discontinuity analysis include techniques based on kernel estimation, residual analysis, and non-parametric kernel estimation.

3.3.3 Bayesian Methods

Bayesian change point detection methods model the problem in a general form, assuming known or partially known probability distributions. Popular methods include the Bayesian Online Changepoint Detection method by Adams and MacKay and the similar approach by Fearnhead and Liu.

4 Discontinuity Detection in Observational Data

The proposed method for discontinuity detection in observational data is based on polynomial approximation, where polynomials are used to estimate the function values at interstitial points.

4.2 Method of Ninevski and O Leary

The method of Ninevski and O'Leary uses constrained polynomial approximation to detect discontinuities in the n-th derivative of a function. It estimates polynomial coefficients up to order (n-1) and calculates Taylor differences to infer discontinuities.

4.2.3 Measures for Discontinuity

Three measures are defined to quantify potential derivative discontinuities: the absolute value of the Taylor difference, the relative Taylor difference, and the combined error.

5.2 Synthetic Datasets

Synthetic datasets with known statistical parameters, including change point size, location and order, are created to validate the discontinuity detection method.

5.3 Generalisation of the Method

The discontinuity detection method is generalised to allow for specifying which derivative orders to inspect for discontinuities. This is achieved by applying constraints to the polynomial approximation.

5.4 Evaluation on Real World Datasets

The method is evaluated on a collection of real world datasets, and its performance is compared to other change point detection methods using the segmentation covering metric and the F1-Score.

6 Benchmark Testing

Benchmark testing is performed on the entire dataset collection, once with the constrained coupled polynomial approach and once with the generalised approach. The results are compared to other methods.

Conclusion

The generalised discontinuity detection approach is the most generic of all methods considered in the thesis, not requiring application-specific adaptation. It performs well for various test cases and is computationally efficient, making it attractive for detecting discontinuities in observational data.

IV.Non Parametric Methods for Discontinuity Detection

In addition to the CUSUM method, various non-parametric methods have been developed for discontinuity detection. Hall and Titterington (1992) used kernel-based estimation to preserve discontinuities in curves. Jose and Ismail (1997) approached a similar problem through residual analysis. Other researchers employed non-parametric kernel estimation techniques, such as Müller (1992) and Wu and Chu (1993).

Non Parametric Methods for Discontinuity Detection

The method described in [1] is presented in this chapter for derivative discontinuity detection. It is extended to a more general case in [1], where a set of n derivative orders is considered, and the orders to be inspected for discontinuities can be defined. Here, all the derivations for the method are provided in matrix algebraic formulations. A numerical solution for the method is implemented and tested using various datasets. Performance estimates are computed using two different metrics and compared with the results from other discontinuity and change detection methods.

V.Bayesian Methods for Discontinuity Detection

Bayesian methods are another popular group of discontinuity detection algorithms. Chernoff and Zacks (1964), Broemeling (1985), and Smith (1975) described problem formulations where the distribution of the observed dataset before and after a potential change point is known or partially known. Many Bayesian methods utilize the product partitioning model introduced by Barry and Hartigan (1992).

1 Bayesian Methods for Discontinuity Detection

Bayesian methods are a popular alternative to likelihood ratio tests and can provide more flexibility in modeling the underlying probability distributions. One notable Bayesian method for change point detection is the Bayesian Online Changepoint Detection (BOCPD) algorithm proposed by Adams and MacKay (2007), which sequentially updates the posterior probability of a change point at each time step. This algorithm is particularly useful in online applications where the data is continuously streaming in and the change points need to be detected in real-time.

VI.Evaluation of Change Point Methods

For evaluation purposes, datasets with known statistical parameters, such as change point size, location, and order, were created and tested under various conditions, including the presence of noise. The new discontinuity detection method was benchmarked against other existing methods, and the results were compared using the segmentation covering metric and the F1-Score. The new approach demonstrated competitive performance, especially for higher order derivative discontinuities.

1 Evaluation of Change Point Methods

In this section, the performance of the proposed method is evaluated using two metrics: the segmentation covering metric and the F1-Score. The evaluation is conducted on a collection of real-world datasets commonly used in the literature for change point analysis and includes both univariate and multivariate datasets.

1.1 Segmentation Covering Metric

The segmentation covering metric is a typical measure for change point evaluation methods that are separated into clustering and classification metrics. The measure is used to compare the subsets of a dataset due to the detected change points with the ground truth subsets. The highest Jaccard index J(A, A 0 ) for each subset is computed, where every subset in the partition S is compared to every subset of the ground truth partition G k . This comparison then yields the highest Jaccard index J(A, A 0 ) for each subset of the detected partition S, which can be seen as the score of the best match for each subset. Finally, for those detected scores the weighted average over the dataset length T of all subsets yields the segmentation covering measure C(S, G k ) of the partitions S and G k .

1.2 F1 Score

The F1-Score is a weighted average of the precision and recall, where an F1-score reaches its best value at 1 (perfect precision and recall) and worst at 0. It is calculated as: F1 = 2 * (precision * recall) / (precision + recall).

1.3 Benchmark Testing

The benchmark testing is performed on the entire dataset collection, once for the constrained coupled polynomial approach before and once after the generalization. The results are shown in Table 6.1, where the average of each metric for the entire dataset collection is compared to the other methods evaluated in [31]. The parameters for the newly tested algorithms have been optimized manually.

VII.Keywords

  • Discontinuity detection
  • Change point analysis
  • Time series analysis
  • Derivative discontinuities
  • Segmentation covering metric
  • F1-Score

Keywords

  1. Erkennung von Unstetigkeiten
  2. Monitoring
  3. Zeitreihenanalyse
  4. Multivariate Zeitreihen
  5. CUSUM-Verfahren
  6. Bayes-Methoden
  7. Differenzenquotienten
  8. Matrixschreibweise
  9. Kalman-Filter
  10. Nicht-parametrische Methoden
  11. Fehlertoleranz
  12. Restauswertung
  13. Likelihood-Ratio-Test
  14. Exponentielles Glätten
  15. Taylor-Expansion
  16. LSI-Terme