Skip to content

stats

import "github.com/wnjoon/go-yfinance/pkg/stats"

Package stats provides statistical utility functions for price repair operations.

This package includes functions for percentile calculations, z-score computations, median filtering, and outlier detection. These utilities are essential for detecting and correcting data quality issues in financial time series.

Percentile Functions

The package provides percentile calculation using linear interpolation:

p50 := stats.Percentile(data, 50.0)  // Median
q1, q3, iqr := stats.IQR(data)       // Interquartile range

Z\-Score Functions

Z-score calculations for standardization and outlier detection:

z := stats.ZScore(value, mean, std)
zScores := stats.ZScoreSlice(data)

Filtering Functions

Median filter and outlier detection for noise reduction:

filtered := stats.MedianFilter(data, windowSize)
mask := stats.OutlierMask(data, multiplier)

These functions are designed to match the behavior of numpy and scipy functions used in the Python yfinance implementation.

Index

func Abs

func Abs(data []float64) []float64

Abs returns absolute values of the data.

func All

func All(mask []bool) bool

All returns true if all values in the mask are true.

func Any

func Any(mask []bool) bool

Any returns true if any value in the mask is true.

func ClipOutliers

func ClipOutliers(data []float64, multiplier float64) []float64

ClipOutliers replaces outliers with boundary values.

func CountTrue

func CountTrue(mask []bool) int

CountTrue counts the number of true values in a boolean slice.

func DetectOutliersByZScore

func DetectOutliersByZScore(data []float64, threshold float64) []bool

DetectOutliersByZScore identifies outliers based on z-score threshold. Returns a boolean mask where true indicates an outlier.

Parameters:

  • data: slice of float64 values
  • threshold: z-score threshold (typically 2.0 or 3.0)

func Diff

func Diff(data []float64) []float64

Diff calculates the difference between consecutive elements. Returns slice of length n-1.

func FilterByMask

func FilterByMask(data []float64, mask []bool) []float64

FilterByMask returns elements where mask is true.

func FindBlocks

func FindBlocks(mask []bool) [][2]int

FindBlocks identifies contiguous blocks of True values in a boolean mask. Returns slice of [start, end) pairs.

func IQR

func IQR(data []float64) (q1, q3, iqr float64)

IQR calculates the interquartile range (Q3 - Q1). Returns Q1, Q3, and IQR.

The interquartile range is used for outlier detection:

  • Lower bound: Q1 - 1.5 * IQR
  • Upper bound: Q3 + 1.5 * IQR

func InlierMask

func InlierMask(data []float64, multiplier float64) []bool

InlierMask creates a boolean mask for inliers (non-outliers). Returns true for values that are NOT outliers.

func Mean

func Mean(data []float64) float64

Mean calculates the arithmetic mean of the data. Returns NaN for empty data.

func Median

func Median(data []float64) float64

Median calculates the median (50th percentile) of the data.

func MedianFilter

func MedianFilter(data []float64, windowSize int) []float64

MedianFilter applies a 1D median filter to the data. This is similar to scipy.ndimage.median_filter for 1D arrays.

Parameters:

  • data: input slice
  • windowSize: filter window size (should be odd)

Returns filtered data with same length as input. Edge values use smaller windows.

func MedianFilter2D

func MedianFilter2D(data [][]float64, windowSize int) [][]float64

MedianFilter2D applies a 2D median filter to the data matrix. This is similar to scipy.ndimage.median_filter for 2D arrays.

Parameters:

  • data: 2D slice [rows][cols]
  • windowSize: filter window size for both dimensions

Returns filtered 2D data.

func MedianOfSlice

func MedianOfSlice(data []float64) float64

MedianOfSlice calculates the median without sorting the original slice.

func OHLCMedian

func OHLCMedian(open, high, low, close float64) float64

OHLC calculates the median of Open, High, Low, Close values. This provides a robust estimate of the "typical" price.

func OutlierBounds

func OutlierBounds(data []float64, multiplier float64) (lower, upper float64)

OutlierBounds calculates the lower and upper bounds for outlier detection using the IQR method with a configurable multiplier.

Parameters:

  • data: slice of float64 values
  • multiplier: IQR multiplier (typically 1.5 for outliers, 3.0 for extreme outliers)

Returns lower bound, upper bound.

func OutlierMask

func OutlierMask(data []float64, multiplier float64) []bool

OutlierMask creates a boolean mask for outliers using the IQR method. Returns true for values that are outliers.

Parameters:

  • data: slice of float64 values
  • multiplier: IQR multiplier (typically 1.5)

func PctChange

func PctChange(data []float64) []float64

PctChange calculates the percentage change between consecutive elements. Returns slice of length n-1.

func Percentile

func Percentile(data []float64, p float64) float64

Percentile calculates the p-th percentile of the given data using linear interpolation. This matches numpy.percentile with default interpolation method.

Parameters:

  • data: slice of float64 values
  • p: percentile to compute (0-100)

Returns the percentile value. Returns NaN for empty data.

func RemoveNaN

func RemoveNaN(data []float64) []float64

RemoveNaN returns a new slice with NaN values removed.

func RollingMean

func RollingMean(data []float64, windowSize int) []float64

RollingMean calculates a rolling (moving) mean with the specified window size. Uses center alignment. Returns NaN for positions where window is incomplete.

func RollingStd

func RollingStd(data []float64, windowSize int) []float64

RollingStd calculates a rolling (moving) standard deviation. Uses center alignment and sample std (ddof=1).

func Std

func Std(data []float64, ddof int) float64

Std calculates the standard deviation of the data. Uses n-1 denominator (sample standard deviation) by default.

Parameters:

  • data: slice of float64 values
  • ddof: delta degrees of freedom (0 for population, 1 for sample)

func WeightedMean

func WeightedMean(data, weights []float64) float64

WeightedMean calculates the weighted arithmetic mean. Returns NaN if weights sum to zero or if slices have different lengths.

func ZScore

func ZScore(value, mean, std float64) float64

ZScore calculates the z-score (standard score) for a single value.

Z-score = (value - mean) / std

Returns NaN if std is zero or NaN.

func ZScoreSlice

func ZScoreSlice(data []float64) []float64

ZScoreSlice calculates z-scores for all values in the data. Uses sample standard deviation (ddof=1).

func ZScoreWithParams

func ZScoreWithParams(data []float64, mean, std float64) []float64

ZScoreWithParams calculates z-scores using provided mean and std.