AverageShiftedHistograms.jl

An Average Shifted Histogram (ASH) estimator is essentially a kernel density estimator calculated over a fine-partition histogram.

Benefits Over KDE

  • The histogram component can be constructed on-line.
  • Adding new data is an O(nbins) operation vs. O(n) for KDE, nbins << n.
  • ASH is considerably faster even for small datasets. See below with a comparison with KernelDensity.jl.
julia> @btime kde(x) setup=(x=randn(100));
  169.523 μs (106 allocations: 56.05 KiB)

julia> @btime ash(x) setup=(x=randn(100));
  4.173 μs (3 allocations: 8.22 KiB)

Usage

The main function exported by AverageShiftedHistograms is ash.

AverageShiftedHistograms.ashFunction

Univariate Ash

ash(x; kw...)

Fit an average shifted histogram to data x. Keyword options are:

  • rng : values over which the density will be estimated
  • m : Number of adjacent histograms to smooth over
  • kernel : kernel used to smooth the estimate

Bivariate Ash

ash(x, y; kw...)

Fit a bivariate averaged shifted histogram to data vectors x and y. Keyword options are:

  • rngx : x values where density will be estimated
  • rngy : y values where density will be estimated
  • mx : smoothing parameter in x direction
  • my : smoothing parameter in y direction
  • kernelx : kernel in x direction
  • kernely : kernel in y direction

Mutating an Ash object

Ash objectes can be updated with new data, smoothing parameter(s), or kernel(s). They cannot, however, change the ranges over which the density is estimated. It is therefore suggested to err on the side of caution when choosing data endpoints.

# univariate
ash!(obj; kw...)
ash!(obj, newx, kw...)

# bivariate
ash!(obj; kw...)
ash!(obj, newx, newy; kw...)
source

Gotchas

Warning

Beware oversmoothing by setting the m parameter too large. Note that "too large" is relative to the width of the bin edges.

using AverageShiftedHistograms, Plots

y = randn(10 ^ 6)

o = ash(y; rng = -5:.1:5, m = 20)

plot(o)