AverageShiftedHistograms.jl

An Average Shifted Histogram (ASH) estimator is essentially a kernel density estimator calculated over a fine-partition histogram.

Benefits Over KDE

The histogram component can be constructed on-line.
Adding new data is an O(nbins) operation vs. O(n) for KDE, nbins << n.
ASH is considerably faster even for small datasets. See below with a comparison with KernelDensity.jl.

julia> @btime kde(x) setup=(x=randn(100));
  169.523 μs (106 allocations: 56.05 KiB)

julia> @btime ash(x) setup=(x=randn(100));
  4.173 μs (3 allocations: 8.22 KiB)

Usage

The main function exported by AverageShiftedHistograms is ash.

AverageShiftedHistograms.ash — Function

Univariate Ash

ash(x; kw...)

Fit an average shifted histogram to data x. Keyword options are:

rng : values over which the density will be estimated
m : Number of adjacent histograms to smooth over
kernel : kernel used to smooth the estimate

Bivariate Ash

ash(x, y; kw...)

Fit a bivariate averaged shifted histogram to data vectors x and y. Keyword options are:

rngx : x values where density will be estimated
rngy : y values where density will be estimated
mx : smoothing parameter in x direction
my : smoothing parameter in y direction
kernelx : kernel in x direction
kernely : kernel in y direction

Mutating an Ash object

Ash objectes can be updated with new data, smoothing parameter(s), or kernel(s). They cannot, however, change the ranges over which the density is estimated. It is therefore suggested to err on the side of caution when choosing data endpoints.

# univariate
ash!(obj; kw...)
ash!(obj, newx, kw...)

# bivariate
ash!(obj; kw...)
ash!(obj, newx, newy; kw...)

source

Gotchas

Warning

Beware oversmoothing by setting the m parameter too large. Note that "too large" is relative to the width of the bin edges.

using AverageShiftedHistograms, Plots

y = randn(10 ^ 6)

o = ash(y; rng = -5:.1:5, m = 20)

plot(o)