AverageShiftedHistograms.jl
An Average Shifted Histogram (ASH) estimator is essentially a kernel density estimator calculated over a fine-partition histogram.
Benefits Over KDE
- The histogram component can be constructed on-line.
- Adding new data is an
O(nbins)
operation vs.O(n)
for KDE,nbins << n
. - ASH is considerably faster even for small datasets. See below with a comparison with KernelDensity.jl.
julia> @btime kde(x) setup=(x=randn(100));
169.523 μs (106 allocations: 56.05 KiB)
julia> @btime ash(x) setup=(x=randn(100));
4.173 μs (3 allocations: 8.22 KiB)
Usage
The main function exported by AverageShiftedHistograms is ash
.
AverageShiftedHistograms.ash
— FunctionUnivariate Ash
ash(x; kw...)
Fit an average shifted histogram to data x
. Keyword options are:
rng
: values over which the density will be estimatedm
: Number of adjacent histograms to smooth overkernel
: kernel used to smooth the estimate
Bivariate Ash
ash(x, y; kw...)
Fit a bivariate averaged shifted histogram to data vectors x
and y
. Keyword options are:
rngx
: x values where density will be estimatedrngy
: y values where density will be estimatedmx
: smoothing parameter in x directionmy
: smoothing parameter in y directionkernelx
: kernel in x directionkernely
: kernel in y direction
Mutating an Ash object
Ash objectes can be updated with new data, smoothing parameter(s), or kernel(s). They cannot, however, change the ranges over which the density is estimated. It is therefore suggested to err on the side of caution when choosing data endpoints.
# univariate
ash!(obj; kw...)
ash!(obj, newx, kw...)
# bivariate
ash!(obj; kw...)
ash!(obj, newx, newy; kw...)
Gotchas
Beware oversmoothing by setting the m
parameter too large. Note that "too large" is relative to the width of the bin edges.
using AverageShiftedHistograms, Plots
y = randn(10 ^ 6)
o = ash(y; rng = -5:.1:5, m = 20)
plot(o)