API

# API

``ADADELTA(ρ = .95)``

An extension of `ADAGRAD`.

source
``ADAGRAD()``

A variation of `SGD` with element-wise weights generated by the average of the squared gradients.

source
``ADAM(β1 = .99, β2 = .999)``

A variant of `SGD` with element-wise learning rates generated by exponentially weighted first and second moments of the gradient.

source
``ADAMAX(η, β1 = .9, β2 = .999)``

ADAMAX with momentum parameters `β1`, `β2`. ADAMAX is an extension of `ADAM`.

source
``AutoCov(b, T = Float64; weight=EqualWeight())``

Calculate the auto-covariance/correlation for lags 0 to `b` for a data stream of type `T`.

Example

``````y = cumsum(randn(100))
o = AutoCov(5)
fit!(o, y)
autocov(o)
autocor(o)``````
source
``BiasVec(x)``

Lightweight wrapper of a vector which adds a "bias" term at the end.

Example

``BiasVec(rand(5))``
source
``Bootstrap(o::OnlineStat, nreps = 100, d = [0, 2])``

Calculate an online statistical bootstrap of `nreps` replicates of `o`. For each call to `fit!`, any given replicate will be updated `rand(d)` times (default is double or nothing).

Example

``````o = Bootstrap(Variance())
fit!(o, randn(1000))
confint(o, .95)``````
source
``CStat(stat)``

Track a univariate OnlineStat for complex numbers. A copy of `stat` is made to separately track the real and imaginary parts.

Example

``````y = randn(100) + randn(100)im
fit!(CStat(Mean()), y)``````
source
``CallFun(o::OnlineStat, f::Function)``

Call `f(o)` every time the OnlineStat `o` gets updated.

Example

``````o = CallFun(Mean(), println)
fit!(o, [0,0,1,1])``````
source
``````CountMap(T::Type)
CountMap(dict::AbstractDict{T, Int})``````

Track a dictionary that maps unique values to its number of occurrences. Similar to `StatsBase.countmap`.

Example

``````o = fit!(CountMap(Int), rand(1:10, 1000))
value(o)
probs(o)
OnlineStats.pdf(o, 1)
collect(keys(o))``````
source
``````CovMatrix(p=0; weight=EqualWeight())
CovMatrix(::Type{T}, p=0; weight=EqualWeight())``````

Calculate a covariance/correlation matrix of `p` variables. If the number of variables is unknown, leave the default `p=0`.

Example

``````o = fit!(CovMatrix(), randn(100, 4))
cor(o)
cov(o)
mean(o)
var(o)``````
source
``Diff(T::Type = Float64)``

Track the difference and the last value.

Example

``````o = Diff()
fit!(o, [1.0, 2.0])
last(o)
diff(o)``````
source
``Extrema(T::Type = Float64)``

Maximum and minimum.

Example

``````o = fit!(Extrema(), rand(10^5))
extrema(o)
maximum(o)
minimum(o)``````
source
``FTSeries(stats...; filter=x->true, transform=identity)``

Track multiple stats for one data stream that is filtered and transformed before being fitted.

``FTSeries(T, stats...; filter, transform)``

Create an FTSeries and specify the type `T` of the transformed values.

Example

``````o = FTSeries(Mean(), Variance(); transform=abs)
fit!(o, -rand(1000))

# Remove missing values represented as DataValues
using DataValues
y = DataValueArray(randn(100), rand(Bool, 100))
o = FTSeries(DataValue, Mean(); transform=get, filter=!isna)
fit!(o, y)``````
source
``FastForest(p, nkeys=2; stat=FitNormal(), kw...)``

Calculate a random forest where each variable is summarized by `stat`.

Keyword Arguments

• `nt=100)`: Number of trees in the forest
• `b=floor(Int, sqrt(p))`: Number of random features for each tree to receive
• `maxsize=1000`: Maximum size for any tree in the forest
• `splitsize=5000`: Number of observations in any given node before splitting
• `λ = .05`: Probability that each tree is updated on a new observation

Example

``````x, y = randn(10^5, 10), rand(1:2, 10^5)

o = fit!(FastForest(10), (x,y))

classify(o, x[1,:])``````
source
``FastTree(p::Int, nclasses=2; stat=FitNormal(), maxsize=5000, splitsize=1000)``

Calculate a decision tree of `p` predictors variables and classes `1, 2, …, nclasses`. Nodes split when they reach `splitsize` observations until `maxsize` nodes are in the tree. Each variable is summarized by `stat`, which can be `FitNormal()` or `Hist(nbins)`.

Example

``````x = randn(10^5, 10)
y = rand([1,2], 10^5)

o = fit!(FastTree(10), (x,y))

xi = randn(10)
classify(o, xi)``````
source
``FitBeta(; weight)``

Online parameter estimate of a Beta distribution (Method of Moments).

Example

``o = fit!(FitBeta(), rand(1000))``
source
``FitCauchy(; alg, rate)``

Approximate parameter estimation of a Cauchy distribution. Estimates are based on quantiles, so that `alg` will be passed to `Quantile`.

Example

``o = fit!(FitCauchy(), randn(1000))``
source
``FitGamma(; weight)``

Online parameter estimate of a Gamma distribution (Method of Moments).

Example

``````using Random
o = fit!(FitGamma(), randexp(10^5))``````
source
``FitLogNormal()``

Online parameter estimate of a LogNormal distribution (MLE).

Example

``o = fit!(FitLogNormal(), exp.(randn(10^5)))``
source
``FitMultinomial(p)``

Online parameter estimate of a Multinomial distribution. The sum of counts does not need to be consistent across observations. Therefore, the `n` parameter of the Multinomial distribution is returned as 1.

Example

``````x = [1 2 3; 4 8 12]
fit!(FitMultinomial(3), x)``````
source
``FitMvNormal(d)``

Online parameter estimate of a `d`-dimensional MvNormal distribution (MLE).

Example

``````y = randn(100, 2)
o = fit!(FitMvNormal(2), y)``````
source
``FitNormal()``

Calculate the parameters of a normal distribution via maximum likelihood.

Example

``o = fit!(FitNormal(), randn(1000))``
source
``````Group(stats::OnlineStat...)
Group(; stats...)
Group(collection)``````

Create a vector-input stat from several scalar-input stats. For a new observation `y`, `y[i]` is sent to `stats[i]`.

Examples

``````x = randn(100, 2)

fit!(Group(Mean(), Mean()), x)
fit!(Group(Mean(), Variance()), x)

o = fit!(Group(m1 = Mean(), m2 = Mean()), x)
o.stats.m1
o.stats.m2``````
source
``GroupBy{T}(stat)``

Update `stat` for each group (of type `T`).

Example

``````x = rand(1:10, 10^5)
y = x .+ randn(10^5)
fit!(GroupBy{Int}(Extrema()), zip(x,y))``````
source
``Heatmap(xedges, yedges; left = true, closed = true)``

Create a two dimensional histogram with the bin partition created by `xedges` and `yedges`. When fitting a new observation, the first value will be associated with X, the second with Y.

• If `left`, the bins will be left-closed.
• If `closed`, the bins on the ends will be closed. See Hist.

Example

``````o = fit!(HeatMap(-5:.1:5, -5:.1:5), eachrow(randn(10^5, 2)))

using Plots
plot(o)``````
source
``Hist(edges; left = true, closed = true)``

Create a histogram with bin partition defined by `edges`.

• If `left`, the bins will be left-closed.
• If `closed`, the bin on the end will be closed.
• E.g. for a two bin histogram \$[a, b), [b, c)\$ vs. \$[a, b), [b, c]\$

Example

``````o = fit!(Hist(-5:.1:5), randn(10^6))

# approximate statistics
using Statistics

mean(o)
var(o)
std(o)
quantile(o)
median(o)
extrema(o)``````
source
``HyperLogLog(b, T::Type = Number)  # 4 ≤ b ≤ 16``

Approximate count of distinct elements.

Example

``fit!(HyperLogLog(12), rand(1:10,10^5))``
source
``IndexedPartition(T, stat, b=100)``

Summarize data with `stat` over a partition of size `b` where the data is indexed by a variable of type `T`.

Example

``````o = IndexedPartition(Float64, Hist(10))
fit!(o, randn(10^4, 2))

using Plots
plot(o)``````
source
``KHist(k::Int)``

Estimate the probability density of a univariate distribution at `k` approximately equally-spaced points.

Example

``````o = fit!(KHist(25), randn(10^6))

# Approximate statistics
using Statistics
mean(o)
var(o)
std(o)
quantile(o)
median(o)

using Plots
plot(o)``````
source
``KMeans(p, k; rate=LearningRate(.6))``

Approximate K-Means clustering of `k` clusters and `p` variables.

Example

``````clusters = rand(Bool, 10^5)

x = [clusters[i] > .5 ? randn() : 5 + randn() for i in 1:10^5, j in 1:2]

o = fit!(KMeans(2, 2), x)``````
source
``KahanMean(; T=Float64, weight=EqualWeight())``

Track a univariate mean. Uses a compensation term for the update.

#Note

This should be more accurate as `Mean` in most cases but the guarantees of `KahanSum` do not apply. `merge!` can have some accuracy issues.

Update

\$μ = (1 - γ) * μ + γ * x\$

Example

``@time fit!(KahanMean(), randn(10^6))``
source
``KahanSum(T::Type = Float64)``

Track the overall sum. Includes a compensation term that effectively doubles precision, see Wikipedia for details.

Example

``fit!(KahanSum(Float64), fill(1, 100))``
source
``KahanVariance(; T=Float64, weight=EqualWeight())``

Track the univariate variance. Uses compensation terms for a higher accuracy.

#Note

This should be more accurate as `Variance` in most cases but the guarantees of `KahanSum` do not apply. `merge!` can have accuracy issues.

Example

``````o = fit!(KahanVariance(), randn(10^6))
mean(o)
var(o)
std(o)``````
source
``Lag{T}(b::Integer)``

Store the last `b` values for a data stream of type `T`. Values are stored as

\$v(t), v(t-1), v(t-2), …, v(t-b+1)\$

Example

``````o = fit!(Lag{Int}(10), 1:12)
o[1]
o[end]``````
source
``LinReg()``

Linear regression, optionally with element-wise ridge regularization.

Example

``````x = randn(100, 5)
y = x * (1:5) + randn(100)
o = fit!(LinReg(), (x,y))
coef(o)
coef(o, .1)
coef(o, [0,0,0,0,Inf])``````
source
``LinRegBuilder(p)``

Create an object from which any variable can be regressed on any other set of variables, optionally with element-wise ridge regularization. The main function to use with `LinRegBuilder` is `coef`:

``coef(o::LinRegBuilder, λ = 0; y=1, x=[2,3,...], bias=true, verbose=false)``

Return the coefficients of a regressing column `y` on columns `x` with ridge (`L2Penalty`) parameter `λ`. An intercept (`bias`) term is added by default.

Examples

``````x = randn(1000, 10)
o = fit!(LinRegBuilder(), x)

coef(o; y=3, verbose=true)

coef(o; y=7, x=[2,5,4])``````
source
``MSPI()``

Majorized Stochastic Proximal Iteration.

source
``Mean(T = Float64; weight=EqualWeight())``

Track a univariate mean, stored as type `T`.

Example

``@time fit!(Mean(), randn(10^6))``
source
``Moments(; weight=EqualWeight())``

First four non-central moments.

Example

``````o = fit!(Moments(), randn(1000))
mean(o)
var(o)
std(o)
skewness(o)
kurtosis(o)``````
source
``Mosaic(T::Type, S::Type)``

Data structure for generating a mosaic plot, a comparison between two categorical variables.

Example

``````using OnlineStats, Plots
x = [rand() > .8 for i in 1:10^5]
y = rand([1,2,2,3,3,3], 10^5)
o = fit!(Mosaic(Bool, Int), zip(x, y))
plot(o)``````
source
``````MovingTimeWindow{T<:TimeType, S}(window::DatePeriod)
MovingTimeWindow(window::DatePeriod; valtype=Float64, timetype=Date)``````

Fit a moving window of data based on time stamps. Each observation must be a `Tuple`, `NamedTuple`, or `Pair` where the first item is `<: Dates.TimeType`. Only observations with time stamps in the range

\$most_recent_datetime - window <= time_stamp <= most_recent_datetime\$

are kept track of.

Example

``````using Dates
dts = Date(2010):Day(1):Date(2011)
y = rand(length(dts))

o = MovingTimeWindow(Day(4); timetype=Date, valtype=Float64)
fit!(o, zip(dts, y))``````
source
``````MovingWindow(b, T)
MovingWindow(T, b)``````

Track a moving window of `b` items of type `T`.

Example

``````o = MovingWindow(10, Int)
fit!(o, 1:14)``````
source
``NBClassifier(p::Int, T::Type; stat = Hist(15))``

Calculate a naive bayes classifier for classes of type `T` and `p` predictors. For each class `K`, predictor variables are summarized by the `stat`.

Example

``````x, y = randn(10^4, 10), rand(Bool, 10^4)

o = fit!(NBClassifier(10, Bool), (x,y))
collect(keys(o))
probs(o)

xi = randn(10)
predict(o, xi)
classify(o, xi)``````
source
``OMAP()``

Online MM via Averaged Parameter.

source
``OMAS()``

Online MM via Averaged Surrogate.

source
``OrderStats(b::Int, T::Type = Float64; weight=EqualWeight())``

Average order statistics with batches of size `b`.

Example

``````o = fit!(OrderStats(100), randn(10^5))
quantile(o, [.25, .5, .75])``````
source
``P2Quantile(τ = 0.5)``

Calculate the approximate quantile via the P^2 algorithm. It is more computationally expensive than the algorithms used by `Quantile`, but also more exact.

Example

``fit!(P2Quantile(.5), rand(10^5))``
source
``Partition(stat, nparts=100)``

Split a data stream into `nparts` where each part is summarized by `stat`.

Example

``````o = Partition(Extrema())
fit!(o, cumsum(randn(10^5)))

using Plots
plot(o)``````
source
``PlotNN(b=300)``

Approximate scatterplot of `b` centers. This implementation is too slow to be useful.

Example

``````x = randn(10^4)
y = x + randn(10^4)
plot(fit!(PlotNN(), zip(x, y)))``````
source
``````ProbMap(T::Type; weight=EqualWeight())
ProbMap(A::AbstractDict{T, Float64}; weight=EqualWeight())``````

Track a dictionary that maps unique values to its probability. Similar to `CountMap`, but uses a weighting mechanism.

Example

``````o = ProbMap(Int)
fit!(o, rand(1:10, 1000))
probs(o)``````
source
``Quantile(q = [.25, .5, .75]; alg=OMAS(), rate=LearningRate(.6))``

Calculate quantiles via a stochastic approximation algorithm `OMAS`, `SGD`, `ADAGRAD`, or `MSPI`. For better (although slower) approximations, see `P2Quantile` and `Hist`.

Example

``fit!(Quantile(), randn(10^5))``
source
``RMSPROP(α = .9)``

A Variation of `ADAGRAD` that uses element-wise weights generated by an exponentially weighted mean of the squared gradients.

source
``ReservoirSample(k::Int, T::Type = Float64)``

Create a sample without replacement of size `k`. After running through `n` observations, the probability of an observation being in the sample is `1 / n`.

Example

``fit!(ReservoirSample(100, Int), 1:1000)``
source
``SGD()``

source
``````Series(stats)
Series(stats...)
Series(; stats...)``````

Track a collection stats for one data stream.

Example

``````s = Series(Mean(), Variance())
fit!(s, randn(1000))``````
source
``StatHistory(stat, b)``

Track a moving window (previous `b` copies) of `stat`.

Example

``fit!(StatHistory(Mean(), 10), 1:20)``
source
``StatLearn(p, args...; rate=LearningRate())``

Fit a model that is linear in the parameters.

The (offline) objective function that StatLearn approximately minimizes is

\$(1/n) ∑ᵢ f(yᵢ, xᵢ'β) + ∑ⱼ λⱼ g(βⱼ),\$

where \$fᵢ\$ are loss functions of a single response and linear predictor, \$λⱼ\$s are nonnegative regularization parameters, and \$g\$ is a penalty function.

Arguments

• `loss = .5 * L2DistLoss()`
• `penalty = NoPenalty()`
• `algorithm = SGD()`
• `rate = LearningRate(.6)` (keyword arg)

Example

``````x = randn(1000, 5)
y = x * range(-1, stop=1, length=5) + randn(1000)

o = fit!(StatLearn(5, MSPI()), (x, y))
coef(o)``````
source
``Sum(T::Type = Float64)``

Track the overall sum.

Example

``fit!(Sum(Int), fill(1, 100))``
source
``Variance(T = Float64; weight=EqualWeight())``

Univariate variance, tracked as type `T`.

Example

``````o = fit!(Variance(), randn(10^6))
mean(o)
var(o)
std(o)``````
source
``confint(b::Bootstrap, coverageprob = .95)``

Return a confidence interval for a Bootstrap `b`.

source
``Part(stat, a, b)``

`stat` summarizes a Y variable over an X variable's range `a` to `b`.

source