API

OnlineStats.Bootstrap — Type.

Bootstrap(s::Series, nreps, d, f = value)

Online Statistical Bootstrapping.

Create nreps replicates of the OnlineStat in Series s. When fit! is called, each of the replicates will be updated rand(d) times. Standard choices for d are Distributions.Poisson(), [0, 2], etc. value(b) returns f mapped to the replicates.

Example

b = Bootstrap(Series(Mean()), 100, [0, 2])
fit!(b, randn(1000))
value(b)        # `f` mapped to replicates
mean(value(b))  # mean

OnlineStats.Diff — Type.

Diff()

Track the difference and the last value.

Example

s = Series(randn(1000), Diff())
value(s)

OnlineStats.FitCategorical — Type.

FitCategorical(T)

Fit a categorical distribution where the inputs are of type T.

Example

using Distributions
s = Series(rand(1:10, 1000), FitCategorical(Int))
value(s)

vals = ["small", "medium", "large"]
s = Series(rand(vals, 1000), FitCategorical(String))
value(s)

OnlineStats.MV — Type.

MV(p, o)

Track p univariate OnlineStats o

Example

y = randn(1000, 5)
o = MV(5, Mean())
s = Series(y, o)

OnlineStats.ReservoirSample — Type.

ReservoirSample(k)
ReservoirSample(k, Float64)

Reservoir sample of k items.

Example

o = ReservoirSample(k, Int)
s = Series(o)
fit!(s, 1:10000)

OnlineStats.Series — Type.

Series(stats...)
Series(data, stats...)
Series(weight, stats...)
Series(weight, data, stats...)

A Series is a container for a Weight and any number of OnlineStats. Updating the Series with fit!(s, data) will update the OnlineStats it holds according to its Weight.

Examples

Series(randn(100), Mean(), Variance())
Series(ExponentialWeight(.1), Mean())

s = Series(Mean())
fit!(s, randn(100))
s2 = Series(randn(123), Mean())
merge(s, s2)

OnlineStats.StatLearn — Type.

StatLearn(p, loss, penalty, λ, updater)

Fit a statistical learning model of p independent variables for a given loss, penalty, and λ. Arguments are:

loss: any Loss from LossFunctions.jl
penalty: any Penalty from PenaltyFunctions.jl.
λ: a Float64 regularization parameter
updater: SPGD(), ADAGRAD(), ADAM(), or ADAMAX()

Example

using LossFunctions, PenaltyFunctions
x = randn(100_000, 10)
y = x * linspace(-1, 1, 10) + randn(100_000)
o = StatLearn(10, L2DistLoss(), L1Penalty(), .1, SPGD())
s = Series(o)
fit!(s, x, y)
coef(o)
predict(o, x)

OnlineStats.StochasticLoss — Type.

    s = Series(randn(1000), StochasticLoss(QuantileLoss(.7)))

Minimize a loss (from LossFunctions.jl) using stochastic gradient descent.

Example

o1 = StochasticLoss(QuantileLoss(.7))  # approx. .7 quantile
o2 = StochasticLoss(L2DistLoss())      # approx. mean
o3 = StochasticLoss(L1DistLoss())      # approx. median
s = Series(randn(10_000), o1, o2, o3)

OnlineStats.Sum — Type.

Sum()

Track the overall sum.

Example

s = Series(randn(1000), Sum())
value(s)

OnlineStats.ADAGRAD — Type.

ADAGRAD(η)

Adaptive (element-wise learning rate) SPGD with step size η

OnlineStats.ADAM — Type.

ADAM(α1, α2, η)

Adaptive Moment Estimation with step size η and momentum parameters α1, α2

OnlineStats.ADAMAX — Type.

ADAMAX(α1, α2, η)

ADAMAX with step size η and momentum parameters α1, α2

OnlineStats.BoundedEqualWeight — Type.

BoundedEqualWeight(λ::Real = 0.1)
BoundedEqualWeight(lookback::Integer)

Use EqualWeight until threshold λ is hit, then hold constant.
Singleton weight at observation t is γ = max(1 / t, λ)

OnlineStats.CovMatrix — Type.

CovMatrix(d)

Covariance Matrix of d variables.

Example

y = randn(100, 5)
Series(y, CovMatrix(5))

OnlineStats.EqualWeight — Type.

EqualWeight()

Equally weighted observations
Singleton weight at observation t is γ = 1 / t

OnlineStats.ExponentialWeight — Type.

ExponentialWeight(λ::Real = 0.1)
ExponentialWeight(lookback::Integer)

Exponentially weighted observations (constant)
Singleton weight at observation t is γ = λ

OnlineStats.Extrema — Type.

Extrema()

Maximum and minimum.

Example

s = Series(randn(100), Extrema())
value(s)

OnlineStats.FitBeta — Type.

FitBeta()

Online parameter estimate of a Beta distribution (Method of Moments)

Example

using Distributions, OnlineStats
y = rand(Beta(3, 5), 1000)
s = Series(y, FitBeta())
Beta(value(s)...)

OnlineStats.FitCauchy — Type.

FitCauchy()

Online parameter estimate of a Cauchy distribution

Example

using Distributions
y = rand(Cauchy(0, 10), 10_000)
s = Series(y, FitCauchy())
Cauchy(value(s)...)

OnlineStats.FitGamma — Type.

FitGamma()

Online parameter estimate of a Gamma distribution (Method of Moments)

Example

using Distributions
y = rand(Gamma(5, 1), 1000)
s = Series(y, FitGamma())
Gamma(value(s)...)

OnlineStats.FitLogNormal — Type.

FitLogNormal()

Online parameter estimate of a LogNormal distribution (MLE)

Example

using Distributions
y = rand(LogNormal(3, 4), 1000)
s = Series(y, FitLogNormal())
LogNormal(value(s)...)

OnlineStats.FitMultinomial — Type.

FitMultinomial(p)

Online parameter estimate of a Multinomial distribution.

Example

using Distributions
y = rand(Multinomial(10, [.2, .2, .6]), 1000)
s = Series(y', FitMultinomial())
Multinomial(value(s)...)

OnlineStats.FitMvNormal — Type.

FitMvNormal(d)

Online parameter estimate of a d-dimensional MvNormal distribution (MLE)

Example

using Distributions
y = rand(MvNormal(zeros(3), eye(3)), 1000)
s = Series(y', FitMvNormal(3))

OnlineStats.FitNormal — Type.

FitNormal()

Online parameter estimate of a Normal distribution (MLE)

Example

using Distributions
y = rand(Normal(-3, 4), 1000)
s = Series(y, FitNormal())

OnlineStats.HarmonicWeight — Type.

HarmonicWeight(a = 10.0)

Decreases at a slow rate
Singleton weight at observation t is γ = a / (a + t - 1)

OnlineStats.HyperLogLog — Type.

HyperLogLog(b)  # 4 ≤ b ≤ 16

Approximate count of distinct elements.

Example

s = Series(rand(1:10, 1000), HyperLogLog(12))

OnlineStats.KMeans — Type.

KMeans(p, k)

Approximate K-Means clustering of k clusters of p variables

Example

using OnlineStats, Distributions
d = MixtureModel([Normal(0), Normal(5)])
y = rand(d, 100_000, 1)
s = Series(y, LearningRate(.6), KMeans(1, 2))

OnlineStats.LearningRate — Type.

LearningRate(r = .6, λ = 0.0)

Mainly for stochastic approximation types (QuantileSGD, QuantileMM etc.)
Decreases at a "slow" rate until threshold λ is reached
Singleton weight at observation t is γ = max(1 / t ^ r, λ)

OnlineStats.LearningRate2 — Type.

LearningRate2(c = .5, λ = 0.0)

Mainly for stochastic approximation types (QuantileSGD, QuantileMM etc.)
Decreases at a "slow" rate until threshold λ is reached
Singleton weight at observation t is γ = max(inv(1 + c * (t - 1), λ)

OnlineStats.LinReg — Type.

LinReg(p)
LinReg(p, λ)

Create a linear regression object with p predictors and optional ridge (L2-regularization) parameter λ.

Example

x = randn(1000, 5)
y = x * linspace(-1, 1, 5) + randn(1000)
o = LinReg(5)
s = Series(o)
fit!(s, x, y)
coef(o)
predict(o, x)
coeftable(o)
vcov(o)
confint(o)

OnlineStats.MAXSPGD — Type.

MAXSPGD(η)

SPGD where only the largest gradient element is used to update the parameter.

OnlineStats.MMXTX — Type.

MMXTX(c)

Online MM algorithm via quadratic approximation. Approximates Lipschitz constant with x'x * c * I.

OnlineStats.McclainWeight — Type.

McclainWeight(ᾱ = 0.1)

"smoothed" version of BoundedEqualWeight
weights asymptotically approach ᾱ
Singleton weight at observation t is γ(t-1) / (1 + γ(t-1) - ᾱ)

OnlineStats.Mean — Type.

Mean()

Univariate mean.

Example

s = Series(randn(100), Mean())
value(s)

OnlineStats.Moments — Type.

Moments()

First four non-central moments.

Example

s = Series(randn(1000), Moments(10))
value(s)

OnlineStats.OrderStats — Type.

OrderStats(b)

Average order statistics with batches of size b.

Example

s = Series(randn(1000), OrderStats(10))
value(s)

OnlineStats.QuantileISGD — Type.

QuantileISGD()

Approximate quantiles via implicit stochastic gradient descent.

Example

s = Series(randn(1000), LearningRate(.7), QuantileISGD())
value(s)

OnlineStats.QuantileMM — Type.

QuantileMM()

Approximate quantiles via an online MM algorithm.

Example

s = Series(randn(1000), LearningRate(.7), QuantileMM())
value(s)

OnlineStats.QuantileSGD — Type.

QuantileSGD()

Approximate quantiles via stochastic gradient descent.

Example

s = Series(randn(1000), LearningRate(.7), QuantileSGD())
value(s)

OnlineStats.SPGD — Type.

SPGD(η)

Stochastic Proximal Gradient Descent with step size η

OnlineStats.Variance — Type.

Variance()

Univariate variance.

Example

s = Series(randn(100), Variance())
value(s)

LearnBase.value — Method.

Map value to the stats field of a Series.

OnlineStats.maprows — Method.

maprows(f::Function, b::Integer, data...)

Map rows of data in batches of size b. Most usage is done through do blocks.

Example

s = Series(Mean())
maprows(10, randn(100)) do yi
    fit!(s, yi)
    info("nobs: $(nobs(s))")
end

OnlineStats.replicates — Method.

replicates(b)

Return the vector of replicates from Bootstrap b

OnlineStats.stats — Method.

Return the stats field of a Series.

StatsBase.confint — Function.

confint(b, coverageprob = .95)

Return a confidence interval for a Bootstrap b.

StatsBase.fit! — Method.

fit!(s, y)
fit!(s, y, w)

Update a Series s with more data y and optional weighting w.

Examples

y = randn(100)
w = rand(100)

s = Series(Mean())
fit!(s, y[1])        # one observation: use Series weight
fit!(s, y[1], w[1])  # one observation: override weight
fit!(s, y)           # multiple observations: use Series weight
fit!(s, y, w[1])     # multiple observations: override each weight with w[1]
fit!(s, y, w)        # multiple observations: y[i] uses weight w[i]