API

API

Bootstrap(s::Series, nreps, d, f = value)

Online Statistical Bootstrapping.

Create nreps replicates of the OnlineStat in Series s. When fit! is called, each of the replicates will be updated rand(d) times. Standard choices for d are Distributions.Poisson(), [0, 2], etc. value(b) returns f mapped to the replicates.

Example

b = Bootstrap(Series(Mean()), 100, [0, 2])
fit!(b, randn(1000))
value(b)        # `f` mapped to replicates
mean(value(b))  # mean
source
Diff()

Track the difference and the last value.

Example

s = Series(randn(1000), Diff())
value(s)
source
FitCategorical(T)

Fit a categorical distribution where the inputs are of type T.

Example

using Distributions
s = Series(rand(1:10, 1000), FitCategorical(Int))
value(s)

vals = ["small", "medium", "large"]
s = Series(rand(vals, 1000), FitCategorical(String))
value(s)
source
OnlineStats.MVType.
MV(p, o)

Track p univariate OnlineStats o

Example

y = randn(1000, 5)
o = MV(5, Mean())
s = Series(y, o)
source
ReservoirSample(k)
ReservoirSample(k, Float64)

Reservoir sample of k items.

Example

o = ReservoirSample(k, Int)
s = Series(o)
fit!(s, 1:10000)
source
Series(stats...)
Series(data, stats...)
Series(weight, stats...)
Series(weight, data, stats...)

A Series is a container for a Weight and any number of OnlineStats. Updating the Series with fit!(s, data) will update the OnlineStats it holds according to its Weight.

Examples

Series(randn(100), Mean(), Variance())
Series(ExponentialWeight(.1), Mean())

s = Series(Mean())
fit!(s, randn(100))
s2 = Series(randn(123), Mean())
merge(s, s2)
source
StatLearn(p, loss, penalty, λ, updater)

Fit a statistical learning model of p independent variables for a given loss, penalty, and λ. Arguments are:

  • loss: any Loss from LossFunctions.jl

  • penalty: any Penalty from PenaltyFunctions.jl.

  • λ: a Float64 regularization parameter

  • updater: SPGD(), ADAGRAD(), ADAM(), or ADAMAX()

Example

using LossFunctions, PenaltyFunctions
x = randn(100_000, 10)
y = x * linspace(-1, 1, 10) + randn(100_000)
o = StatLearn(10, L2DistLoss(), L1Penalty(), .1, SPGD())
s = Series(o)
fit!(s, x, y)
coef(o)
predict(o, x)
source
    s = Series(randn(1000), StochasticLoss(QuantileLoss(.7)))

Minimize a loss (from LossFunctions.jl) using stochastic gradient descent.

Example

o1 = StochasticLoss(QuantileLoss(.7))  # approx. .7 quantile
o2 = StochasticLoss(L2DistLoss())      # approx. mean
o3 = StochasticLoss(L1DistLoss())      # approx. median
s = Series(randn(10_000), o1, o2, o3)
source
OnlineStats.SumType.
Sum()

Track the overall sum.

Example

s = Series(randn(1000), Sum())
value(s)
source
ADAGRAD(η)

Adaptive (element-wise learning rate) SPGD with step size η

source
ADAM(α1, α2, η)

Adaptive Moment Estimation with step size η and momentum parameters α1, α2

source
ADAMAX(α1, α2, η)

ADAMAX with step size η and momentum parameters α1, α2

source
BoundedEqualWeight(λ::Real = 0.1)
BoundedEqualWeight(lookback::Integer)
  • Use EqualWeight until threshold λ is hit, then hold constant.

  • Singleton weight at observation t is γ = max(1 / t, λ)

source
CovMatrix(d)

Covariance Matrix of d variables.

Example

y = randn(100, 5)
Series(y, CovMatrix(5))
source
EqualWeight()
  • Equally weighted observations

  • Singleton weight at observation t is γ = 1 / t

source
ExponentialWeight(λ::Real = 0.1)
ExponentialWeight(lookback::Integer)
  • Exponentially weighted observations (constant)

  • Singleton weight at observation t is γ = λ

source
Extrema()

Maximum and minimum.

Example

s = Series(randn(100), Extrema())
value(s)
source
FitBeta()

Online parameter estimate of a Beta distribution (Method of Moments)

Example

using Distributions, OnlineStats
y = rand(Beta(3, 5), 1000)
s = Series(y, FitBeta())
Beta(value(s)...)
source
FitCauchy()

Online parameter estimate of a Cauchy distribution

Example

using Distributions
y = rand(Cauchy(0, 10), 10_000)
s = Series(y, FitCauchy())
Cauchy(value(s)...)
source
FitGamma()

Online parameter estimate of a Gamma distribution (Method of Moments)

Example

using Distributions
y = rand(Gamma(5, 1), 1000)
s = Series(y, FitGamma())
Gamma(value(s)...)
source
FitLogNormal()

Online parameter estimate of a LogNormal distribution (MLE)

Example

using Distributions
y = rand(LogNormal(3, 4), 1000)
s = Series(y, FitLogNormal())
LogNormal(value(s)...)
source
FitMultinomial(p)

Online parameter estimate of a Multinomial distribution.

Example

using Distributions
y = rand(Multinomial(10, [.2, .2, .6]), 1000)
s = Series(y', FitMultinomial())
Multinomial(value(s)...)
source
FitMvNormal(d)

Online parameter estimate of a d-dimensional MvNormal distribution (MLE)

Example

using Distributions
y = rand(MvNormal(zeros(3), eye(3)), 1000)
s = Series(y', FitMvNormal(3))
source
FitNormal()

Online parameter estimate of a Normal distribution (MLE)

Example

using Distributions
y = rand(Normal(-3, 4), 1000)
s = Series(y, FitNormal())
source
HarmonicWeight(a = 10.0)
  • Decreases at a slow rate

  • Singleton weight at observation t is γ = a / (a + t - 1)

source
HyperLogLog(b)  # 4 ≤ b ≤ 16

Approximate count of distinct elements.

Example

s = Series(rand(1:10, 1000), HyperLogLog(12))
source
KMeans(p, k)

Approximate K-Means clustering of k clusters of p variables

Example

using OnlineStats, Distributions
d = MixtureModel([Normal(0), Normal(5)])
y = rand(d, 100_000, 1)
s = Series(y, LearningRate(.6), KMeans(1, 2))
source
LearningRate(r = .6, λ = 0.0)
  • Mainly for stochastic approximation types (QuantileSGD, QuantileMM etc.)

  • Decreases at a "slow" rate until threshold λ is reached

  • Singleton weight at observation t is γ = max(1 / t ^ r, λ)

source
LearningRate2(c = .5, λ = 0.0)
  • Mainly for stochastic approximation types (QuantileSGD, QuantileMM etc.)

  • Decreases at a "slow" rate until threshold λ is reached

  • Singleton weight at observation t is γ = max(inv(1 + c * (t - 1), λ)

source
LinReg(p)
LinReg(p, λ)

Create a linear regression object with p predictors and optional ridge (L2-regularization) parameter λ.

Example

x = randn(1000, 5)
y = x * linspace(-1, 1, 5) + randn(1000)
o = LinReg(5)
s = Series(o)
fit!(s, x, y)
coef(o)
predict(o, x)
coeftable(o)
vcov(o)
confint(o)
source
MAXSPGD(η)

SPGD where only the largest gradient element is used to update the parameter.

source
MMXTX(c)

Online MM algorithm via quadratic approximation. Approximates Lipschitz constant with x'x * c * I.

source
McclainWeight(ᾱ = 0.1)
  • "smoothed" version of BoundedEqualWeight

  • weights asymptotically approach ᾱ

  • Singleton weight at observation t is γ(t-1) / (1 + γ(t-1) - ᾱ)

source
Mean()

Univariate mean.

Example

s = Series(randn(100), Mean())
value(s)
source
Moments()

First four non-central moments.

Example

s = Series(randn(1000), Moments(10))
value(s)
source
OrderStats(b)

Average order statistics with batches of size b.

Example

s = Series(randn(1000), OrderStats(10))
value(s)
source
QuantileISGD()

Approximate quantiles via implicit stochastic gradient descent.

Example

s = Series(randn(1000), LearningRate(.7), QuantileISGD())
value(s)
source
QuantileMM()

Approximate quantiles via an online MM algorithm.

Example

s = Series(randn(1000), LearningRate(.7), QuantileMM())
value(s)
source
QuantileSGD()

Approximate quantiles via stochastic gradient descent.

Example

s = Series(randn(1000), LearningRate(.7), QuantileSGD())
value(s)
source
SPGD(η)

Stochastic Proximal Gradient Descent with step size η

source
Variance()

Univariate variance.

Example

s = Series(randn(100), Variance())
value(s)
source
LearnBase.valueMethod.

Map value to the stats field of a Series.

source
maprows(f::Function, b::Integer, data...)

Map rows of data in batches of size b. Most usage is done through do blocks.

Example

s = Series(Mean())
maprows(10, randn(100)) do yi
    fit!(s, yi)
    info("nobs: $(nobs(s))")
end
source
replicates(b)

Return the vector of replicates from Bootstrap b

source
OnlineStats.statsMethod.

Return the stats field of a Series.

source
StatsBase.confintFunction.
confint(b, coverageprob = .95)

Return a confidence interval for a Bootstrap b.

source
StatsBase.fit!Method.
fit!(s, y)
fit!(s, y, w)

Update a Series s with more data y and optional weighting w.

Examples

y = randn(100)
w = rand(100)

s = Series(Mean())
fit!(s, y[1])        # one observation: use Series weight
fit!(s, y[1], w[1])  # one observation: override weight
fit!(s, y)           # multiple observations: use Series weight
fit!(s, y, w[1])     # multiple observations: override each weight with w[1]
fit!(s, y, w)        # multiple observations: y[i] uses weight w[i]
source