API
OnlineStats.ADADELTA
OnlineStats.ADAGRAD
OnlineStats.ADAM
OnlineStats.ADAMAX
OnlineStats.AdaptiveBins
OnlineStats.AutoCov
OnlineStats.BiasVec
OnlineStats.Bootstrap
OnlineStats.CStat
OnlineStats.CallFun
OnlineStats.CountMap
OnlineStats.CovMatrix
OnlineStats.Diff
OnlineStats.Extrema
OnlineStats.FTSeries
OnlineStats.FastForest
OnlineStats.FastTree
OnlineStats.FitBeta
OnlineStats.FitCauchy
OnlineStats.FitGamma
OnlineStats.FitLogNormal
OnlineStats.FitMultinomial
OnlineStats.FitMvNormal
OnlineStats.FitNormal
OnlineStats.Group
OnlineStats.GroupBy
OnlineStats.Hist
OnlineStats.HyperLogLog
OnlineStats.IndexedPartition
OnlineStats.KMeans
OnlineStats.Lag
OnlineStats.LinReg
OnlineStats.LinRegBuilder
OnlineStats.MSPI
OnlineStats.Mean
OnlineStats.Moments
OnlineStats.Mosaic
OnlineStats.MovingTimeWindow
OnlineStats.MovingWindow
OnlineStats.NBClassifier
OnlineStats.OMAP
OnlineStats.OMAS
OnlineStats.OrderStats
OnlineStats.P2Quantile
OnlineStats.Part
OnlineStats.Partition
OnlineStats.PlotNN
OnlineStats.ProbMap
OnlineStats.Quantile
OnlineStats.RMSPROP
OnlineStats.ReservoirSample
OnlineStats.SGD
OnlineStats.Series
OnlineStats.StatHistory
OnlineStats.StatLearn
OnlineStats.Sum
OnlineStats.Variance
OnlineStatsBase.Bounded
OnlineStatsBase.EqualWeight
OnlineStatsBase.ExponentialWeight
OnlineStatsBase.HarmonicWeight
OnlineStatsBase.LearningRate
OnlineStatsBase.McclainWeight
OnlineStatsBase.Scaled
StatsBase.confint
OnlineStats.ADADELTA
— Type.ADADELTA(ρ = .95)
An extension of ADAGRAD
.
OnlineStats.ADAGRAD
— Type.ADAGRAD()
A variation of SGD
with element-wise weights generated by the average of the squared gradients.
OnlineStats.ADAM
— Type.ADAM(β1 = .99, β2 = .999)
A variant of SGD
with element-wise learning rates generated by exponentially weighted first and second moments of the gradient.
OnlineStats.ADAMAX
— Type.ADAMAX(η, β1 = .9, β2 = .999)
ADAMAX with momentum parameters β1
, β2
. ADAMAX is an extension of ADAM
.
OnlineStats.AutoCov
— Type.AutoCov(b, T = Float64; weight=EqualWeight())
Calculate the auto-covariance/correlation for lags 0 to b
for a data stream of type T
.
Example
y = cumsum(randn(100))
o = AutoCov(5)
fit!(o, y)
autocov(o)
autocor(o)
OnlineStats.BiasVec
— Type.BiasVec(x)
Lightweight wrapper of a vector which adds a "bias" term at the end.
Example
BiasVec(rand(5))
OnlineStats.Bootstrap
— Type.Bootstrap(o::OnlineStat, nreps = 100, d = [0, 2])
Calculate an online statistical bootstrap of nreps
replicates of o
. For each call to fit!
, any given replicate will be updated rand(d)
times (default is double or nothing).
Example
o = Bootstrap(Variance())
fit!(o, randn(1000))
confint(o, .95)
OnlineStats.CStat
— Type.CStat(stat)
Track a univariate OnlineStat for complex numbers. A copy of stat
is made to separately track the real and imaginary parts.
Example
y = randn(100) + randn(100)im
fit!(CStat(Mean()), y)
OnlineStats.CallFun
— Type.CallFun(o::OnlineStat, f::Function)
Call f(o)
every time the OnlineStat o
gets updated.
Example
o = CallFun(Mean(), println)
fit!(o, [0,0,1,1])
OnlineStats.CountMap
— Type.CountMap(T::Type)
CountMap(dict::AbstractDict{T, Int})
Track a dictionary that maps unique values to its number of occurrences. Similar to StatsBase.countmap
.
Example
o = fit!(CountMap(Int), rand(1:10, 1000))
value(o)
probs(o)
OnlineStats.pdf(o, 1)
collect(keys(o))
OnlineStats.CovMatrix
— Type.CovMatrix(p=0; weight=EqualWeight())
CovMatrix(::Type{T}, p=0; weight=EqualWeight())
Calculate a covariance/correlation matrix of p
variables. If the number of variables is unknown, leave the default p=0
.
Example
o = fit!(CovMatrix(), randn(100, 4))
cor(o)
cov(o)
mean(o)
var(o)
OnlineStats.Diff
— Type.Diff(T::Type = Float64)
Track the difference and the last value.
Example
o = Diff()
fit!(o, [1.0, 2.0])
last(o)
diff(o)
OnlineStats.Extrema
— Type.Extrema(T::Type = Float64)
Maximum and minimum.
Example
o = fit!(Extrema(), rand(10^5))
extrema(o)
maximum(o)
minimum(o)
OnlineStats.FTSeries
— Type.FTSeries(stats...; filter=x->true, transform=identity)
Track multiple stats for one data stream that is filtered and transformed before being fitted.
FTSeries(T, stats...; filter, transform)
Create an FTSeries and specify the type T
of the transformed values.
Example
o = FTSeries(Mean(), Variance(); transform=abs)
fit!(o, -rand(1000))
# Remove missing values represented as DataValues
using DataValues
y = DataValueArray(randn(100), rand(Bool, 100))
o = FTSeries(DataValue, Mean(); transform=get, filter=!isna)
fit!(o, y)
OnlineStats.FastForest
— Type.FastForest(p, nkeys=2; stat=FitNormal(), kw...)
Calculate a random forest where each variable is summarized by stat
.
Keyword Arguments
nt=100)
: Number of trees in the forestb=floor(Int, sqrt(p))
: Number of random features for each tree to receivemaxsize=1000
: Maximum size for any tree in the forestsplitsize=5000
: Number of observations in any given node before splittingλ = .05
: Probability that each tree is updated on a new observation
Example
x, y = randn(10^5, 10), rand(1:2, 10^5)
o = fit!(FastForest(10), (x,y))
classify(o, x[1,:])
OnlineStats.FastTree
— Type.FastTree(p::Int, nclasses=2; stat=FitNormal(), maxsize=5000, splitsize=1000)
Calculate a decision tree of p
predictors variables and classes 1, 2, …, nclasses
. Nodes split when they reach splitsize
observations until maxsize
nodes are in the tree. Each variable is summarized by stat
, which can be FitNormal()
or Hist(nbins)
.
Example
x = randn(10^5, 10)
y = rand([1,2], 10^5)
o = fit!(FastTree(10), (x,y))
xi = randn(10)
classify(o, xi)
OnlineStats.FitBeta
— Type.FitBeta(; weight)
Online parameter estimate of a Beta distribution (Method of Moments).
Example
o = fit!(FitBeta(), rand(1000))
OnlineStats.FitCauchy
— Type.FitCauchy(; alg, rate)
Approximate parameter estimation of a Cauchy distribution. Estimates are based on quantiles, so that alg
will be passed to Quantile
.
Example
o = fit!(FitCauchy(), randn(1000))
OnlineStats.FitGamma
— Type.FitGamma(; weight)
Online parameter estimate of a Gamma distribution (Method of Moments).
Example
using Random
o = fit!(FitGamma(), randexp(10^5))
OnlineStats.FitLogNormal
— Type.FitLogNormal()
Online parameter estimate of a LogNormal distribution (MLE).
Example
o = fit!(FitLogNormal(), exp.(randn(10^5)))
OnlineStats.FitMultinomial
— Type.FitMultinomial(p)
Online parameter estimate of a Multinomial distribution. The sum of counts does not need to be consistent across observations. Therefore, the n
parameter of the Multinomial distribution is returned as 1.
Example
x = [1 2 3; 4 8 12]
fit!(FitMultinomial(3), x)
OnlineStats.FitMvNormal
— Type.FitMvNormal(d)
Online parameter estimate of a d
-dimensional MvNormal distribution (MLE).
Example
y = randn(100, 2)
o = fit!(FitMvNormal(2), y)
OnlineStats.FitNormal
— Type.FitNormal()
Calculate the parameters of a normal distribution via maximum likelihood.
Example
o = fit!(FitNormal(), randn(1000))
OnlineStats.Group
— Type.Group(stats::OnlineStat...)
Group(; stats...)
Group(collection)
Create a vector-input stat from several scalar-input stats. For a new observation y
, y[i]
is sent to stats[i]
.
Examples
x = randn(100, 2)
fit!(Group(Mean(), Mean()), x)
fit!(Group(Mean(), Variance()), x)
o = fit!(Group(m1 = Mean(), m2 = Mean()), x)
o.stats.m1
o.stats.m2
OnlineStats.GroupBy
— Type.GroupBy{T}(stat)
Update stat
for each group (of type T
).
Example
x = rand(1:10, 10^5)
y = x .+ randn(10^5)
fit!(GroupBy{Int}(Extrema()), zip(x,y))
OnlineStats.Hist
— Type.Hist(nbins)
Hist(edges)
Calculate a histogram over fixed edges
or adaptive nbins
.
Example
using OnlineStats, Statistics
y = randn(10^6)
o = fit!(Hist(20), y)
quantile(o)
mean(o)
var(o)
std(o)
extrema(o)
OnlineStats.pdf(o, 0.0)
OnlineStats.cdf(o, 0.0)
OnlineStats.HyperLogLog
— Type.HyperLogLog(b, T::Type = Number) # 4 ≤ b ≤ 16
Approximate count of distinct elements.
Example
fit!(HyperLogLog(12), rand(1:10,10^5))
OnlineStats.IndexedPartition
— Type.IndexedPartition(T, stat, b=100)
Summarize data with stat
over a partition of size b
where the data is indexed by a variable of type T
.
Example
o = IndexedPartition(Float64, Hist(10))
fit!(o, randn(10^4, 2))
using Plots
plot(o)
OnlineStats.KMeans
— Type.KMeans(p, k; rate=LearningRate(.6))
Approximate K-Means clustering of k
clusters and p
variables.
Example
clusters = rand(Bool, 10^5)
x = [clusters[i] > .5 ? randn() : 5 + randn() for i in 1:10^5, j in 1:2]
o = fit!(KMeans(2, 2), x)
OnlineStats.Lag
— Type.Lag{T}(b::Integer)
Store the last b
values for a data stream of type T
. Values are stored as
$v(t), v(t-1), v(t-2), …, v(t-b+1)$
Example
o = fit!(Lag{Int}(10), 1:12)
o[1]
o[end]
OnlineStats.LinReg
— Type.LinReg()
Linear regression, optionally with element-wise ridge regularization.
Example
x = randn(100, 5)
y = x * (1:5) + randn(100)
o = fit!(LinReg(), (x,y))
coef(o)
coef(o, .1)
coef(o, [0,0,0,0,Inf])
OnlineStats.LinRegBuilder
— Type.LinRegBuilder(p)
Create an object from which any variable can be regressed on any other set of variables, optionally with element-wise ridge regularization. The main function to use with LinRegBuilder
is coef
:
coef(o::LinRegBuilder, λ = 0; y=1, x=[2,3,...], bias=true, verbose=false)
Return the coefficients of a regressing column y
on columns x
with ridge (L2Penalty
) parameter λ
. An intercept (bias
) term is added by default.
Examples
x = randn(1000, 10)
o = fit!(LinRegBuilder(), x)
coef(o; y=3, verbose=true)
coef(o; y=7, x=[2,5,4])
OnlineStats.MSPI
— Type.MSPI()
Majorized Stochastic Proximal Iteration.
OnlineStats.Mean
— Type.Mean(; weight=EqualWeight())
Track a univariate mean.
Update
$μ = (1 - γ) * μ + γ * x$
Example
@time fit!(Mean(), randn(10^6))
OnlineStats.Moments
— Type.Moments(; weight=EqualWeight())
First four non-central moments.
Example
o = fit!(Moments(), randn(1000))
mean(o)
var(o)
std(o)
skewness(o)
kurtosis(o)
OnlineStats.Mosaic
— Type.Mosaic(T::Type, S::Type)
Data structure for generating a mosaic plot, a comparison between two categorical variables.
Example
using OnlineStats, Plots
x = [rand() > .8 for i in 1:10^5]
y = rand([1,2,2,3,3,3], 10^5)
o = fit!(Mosaic(Bool, Int), zip(x, y))
plot(o)
OnlineStats.MovingTimeWindow
— Type.MovingTimeWindow{T<:TimeType, S}(window::DatePeriod)
MovingTimeWindow(window::DatePeriod; valtype=Float64, timetype=Date)
Fit a moving window of data based on time stamps. Each observation must be a Tuple
, NamedTuple
, or Pair
where the first item is <: Dates.TimeType
. Only observations with time stamps in the range
$most_recent_datetime - window <= time_stamp <= most_recent_datetime$
are kept track of.
Example
using Dates
dts = Date(2010):Day(1):Date(2011)
y = rand(length(dts))
o = MovingTimeWindow(Day(4); timetype=Date, valtype=Float64)
fit!(o, zip(dts, y))
OnlineStats.MovingWindow
— Type.MovingWindow(b, T)
MovingWindow(T, b)
Track a moving window of b
items of type T
.
Example
o = MovingWindow(10, Int)
fit!(o, 1:14)
OnlineStats.NBClassifier
— Type.NBClassifier(p::Int, T::Type; stat = Hist(15))
Calculate a naive bayes classifier for classes of type T
and p
predictors. For each class K
, predictor variables are summarized by the stat
.
Example
x, y = randn(10^4, 10), rand(Bool, 10^4)
o = fit!(NBClassifier(10, Bool), (x,y))
collect(keys(o))
probs(o)
xi = randn(10)
predict(o, xi)
classify(o, xi)
OnlineStats.OMAP
— Type.OMAP()
Online MM via Averaged Parameter.
OnlineStats.OMAS
— Type.OMAS()
Online MM via Averaged Surrogate.
OnlineStats.OrderStats
— Type.OrderStats(b::Int, T::Type = Float64; weight=EqualWeight())
Average order statistics with batches of size b
.
Example
o = fit!(OrderStats(100), randn(10^5))
quantile(o, [.25, .5, .75])
OnlineStats.P2Quantile
— Type.P2Quantile(τ = 0.5)
Calculate the approximate quantile via the P^2 algorithm. It is more computationally expensive than the algorithms used by Quantile
, but also more exact.
Ref: https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf
Example
fit!(P2Quantile(.5), rand(10^5))
OnlineStats.Partition
— Type.Partition(stat, nparts=100)
Split a data stream into nparts
where each part is summarized by stat
.
Example
o = Partition(Extrema())
fit!(o, cumsum(randn(10^5)))
using Plots
plot(o)
OnlineStats.PlotNN
— Type.PlotNN(b=300)
Approximate scatterplot of b
centers. This implementation is too slow to be useful.
Example
x = randn(10^4)
y = x + randn(10^4)
plot(fit!(PlotNN(), zip(x, y)))
OnlineStats.ProbMap
— Type.ProbMap(T::Type; weight=EqualWeight())
ProbMap(A::AbstractDict{T, Float64}; weight=EqualWeight())
Track a dictionary that maps unique values to its probability. Similar to CountMap
, but uses a weighting mechanism.
Example
o = ProbMap(Int)
fit!(o, rand(1:10, 1000))
probs(o)
OnlineStats.Quantile
— Type.Quantile(q = [.25, .5, .75]; alg=OMAS(), rate=LearningRate(.6))
Calculate quantiles via a stochastic approximation algorithm OMAS
, SGD
, ADAGRAD
, or MSPI
. For better (although slower) approximations, see P2Quantile
and Hist
.
Example
fit!(Quantile(), randn(10^5))
OnlineStats.RMSPROP
— Type.RMSPROP(α = .9)
A Variation of ADAGRAD
that uses element-wise weights generated by an exponentially weighted mean of the squared gradients.
OnlineStats.ReservoirSample
— Type.ReservoirSample(k::Int, T::Type = Float64)
Create a sample without replacement of size k
. After running through n
observations, the probability of an observation being in the sample is 1 / n
.
Example
fit!(ReservoirSample(100, Int), 1:1000)
OnlineStats.SGD
— Type.SGD()
Stochastic Gradient Descent.
OnlineStats.Series
— Type.Series(stats)
Series(stats...)
Series(; stats...)
Track a collection stats for one data stream.
Example
s = Series(Mean(), Variance())
fit!(s, randn(1000))
OnlineStats.StatHistory
— Type.StatHistory(stat, b)
Track a moving window (previous b
copies) of stat
.
Example
fit!(StatHistory(Mean(), 10), 1:20)
OnlineStats.StatLearn
— Type.StatLearn(p, args...; rate=LearningRate())
Fit a model that is linear in the parameters.
The (offline) objective function that StatLearn approximately minimizes is
$(1/n) ∑ᵢ f(yᵢ, xᵢ'β) + ∑ⱼ λⱼ g(βⱼ),$
where $fᵢ$ are loss functions of a single response and linear predictor, $λⱼ$s are nonnegative regularization parameters, and $g$ is a penalty function.
Arguments
loss = .5 * L2DistLoss()
penalty = NoPenalty()
algorithm = SGD()
rate = LearningRate(.6)
(keyword arg)
Example
x = randn(1000, 5)
y = x * range(-1, stop=1, length=5) + randn(1000)
o = fit!(StatLearn(5, MSPI()), (x, y))
coef(o)
OnlineStats.Sum
— Type.Sum(T::Type = Float64)
Track the overall sum.
Example
fit!(Sum(Int), fill(1, 100))
OnlineStats.Variance
— Type.Variance(; weight=EqualWeight())
Univariate variance.
Example
o = fit!(Variance(), randn(10^6))
mean(o)
var(o)
std(o)
StatsBase.confint
— Function.confint(b::Bootstrap, coverageprob = .95)
Return a confidence interval for a Bootstrap b
.
OnlineStats.AdaptiveBins
— Type.Calculate a histogram adaptively.
Ref: http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf
OnlineStats.Part
— Type.Part(stat, a, b)
stat
summarizes a Y variable over an X variable's range a
to b
.