OnlineStats is a Julia package which provides online parallelizable algorithms for statistics. Online algorithms are well suited for streaming data or when data is too large to hold in memory. Observations are processed one at a time and all algorithms use O(1) memory.



Summary of Usage

Every statistic/model is a type (<: OnlineStat)

using OnlineStats 

m = Mean()
v = Variance()

OnlineStats are grouped by Series

s = Series(m, v)

Updating a Series updates the contained OnlineStats

y = randn(1000)

# for yi in y
#     fit!(s, yi)
# end
fit!(s, y)

OnlineStats have a value

value(m) ≈ mean(y)    
value(v) ≈ var(y)  

Merging a Series merges the contained OnlineStats

See Parallel Computation.

y2 = randn(123)

s2 = Series(y2, Mean(), Variance())

merge!(s, s2)

value(m) ≈ mean(vcat(y, y2))    
value(v) ≈ var(vcat(y, y2))  

Much more than means and variances

OnlineStats can do a lot. See Statistics and Models.