Details of Updating (fit!)
Core Principles
- Stats are subtypes of
OnlineStat{T}whereTis the type of a single observation.- E.g.
Mean <: OnlineStat{Number}
- E.g.
fit!(o::OnlineStat{T}, data::T)- Update
owith the single observationdata.
- Update
fit!(o::OnlineStat{T}, data::S)- Iterate through
dataandfit!each item.
- Iterate through
Why is Fitting Based on Iteration?
Reason 1: OnlineStats doesn't make assumptions on the shape of your data
Consider CovMatrix, for which a single observation is an AbstractVector, Tuple, or NamedTuple. If I try to fit! it with a Matrix, it's ambiguous whether I want rows or columns of the matrix to be treated as individual observations.
x = randn(1000, 2)
fit!(CovMatrix(), eachrow(x))
fit!(CovMatrix(), eachcol(x'))CovMatrix: n=1_000 | value=[1.0189 -0.065982; -0.065982 1.07008]Reason 2: OnlineStats works out-of-the-box with many data structures
Tabular data structures such as those in JuliaDB iterate over named tuples of rows, so things like this just work:
using JuliaDB
t = table(randn(100), randn(100))
fit!(2Mean(), t)A Common Error
Consider the following example:
julia> fit!(Mean(), "asdf")ERROR: The input for Mean is Number. Found Char.
This causes an error because:
"asdf"is not aNumber, so OnlineStats attempts to iterate through it- Iterating through
"asdf"begins with the character'a'