How fit! Works

How fit! Works

Why is Fitting Based on Iteration?

Reason 1: OnlineStats doesn't want to make assumptions on the shape of your data

Consider CovMatrix, for which a single observation is an AbstractVector, Tuple, or NamedTuple. If I try to update it with a Matrix, it's ambiguous whether I want rows or columns of the matrix to be treated as individual observations.

By default, OnlineStats will try observations-in-rows, but you can alternately/explicitly use the OnlineStatsBase.eachrow and OnlineStatsBase.eachcol functions, which efficiently iterate over the rows or columns of the matrix, respectively.

fit!(CovMatrix(), eachrow(randn(1000,2)))

fit!(CovMatrix(), eachcol(randn(2,1000)))
CovMatrix: n=1000 | value=[0.954284 0.0104256; 0.0104256 0.95253]

Reason 2: OnlineStats naturally works out-of-the-box with many data structures

Tabular data structures such as those in JuliaDB iterate over named tuples of rows, so things like this just work:

using JuliaDB

t = table(randn(100), randn(100))

fit!(2Mean(), t)

A Common Error

Consider the following example:

julia> fit!(Mean(), "asdf")
ERROR: The input for Mean is a Number.  Found Char.

This causes an error because:

  1. "asdf" is not a Number, so OnlineStats attempts to iterate through it
  2. Iterating through "asdf" begins with the character 'a'