Parallel Computation

Parallel Computation

Two OnlineStats can be merged together, which facilitates Embarassingly parallel computations. Merging in OnlineStats is used by JuliaDB to run analytics in parallel on large persistent datasets.

Note

In general, fit! is a cheaper operation than merge!.

ExactStat merges

Many OnlineStats are capable of calculating the exact value as a corresponding offline estimator. For these types, the order of fit!-ting and merge!-ing does not matter.

y1 = randn(10_000)
y2 = randn(10_000)
y3 = randn(10_000)

s1 = Series(Mean(), Variance(), Hist(50))
s2 = Series(Mean(), Variance(), Hist(50))
s3 = Series(Mean(), Variance(), Hist(50))

fit!(s1, y1)
fit!(s2, y2)
fit!(s3, y3)

merge!(s1, s2)  # merge information from s2 into s1
merge!(s1, s3)  # merge information from s3 into s1

Other Merges

For OnlineStats that rely on approximations, merging isn't always a well-defined operation. OnlineStats will either make a sane choice for merging or print a warning that merging did not occur. Please open an issue to discuss a stat you believe you should be merge-able.