Data Vizualization

Note

Each of the following examples plots one million data points, but can scale to infinitely many observations, since only a summary (OnlineStat) of the data is plotted.

Partitions

The Partition type summarizes sections of a data stream using any OnlineStat, and is therefore extremely useful in visualizing huge datasets, as summaries are plotted rather than every single observation.

Continuous Data

y = cumsum(randn(10^6)) + 100randn(10^6)

o = Partition(KHist(10))

fit!(o, y)

plot(o)
o = Partition(Series(Mean(), Extrema()))

fit!(o, y)

plot(o)

Categorical Data

y = rand(["a", "a", "b", "c"], 10^6)

o = Partition(CountMap(String), 75)

fit!(o, y)

plot(o)

Indexed Partitions

The Partition type can only track the number of observations in the x-axis. If you wish to plot one variable against another, you can use an IndexedPartition.

x = randn(10^6)
y = x + randn(10^6)

o = fit!(IndexedPartition(Float64, KHist(40), 40), zip(x, y))

plot(o)