Statistics and Models
Univariate Statistics
| Statistic | OnlineStat |
|---|---|
| Mean | Mean |
| Variance | Variance |
| Quantiles | Quantile, OrderStats, and P2Quantile |
| Maximum/Minimum | Extrema |
| Skewness and kurtosis | Moments |
| Sum | Sum |
| Geometric Mean | GeometricMean |
Plotting (See Data Visualization)
| Plot | OnlineStat |
|---|---|
| Big Data Viz | Partition, IndexedPartition, KIndexedPartition |
| Mosaic Plot | Mosaic |
| HeatMap | HeatMap |
Time Series
| Statistic | OnlineStat |
|---|---|
| Difference | Diff |
| Lag | Lag |
| Autocorrelation/autocovariance | AutoCov |
| Tracked history | Trace, StatLag |
Multivariate Analysis
| Statistic/Model | OnlineStat |
|---|---|
| Covariance/correlation matrix | CovMatrix |
| Principal components analysis | CovMatrix, CCIPCA |
| K-means clustering | KMeans |
| Multiple univariate statistics | Group |
Nonparametric Density Estimation
| Statistic/Model | OnlineStat |
|---|---|
| Histograms/continuous density | Hist, KHist, and ExpandingHist |
| ASH density (semiparametric, similar to KDE) | Ash |
| Approximate order statistics | OrderStats |
| Count for each unique value | CountMap |
| Approximate CDF | OrderStats |
Parametric Density Estimation
| Distribution | OnlineStat |
|---|---|
| Beta | FitBeta |
| Cauchy | FitCauchy |
| Gamma | FitGamma |
| LogNormal | FitLogNormal |
| Normal | FitNormal |
| Multinomial | FitMultinomial |
| MvNormal | FitMvNormal |
Machine/Statistical Learning
| Model | OnlineStat |
|---|---|
| Linear (also ridge) regression | LinReg, LinRegBuilder |
| Decision Trees | FastTree |
| Random Forest | FastForest |
| Naive Bayes Classifier | NBClassifier |
| ML via Stochastic Approximation | StatLearn |
Other
| Statistic/Model | OnlineStat |
|---|---|
| Handling Missing Data | FilterTransform, CountMissing, SkipMissing |
| Statistical Bootstrap | Bootstrap |
| Approx. count of distinct elements | HyperLogLog |
| Approx. count of occurrences | CountMinSketch |
| Random sample | ReservoirSample |
| Moving Window | MovingWindow, MovingTimeWindow |
Collection of Stats
| Statistic/Model | OnlineStat |
|---|---|
| Univariate data stream | Series |
| Multivariate data streams | Group |
| Group by categorical variable | GroupBy |
Stochastic Approximation with StatLearn
Regression and Classification Losses
| Loss | Function |
|---|---|
| $L_{2}$ Loss (squared error) | OnlineStats.l2regloss |
| $L_{1}$ Loss (absolute error) | OnlineStats.l1regloss |
| Logistic Loss | OnlineStats.logisticloss |
| $L_{1}$ Hinge Loss | OnlineStats.l1hingeloss |
| Generalized distance weighted discrimination | OnlineStats.DWDLoss |
Penalty/regularization functions
| Penalty | Function |
|---|---|
| None | zero |
| LASSO ($L_{1}$ penalty) | abs |
| Ridge ($L_{2}$ penalty) | abs2 |
| Elastic Net | OnlineStats.ElasticNet |
Optimization Algorithms
| Algorithm | Constructor |
|---|---|
| Stochastic Gradient Descent | SGD |
| RMSProp | RMSPROP |
| AdaGrad | ADAGRAD |
| AdaDelta | ADADELTA |
| ADAM | ADAM |
| ADAMax | ADAMAX |
| MSPI (Majorized Stochastic Proximal Iteration) | MSPI |
| Online Majorization-Minimization (MM) - averaged surrogate | OMAS |
| Online MM - Averaged Parameter | OMAP |