API Reference

Batch versions of the algorithm

Kmeans

Gaussian Mixture Model (GMM)

Variational Gaussian Mixture Model (VBGMM)

Dirichlet Process Gaussian Mixture Model (DPGMM)

Online versions of the algorithm

Kmeans

class megamix.online.kmeans.Kmeans(n_components=1, window=1, kappa=1.0)

Kmeans model.

Parameters:
  • n_components (int, defaults to 1.) – Number of clusters used.
  • window (int, defaults to 1) – The number of points used at the same time in order to update the parameters.
  • kappa (double, defaults to 1.0) –

    A coefficient in ]0.0,1.0] which give weight or not to the new points compared to the ones already used.

    • If kappa is nearly null, the new points have a big weight and the model may

    take a lot of time to stabilize.

    • If kappa = 1.0, the new points won’t have a lot of weight and the model may

    not move enough from its initialization.

name

The name of the method : ‘Kmeans’

Type:str
log_weights

Contains the logarithm of the mixing coefficients of the model.

Type:array of floats (n_components)
means

Contains the computed means of the model.

Type:array of floats (n_components,dim)
N

The sufficient statistic updated during each iteration used to compute log_weights (this corresponds to the mixing coefficients).

Type:array of floats (n_components,)
X

The sufficient statistic updated during each iteration used to compute the means.

Type:array of floats (n_components,dim)
iter

The number of points which have been used to compute the model.

Type:int
_is_initialized

Ensures that the model has been initialized before using other methods such as fit(), distortion() or predict_assignements().

Type:bool
Raises:ValueError : if the parameters are inconsistent, for example if the cluster number is negative, init_type is not in [‘resp’,’mcw’]…

References

Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling, C. Dupuy & F. Bach ‘The remarkable k-means++ <https://normaldeviate.wordpress.com/2012/09/30/the-remarkable-k-means/>’_

fit(points, saving=None, file_name='model', saving_iter=2)

The k-means algorithm

Parameters:
  • points (array (n_points,dim)) – A 2D array of points on which the model will be trained.
  • saving_iter (int | defaults 2) – An int to know how often the model is saved (see saving below).
  • file_name (str | defaults model) – The name of the file (including the path).
Other Parameters:
 

saving (str | Optional) – A string in [‘log’,’linear’]. In the following equations x is the parameter saving_iter (see above).

  • If ‘log’, the model will be saved for all iterations which verify :
    log(iter)/log(x) is an int
  • If ‘linear’ the model will be saved for all iterations which verify :
    iter/x is an int
Returns:

Return type:

None

get(name)
initialize(points)

This method initializes the Gaussian Mixture by setting the values of the means, covariances and weights.

Parameters:
  • points_data (an array (n_points,dim)) – Data on which the model is fitted.
  • points_test (an array (n_points,dim) | Optional) – Data used to do early stopping (avoid overfitting)
predict_assignements(points)

This function return the hard assignements of points once the model is fitted.

score(points, assignements=None)

This method returns the distortion measurement at the end of the k_means.

Parameters:
  • points (an array (n_points,dim)) –
  • assignements (an array (n_components,dim)) – an array containing the responsibilities of the clusters
Returns:

distortion

Return type:

(float)

megamix.online.kmeans.dist_matrix(points, means)

Gaussian Mixture Model (GMM)