API Reference¶

Batch versions of the algorithm¶

class megamix.online.kmeans.Kmeans(n_components=1, window=1, kappa=1.0)¶

Kmeans model.

Parameters:

n_components (int, defaults to 1.) – Number of clusters used.
window (int, defaults to 1) – The number of points used at the same time in order to update the parameters.
kappa (double, defaults to 1.0) –
A coefficient in ]0.0,1.0] which give weight or not to the new points compared to the ones already used.
- If kappa is nearly null, the new points have a big weight and the model may
take a lot of time to stabilize.
- If kappa = 1.0, the new points won’t have a lot of weight and the model may
not move enough from its initialization.

name¶

The name of the method : ‘Kmeans’

Type:	str

log_weights¶

Contains the logarithm of the mixing coefficients of the model.

Type:	array of floats (n_components)

means¶

Contains the computed means of the model.

Type:	array of floats (n_components,dim)

N¶

The sufficient statistic updated during each iteration used to compute log_weights (this corresponds to the mixing coefficients).

Type:	array of floats (n_components,)

X¶

The sufficient statistic updated during each iteration used to compute the means.

Type:	array of floats (n_components,dim)

iter¶

The number of points which have been used to compute the model.

Type:	int

_is_initialized¶

Ensures that the model has been initialized before using other methods such as fit(), distortion() or predict_assignements().

Type:	bool

Raises:	ValueError : if the parameters are inconsistent, for example if the cluster number is negative, init_type is not in [‘resp’,’mcw’]…

References

Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling, C. Dupuy & F. Bach ‘The remarkable k-means++ <https://normaldeviate.wordpress.com/2012/09/30/the-remarkable-k-means/>’_

fit(points, saving=None, file_name='model', saving_iter=2)¶

The k-means algorithm

Other Parameters:
Parameters:	points (array (n_points,dim)) – A 2D array of points on which the model will be trained. saving_iter (int \| defaults 2) – An int to know how often the model is saved (see saving below). file_name (str \| defaults model) – The name of the file (including the path).
	saving (str \| Optional) – A string in [‘log’,’linear’]. In the following equations x is the parameter saving_iter (see above). If ‘log’, the model will be saved for all iterations which verify : log(iter)/log(x) is an int If ‘linear’ the model will be saved for all iterations which verify : iter/x is an int
Returns:
Return type:	None

initialize(points)¶

This method initializes the Gaussian Mixture by setting the values of the means, covariances and weights.

Parameters:	points_data (an array (n_points,dim)) – Data on which the model is fitted. points_test (an array (n_points,dim) \| Optional) – Data used to do early stopping (avoid overfitting)

predict_assignements(points)¶: This function return the hard assignements of points once the model is fitted.

score(points, assignements=None)¶

This method returns the distortion measurement at the end of the k_means.

Parameters:	points (an array (n_points,dim)) – assignements (an array (n_components,dim)) – an array containing the responsibilities of the clusters
Returns:	distortion
Return type:	(float)