API Reference¶
Batch versions of the algorithm¶
Kmeans¶
Gaussian Mixture Model (GMM)¶
Variational Gaussian Mixture Model (VBGMM)¶
Dirichlet Process Gaussian Mixture Model (DPGMM)¶
Online versions of the algorithm¶
Kmeans¶
-
class
megamix.online.kmeans.
Kmeans
(n_components=1, window=1, kappa=1.0)¶ Kmeans model.
Parameters: - n_components (int, defaults to 1.) – Number of clusters used.
- window (int, defaults to 1) – The number of points used at the same time in order to update the parameters.
- kappa (double, defaults to 1.0) –
A coefficient in ]0.0,1.0] which give weight or not to the new points compared to the ones already used.
- If kappa is nearly null, the new points have a big weight and the model may
take a lot of time to stabilize.
- If kappa = 1.0, the new points won’t have a lot of weight and the model may
not move enough from its initialization.
-
name
¶ The name of the method : ‘Kmeans’
Type: str
-
log_weights
¶ Contains the logarithm of the mixing coefficients of the model.
Type: array of floats (n_components)
-
means
¶ Contains the computed means of the model.
Type: array of floats (n_components,dim)
-
N
¶ The sufficient statistic updated during each iteration used to compute log_weights (this corresponds to the mixing coefficients).
Type: array of floats (n_components,)
-
X
¶ The sufficient statistic updated during each iteration used to compute the means.
Type: array of floats (n_components,dim)
-
iter
¶ The number of points which have been used to compute the model.
Type: int
-
_is_initialized
¶ Ensures that the model has been initialized before using other methods such as fit(), distortion() or predict_assignements().
Type: bool
Raises: ValueError : if the parameters are inconsistent, for example if the cluster number is negative, init_type is not in [‘resp’,’mcw’]… References
Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling, C. Dupuy & F. Bach ‘The remarkable k-means++ <https://normaldeviate.wordpress.com/2012/09/30/the-remarkable-k-means/>’_
-
fit
(points, saving=None, file_name='model', saving_iter=2)¶ The k-means algorithm
Parameters: - points (array (n_points,dim)) – A 2D array of points on which the model will be trained.
- saving_iter (int | defaults 2) – An int to know how often the model is saved (see saving below).
- file_name (str | defaults model) – The name of the file (including the path).
Other Parameters: saving (str | Optional) – A string in [‘log’,’linear’]. In the following equations x is the parameter saving_iter (see above).
- If ‘log’, the model will be saved for all iterations which verify :
- log(iter)/log(x) is an int
- If ‘linear’ the model will be saved for all iterations which verify :
- iter/x is an int
Returns: Return type: None
-
get
(name)¶
-
initialize
(points)¶ This method initializes the Gaussian Mixture by setting the values of the means, covariances and weights.
Parameters: - points_data (an array (n_points,dim)) – Data on which the model is fitted.
- points_test (an array (n_points,dim) | Optional) – Data used to do early stopping (avoid overfitting)
-
predict_assignements
(points)¶ This function return the hard assignements of points once the model is fitted.
-
score
(points, assignements=None)¶ This method returns the distortion measurement at the end of the k_means.
Parameters: - points (an array (n_points,dim)) –
- assignements (an array (n_components,dim)) – an array containing the responsibilities of the clusters
Returns: distortion
Return type: (float)
-
megamix.online.kmeans.
dist_matrix
(points, means)¶