API Reference¶
Batch versions of the algorithm¶
Kmeans¶

class
megamix.batch.
Kmeans
(n_components=1, init='plus', n_jobs=1)¶ Kmeans model.
Parameters:  n_components (int, defaults to 1.) – Number of clusters used.
 init (str, defaults to 'kmeans'.) – Method used in order to perform the initialization, must be in [‘random’, ‘plus’, ‘AF_KMC’].

name
¶ str – The name of the method : ‘Kmeans’

means
¶ array of floats (n_components,dim) – Contains the computed means of the model.

log_weights
¶ array of floats (n_components,) – Contains the logarithm of the mixing coefficient of each cluster.

iter
¶ int – The number of iterations computed with the method fit()

_is_initialized
¶ bool – Ensures that the model has been initialized before using other methods such as distortion() or predict_assignements().
Raises: ValueError : if the parameters are inconsistent, for example if the cluster number is negative, init_type is not in [‘resp’,’mcw’]... References
‘Fast and Provably Good Seedings for kMeans’, O. Bachem, M. Lucic, S. Hassani, A.Krause ‘Lloyd’s algorithm <https://en.wikipedia.org/wiki/Lloyd’s_algorithm>’_ ‘The remarkable kmeans++ <https://normaldeviate.wordpress.com/2012/09/30/theremarkablekmeans/>’_

fit
(points_data, points_test=None, n_iter_max=100, n_iter_fix=None, tol=0, saving=None, file_name='model', saving_iter=2)¶ The kmeans algorithm
Parameters:  points_data (array (n_points,dim)) – A 2D array of points on which the model will be trained
 tol (float, defaults to 0) – The EM algorithm will stop when the difference between two steps regarding the distortion is less or equal to tol.
 n_iter_max (int, defaults to 100) – number of iterations maximum that can be done
 saving_iter (int  defaults 2) – An int to know how often the model is saved (see saving below).
 file_name (str  defaults model) – The name of the file (including the path).
Other Parameters:  points_test (array (n_points_bis,dim)  Optional) – A 2D array of points on which the model will be tested.
 n_iter_fix (int  Optional) – If not None, the algorithm will exactly do the number of iterations of n_iter_fix and stop.
 saving (str  Optional) – A string in [‘log’,’linear’]. In the following equations x is the parameter
saving_iter (see above).
 If ‘log’, the model will be saved for all iterations which verify :
 log(iter)/log(x) is an int
 If ‘linear’ the model will be saved for all iterations which verify :
 iter/x is an int
Returns: Return type: None

predict_assignements
(points)¶ This function return the hard assignements of points once the model is fitted.

score
(points, assignements=None)¶ This method returns the distortion measurement at the end of the k_means.
Parameters:  points (an array (n_points,dim)) –
 assignements (an array (n_components,dim)) – an array containing the responsibilities of the clusters
Returns: distortion
Return type: (float)
Gaussian Mixture Model (GMM)¶

class
megamix.batch.
GaussianMixture
(n_components=1, covariance_type='full', init='kmeans', reg_covar=1e06, type_init='resp', n_jobs=1)¶ Gaussian Mixture Model
Representation of a Gaussian mixture model probability distribution. This class allows to estimate the parameters of a Gaussian mixture distribution.
Parameters:  n_components (int, defaults to 1.) – Number of clusters used.
 init (str, defaults to 'kmeans'.) – Method used in order to perform the initialization, must be in [‘random’, ‘plus’, ‘AF_KMC’, ‘kmeans’].
 reg_covar (float, defaults to 1e6) – In order to avoid null covariances this float is added to the diagonal of covariance matrices.
 type_init (str, defaults to 'resp'.) – The algorithm is initialized using this data (responsibilities if ‘resp’ or means, covariances and weights if ‘mcw’).

name
¶ str – The name of the method : ‘GMM’

cov
¶ array of floats (n_components,dim,dim) – Contains the computed covariance matrices of the mixture.

means
¶ array of floats (n_components,dim) – Contains the computed means of the mixture.

log_weights
¶ array of floats (n_components,) – Contains the logarithm of the mixing coefficient of each cluster.

iter
¶ int – The number of iterations computed with the method fit()

convergence_criterion_data
¶ array of floats (iter,) – Stores the value of the convergence criterion computed with data on which the model is fitted.

convergence_criterion_test
¶ array of floats (iter,)  if _early_stopping only – Stores the value of the convergence criterion computed with test data if it exists.

_is_initialized
¶ bool – Ensures that the method _initialize() has been used before using other methods such as score() or predict_log_assignements().
Raises: ValueError : if the parameters are inconsistent, for example if the cluster number is negative, init_type is not in [‘resp’,’mcw’]... References
‘Pattern Recognition and Machine Learning’, Bishop

fit
(points_data, points_test=None, tol=0.001, patience=None, n_iter_max=100, n_iter_fix=None, saving=None, file_name='model', saving_iter=2)¶ The EM algorithm
Parameters:  points_data (array (n_points,dim)) – A 2D array of points on which the model will be trained
 tol (float, defaults to 1e3) – The EM algorithm will stop when the difference between two steps regarding the convergence criterion is less than tol.
 n_iter_max (int, defaults to 100) – number of iterations maximum that can be done
 saving_iter (int  defaults 2) – An int to know how often the model is saved (see saving below).
 file_name (str  defaults model) – The name of the file (including the path).
Other Parameters:  points_test (array (n_points_bis,dim)  Optional) – A 2D array of points on which the model will be tested.
 patience (int  Optional) – The number of iterations performed after having satisfied the convergence criterion
 n_iter_fix (int  Optional) – If not None, the algorithm will exactly do the number of iterations of n_iter_fix and stop.
 saving (str  Optional) – A string in [‘log’,’linear’]. In the following equations x is the parameter
saving_iter (see above).
 If ‘log’, the model will be saved for all iterations which verify :
 log(iter)/log(x) is an int
 If ‘linear’ the model will be saved for all iterations which verify :
 iter/x is an int
Returns: Return type: None

predict_log_resp
(points)¶ This function returns the logarithm of each point’s responsibilities
Parameters: points (array (n_points_bis,dim)) – a 1D or 2D array of points with the same dimension as the problem Returns: log_resp – the logarithm of the responsibilities Return type: array (n_points_bis,n_components)

read_and_init
(group, points)¶ A method reading a group of an hdf5 file to initialize DPGMM
Parameters: group (HDF5 group) – A group of a hdf5 file in reading mode

score
(points)¶ This function return the score of the function, which is the logarithm of the likelihood for GMM and the logarithm of the lower bound of the likelihood for VBGMM and DPGMM
Parameters: points (array (n_points_bis,dim)) – a 1D or 2D array of points with the same dimension as the problem Returns: score Return type: float

simplified_model
(points)¶ A method creating a new model with simplified parameters: clusters unused are removed
Parameters: points (an array (n_points,dim)) – Returns: GM Return type: an instance of the same type of self: GMM,VBGMM or DPGMM

write
(group)¶ A method creating datasets in a group of an hdf5 file in order to save the model
Parameters: group (HDF5 group) – A group of a hdf5 file in reading mode
Variational Gaussian Mixture Model (VBGMM)¶

class
megamix.batch.
VariationalGaussianMixture
(n_components=1, init='kmeans', alpha_0=None, beta_0=None, nu_0=None, means_prior=None, cov_wishart_prior=None, reg_covar=1e06, type_init='resp', n_jobs=1, boost=None)¶ Variational Bayesian Estimation of a Gaussian Mixture
This class allows to infer an approximate posterior distribution over the parameters of a Gaussian mixture distribution.
The weights distribution is a Dirichlet distribution with parameter alpha (see Bishop’s book p474486)
Parameters:  n_components (int, defaults to 1.) – Number of clusters used.
 init (str, defaults to 'kmeans'.) – Method used in order to perform the initialization, must be in [‘random’, ‘plus’, ‘AF_KMC’, ‘kmeans’, ‘GMM’].
 reg_covar (float, defaults to 1e6) – In order to avoid null covariances this float is added to the diagonal of covariance matrices.
 type_init (str, defaults to 'resp'.) – The algorithm is initialized using this data (responsibilities if ‘resp’ or means, covariances and weights if ‘mcw’).
Other Parameters: alpha_0 (float, Optional  defaults to None.) – The prior parameter on the weight distribution (Dirichlet). A high value of alpha_0 will lead to equal weights, while a low value will allow some clusters to shrink and disappear. Must be greater than 0.
If None, the value is set to 1/n_components
beta_0 (float, Optional  defaults to None.) – The precision prior on the mean distribution (Gaussian). Must be greater than 0.
If None, the value is set to 1.0
nu_0 (float, Optional  defaults to None.) – The prior of the number of degrees of freedom on the covariance distributions (Wishart). Must be greater or equal to dim.
If None, the value is set to dim
means_prior (array (dim,), Optional  defaults to None) – The prior value to compute the value of the means.
If None, the value is set to the mean of points_data
cov_wishart_prior (type depends on covariance_type, Optional  defaults to None) – If covariance_type is ‘full’ type must be array (dim,dim) If covariance_type is ‘spherical’ type must be float The prior value to compute the value of the precisions.
If None, the value is set to the covariance of points_data

name
¶ str – The name of the method : ‘VBGMM’

alpha
¶ array of floats (n_components,) – Contains the parameters of the weight distribution (Dirichlet)

beta
¶ array of floats (n_components,) – Contains coefficients which are multipied with the precision matrices to form the precision matrix on the Gaussian distribution of the means.

nu
¶ array of floats (n_components,) – Contains the number of degrees of freedom on the distribution of covariance matrices.

_inv_prec
¶ array of floats (n_components,dim,dim) – Contains the equivalent of the matrix W described in Bishop’s book. It is proportional to cov.

_log_det_inv_prec
¶ array of floats (n_components,) – Contains the logarithm of the determinant of W matrices.

cov
¶ array of floats (n_components,dim,dim) – Contains the computed covariance matrices of the mixture.

means
¶ array of floats (n_components,dim) – Contains the computed means of the mixture.

log_weights
¶ array of floats (n_components,) – Contains the logarithm of weights of each cluster.

iter
¶ int – The number of iterations computed with the method fit()

convergence_criterion_data
¶ array of floats (iter,) – Stores the value of the convergence criterion computed with data on which the model is fitted.

convergence_criterion_test
¶ array of floats (iter,)  if _early_stopping only – Stores the value of the convergence criterion computed with test data if it exists.

_is_initialized
¶ bool – Ensures that the method _initialize() has been used before using other methods such as score() or predict_log_assignements().
Raises: ValueError : if the parameters are inconsistent, for example if the cluster number is negative, init_type is not in [‘resp’,’mcw’]... References
‘Pattern Recognition and Machine Learning’, Bishop

fit
(points_data, points_test=None, tol=0.001, patience=None, n_iter_max=100, n_iter_fix=None, saving=None, file_name='model', saving_iter=2)¶ The EM algorithm
Parameters:  points_data (array (n_points,dim)) – A 2D array of points on which the model will be trained
 tol (float, defaults to 1e3) – The EM algorithm will stop when the difference between two steps regarding the convergence criterion is less than tol.
 n_iter_max (int, defaults to 100) – number of iterations maximum that can be done
 saving_iter (int  defaults 2) – An int to know how often the model is saved (see saving below).
 file_name (str  defaults model) – The name of the file (including the path).
Other Parameters:  points_test (array (n_points_bis,dim)  Optional) – A 2D array of points on which the model will be tested.
 patience (int  Optional) – The number of iterations performed after having satisfied the convergence criterion
 n_iter_fix (int  Optional) – If not None, the algorithm will exactly do the number of iterations of n_iter_fix and stop.
 saving (str  Optional) – A string in [‘log’,’linear’]. In the following equations x is the parameter
saving_iter (see above).
 If ‘log’, the model will be saved for all iterations which verify :
 log(iter)/log(x) is an int
 If ‘linear’ the model will be saved for all iterations which verify :
 iter/x is an int
Returns: Return type: None

predict_log_resp
(points)¶ This function returns the logarithm of each point’s responsibilities
Parameters: points (array (n_points_bis,dim)) – a 1D or 2D array of points with the same dimension as the problem Returns: log_resp – the logarithm of the responsibilities Return type: array (n_points_bis,n_components)

read_and_init
(group, points)¶ A method reading a group of an hdf5 file to initialize DPGMM
Parameters: group (HDF5 group) – A group of a hdf5 file in reading mode

score
(points)¶ This function return the score of the function, which is the logarithm of the likelihood for GMM and the logarithm of the lower bound of the likelihood for VBGMM and DPGMM
Parameters: points (array (n_points_bis,dim)) – a 1D or 2D array of points with the same dimension as the problem Returns: score Return type: float

simplified_model
(points)¶ A method creating a new model with simplified parameters: clusters unused are removed
Parameters: points (an array (n_points,dim)) – Returns: GM Return type: an instance of the same type of self: GMM,VBGMM or DPGMM

write
(group)¶ A method creating datasets in a group of an hdf5 file in order to save the model
Parameters: group (HDF5 group) – A group of a hdf5 file in reading mode
Dirichlet Process Gaussian Mixture Model (DPGMM)¶

class
megamix.batch.
DPVariationalGaussianMixture
(n_components=1, init='kmeans', alpha_0=None, beta_0=None, nu_0=None, means_prior=None, cov_wishart_prior=None, reg_covar=1e06, type_init='resp', n_jobs=1, pypcoeff=0, boost=None)¶ Variational Bayesian Estimation of a Gaussian Mixture with Dirichlet Process
This class allows to infer an approximate posterior distribution over the parameters of a Gaussian mixture distribution.
The weights distribution follows a Dirichlet Process with attribute alpha.
Parameters:  n_components (int, defaults to 1.) – Number of clusters used.
 init (str, defaults to 'kmeans'.) – Method used in order to perform the initialization, must be in [‘random’, ‘plus’, ‘AF_KMC’, ‘kmeans’, ‘GMM’, ‘VBGMM’].
 reg_covar (float, defaults to 1e6) – In order to avoid null covariances this float is added to the diagonal of covariance matrices.
 type_init (str, defaults to 'resp'.) – The algorithm is initialized using this data (responsibilities if ‘resp’ or means, covariances and weights if ‘mcw’).
Other Parameters: alpha_0 (float, Optional  defaults to None.) – The prior parameter on the weight distribution (Beta). A high value of alpha_0 will lead to equal weights, while a low value will allow some clusters to shrink and disappear. Must be greater than 0.
If None, the value is set to 1/n_components
beta_0 (float, Optional  defaults to None.) – The precision prior on the mean distribution (Gaussian). Must be greater than 0.
If None, the value is set to 1.0
nu_0 (float, Optional  defaults to None.) – The prior of the number of degrees of freedom on the covariance distributions (Wishart). Must be greater or equal to dim.
If None, the value is set to dim
means_prior (array (dim,), Optional  defaults to None) – The prior value to compute the value of the means.
If None, the value is set to the mean of points_data
cov_wishart_prior (type depends on covariance_type, Optional  defaults to None) – If covariance_type is ‘full’ type must be array (dim,dim) If covariance_type is ‘spherical’ type must be float The prior value to compute the value of the precisions.
pypcoeff (float  defaults to 0) – If 0 the weights are generated according to a Dirichlet Process If >0 and <=1 the weights are generated according to a PitmanYor Process.

name
¶ str – The name of the method : ‘VBGMM’

alpha
¶ array of floats (n_components,2) – Contains the parameters of the weight distribution (Beta)

beta
¶ array of floats (n_components,) – Contains coefficients which are multipied with the precision matrices to form the precision matrix on the Gaussian distribution of the means.

nu
¶ array of floats (n_components,) – Contains the number of degrees of freedom on the distribution of covariance matrices.

_inv_prec
¶ array of floats (n_components,dim,dim) – Contains the equivalent of the matrix W described in Bishop’s book. It is proportional to cov.

_log_det_inv_prec
¶ array of floats (n_components,) – Contains the logarithm of the determinant of W matrices.

cov
¶ array of floats (n_components,dim,dim) – Contains the computed covariance matrices of the mixture.

means
¶ array of floats (n_components,dim) – Contains the computed means of the mixture.

log_weights
¶ array of floats (n_components,) – Contains the logarithm of weights of each cluster.

iter
¶ int – The number of iterations computed with the method fit()

convergence_criterion_data
¶ array of floats (iter,) – Stores the value of the convergence criterion computed with data on which the model is fitted.

convergence_criterion_test
¶ array of floats (iter,)  if _early_stopping only – Stores the value of the convergence criterion computed with test data if it exists.

_is_initialized
¶ bool – Ensures that the method _initialize() has been used before using other methods such as score() or predict_log_assignements().
Raises: ValueError : if the parameters are inconsistent, for example if the cluster number is negative, init_type is not in [‘resp’,’mcw’]... References
‘Variational Inference for Dirichlet Process Mixtures’, D. Blei and M. Jordan

fit
(points_data, points_test=None, tol=0.001, patience=None, n_iter_max=100, n_iter_fix=None, saving=None, file_name='model', saving_iter=2)¶ The EM algorithm
Parameters:  points_data (array (n_points,dim)) – A 2D array of points on which the model will be trained
 tol (float, defaults to 1e3) – The EM algorithm will stop when the difference between two steps regarding the convergence criterion is less than tol.
 n_iter_max (int, defaults to 100) – number of iterations maximum that can be done
 saving_iter (int  defaults 2) – An int to know how often the model is saved (see saving below).
 file_name (str  defaults model) – The name of the file (including the path).
Other Parameters:  points_test (array (n_points_bis,dim)  Optional) – A 2D array of points on which the model will be tested.
 patience (int  Optional) – The number of iterations performed after having satisfied the convergence criterion
 n_iter_fix (int  Optional) – If not None, the algorithm will exactly do the number of iterations of n_iter_fix and stop.
 saving (str  Optional) – A string in [‘log’,’linear’]. In the following equations x is the parameter
saving_iter (see above).
 If ‘log’, the model will be saved for all iterations which verify :
 log(iter)/log(x) is an int
 If ‘linear’ the model will be saved for all iterations which verify :
 iter/x is an int
Returns: Return type: None

predict_log_resp
(points)¶ This function returns the logarithm of each point’s responsibilities
Parameters: points (array (n_points_bis,dim)) – a 1D or 2D array of points with the same dimension as the problem Returns: log_resp – the logarithm of the responsibilities Return type: array (n_points_bis,n_components)

read_and_init
(group, points)¶ A method reading a group of an hdf5 file to initialize DPGMM
Parameters: group (HDF5 group) – A group of a hdf5 file in reading mode

score
(points)¶ This function return the score of the function, which is the logarithm of the likelihood for GMM and the logarithm of the lower bound of the likelihood for VBGMM and DPGMM
Parameters: points (array (n_points_bis,dim)) – a 1D or 2D array of points with the same dimension as the problem Returns: score Return type: float

simplified_model
(points)¶ A method creating a new model with simplified parameters: clusters unused are removed
Parameters: points (an array (n_points,dim)) – Returns: GM Return type: an instance of the same type of self: GMM,VBGMM or DPGMM

write
(group)¶ A method creating datasets in a group of an hdf5 file in order to save the model
Parameters: group (HDF5 group) – A group of a hdf5 file in reading mode
Online versions of the algorithm¶
Kmeans¶

class
megamix.online.
Kmeans
(n_components=1, window=1, kappa=1.0)¶ Kmeans model.
Parameters:  n_components (int, defaults to 1.) – Number of clusters used.
 window (int, defaults to 1) – The number of points used at the same time in order to update the parameters.
 kappa (double, defaults to 1.0) –
A coefficient in ]0.0,1.0] which give weight or not to the new points compared to the ones already used.
 If kappa is nearly null, the new points have a big weight and the model may take a lot of time to stabilize.
 If kappa = 1.0, the new points won’t have a lot of weight and the model may not move enough from its initialization.

name
str – The name of the method : ‘Kmeans’

log_weights
array of floats (n_components) – Contains the logarithm of the mixing coefficients of the model.

means
array of floats (n_components,dim) – Contains the computed means of the model.

iter
int – The number of points which have been used to compute the model.
Raises: ValueError : if the parameters are inconsistent, for example if the cluster number is negative, init_type is not in [‘resp’,’mcw’]... References
Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling, C. Dupuy & F. Bach ‘The remarkable kmeans++ <https://normaldeviate.wordpress.com/2012/09/30/theremarkablekmeans/>’_

fit
(points_data, points_test=None, saving=None, file_name='model', check_convergence_iter=None, saving_iter=2)¶ The kmeans algorithm
Parameters:  points_data (array (n_points,dim)) – A 2D array of points on which the model will be trained.
 saving_iter (int  defaults 2) – An int to know how often the model is saved (see saving below).
 file_name (str  defaults model) – The name of the file (including the path).
Other Parameters:  points_test (an array (n_points2,dim)  Optional) – Data used to do early stopping (avoid overfitting)
 check_convergence_iter (int  Optional) – If points_test are given, convergence criterion will be computed every check_convergence_iter iterations. If no value is given and points_test is not None, it will raise an Error.
 saving (str  Optional) – A string in [‘log’,’linear’]. In the following equations x is the parameter
saving_iter (see above).
 If ‘log’, the model will be saved for all iterations which verify :
 log(iter)/log(x) is an int
 If ‘linear’ the model will be saved for all iterations which verify :
 iter/x is an int
Returns: Return type: None

get
(name)¶ A getter to allow the user to get the attributes with the cython version.
Parameters: name (str) – The name of the parameter. Must be in [‘_is_initialized’,’log_weights’, ‘means’,’iter’,’window’,’kappa’,’name’] Returns: Return type: The wanted parameter (may be an array, a boolean, an int or a string)

initialize
(points)¶ This method initializes the Gaussian Mixture by setting the values of the means, covariances and weights.
Parameters: points (an array (n_points,dim)) – Data on which the model is initialized.

predict_assignements
(points)¶ This function return the hard assignements of points once the model is fitted.

score
(points, assignements=None)¶ This method returns the distortion measurement at the end of the k_means.
Parameters:  points (an array (n_points,dim)) –
 assignements (an array (n_components,dim)) – an array containing the responsibilities of the clusters
Returns: distortion
Return type: (float)
Gaussian Mixture Model (GMM)¶

class
megamix.online.
GaussianMixture
(n_components=1, kappa=1.0, reg_covar=1e06, window=1, update=False)¶ Gaussian Mixture Model
Representation of a Gaussian mixture model probability distribution. This class allows to estimate the parameters of a Gaussian mixture distribution (with full covariance matrices only).
Parameters:  n_components (int, defaults to 1) – Number of clusters used.
 kappa (double, defaults to 1.0) –
A coefficient in ]0.0,1.0] which give weight or not to the new points compared to the ones already used.
 If kappa is nearly null, the new points have a big weight and the model may take a lot of time to stabilize.
 If kappa = 1.0, the new points won’t have a lot of weight and the model may not move enough from its initialization.
 window (int, defaults to 1) – The number of points used at the same time in order to update the parameters.
 update (bool, defaults to False) – If True, the matrices of Cholesky of covariance matrices are updated, else they are computed at each iteration. Set it to True if window < dimension of the problem.
 reg_covar (float, defaults to 1e6) – In order to avoid null covariances this float is added to the diagonal of covariance matrices.

name
str – The name of the method : ‘GMM’

cov
array of floats (n_components,dim,dim) – Contains the computed covariance matrices of the mixture.

means
array of floats (n_components,dim) – Contains the computed means of the mixture.

log_weights
array of floats (n_components,) – Contains the logarithm of weights of each cluster.

iter
int – The number of iterations computed with the method fit()
Raises: ValueError : if the parameters are inconsistent, for example if the cluster number is negative, init_type is not in [‘resp’,’mcw’]... References
Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling, C. Dupuy & F. Bach

fit
(points_data, points_test=None, saving=None, file_name='model', check_convergence_iter=None, saving_iter=2)¶ The EM algorithm
Parameters:  points_data (array (n_points,dim)) – A 2D array of points on which the model will be trained.
 saving_iter (int  defaults 2) – An int to know how often the model is saved (see saving below).
 file_name (str  defaults model) – The name of the file (including the path).
Other Parameters:  points_test (an array (n_points2,dim)  Optional) – Data used to do early stopping (avoid overfitting)
 check_convergence_iter (int  Optional) – If points_test are given, convergence criterion will be computed every check_convergence_iter iterations. If no value is given and points_test is not None, it will raise an Error.
 saving (str  Optional) – A string in [‘log’,’linear’]. In the following equations x is the parameter
saving_iter (see above).
 If ‘log’, the model will be saved for all iterations which verify :
 log(iter)/log(x) is an int
 If ‘linear’ the model will be saved for all iterations which verify :
 iter/x is an int
Returns: Return type: None

get
(name)¶ A getter to allow the user to get the attributes with the cython version.
Parameters: name (str) – The name of the parameter. Must be in [‘_is_initialized’,’log_weights’, ‘means’,’cov’,’cov_chol’,’iter’,’window’,’kappa’,’name’] Returns: Return type: The wanted parameter (may be an array, a boolean, an int or a string)

initialize
(points)¶ This method initializes the Gaussian Mixture by setting the values of the means, covariances and weights.
Parameters: points (an array (n_points,dim)) – Data on which the model is initialie using the seeds of kmeans++.

predict_log_resp
(points)¶ This function returns the logarithm of each point’s responsibilities
Parameters: points (array (n_points_bis,dim)) – a 1D or 2D array of points with the same dimension as the problem Returns: log_resp – the logarithm of the responsibilities Return type: array (n_points_bis,n_components)

read_and_init
(group, points)¶ A method reading a group of an hdf5 file to initialize DPGMM
Parameters: group (HDF5 group) – A group of a hdf5 file in reading mode

score
(points)¶ This function return the score of the function, which is the logarithm of the likelihood for GMM and the logarithm of the lower bound of the likelihood for VBGMM and DPGMM
Parameters: points (array (n_points_bis,dim)) – a 1D or 2D array of points with the same dimension as the problem Returns: score Return type: float

write
(group)¶ A method creating datasets in a group of an hdf5 file in order to save the model
Parameters: group (HDF5 group) – A group of a hdf5 file in reading mode