API Reference

Batch versions of the algorithm

Kmeans

class megamix.batch.Kmeans(n_components=1, init='plus', n_jobs=1)

Kmeans model.

Parameters:
  • n_components (int, defaults to 1.) – Number of clusters used.
  • init (str, defaults to 'kmeans'.) – Method used in order to perform the initialization, must be in [‘random’, ‘plus’, ‘AF_KMC’].
name

str – The name of the method : ‘Kmeans’

means

array of floats (n_components,dim) – Contains the computed means of the model.

log_weights

array of floats (n_components,) – Contains the logarithm of the mixing coefficient of each cluster.

iter

int – The number of iterations computed with the method fit()

_is_initialized

bool – Ensures that the model has been initialized before using other methods such as distortion() or predict_assignements().

Raises:ValueError : if the parameters are inconsistent, for example if the cluster number is negative, init_type is not in [‘resp’,’mcw’]...

References

‘Fast and Provably Good Seedings for k-Means’, O. Bachem, M. Lucic, S. Hassani, A.Krause ‘Lloyd’s algorithm <https://en.wikipedia.org/wiki/Lloyd’s_algorithm>’_ ‘The remarkable k-means++ <https://normaldeviate.wordpress.com/2012/09/30/the-remarkable-k-means/>’_

fit(points_data, points_test=None, n_iter_max=100, n_iter_fix=None, tol=0, saving=None, file_name='model', saving_iter=2)

The k-means algorithm

Parameters:
  • points_data (array (n_points,dim)) – A 2D array of points on which the model will be trained
  • tol (float, defaults to 0) – The EM algorithm will stop when the difference between two steps regarding the distortion is less or equal to tol.
  • n_iter_max (int, defaults to 100) – number of iterations maximum that can be done
  • saving_iter (int | defaults 2) – An int to know how often the model is saved (see saving below).
  • file_name (str | defaults model) – The name of the file (including the path).
Other Parameters:
 
  • points_test (array (n_points_bis,dim) | Optional) – A 2D array of points on which the model will be tested.
  • n_iter_fix (int | Optional) – If not None, the algorithm will exactly do the number of iterations of n_iter_fix and stop.
  • saving (str | Optional) – A string in [‘log’,’linear’]. In the following equations x is the parameter saving_iter (see above).
    • If ‘log’, the model will be saved for all iterations which verify :
      log(iter)/log(x) is an int
    • If ‘linear’ the model will be saved for all iterations which verify :
      iter/x is an int
Returns:

Return type:

None

predict_assignements(points)

This function return the hard assignements of points once the model is fitted.

score(points, assignements=None)

This method returns the distortion measurement at the end of the k_means.

Parameters:
  • points (an array (n_points,dim)) –
  • assignements (an array (n_components,dim)) – an array containing the responsibilities of the clusters
Returns:

distortion

Return type:

(float)

Gaussian Mixture Model (GMM)

class megamix.batch.GaussianMixture(n_components=1, covariance_type='full', init='kmeans', reg_covar=1e-06, type_init='resp', n_jobs=1)

Gaussian Mixture Model

Representation of a Gaussian mixture model probability distribution. This class allows to estimate the parameters of a Gaussian mixture distribution.

Parameters:
  • n_components (int, defaults to 1.) – Number of clusters used.
  • init (str, defaults to 'kmeans'.) – Method used in order to perform the initialization, must be in [‘random’, ‘plus’, ‘AF_KMC’, ‘kmeans’].
  • reg_covar (float, defaults to 1e-6) – In order to avoid null covariances this float is added to the diagonal of covariance matrices.
  • type_init (str, defaults to 'resp'.) – The algorithm is initialized using this data (responsibilities if ‘resp’ or means, covariances and weights if ‘mcw’).
name

str – The name of the method : ‘GMM’

cov

array of floats (n_components,dim,dim) – Contains the computed covariance matrices of the mixture.

means

array of floats (n_components,dim) – Contains the computed means of the mixture.

log_weights

array of floats (n_components,) – Contains the logarithm of the mixing coefficient of each cluster.

iter

int – The number of iterations computed with the method fit()

convergence_criterion_data

array of floats (iter,) – Stores the value of the convergence criterion computed with data on which the model is fitted.

convergence_criterion_test

array of floats (iter,) | if _early_stopping only – Stores the value of the convergence criterion computed with test data if it exists.

_is_initialized

bool – Ensures that the method _initialize() has been used before using other methods such as score() or predict_log_assignements().

Raises:ValueError : if the parameters are inconsistent, for example if the cluster number is negative, init_type is not in [‘resp’,’mcw’]...

References

‘Pattern Recognition and Machine Learning’, Bishop

fit(points_data, points_test=None, tol=0.001, patience=None, n_iter_max=100, n_iter_fix=None, saving=None, file_name='model', saving_iter=2)

The EM algorithm

Parameters:
  • points_data (array (n_points,dim)) – A 2D array of points on which the model will be trained
  • tol (float, defaults to 1e-3) – The EM algorithm will stop when the difference between two steps regarding the convergence criterion is less than tol.
  • n_iter_max (int, defaults to 100) – number of iterations maximum that can be done
  • saving_iter (int | defaults 2) – An int to know how often the model is saved (see saving below).
  • file_name (str | defaults model) – The name of the file (including the path).
Other Parameters:
 
  • points_test (array (n_points_bis,dim) | Optional) – A 2D array of points on which the model will be tested.
  • patience (int | Optional) – The number of iterations performed after having satisfied the convergence criterion
  • n_iter_fix (int | Optional) – If not None, the algorithm will exactly do the number of iterations of n_iter_fix and stop.
  • saving (str | Optional) – A string in [‘log’,’linear’]. In the following equations x is the parameter saving_iter (see above).
    • If ‘log’, the model will be saved for all iterations which verify :
      log(iter)/log(x) is an int
    • If ‘linear’ the model will be saved for all iterations which verify :
      iter/x is an int
Returns:

Return type:

None

predict_log_resp(points)

This function returns the logarithm of each point’s responsibilities

Parameters:points (array (n_points_bis,dim)) – a 1D or 2D array of points with the same dimension as the problem
Returns:log_resp – the logarithm of the responsibilities
Return type:array (n_points_bis,n_components)
read_and_init(group, points)

A method reading a group of an hdf5 file to initialize DPGMM

Parameters:group (HDF5 group) – A group of a hdf5 file in reading mode
score(points)

This function return the score of the function, which is the logarithm of the likelihood for GMM and the logarithm of the lower bound of the likelihood for VBGMM and DPGMM

Parameters:points (array (n_points_bis,dim)) – a 1D or 2D array of points with the same dimension as the problem
Returns:score
Return type:float
simplified_model(points)

A method creating a new model with simplified parameters: clusters unused are removed

Parameters:points (an array (n_points,dim)) –
Returns:GM
Return type:an instance of the same type of self: GMM,VBGMM or DPGMM
write(group)

A method creating datasets in a group of an hdf5 file in order to save the model

Parameters:group (HDF5 group) – A group of a hdf5 file in reading mode

Variational Gaussian Mixture Model (VBGMM)

class megamix.batch.VariationalGaussianMixture(n_components=1, init='kmeans', alpha_0=None, beta_0=None, nu_0=None, means_prior=None, cov_wishart_prior=None, reg_covar=1e-06, type_init='resp', n_jobs=1, boost=None)

Variational Bayesian Estimation of a Gaussian Mixture

This class allows to infer an approximate posterior distribution over the parameters of a Gaussian mixture distribution.

The weights distribution is a Dirichlet distribution with parameter alpha (see Bishop’s book p474-486)

Parameters:
  • n_components (int, defaults to 1.) – Number of clusters used.
  • init (str, defaults to 'kmeans'.) – Method used in order to perform the initialization, must be in [‘random’, ‘plus’, ‘AF_KMC’, ‘kmeans’, ‘GMM’].
  • reg_covar (float, defaults to 1e-6) – In order to avoid null covariances this float is added to the diagonal of covariance matrices.
  • type_init (str, defaults to 'resp'.) – The algorithm is initialized using this data (responsibilities if ‘resp’ or means, covariances and weights if ‘mcw’).
Other Parameters:
 
  • alpha_0 (float, Optional | defaults to None.) – The prior parameter on the weight distribution (Dirichlet). A high value of alpha_0 will lead to equal weights, while a low value will allow some clusters to shrink and disappear. Must be greater than 0.

    If None, the value is set to 1/n_components

  • beta_0 (float, Optional | defaults to None.) – The precision prior on the mean distribution (Gaussian). Must be greater than 0.

    If None, the value is set to 1.0

  • nu_0 (float, Optional | defaults to None.) – The prior of the number of degrees of freedom on the covariance distributions (Wishart). Must be greater or equal to dim.

    If None, the value is set to dim

  • means_prior (array (dim,), Optional | defaults to None) – The prior value to compute the value of the means.

    If None, the value is set to the mean of points_data

  • cov_wishart_prior (type depends on covariance_type, Optional | defaults to None) – If covariance_type is ‘full’ type must be array (dim,dim) If covariance_type is ‘spherical’ type must be float The prior value to compute the value of the precisions.

    If None, the value is set to the covariance of points_data

name

str – The name of the method : ‘VBGMM’

alpha

array of floats (n_components,) – Contains the parameters of the weight distribution (Dirichlet)

beta

array of floats (n_components,) – Contains coefficients which are multipied with the precision matrices to form the precision matrix on the Gaussian distribution of the means.

nu

array of floats (n_components,) – Contains the number of degrees of freedom on the distribution of covariance matrices.

_inv_prec

array of floats (n_components,dim,dim) – Contains the equivalent of the matrix W described in Bishop’s book. It is proportional to cov.

_log_det_inv_prec

array of floats (n_components,) – Contains the logarithm of the determinant of W matrices.

cov

array of floats (n_components,dim,dim) – Contains the computed covariance matrices of the mixture.

means

array of floats (n_components,dim) – Contains the computed means of the mixture.

log_weights

array of floats (n_components,) – Contains the logarithm of weights of each cluster.

iter

int – The number of iterations computed with the method fit()

convergence_criterion_data

array of floats (iter,) – Stores the value of the convergence criterion computed with data on which the model is fitted.

convergence_criterion_test

array of floats (iter,) | if _early_stopping only – Stores the value of the convergence criterion computed with test data if it exists.

_is_initialized

bool – Ensures that the method _initialize() has been used before using other methods such as score() or predict_log_assignements().

Raises:ValueError : if the parameters are inconsistent, for example if the cluster number is negative, init_type is not in [‘resp’,’mcw’]...

References

‘Pattern Recognition and Machine Learning’, Bishop

fit(points_data, points_test=None, tol=0.001, patience=None, n_iter_max=100, n_iter_fix=None, saving=None, file_name='model', saving_iter=2)

The EM algorithm

Parameters:
  • points_data (array (n_points,dim)) – A 2D array of points on which the model will be trained
  • tol (float, defaults to 1e-3) – The EM algorithm will stop when the difference between two steps regarding the convergence criterion is less than tol.
  • n_iter_max (int, defaults to 100) – number of iterations maximum that can be done
  • saving_iter (int | defaults 2) – An int to know how often the model is saved (see saving below).
  • file_name (str | defaults model) – The name of the file (including the path).
Other Parameters:
 
  • points_test (array (n_points_bis,dim) | Optional) – A 2D array of points on which the model will be tested.
  • patience (int | Optional) – The number of iterations performed after having satisfied the convergence criterion
  • n_iter_fix (int | Optional) – If not None, the algorithm will exactly do the number of iterations of n_iter_fix and stop.
  • saving (str | Optional) – A string in [‘log’,’linear’]. In the following equations x is the parameter saving_iter (see above).
    • If ‘log’, the model will be saved for all iterations which verify :
      log(iter)/log(x) is an int
    • If ‘linear’ the model will be saved for all iterations which verify :
      iter/x is an int
Returns:

Return type:

None

predict_log_resp(points)

This function returns the logarithm of each point’s responsibilities

Parameters:points (array (n_points_bis,dim)) – a 1D or 2D array of points with the same dimension as the problem
Returns:log_resp – the logarithm of the responsibilities
Return type:array (n_points_bis,n_components)
read_and_init(group, points)

A method reading a group of an hdf5 file to initialize DPGMM

Parameters:group (HDF5 group) – A group of a hdf5 file in reading mode
score(points)

This function return the score of the function, which is the logarithm of the likelihood for GMM and the logarithm of the lower bound of the likelihood for VBGMM and DPGMM

Parameters:points (array (n_points_bis,dim)) – a 1D or 2D array of points with the same dimension as the problem
Returns:score
Return type:float
simplified_model(points)

A method creating a new model with simplified parameters: clusters unused are removed

Parameters:points (an array (n_points,dim)) –
Returns:GM
Return type:an instance of the same type of self: GMM,VBGMM or DPGMM
write(group)

A method creating datasets in a group of an hdf5 file in order to save the model

Parameters:group (HDF5 group) – A group of a hdf5 file in reading mode

Dirichlet Process Gaussian Mixture Model (DPGMM)

class megamix.batch.DPVariationalGaussianMixture(n_components=1, init='kmeans', alpha_0=None, beta_0=None, nu_0=None, means_prior=None, cov_wishart_prior=None, reg_covar=1e-06, type_init='resp', n_jobs=1, pypcoeff=0, boost=None)

Variational Bayesian Estimation of a Gaussian Mixture with Dirichlet Process

This class allows to infer an approximate posterior distribution over the parameters of a Gaussian mixture distribution.

The weights distribution follows a Dirichlet Process with attribute alpha.

Parameters:
  • n_components (int, defaults to 1.) – Number of clusters used.
  • init (str, defaults to 'kmeans'.) – Method used in order to perform the initialization, must be in [‘random’, ‘plus’, ‘AF_KMC’, ‘kmeans’, ‘GMM’, ‘VBGMM’].
  • reg_covar (float, defaults to 1e-6) – In order to avoid null covariances this float is added to the diagonal of covariance matrices.
  • type_init (str, defaults to 'resp'.) – The algorithm is initialized using this data (responsibilities if ‘resp’ or means, covariances and weights if ‘mcw’).
Other Parameters:
 
  • alpha_0 (float, Optional | defaults to None.) – The prior parameter on the weight distribution (Beta). A high value of alpha_0 will lead to equal weights, while a low value will allow some clusters to shrink and disappear. Must be greater than 0.

    If None, the value is set to 1/n_components

  • beta_0 (float, Optional | defaults to None.) – The precision prior on the mean distribution (Gaussian). Must be greater than 0.

    If None, the value is set to 1.0

  • nu_0 (float, Optional | defaults to None.) – The prior of the number of degrees of freedom on the covariance distributions (Wishart). Must be greater or equal to dim.

    If None, the value is set to dim

  • means_prior (array (dim,), Optional | defaults to None) – The prior value to compute the value of the means.

    If None, the value is set to the mean of points_data

  • cov_wishart_prior (type depends on covariance_type, Optional | defaults to None) – If covariance_type is ‘full’ type must be array (dim,dim) If covariance_type is ‘spherical’ type must be float The prior value to compute the value of the precisions.

  • pypcoeff (float | defaults to 0) – If 0 the weights are generated according to a Dirichlet Process If >0 and <=1 the weights are generated according to a Pitman-Yor Process.

name

str – The name of the method : ‘VBGMM’

alpha

array of floats (n_components,2) – Contains the parameters of the weight distribution (Beta)

beta

array of floats (n_components,) – Contains coefficients which are multipied with the precision matrices to form the precision matrix on the Gaussian distribution of the means.

nu

array of floats (n_components,) – Contains the number of degrees of freedom on the distribution of covariance matrices.

_inv_prec

array of floats (n_components,dim,dim) – Contains the equivalent of the matrix W described in Bishop’s book. It is proportional to cov.

_log_det_inv_prec

array of floats (n_components,) – Contains the logarithm of the determinant of W matrices.

cov

array of floats (n_components,dim,dim) – Contains the computed covariance matrices of the mixture.

means

array of floats (n_components,dim) – Contains the computed means of the mixture.

log_weights

array of floats (n_components,) – Contains the logarithm of weights of each cluster.

iter

int – The number of iterations computed with the method fit()

convergence_criterion_data

array of floats (iter,) – Stores the value of the convergence criterion computed with data on which the model is fitted.

convergence_criterion_test

array of floats (iter,) | if _early_stopping only – Stores the value of the convergence criterion computed with test data if it exists.

_is_initialized

bool – Ensures that the method _initialize() has been used before using other methods such as score() or predict_log_assignements().

Raises:ValueError : if the parameters are inconsistent, for example if the cluster number is negative, init_type is not in [‘resp’,’mcw’]...

References

‘Variational Inference for Dirichlet Process Mixtures’, D. Blei and M. Jordan

fit(points_data, points_test=None, tol=0.001, patience=None, n_iter_max=100, n_iter_fix=None, saving=None, file_name='model', saving_iter=2)

The EM algorithm

Parameters:
  • points_data (array (n_points,dim)) – A 2D array of points on which the model will be trained
  • tol (float, defaults to 1e-3) – The EM algorithm will stop when the difference between two steps regarding the convergence criterion is less than tol.
  • n_iter_max (int, defaults to 100) – number of iterations maximum that can be done
  • saving_iter (int | defaults 2) – An int to know how often the model is saved (see saving below).
  • file_name (str | defaults model) – The name of the file (including the path).
Other Parameters:
 
  • points_test (array (n_points_bis,dim) | Optional) – A 2D array of points on which the model will be tested.
  • patience (int | Optional) – The number of iterations performed after having satisfied the convergence criterion
  • n_iter_fix (int | Optional) – If not None, the algorithm will exactly do the number of iterations of n_iter_fix and stop.
  • saving (str | Optional) – A string in [‘log’,’linear’]. In the following equations x is the parameter saving_iter (see above).
    • If ‘log’, the model will be saved for all iterations which verify :
      log(iter)/log(x) is an int
    • If ‘linear’ the model will be saved for all iterations which verify :
      iter/x is an int
Returns:

Return type:

None

predict_log_resp(points)

This function returns the logarithm of each point’s responsibilities

Parameters:points (array (n_points_bis,dim)) – a 1D or 2D array of points with the same dimension as the problem
Returns:log_resp – the logarithm of the responsibilities
Return type:array (n_points_bis,n_components)
read_and_init(group, points)

A method reading a group of an hdf5 file to initialize DPGMM

Parameters:group (HDF5 group) – A group of a hdf5 file in reading mode
score(points)

This function return the score of the function, which is the logarithm of the likelihood for GMM and the logarithm of the lower bound of the likelihood for VBGMM and DPGMM

Parameters:points (array (n_points_bis,dim)) – a 1D or 2D array of points with the same dimension as the problem
Returns:score
Return type:float
simplified_model(points)

A method creating a new model with simplified parameters: clusters unused are removed

Parameters:points (an array (n_points,dim)) –
Returns:GM
Return type:an instance of the same type of self: GMM,VBGMM or DPGMM
write(group)

A method creating datasets in a group of an hdf5 file in order to save the model

Parameters:group (HDF5 group) – A group of a hdf5 file in reading mode

Online versions of the algorithm

Kmeans

class megamix.online.Kmeans(n_components=1, window=1, kappa=1.0)

Kmeans model.

Parameters:
  • n_components (int, defaults to 1.) – Number of clusters used.
  • window (int, defaults to 1) – The number of points used at the same time in order to update the parameters.
  • kappa (double, defaults to 1.0) –

    A coefficient in ]0.0,1.0] which give weight or not to the new points compared to the ones already used.

    • If kappa is nearly null, the new points have a big weight and the model may take a lot of time to stabilize.
    • If kappa = 1.0, the new points won’t have a lot of weight and the model may not move enough from its initialization.
name

str – The name of the method : ‘Kmeans’

log_weights

array of floats (n_components) – Contains the logarithm of the mixing coefficients of the model.

means

array of floats (n_components,dim) – Contains the computed means of the model.

iter

int – The number of points which have been used to compute the model.

Raises:ValueError : if the parameters are inconsistent, for example if the cluster number is negative, init_type is not in [‘resp’,’mcw’]...

References

Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling, C. Dupuy & F. Bach ‘The remarkable k-means++ <https://normaldeviate.wordpress.com/2012/09/30/the-remarkable-k-means/>’_

fit(points_data, points_test=None, saving=None, file_name='model', check_convergence_iter=None, saving_iter=2)

The k-means algorithm

Parameters:
  • points_data (array (n_points,dim)) – A 2D array of points on which the model will be trained.
  • saving_iter (int | defaults 2) – An int to know how often the model is saved (see saving below).
  • file_name (str | defaults model) – The name of the file (including the path).
Other Parameters:
 
  • points_test (an array (n_points2,dim) | Optional) – Data used to do early stopping (avoid overfitting)
  • check_convergence_iter (int | Optional) – If points_test are given, convergence criterion will be computed every check_convergence_iter iterations. If no value is given and points_test is not None, it will raise an Error.
  • saving (str | Optional) – A string in [‘log’,’linear’]. In the following equations x is the parameter saving_iter (see above).
    • If ‘log’, the model will be saved for all iterations which verify :
      log(iter)/log(x) is an int
    • If ‘linear’ the model will be saved for all iterations which verify :
      iter/x is an int
Returns:

Return type:

None

get(name)

A getter to allow the user to get the attributes with the cython version.

Parameters:name (str) – The name of the parameter. Must be in [‘_is_initialized’,’log_weights’, ‘means’,’iter’,’window’,’kappa’,’name’]
Returns:
Return type:The wanted parameter (may be an array, a boolean, an int or a string)
initialize(points)

This method initializes the Gaussian Mixture by setting the values of the means, covariances and weights.

Parameters:points (an array (n_points,dim)) – Data on which the model is initialized.
predict_assignements(points)

This function return the hard assignements of points once the model is fitted.

score(points, assignements=None)

This method returns the distortion measurement at the end of the k_means.

Parameters:
  • points (an array (n_points,dim)) –
  • assignements (an array (n_components,dim)) – an array containing the responsibilities of the clusters
Returns:

distortion

Return type:

(float)

Gaussian Mixture Model (GMM)

class megamix.online.GaussianMixture(n_components=1, kappa=1.0, reg_covar=1e-06, window=1, update=False)

Gaussian Mixture Model

Representation of a Gaussian mixture model probability distribution. This class allows to estimate the parameters of a Gaussian mixture distribution (with full covariance matrices only).

Parameters:
  • n_components (int, defaults to 1) – Number of clusters used.
  • kappa (double, defaults to 1.0) –

    A coefficient in ]0.0,1.0] which give weight or not to the new points compared to the ones already used.

    • If kappa is nearly null, the new points have a big weight and the model may take a lot of time to stabilize.
    • If kappa = 1.0, the new points won’t have a lot of weight and the model may not move enough from its initialization.
  • window (int, defaults to 1) – The number of points used at the same time in order to update the parameters.
  • update (bool, defaults to False) – If True, the matrices of Cholesky of covariance matrices are updated, else they are computed at each iteration. Set it to True if window < dimension of the problem.
  • reg_covar (float, defaults to 1e-6) – In order to avoid null covariances this float is added to the diagonal of covariance matrices.
name

str – The name of the method : ‘GMM’

cov

array of floats (n_components,dim,dim) – Contains the computed covariance matrices of the mixture.

means

array of floats (n_components,dim) – Contains the computed means of the mixture.

log_weights

array of floats (n_components,) – Contains the logarithm of weights of each cluster.

iter

int – The number of iterations computed with the method fit()

Raises:ValueError : if the parameters are inconsistent, for example if the cluster number is negative, init_type is not in [‘resp’,’mcw’]...

References

Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling, C. Dupuy & F. Bach

fit(points_data, points_test=None, saving=None, file_name='model', check_convergence_iter=None, saving_iter=2)

The EM algorithm

Parameters:
  • points_data (array (n_points,dim)) – A 2D array of points on which the model will be trained.
  • saving_iter (int | defaults 2) – An int to know how often the model is saved (see saving below).
  • file_name (str | defaults model) – The name of the file (including the path).
Other Parameters:
 
  • points_test (an array (n_points2,dim) | Optional) – Data used to do early stopping (avoid overfitting)
  • check_convergence_iter (int | Optional) – If points_test are given, convergence criterion will be computed every check_convergence_iter iterations. If no value is given and points_test is not None, it will raise an Error.
  • saving (str | Optional) – A string in [‘log’,’linear’]. In the following equations x is the parameter saving_iter (see above).
    • If ‘log’, the model will be saved for all iterations which verify :
      log(iter)/log(x) is an int
    • If ‘linear’ the model will be saved for all iterations which verify :
      iter/x is an int
Returns:

Return type:

None

get(name)

A getter to allow the user to get the attributes with the cython version.

Parameters:name (str) – The name of the parameter. Must be in [‘_is_initialized’,’log_weights’, ‘means’,’cov’,’cov_chol’,’iter’,’window’,’kappa’,’name’]
Returns:
Return type:The wanted parameter (may be an array, a boolean, an int or a string)
initialize(points)

This method initializes the Gaussian Mixture by setting the values of the means, covariances and weights.

Parameters:points (an array (n_points,dim)) – Data on which the model is initialie using the seeds of kmeans++.
predict_log_resp(points)

This function returns the logarithm of each point’s responsibilities

Parameters:points (array (n_points_bis,dim)) – a 1D or 2D array of points with the same dimension as the problem
Returns:log_resp – the logarithm of the responsibilities
Return type:array (n_points_bis,n_components)
read_and_init(group, points)

A method reading a group of an hdf5 file to initialize DPGMM

Parameters:group (HDF5 group) – A group of a hdf5 file in reading mode
score(points)

This function return the score of the function, which is the logarithm of the likelihood for GMM and the logarithm of the lower bound of the likelihood for VBGMM and DPGMM

Parameters:points (array (n_points_bis,dim)) – a 1D or 2D array of points with the same dimension as the problem
Returns:score
Return type:float
write(group)

A method creating datasets in a group of an hdf5 file in order to save the model

Parameters:group (HDF5 group) – A group of a hdf5 file in reading mode