Clustering API¶

Clustering API.

Methods to cluster output of ensemble simulations.

enstools.clustering.cluster(algorithm, data, n_clusters=None, n_clusters_max=None, sort=True, **kwargs)¶

Parameters:

algorithmstr

clustering method to use. all sklearn methods are supported.

agglo: sklearn.cluster.AgglomerativeClustering

aprop: sklearn.cluster.AffinityPropagation

birch: sklearn.cluster.Birch

dbscan: sklearn.cluster.DBSCAN

kmeans: sklearn.cluster.KMeans

mshift: sklearn.cluster.MeanShift

spectral: sklearn.cluster.SpectralClustering

dataxarray.DataArray or np.ndarray

the input data for the clustering. This is the output of the prepare() function.

n_clustersint or None

number of clusters to create. An Integer will create the specified number of clusters. None will try to estimate the number of clusters using the silhouette score if the algorithm supports the prescription of the number of clusters.

n_clusters_maxint

maximal number of clusters to create. The default is the number of ensemble members divided by four.

sortbool

If True, the clusters are sorted in order to create better reproducible results.

**kwargs

all

Returns:

np.ndarray: 1d array with cluster labels.

enstools.clustering.prepare(*variables, **kwargs)¶

Parameters:

variablesxarray.DataArray or np.ndarray

one or more variables which should be used for the clustering.

**kwargs

ens_dimint or string: index or name of the dimension along which the clustering should be performed. If not provided, xarrays will be scanned for standard ensemble dimension names (ens, ensemble, member, members). If no standard ensemble dimension is found, dimension 0 is used. That is also the default for numpy arrays.

Returns:

np.ndarray: 2d-array with dimensions (ensemble, feature) where ensemble is the dimension along which the clustering should be performed and feature is the product of grid points and number of variables.