Clustering API¶
Methods to cluster output of ensemble simulations.
- enstools.clustering.cluster(algorithm, data, n_clusters=None, n_clusters_max=None, sort=True, **kwargs)¶
- Parameters:
- algorithmstr
clustering method to use. all sklearn methods are supported.
agglo:
sklearn.cluster.AgglomerativeClusteringaprop:
sklearn.cluster.AffinityPropagationbirch:
sklearn.cluster.Birchdbscan:
sklearn.cluster.DBSCANkmeans:
sklearn.cluster.KMeansmshift:
sklearn.cluster.MeanShiftspectral:
sklearn.cluster.SpectralClustering- dataxarray.DataArray or np.ndarray
the input data for the clustering. This is the output of the
prepare()function.- n_clustersint or None
number of clusters to create. An Integer will create the specified number of clusters. None will try to estimate the number of clusters using the silhouette score if the algorithm supports the prescription of the number of clusters.
- n_clusters_maxint
maximal number of clusters to create. The default is the number of ensemble members divided by four.
- sortbool
If True, the clusters are sorted in order to create better reproducible results.
- **kwargs
all
- Returns:
- np.ndarray
1d array with cluster labels.
- enstools.clustering.prepare(*variables, **kwargs)¶
- Parameters:
- variablesxarray.DataArray or np.ndarray
one or more variables which should be used for the clustering.
- **kwargs
- ens_dimint or string
index or name of the dimension along which the clustering should be performed. If not provided, xarrays will be scanned for standard ensemble dimension names (ens, ensemble, member, members). If no standard ensemble dimension is found, dimension 0 is used. That is also the default for numpy arrays.
- Returns:
- np.ndarray
2d-array with dimensions (ensemble, feature) where ensemble is the dimension along which the clustering should be performed and feature is the product of grid points and number of variables.