IoTPy.modules.ML.KMeans package

Submodules

IoTPy.modules.ML.KMeans.KMeansStream module

class IoTPy.modules.ML.KMeans.KMeansStream.KMeansStream(draw, output, k, incremental=True, figsize=(1000, 500))[source]

Helper class for kmeans clustering.

This class provides train and predict functions for using kmeans with Stream_Learn.

Parameters:

draw : boolean

Describes whether the data is to be plotted (data must have 2 or less dimensions).

output : boolean

Describes whether debug info is to be printed. Info includes average error, average number of iterations, current number of iterations, and number of changed points over time.

k : int

Describes the number of clusters to train.

incremental : boolean, optional

Describes whether the kmeans algorithm is run incrementally or not (the default is True). If incremental, then previous clusters are used to initialize new clusters. Otherwise, clusters are reinitialized randomly for each window.

figsize : tuple, optional

A tuple containing the width and height of the plot for the map (the default is (15, 8)).

Attributes

train (function) The train function with signature as required by Stream_Learn.
predict (function) The predict function with signature as required by ‘Stream_Learn’.
avg_iterations (float) The average number of iterations per window of data trained.
avg_error (float) The average error per window of data trained.

Methods

reset() Resets the KMeans functions and average values.
reset()[source]

Resets the KMeans functions and average values.

Resets: train, predict, avg_iterations, avg_error

class IoTPy.modules.ML.KMeans.KMeansStream.Model(k)[source]

IoTPy.modules.ML.KMeans.kmeans module

IoTPy.modules.ML.KMeans.kmeans.computeCentroids(X, index, k)[source]

Finds the centroids for the data given the index of the closest centroid for each data point.

Parameters:

X : numpy.ndarray

A numpy array with dimensions n * 2 for some integer n.

index : numpy.ndarray

A numpy array with dimensions n * 1 that describes the closest centroid to each point in X.

k : int

Describes the number of centroids. k - 1 is the maximum value that appears in index.

Returns:

centroids : numpy.ndarray

A numpy array with dimensions k * 2.

Notes

The centroids are computed by taking the mean of each group of points in X with the same index value. For i in [0, k), centroids[i] is the mean of all data points X[j] where index[j] is i.

IoTPy.modules.ML.KMeans.kmeans.evaluate_error(X, centroids, index)[source]

Returns the mean squared error.

Parameters:

X : numpy.ndarray

A numpy array with 2 columns.

centroids : numpy.ndarray

A numpy array with 2 columns.

index : numpy.ndarray

A numpy array with 1 column.

Returns:

float

The mean squared error.

Notes

The mean squared error is calculated as the average squared distance of each point from the closest centroid.

IoTPy.modules.ML.KMeans.kmeans.findClosestCentroids(X, centroids)[source]

Returns a numpy array containing the index of the closest centroid for each point in X.

Parameters:

X : numpy.ndarray

A numpy array with 2 columns.

centroids : numpy.ndarray

A numpy array with 2 columns.

Returns:

index : numpy.ndarray

A numpy array with dimensions n * 1, where n is the number of rows in X. For each row i in index, index[i] is in [0, k) where k is the number of rows in centroids.

IoTPy.modules.ML.KMeans.kmeans.init_plot(figsize=(1000, 500))[source]

Initializes the plot.

Parameters:

figsize : tuple, optional

A tuple containing the width and height of the plot (the default is (1000, 800)).

IoTPy.modules.ML.KMeans.kmeans.initialize(k, low, high)[source]

Returns k random points with x and y coordinates in [low, high).

Parameters:

k : int

The number of points to return.

low : int

The lower bound (inclusive) for a point.

high : int

The upper bound (exclusive) for a point.

Returns:

centroids : numpy.ndarray

Numpy array with dimensions k by 2.

IoTPy.modules.ML.KMeans.kmeans.initializeCentroids(X, k)[source]

Returns k random points from the data X without replacement.

Parameters:

X : numpy.ndarray

A numpy array with dimensions n * 2, where n >= k.

k : int

The number of points to return

Returns:

numpy.ndarray

Numpy array with dimensions k by 2.

IoTPy.modules.ML.KMeans.kmeans.initializeData(n, k, scale, low, high)[source]

Initialize n points around k random centroids each with a normal distribution and scale.

Parameters:

n : int

Describes the numbe of points to make around each centroid.

k : int

Describes the number of centroids.

scale : int

Describes the scale for the distribution.

low : int

The lower bound (inclusive) for a centroid.

high : int

The upper bound (exclusive) for a centroid.

Returns:

X : numpy.ndarray

A numpy array with dimensions (n * k) * 2.

IoTPy.modules.ML.KMeans.kmeans.initializeDataCenter(centroid, scale, n)[source]

Initialize n points with a normal distribution and scale around a centroid.

Parameters:

centroid : numpy.ndarray

Numpy array with dimensions 1 * 2.

scale : int

Describes the scale for the distribution.

n : int

Describes the number of points to make.

Returns:

X : numpy.ndarray

A numpy array with dimensions n * 2.

IoTPy.modules.ML.KMeans.kmeans.kmeans(X, k, initial_centroids=None, draw=False, output=False, source=None)[source]

Runs kmeans until clusters stop moving.

Parameters:

X : numpy.ndarray

A numpy array with 2 columns.

k : int

Describes the number of centroids.

initial_centroids : numpy.ndarray, optional

A numpy array with initial centroids to run the algorithm. This array has with dimensions k * 2. If not provided, algorithm is initialized with random centroids from the data X.

draw : boolean, optional

Describes whether the data is to be plotted (data must have 2 or less dimensions). The default is False.

output : boolean, optional

Describes whether debug info is to be printed (the default is False). Info includes current number of iterations and number of changed points over time.

Returns:

centroids : numpy.ndarray

Numpy array with learned centroids (dimensions are k * 2).

index : numpy.ndarray

Numpy array with dimensions n * 1, where n is the number of rows in X. Each value describes the closest centroid to each data point in X.

num_iters : int

Describes the number of iterations taken to run kmeans.

IoTPy.modules.ML.KMeans.kmeans.plotKMeans(X, centroids, previous, index, source)[source]

Plots the data and centroids.

This function plots the data with the current centroids and shows the movement of the centroids.

Parameters:

X : numpy.ndarray

A numpy array with 2 columns.

centroids : numpy.ndarray

A numpy array with 2 columns.

previous : numpy.ndarray

A numpy array with 2 columns and the same number of rows as centroids.

index : numpy.ndarray

A numpy array with 1 column.

source : list

List of ColumnDataSource

Module contents