diff --git a/README.md b/README.md index 04547d7..0bb4828 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,32 @@ number of clusters is unknown. Unlike K-means, however, DP-means is hard to para ### Installation `pip install pdc-dp-means` +Installation requires `scikit-learn>=1.2,<1.3` and `numpy >= 1.23.0`. +### Quick Start + + from sklearn.datasets import make_blobs + from pdc_dp_means import DPMeans + + # Generate sample data + X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0) + + # Apply DPMeans clustering + dpmeans = DPMeans(n_clusters=1,n_init=10, delta=10) # n_init and delta parameters + dpmeans.fit(X) + + # Predict the cluster for each data point + y_dpmeans = dpmeans.predict(X) + + # Plotting clusters and centroids + import matplotlib.pyplot as plt + + plt.scatter(X[:, 0], X[:, 1], c=y_dpmeans, s=50, cmap='viridis') + centers = dpmeans.cluster_centers_ + plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5) + plt.show() + +One thing to note is that we replace the `\lambda` parameter from the paper with `delta` in the code, as `lambda` is a reserved word in python. + ### Usage Please refer to the documentation: https://pdc-dp-means.readthedocs.io/en/latest/ @@ -72,4 +98,4 @@ If you use this code for your work, please cite the following: } ``` ### License -Our code is licensed under the BDS-3-Clause license. \ No newline at end of file +Our code is licensed under the BDS-3-Clause license. diff --git a/docs/requirements.txt b/docs/requirements.txt index 57eacee..3ce931a 100644 --- a/docs/requirements.txt +++ b/docs/requirements.txt @@ -1,3 +1,3 @@ -numpy<=1.3.0 +numpy>=1.3.0 scikit-learn==1.2.2 -pdc-dp-means \ No newline at end of file +pdc-dp-means diff --git a/docs/source/index.rst b/docs/source/index.rst index d554131..85f6a4a 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -1,12 +1,35 @@ Welcome to PDC-DP-Means documentation! ====================================== -**PDC-DP-Means** is a Python library for running fast, scalable DP-Means or Mini-Batch DP-Means. It is built on top scikit-learn and numpy. +**PDC-DP-Means** is a Python library for running fast, scalable DP-Means or Mini-Batch DP-Means. It is built on top of scikit-learn and numpy. Check out the :doc:`usage` section for further information, including -how to :ref:`installation` the project. +how to :ref:`install ` the project. +Quickstart +---------- +.. code-block:: python + from sklearn.datasets import make_blobs + from pdc_dp_means import DPMeans + + # Generate sample data + X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0) + + # Apply DPMeans clustering + dpmeans = DPMeans(n_clusters=1,n_init=10, delta=10) # n_init and delta parameters + dpmeans.fit(X) + + # Predict the cluster for each data point + y_dpmeans = dpmeans.predict(X) + + # Plotting clusters and centroids + import matplotlib.pyplot as plt + + plt.scatter(X[:, 0], X[:, 1], c=y_dpmeans, s=50, cmap='viridis') + centers = dpmeans.cluster_centers_ + plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5) + plt.show() Contents -------- @@ -29,4 +52,4 @@ If you use this package for your reseach, please cite the following paper: author={Dinari, Or and Freifeld, Oren}, booktitle={The 38th Conference on Uncertainty in Artificial Intelligence}, year={2022} - } \ No newline at end of file + }