Skip to content

Latest commit

 

History

History
95 lines (73 loc) · 4.51 KB

release-0.2.md

File metadata and controls

95 lines (73 loc) · 4.51 KB

ML.NET 0.2 Release Notes

We would like to thank the community for the engagement so far and helping us shape ML.NET.

Today we are releasing ML.NET 0.2. This release focuses on addressing questions/issues, adding clustering to the list of supported machine learning tasks, enabling using data from memory to train models, easier model validation, and more.

Installation

ML.NET supports Windows, MacOS, and Linux. See supported OS versions of .NET Core 2.0 for more details.

You can install ML.NET NuGet from the CLI using:

dotnet add package Microsoft.ML

From package manager:

Install-Package Microsoft.ML

Release Notes

Below are some of the highlights from this release.

  • Added clustering to the list of supported machine learning tasks

    • Clustering is an unsupervised learning task that groups sets of items based on their features. It identifies which items are more similar to each other than other items. This might be useful in scenarios such as organizing news articles into groups based on their topics, segmenting users based on their shopping habits, and grouping viewers based on their taste in movies.

    • ML.NET 0.2 exposes KMeansPlusPlusClusterer which implements K-Means++ clustering with Yinyang K-means acceleration. This test shows how to use it (from #222).

  • Train using data objects in addition to loading data from a file using CollectionDataSource. ML.NET 0.1 enabled loading data from a delimited text file. CollectionDataSource in ML.NET 0.2 adds the ability to use a collection of objects as the input to a LearningPipeline. See sample usage here (from #106).

  • Easier model validation with cross-validation and train-test

    • Cross-validation is an approach to validating how well your model statistically performs. It does not require a separate test dataset, but rather uses your training data to test your model (it partitions the data so different data is used for training and testing, and it does this multiple times). Here is an example for doing cross-validation (from #212).

    • Train-test is a shortcut to testing your model on a separate dataset. See example usage here.

    • Note that the LearningPipeline is prepared the same way in both cases.

  • Speed improvement for predictions: by not creating a parallel cursor for dataviews that only have one element, we get a significant speed-up for predictions (see #179 for a few measurements).

  • Updated TextLoader API: the TextLoader API is now code generated and was updated to take explicit declarations for the columns in the data, which is required in some scenarios. See #142.

  • Added daily NuGet builds of the project: daily NuGet builds of ML.NET are now available here.

Additional issues closed in this milestone can be found here.

Acknowledgements

Shoutout to tincann, rantri, yamachu, pkulikov, Sorrien, v-tsymbalistyi, Ky7m, forki, jessebenson, mfaticaearnin, and the ML.NET team for their contributions as part of this release!