Skip to content
Madison J Myers edited this page Oct 11, 2017 · 3 revisions

Welcome to the Using-the-Apache-SystemML-API-on-a-Spark-Shell-with-IBM-Analytics-Engine wiki!

Short Name

Learn to use the Apache SystemML API while using Apache Spark Shell on the IBM Analytics Engine to complete computations in Scala.

Offering Type

Data Analytics

Introduction

Apache SystemML is an open source machine learning tool that can run locally or in conjunction with big data tools such as Apache Spark. This journey highlights the positive user experience of the Apache SystemML API when working with the Spark Shell, especially when using the IBM Analytics Engine (IAE). This journey also runs through starting a Spark cluster with IAE, SSHing in to IAE to start the SystemML API and some foundational steps in a data scientist’s workflow such as parallelizing information, reading in matrices as RDDs, getting sums of messages and getting your data into Apache Spark. This journey is built for beginners who are not as familiar with Apache Spark or the IBM Analytics Engine and who are brand new to Apache SystemML. This journey will demonstrate the ease of use of the API and IAE and the efficiency it gives when working on your data science pipeline!

Author

by Madison J. Myers

Code

https://github.com/MadisonJMyers/Using-the-Apache-SystemML-API-on-a-Spark-Shell-

Demo

N/A

Video

N/A

Overview

Apache SystemML and Apache Spark are invaluable big data tools, but are sometimes confusing to use and take a long time to get used to, especially when there are few beginner tutorials out there. In this journey I will demonstrate how to set up an IAE Spark cluster and how to use the Apache SystemML API on top of it. This flow is ideal for data exploration and testing in your data science project. After getting the API and cluster set up, I will then quickly show you how to do some basic scala steps to help you get up and running on your project!

When you have completed this journey, you will understand how to:

Spin up a Spark cluster using the IBM Analytics Engine.
SSH into the Spark Cluster.
Set up and start the SystemML API.
Download data into the Spark Shell.
Parallelize the information, read in two matrices as RDDs and get the output.
Execute your script.

Flow

The user starts a new Spark cluster with the IBM Analytics Engine. 
The user SSHs into the Spark Cluster and starts the SystemML API using Spark Shell.
The user downloads a data and loads it into a data frame.
The user runs through some commands using scala.

Included Components

Get familiar with Apache SystemML and its API while also using Apache Spark Shell with the IBM Analytics Engine.

Featured Technologies

Apache SystemML API: an API to access the machine learning platform, optimal for big data.

Apache Spark: a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.

IBM Analytics Engine (IAE): A platform that lets you easily spin up a Spark cluster and SSH into it to complete your data science pipeline.

Links

https://spark.apache.org/
https://www.ibm.com/analytics/us/en/technology/spark/
http://researcher.watson.ibm.com/researcher/view_group.php?id=3174
http://systemml.apache.org/
https://developer.ibm.com/clouddataservices/docs/ibm-analytics-engine/

Blog by Madison J. Myers

0 to Life-Changing App: Using the Apache SystemML API on a Spark Shell with IBM Analytics Engine

SystemML on Spark Shell using the IBM Analytics Engine (IAE)? Yes!

A very simple way of using SystemML for all of your machine learning and big data needs. This tutorial will get you set up and running SystemML on the Spark Shell using IAE like a star.

Not familiar with Apache SystemML?

At a high-level, SystemML is what is used for the machine learning and mathematical part of your data science project. You can log into Spark Shell, load SystemML on the shell, load your data and write your linear algebra, statistical equations, matrices, etc. in code much shorter than it would be in the Spark shell syntax. It helps not only with mathematical exploration and machine learning algorithms, but it also allows you to be on Spark where you can do all of the above with really big data that you couldn't use on your local computer. Not familiar with how to set up an Apache Spark cluster?

By using the IBM Analytics Engine you can spin up a Spark cluster in just a few minutes using the web user interface.

With both of these tools, I'll walk you through how to set up your computer for all of SystemML's assumptions, how to set up IAE and your spark cluster, SSH in to connect to your Spark cluster on your computer and load Spark Shell with SystemML,then load some data and do a few examples in scala. Whew that's a lot, but we I promise I go through it all!

Now let's get going on our learning.