Core ML and Vision: Machine Learning in iOS 11 Tutorial

Jun 10, 2017 | 5397 Views

Note: This tutorial requires Xcode 9 Beta 1 or later, Swift 4 and iOS 11.

Machine learning is all the rage. Many have heard about it, but few know what it is.

This iOS machine learning tutorial will introduce you to Core ML and Vision, two brand-new frameworks introduced in iOS 11.

Specifically, you'll learn how to use these new APIs with the Places205-GoogLeNet model to classify the scene of an image.

Getting Started

Download the starter project. It already contains a user interface to display an image and let the user pick another image from their photo library. So you can focus on implementing the machine learning and vision aspects of the app.

iOS Machine Learning

Machine learning is a type of artificial intelligence where computers "learn" without being explicitly programmed. Instead of coding an algorithm, machine learning tools enable computers to develop and refine algorithms, by finding patterns in huge amounts of data.

Deep Learning

Since the 1950s, AI researchers have developed many approaches to machine learning. Apple's Core ML framework supports neural networks, tree ensembles, support vector machines, generalized linear models, feature engineering and pipeline models. However, neural networks have produced many of the most spectacular recent successes, starting with Google's 2012 use of YouTube videos to train its AI to recognize cats and people. Only five years later, Google is sponsoring a contest to identify 5000 species of plants and animals. Apps like Siri and Alexa also owe their existence to neural networks.

A neural network tries to model human brain processes with layers of nodes, linked together in different ways. Each additional layer requires a large increase in computing power: Inception v3, an object-recognition model, has 48 layers and approximately 20 million parameters. But the calculations are basically matrix multiplication, which GPUs handle extremely efficiently. The falling cost of GPUs enables people to create multilayer deep neural networks, hence the term deep learning.

Neural networks need a large amount of training data, ideally representing the full range of possibilities. The explosion in user-generated data has also contributed to the renaissance of machine learning.

Training the model means supplying the neural network with training data, and letting it calculate a formula for combining the input parameters to produce the output(s). Training happens offline, usually on machines with many GPUs.

To use the model, you give it new inputs, and it calculates outputs: this is called inferencing. Inference still requires a lot of computing, to calculate outputs from new inputs. Doing these calculations on handheld devices is now possible because of frameworks like Metal.

As you'll see at the end of this tutorial, deep learning is far from perfect. It's really hard to construct a truly representative set of training data, and it's all too easy to over-train the model so it gives too much weight to quirky characteristics.

What Does Apple Provide?

Apple introduced NSLinguisticTagger in iOS 5 to analyze natural language. Metal came in iOS 8, providing low-level access to the device's GPU.

Last year, Apple added Basic Neural Network Subroutines (BNNS) to its Accelerate framework, enabling developers to construct neural networks for inferencing (not training).

And this year, Apple has given you Core ML and Vision!

> Core ML makes it even easier to use trained models in your apps.

> Vision gives you easy access to Apple's models for detecting faces, face landmarks, text, rectangles, barcodes, and objects.

You can also wrap any image-analysis Core ML model in a Vision model, which is what you'll do in this tutorial. Because these two frameworks are built on Metal, they run efficiently on the device, so you don't need to send your users' data to a server.

Integrating a Core ML Model Into Your App

This tutorial uses the Places205-GoogLeNet model, which you can download from Apple's Machine Learning page. Scroll down to Working with Models, and download the first one. While you're there, take note of the other three models, which all detect objects - trees, animals, people, etc. - in an image.

Note: If you have a trained model created with a supported machine learning tool such as Caffe, Keras or scikit-learn, Converting Trained Models to Core ML describes how you can convert it to Core ML format.

You can download the complete project for this tutorial here. If the model shows up as missing, replace it with the one you downloaded.

You're now well-equipped to integrate an existing model into your app. Here's some resources that cover this in more detail:

Apple's Core ML Framework documentation

WWDC 2017 Session 703 Introducing Core ML

WWDC 2017 Session 710 Core ML in depth

From 2016:

WWDC 2016 Session 605 What's New in Metal, Part 2: demos show how fast the app does the Inception model classification calculations, thanks to Metal.

> Apple's Basic Neural Network Subroutines documentation

Thinking about building your own model? I'm afraid that's way beyond the scope of this tutorial (and my expertise). These resources might help you get started:

RWDevCon 2017 Session 3 Machine Learning in iOS: Alexis Gallagher does an absolutely brilliant job, guiding you through the process of collecting training data (videos of you smiling or frowning) for a neural network, training it, then inspecting how well (or not) it works. His conclusion: "You can build useful models without being either a mathematician or a giant corporation."

> Quartz article on Apple's AI research paper: Dave Gershgorn's articles on AI are super clear and informative. This article does an excellent job of summarizing Apple's first AI research paper: the researchers used a neural network trained on real images to refine synthetic images, thus efficiently generating tons of high-quality new training data, free of personal data privacy issues.

Last but not least, I really learned a lot from this concise history of AI from Andreessen Horowitz's Frank Chen: AI and Deep Learning a16z podcast.

I hope you found this tutorial useful. Continue Reading>>

Source: Raywenderlich