Nand Kishor Contributor

Nand Kishor is the Product Manager of House of Bots. After finishing his studies in computer science, he ideated & re-launched Real Estate Business Intelligence Tool, where he created one of the leading Business Intelligence Tool for property price analysis in 2012. He also writes, research and sharing knowledge about Artificial Intelligence (AI), Machine Learning (ML), Data Science, Big Data, Python Language etc... ...

Follow on

Nand Kishor is the Product Manager of House of Bots. After finishing his studies in computer science, he ideated & re-launched Real Estate Business Intelligence Tool, where he created one of the leading Business Intelligence Tool for property price analysis in 2012. He also writes, research and sharing knowledge about Artificial Intelligence (AI), Machine Learning (ML), Data Science, Big Data, Python Language etc...

3 Best Programming Languages For Internet of Things Development In 2018
62 days ago

Data science is the big draw in business schools
235 days ago

7 Effective Methods for Fitting a Liner
245 days ago

3 Thoughts on Why Deep Learning Works So Well
245 days ago

3 million at risk from the rise of robots
245 days ago

Top 10 Hot Artificial Intelligence (AI) Technologies
244218 views

Here's why so many data scientists are leaving their jobs
76860 views

Want to be a millionaire before you turn 25? Study artificial intelligence or machine learning
70404 views

2018 Data Science Interview Questions for Top Tech Companies
62328 views

Google announces scholarship program to train 1.3 lakh Indian developers in emerging technologies
58029 views

Build your own machine-learning-powered robot arm using TensorFlow and Google Cloud

Jun 14, 2017 | 6111 Views

At Google I/O 2017 and Cloud Next 2017, we exhibited a demo called Find Your Candy, a robot arm that listens for a voice request with your preferred flavor of candy, selects and picks up a piece of candy with that particular flavor from a table, and serves it to you:


Specifically, you can tell the robot what flavor you like, such as "chewy candy," "sweet chocolate" or "hard mint." The robot then processes your instructions via voice recognition and natural language processing, recommends a particular kind of candy and uses image recognition to recognize and select that recommendation. The entire demo is powered by deep-learning technology running on Cloud Machine Learning Engine (the fully-managed TensorFlow runtime from Google Cloud) and Cloud machine learning APIs.

This demo is intended to serve as a microcosm of a real-world machine learning (ML) solution. For example, Kewpie, a major food manufacturer in Japan, used the same Google Cloud technology to build a successful Proof of Concept (PoC) for doing anomaly detection for diced potato in a factory. (See this video for more details).

We also designed the demo to be easily reproduced by developers, with the code published on GitHub. So if you buy the hardware (the robot arm, a Linux PC and other parts - costs around $2,500 total), you should be able to build the same demo, learning how to integrate multiple machine-learning technologies along the way (and having a lot of fun doing it)!

In the remainder of this post, we'll describe how the demo works in detail and share some challenges we encountered when using it in the field.

Understanding your voice request

The demo starts with your voice request. The robot arm has a web UI app that uses Web Speech API, which uses the same deep learning-based speech recognition engine as Cloud Speech API only exposed as an API for web apps. The recognized text is sent to the Linux PC that serves as the controller, calling Cloud Natural Language API for extracting words and syntax in the sentence. For example, if you were to say "I like chewy mint candy," Cloud Natural Language API would return syntactic analysis results like the following.

Using "word embeddings" for smarter recommendation

So now the robot arm can learn what kind of flavors you like, such as "chewy," "mint" and "candy." It could use these words directly to pick one of the candies from the table. But instead, even better, the robot runs the word2vec algorithm first to calculate word embeddings for each word in order to make a smarter candy recommendation.

A word embedding is a technique to create a vector (an array with hundreds of numbers) that encapsulates the meaning of each word. You can see how it works by visiting the Embedding Projector demo and clicking on any dots in the cloud.

As you can see in the demo, each vector has many other vectors close to it that have similar meanings. For the word "music," you'd see words like "songs," "artists" and "dances" closer to that word. This is another powerful application of ML for natural-language processing that adds intelligence to the demo. For example, if you say "creamy" or "spicy," the robot arm knows they have similar meanings to "milky" or "hot" respectively. (You can see the word embeddings analysis result on the Web UI of the robot.)

Instead of using the spoken words directly, using the word embeddings allows the robot to pick the best available candy on the table even if there's no direct match.

Teach the robot with transfer learning

Now it's time to have the robot pick up a candy with the words represented as word embeddings. At this point, you might wonder: "How can the robot know the flavor of each candy on the table?" The answer is: you need to run the robot arm in learning mode before operating it.

The learning mode involves the following steps:

The documents camera takes a photo of the candies and labels.

The Linux PC sends it to Cloud Vision API to read the text on the labels.

The Linux PC runs OpenCV to recognize the shape of each candy and cuts it out as a smaller image.

At this point, we have several pairs of labels and images for each candy on the table. If you've learned the basics of deep learning, you may know that we can train image-recognition models such as Inception-v3 with these pairs. However, you would need tens of thousands of those pairs, and take hours or days to finish it, to train the model from scratch. That's the challenge of deep learning: you need tons of training data and lots and lots of time for training the model.

To avoid this problem, we use transfer learning, a popular technique to reduce the size of training data, to shorten training time significantly. With transfer learning, you use pre-trained models to feature vectors of each image. The vector has a bunch of numbers in it encapsulating image features, such as color, shape, pattern and texture of the image. It's similar to the vectors you get with word embeddings for each word. Continue Reading>>

Source: Google Cloud Platform