How I Get Trained Machine Learning Models Online For Free - GPU, TPU Enabled

Oct 18, 2018 | 9819 Views

Computation power needed to train machine learning and deep learning model on large data sets has always been a huge hindrance for machine learning enthusiast. But with jupyter notebook which runs on the cloud, anyone who has the passion to learn can train and come up with great results.

In this post, I will be providing information about the various service that gives us the computation power to us for training models.

  1. Google Colab
  2. Kaggel Kernel
  3. Jupyter Notebook on GCP
  4. Amazon SageMaker
  5. Azure Notebooks

1)Google Colab
The collaboratory is a google research project created to help disseminate machine learning education and research. Collaboratory (colab) provides free Jupyter notebook environment that requires no setup and runs entirely in the cloud. It comes pre-installed with most of the machine learning libraries, it acts as a perfect place where you can plug and play and try out stuff where dependency and compute is not an issue.

The notebooks are connected to your google drive, so you can access it any time you want, and also upload or download notebook from GitHub.

GPU and TPU enabling
First, you'll need to enable GPU or TPU for the notebook.

Navigate to Edit ->Notebook Settings, and select TPU from the Hardware Accelerator drop-down.

code to check whether TPU is enabled
import os
import pprint
import tensorflow as tf
if ‚??COLAB_TPU_ADDR‚?? not in os.environ:
print(‚??ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!‚??)
else:
tpu_address = ‚??grpc://‚?? + os.environ[‚??COLAB_TPU_ADDR‚??]
print (‚??TPU address is‚??, tpu_address)
with tf.Session(tpu_address) as session:
devices = session.list_devices()

print(‚??TPU devices:‚??)
pprint.pprint(devices)

Installing libraries
Colab comes with most of machine learning libraries installed, but you can also add libraries easily which are not pre-installed.

Colab supports both the pip and apt package managers.

!pip install torch
apt command
!apt-get install graphviz -y

both commands work in colab, don't forget the! (exclamatory) before the command.

Uploading Datasets
There are many ways to upload datasets to the notebook

  • One can upload files from the local machine.
  • Upload files from google drive
  • One can also directly upload datasets from kaggle

Code to upload from local

from google.colab import files
uploaded = files.upload()

you can browse and select the file.

Upload files from google drive

PyDrive library is used to upload and files from google drive

!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# PyDrive reference:
# https://gsuitedevs.github.io/PyDrive/docs/build/html/index.html
# 2. Create & upload a file text file.
uploaded = drive.CreateFile({'title': 'Sample upload.txt'})
uploaded.SetContentString('Sample upload file content')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))
# 3. Load a file by ID and print its contents.
downloaded = drive.CreateFile({'id': uploaded.get('id')})
print('Downloaded content "{}"'.format(downloaded.GetContentString()))
You can get id of the file you want to upload and use the above code.

For more resource to upload files from google services.

Uploading dataset from kaggle

We need to install kaggle api and add authentication json file which you can download from kaggle website(API_TOKEN).

!pip install kaggle

upload the JSON file to the notebook by, uploading file from the local machine.

create a /.kaggle directory

copy the JSON file to the kaggle directory

change the file permission

!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

Now you can use the command to download any dataset from kaggle

kaggle datasets download -d lazyjustin/pubgplayerstats
Now you can use the below to download competition dataset from kaggle, but for that, you have to participate in the competition.

!kaggle competitions download -c tgs-salt-identification-challenge

You can train and run fashion_mnist online without any dependency here.

Colab is a great tool for everyone who is interested in machine learning, all the educational resource and code snippets to use colab is provided in the official website itself with notebook examples.

2)Kaggle Kernels
Kaggle Kernels is a cloud computational environment that enables reproducible and collaborative analysis.

One can run both Python and R code in kaggle kernel

Kaggle Kernel runs in a remote computational environment. They provide the hardware needed.

At the time of writing, each kernel editing session is provided with the following resources:

CPU Specifications
  • 4 CPU cores
  • 17 Gigabytes of RAM
  • 6 hours execution time
  • 5 Gigabytes of auto-saved disk space (/kaggle/working)
  • 16 Gigabytes of temporary, scratchpad disk space (outside /kaggle/working)

GPU Specifications
  • 2 CPU cores
  • 14 Gigabytes of RAM

Kernels in action
Once we create an account at kaggle.com, we can choose a dataset to play with and spin up a new kernel, with just a few clicks.

Click on create new kernel

You will be having jupyter notebook up and running. At the bottom, you will be having the console which you can use, and at the right side you will be having various options like

VERSION
When you Commit & Run a kernel, you execute the kernel from top to bottom in a separate session from your interactive session. Once it finishes, you will have generated a new kernel version. A kernel version is a snapshot of your work including your compiled code, log files, output files, data sources, and more. The latest kernel version of your kernel is what is shown to users in the kernel viewer.

Data Environment
When you create a kernel for a dataset, the dataset will be preloaded into the notebook in the input directory
../input/
you can also click on add data source, to add other datasets


Settings
Sharing: you can keep your kernel private, or you can also make it public so that others can learn from your kernel.

Adding GPU: You can add a single NVIDIA Tesla K80 to your kernel. One of the major benefits to using Kernels as opposed to a local machine or your own VM is that the Kernels environment is already pre-configured with GPU-ready software and packages which can be time-consuming and frustrating to set-up. To add a GPU, navigate to the "Settings" pane from the Kernel editor and click the "Enable GPU" option.

Custom package: The kernel has the default packages, if you need any other package you can easily add it by the following ways

  • Just enter the library name, kaggle will download it for you.

  • Enter the username/repo name

both methods work fine in adding custom packages.

Kaggle acts as a perfect platform for both providing data, and also the computer to work with the great data provided. It also hosts various competition one can experiment it out to improve one's skill set.

For more resource regarding kaggle link here. If you are new to kaggle you should definitely try the titanic dataset it comes with awesome tutorials.

Since I was not able to cover all the services to train ml model online in this post, there will be a part2 to this post.

All the resource need to learn and practice machine learning is open sourced and available online. From Compute, datasets , algorithms and there are various high-quality tutorials available online for free, all you need is an internet connection, and passion to learn.

Thank you for reading until the end, I hope this article would be useful, as it solves the major problem faced by people who are starting the path towards machine learning and data science.

Machine learning as the potential to transform the world so does you.

Source: HOB