Nand Kishor Contributor

Nand Kishor is the Product Manager of House of Bots. After finishing his studies in computer science, he ideated & re-launched Real Estate Business Intelligence Tool, where he created one of the leading Business Intelligence Tool for property price analysis in 2012. He also writes, research and sharing knowledge about Artificial Intelligence (AI), Machine Learning (ML), Data Science, Big Data, Python Language etc... ...

Full Bio 
Follow on

Nand Kishor is the Product Manager of House of Bots. After finishing his studies in computer science, he ideated & re-launched Real Estate Business Intelligence Tool, where he created one of the leading Business Intelligence Tool for property price analysis in 2012. He also writes, research and sharing knowledge about Artificial Intelligence (AI), Machine Learning (ML), Data Science, Big Data, Python Language etc...

3 Best Programming Languages For Internet of Things Development In 2018
375 days ago

Data science is the big draw in business schools
548 days ago

7 Effective Methods for Fitting a Liner
558 days ago

3 Thoughts on Why Deep Learning Works So Well
558 days ago

3 million at risk from the rise of robots
558 days ago

Top 10 Hot Artificial Intelligence (AI) Technologies
312837 views

Here's why so many data scientists are leaving their jobs
81261 views

2018 Data Science Interview Questions for Top Tech Companies
77880 views

Want to be a millionaire before you turn 25? Study artificial intelligence or machine learning
77007 views

Google announces scholarship program to train 1.3 lakh Indian developers in emerging technologies
61818 views

Machine Learning Applied to Big Data, Explained

By Nand Kishor |Email | Jul 18, 2017 | 20916 Views

Machine learning with Big Data is, in many ways, different than "regular" machine learning. This informative image is helpful in identifying the steps in machine learning with Big Data, and how they fit together into a process of their own.

Big Data is no longer buzzword terminology or cutting edge, conceptually; rather, it just is. Big Data is not easily or precisely definable, but it is generally easy to identify when you see it.

While successful applications of machine learning cannot rely solely on cramming ever-increasing amounts of Big Data at algorithms and hoping for the best, the ability to leverage large amounts of data for machine learning tasks is a must-have skill for practitioners at this point.

While much of machine learning holds true regardless of data amounts, there are aspects which are the exclusive domain of Big Data modeling, or which apply moreso than they do to smaller data amounts. Data scientist Rubens Zimbres outlines a process for applying machine to Big Data in his original graphic below.


Here is a short description of the image from Zimbres, himself:

The most important part is the one where the data scientist's needs generate a demand for change in data architecture, because this is the part where Big Data projects fail. The orange square. When algorithms are computationally expensive or when infrastructure is not ready for ML algorithms. For instance, lately big banks in Brazil are hiring mainframe specialists to deal with this issue.
The picture is in fact a mind map I did to understand the whole Data Science process.

Zimbres' process includes paths for descriptive, predictive, and prescriptive analysis, as well as simulation. Importantly, the machine learning process is explicitly noted as recursive, which is perhaps especially true of modeling large quantities of data. Zimbres also breaks down the relative number of records at each successive stage of a machine learning task. Likely of greatest importance to newcomers to data science, the sub tasks of the machine learning process are presented alongside task-relevant algorithms.

While Zimbres himself states that there are a few small mistakes with the process graphic (notably, the inclusion of Support Vector Machines in the "Extraction of Groups" section should be replaced with k-means clustering), all in all it represents a relevant high-level roadmap. In particular, it should be useful for newcomers to data science.

Source: KDnuggets