Nand Kishor Contributor

Nand Kishor is the Product Manager of House of Bots. After finishing his studies in computer science, he ideated & re-launched Real Estate Business Intelligence Tool, where he created one of the leading Business Intelligence Tool for property price analysis in 2012. He also writes, research and sharing knowledge about Artificial Intelligence (AI), Machine Learning (ML), Data Science, Big Data, Python Language etc... ...

Full Bio 
Follow on

Nand Kishor is the Product Manager of House of Bots. After finishing his studies in computer science, he ideated & re-launched Real Estate Business Intelligence Tool, where he created one of the leading Business Intelligence Tool for property price analysis in 2012. He also writes, research and sharing knowledge about Artificial Intelligence (AI), Machine Learning (ML), Data Science, Big Data, Python Language etc...

3 Best Programming Languages For Internet of Things Development In 2018
424 days ago

Data science is the big draw in business schools
597 days ago

7 Effective Methods for Fitting a Liner
607 days ago

3 Thoughts on Why Deep Learning Works So Well
607 days ago

3 million at risk from the rise of robots
607 days ago

Top 10 Hot Artificial Intelligence (AI) Technologies
317238 views

Here's why so many data scientists are leaving their jobs
82191 views

2018 Data Science Interview Questions for Top Tech Companies
80085 views

Want to be a millionaire before you turn 25? Study artificial intelligence or machine learning
78084 views

Google announces scholarship program to train 1.3 lakh Indian developers in emerging technologies
62769 views

Free eBook: Applied Data Science (Columbia University)

By Nand Kishor |Email | Apr 9, 2018 | 11652 Views

Published in 2013, but still very interesting, and different from most data science books. Authors: Ian Langmore and Daniel Krasner.. This book focuses more on the statistics end of things, while also getting readers going on (basic) programming & command line skills. It doesn't, however, really go into much of the stuff you would expect to see from the machine learning end of things.

You can download the book here.

Content

I Programming Prerequisites 
1 Unix 
  • History and Culture . . . . . 2
  • The Shell . . . . . 3
  • Streams 5
  • Standard streams . . . 6
  • Pipes . . . 7
  • Text . . 9
  • Philosophy . . . . 10
  • In a nutshell . . . . . 10
  • More nuts and bolts . 10
  • End Notes . . . . . 11

2 Version Control with Git 
  • Background . . . . 13
  • What is Git . . . . 13
  • Setting Up . . . . . 14
  • Online Materials . 14
  • Basic Git Concepts 15
  • Common Git Workflows . . . 15
  • Linear Move from Working to Remote
  • Discarding changes in your working copy . 17
  • Erasing changes . . . 17
  • Remotes . . 17
  • Merge conflicts . . . . 18

3 Building a Data Cleaning Pipeline with Python
  • Simple Shell Scripts . . . . . 19
  • Template for a Python CLI Utility . . . 21

II The Classic Regression Models
4 Notation
  • Notation for Structured Data 24

5 Linear Regression
  • Introduction . . . . 26
  • Coefficient Estimation: Bayesian Formulation . . . 29
  • Generic setup . . . . . 29
  • Ideal Gaussian World 30
  • Coefficient Estimation: Optimization Formulation 33
  • The least squares problem and the singular value decomposition
  • Overfitting examples . 39
  • L2 regularization . . . 43
  • Choosing the regularization parameter . . . 44
  • Numerical techniques 46
  • Variable Scaling and Transformations . 47
  • Simple variable scaling 48
  • Linear transformations of variables . . . . . 51
  • Nonlinear transformations and segmentation . . . . . 52
  • Error Metrics . . . 53
  • End Notes . . . . . 54

6 Logistic Regression
  • Formulation . . . . 55
  • Presenterâ??s viewpoint 55
  • Classical viewpoint . . 56
  • Data generating viewpoint . . . . 57
  • Determining the regression coefficient w 58
  • Multinomial logistic regression . . . . . 61
  • Logistic regression for classification . . . 62
  • L1 regularization . 64
  • Numerical solution 66
  • Gradient descent . . . 67
  • Newtonâ??s method . . . 68
  • Solving the L1 regularized problem . . . . . 70
  • Common numerical issues . . . . 70
  • Model evaluation . 72
  • End Notes . . . . . 73

7 Models Behaving Well
  • End Notes . . . . . 75

III Text Data
8 Processing Text
  • A Quick Introduction . . . . 77
  • Regular Expressions . . . . . 78
  • Basic Concepts . . . . 78
  • Unix Command line and regular expressions 79
  • Finite State Automata and PCRE . . . . . 82
  • Backreference . . . . . 83
  • Python RE Module 84
  • The Python NLTK Library . 87
  • The NLTK Corpus and Some Fun things to do . . . . 87

IV Classification
9 Classification
  • Quick Introduction . . . . 90
  • Naive Bayes . . . . 90
  • Smoothing 93
  • Measuring Accuracy . . . . . 94
  • Error metrics and ROC Curves . 94
  • Other classifiers . . 99
  • Decision Trees . . . . 99
  • Random Forest . . . . 101
  • Out-of-bag classification . . . . . 102
  • Maximum Entropy . . 103

V Extras
10 High(er) performance Python 
  • Memory hierarchy 107
  • Parallelism . . . . 110
  • Practical performance in Python . . . . 114
  • Profiling . . 114
  • Standard Python rules of thumb 117
  • For loops versus BLAS 122
  • Multiprocessing Pools 123
  • Multiprocessing example: Stream processing text files 124
  • Numba . . 129
  • Cython . . 129

Source: HOB