How are Conventional Programming and Machine Learning Different?

By Kimberly Cook |Email | Dec 20, 2018 | 5892 Views

When I heard about Machine Learning I couldn't contain the amazement. I was not able to get my mind around the fact, that unlike normal software programs - which I was accustomed to - I wouldn't even have to teach a computer the "how" in detail about all the future scenarios up front.

Engineering allowed us to push the limits of human capabilities. We used our understanding of nature and utilized that to serve our purposes. Be it a high performant mechanical machinery or an encoded silicon chip. Computers have been by far one of the most intricate utilization of nature's forces put to help humans in pushing their limits of capabilities i.e. many tasks which can be performed by computers can never be performed that quickly and efficiently by a human or a set of humans. As Steve Jobs would say, computers are like a bicycle for our minds.

I have been fascinated by computers from my childhood. I had my first interaction with computers back in 2001 when I wrote a program in BASIC to add two numbers. I was amazed that no matter how difficult the addition was, computer answered it instantly.

When I heard about Machine Learning I couldn't contain the amazement. I was not able to get my mind around the fact, that unlike normal software programs - which I was accustomed to - I wouldn't even have to teach a computer the "how" in detail about all the future scenarios up front. There was a way that computers can themselves learn how to solve a problem. It was a giant leap for mankind.

What is Machine Learning?
 The term 'machine learning' isn't new. It was coined by Arthur Samuel in 1959. It has borrowed understandings from different fields like Computer Science, Statistics, Linear Algebra, and Probability Theory and applied to solve practical problems. Machine learning enables the computers to learn from the historical data and formulate a solution which can be used to solve similar problems in future, without the explicit need of teaching computers all the combination of possible scenarios in the problem. And the practical application of machine learning is where it is not even feasible to articulate a definite mathematical solution to the problem.

A real-world problem is a candidate for the application of machine learning if -

  1. Historical data exists in a huge amount
  2. A pattern exists in the data
  3. Extremely hard to pin down a solution mathematically
How is it different from conventional programming?
 The approach of conventional programming is to feed the computer with a set of instructions for a defined set of scenarios. After that computer will utilize its computing capabilities to help human process the data faster and in an efficient fashion. Whereas in machine learning, a huge amount of data is thrown at the computer, which in turn processes all the data and comes up with something called trained model(solution). Then this model is used to solve the unseen problems of the real world.

Example
 Let us take a toy problem to demonstrate the difference. The problem takes an input number and tries to divide by 3 and 5. If the number is divisible by 3 then it prints 'fizz', if it is divisible by 5 then it prints out 'buzz' and if it is divisible by both then it prints out 'fizzbuzz'. If it is not divisible by any of the 3 or 5 then print 'other'. It's called a Fizzbuzz game.

Conventional Programming
 It is extremely easy in conventional programming to feed the computer with a set of instructions because we have only 4 scenarios to be verified and print the output based on that. The python code can be written as below but you can skip reading code if you are not into coding.

def fizzbuzz(n):  # if the number is divisible by 3 as well as by 5 and returns # "FizzBuzz"
    if n % 3 == 0 and n % 5 == 0:
        return 'FizzBuzz'# If the first condition is not satisfied then it checks if it is # divisible by 3 and return "Fizz"
    elif n % 3 == 0:
        return 'Fizz'# If both of the above tests are not satisfying, then it will check # whether it is divisible by 5 and return "Buzz"
    elif n % 5 == 0:
        return 'Buzz'# If all the conditions above do not satisfy then it returns "Other"
    else:
        return 'Other'

Machine Learning
 Supposed we already have a lot of numbers whose output is already known i.e whether it is 'fizz' or 'buzz' or 'fizzbuzz'. All we need to do is now, write a machine learning code and feed (train) the available data. Then verify whether we have successfully created a model by verifying with testing with unseen data. If the model provides the output using the trained model without actually calculating the result then we have achieved the purpose.

We will be using Google's Tensorflow library for this purpose. Here are some code snippets from the implementation.

Again, you can skip the code if you aren't into coding. The whole working code can be found here in my Github account.

 
a. Creating the model

# Placeholders are the type of variables nodes where data can be 
# fed from outside when we actually run the model

#Placeholder for input data
inputTensor  = tf.placeholder(tf.float32, [None, 10]) 
# Placeholder for output data  
outputTensor = tf.placeholder(tf.float32, [None, 4])    
# The number of neurons which 1st hidden neurons will have i.e. 1000
NUM-HIDDEN-NEURONS-LAYER-1 = 1000          

# Learning rate, which will be later used to optimize for optimizer function
# Learning rate defines at what rate the optimizer function will move towards minima per iteration
# Less than optimum learning rate will slow down the process whereas higher learning rate will have chances of 
# skipping the minima all together. So it will never converge rather it will keep on going back and forth.
LEARNING-RATE = 0.05                     

# Initializing the weights to Normal Distribution
# The weights will keep on adjusting towards optimum values in each iteration. Conceptually, weights determine the 
# discrimination factor of a particular variable in the neural network. More the weight more will be its
# contribution in determining the solution.
def init-weights(shape):
    return to.Variable(tf.random-normal(shape,stddev=0.01))

# Initializing the input to hidden layer weights
# We will need a total of 10(input layer number) * 100(number of hidden layers)
input-hidden-weights  = init-weights([10, NUM-HIDDEN-NEURONS-LAYER-1])

# Initializing the hidden to output layer weights
#, In this case, we will need 100(number of hidden neurons layers) * 4(output neurons, we have only 4 categories)
hidden-output-weights = init-weights([NUM-HIDDEN-NEURONS-LAYER-1, 4])

# Computing values at the hidden layer
# Matrix multiplication is done and then rectifier neural network activation function is used 
# for regularization of the resulting multiplication
hidden-layer = tf.nn.relu(tf.matmul(inputTensor, input-hidden-weights))

# Computing values at the output layer
# Matrix multiplication of hidden layer and output weights it done. 
output-layer = tf.matmul(hidden-layer, hidden-output-weights)

# Defining Error Function
# Error function computes the difference between actual output and model output.
# Here we are calculating the error in the output as compared to the output label
error-function = tf.reduce-mean(tf.nn.softmax-cross-entropy-with-logits(logits=output-layer, labels=outputTensor))

# Defining Learning Algorithm and Training Parameters
# We are using Gradient Descent function to optimize the error or to reach the minima.
training = tf.train.GradientDescentOptimizer(LEARNING-RATE).minimize(error-function)

# Prediction Function
prediction = tf.argmax(output-layer, 1)

b. Train the model

NUM-OF-EPOCHS = 5000
BATCH-SIZE = 128

training-accuracy = []

with tf.Session() as sess:
    
    # Set Global Variables ?
    # We had only defined the model previously. To run the model, all the variables need to be initialized and run.
    # Actual computation can only start after the initialization.
    tf.global-variables-initializer().run()
    
    for epoch in tqdm-notebook(range(NUM-OF-EPOCHS)):
        
        #Shuffle the Training Dataset at each epoch
        #Shuffling is done to have even more randmized data, which adds to the generalization of model even more.
        p = np.random.permutation(range(len(processedTrainingData)))
        processedTrainingData  = processedTrainingData[p]
        processedTrainingLabel = processedTrainingLabel[p]
        
        # Start batch training
        # With batch size of 128, there will be total of 900/128 runs in each epoch where 900 is the total 
        # training data.
        for start in range(0, len(processedTrainingData), BATCH-SIZE):
            end = start + BATCH-SIZE
            sess.run(training, feed-dict={inputTensor: processedTrainingData[start:end], 
                                          outputTensor: processedTrainingLabel[start:end]})
        # Training accuracy for an epoch
        # We are checking here the accuracy of model after each epoch
        training-accuracy.append(np.mean(np.argmax(processedTrainingLabel, axis=1) ==
                             sess.run(prediction, feed-dict={inputTensor: processedTrainingData,
                                                             outputTensor: processedTrainingLabel})))
        # Testing
        predictedTestLabel = sess.run(prediction, feed-dict={inputTensor: processedTestingData})

c. Test the model

wrong   = 0
right   = 0

predictedTestLabelList = []

#Comparing the predicted value with the actual label in training data
for i,j in zip(processedTestingLabel,predictedTestLabel):
    predictedTestLabelList.append(decodeLabel(j))
    
    if np.argmax(i) == j:
        right = right + 1
    else:
        wrong = wrong + 1

print("Errors: " + str(wrong), " Correct :" + str(right))

print("Testing Accuracy: " + str(right/(right+wrong)*100))

d. Result:
Errors: 2  Correct :98
Testing Accuracy: 98.0 %

Conclusion
 We repeatedly fed the same data 5000 times, after re-shuffling it every-time. And after each run, we measured whether how accurate out model has become by measuring the error rate in test data. As it can be seen that there is 98 percent accuracy for the model. I have plotted the graph of error after each iteration. The closer it gets to 1 better our model has become. Here is what it looks like:

Final Words
 
As seen above, in this particular toy case - the conventional program will always give the correct output whereas machine learning might not reach 100% accuracy level.

Now imagine a scenario where you are running a company like Netflix which has millions of customers. Each customer has a unique set of preferences. For example, some person might love documentaries and comedy movies but he watches documentaries on Tuesdays and comedies on Sunday. While other person binge watches on Saturday nights and loves watching horror on Sunday and so on. You can imagine how complex it would be to document all the habits of each individual when there are so many variables which determine exact preferences. Because if you want to suggest each customer a movie which he/she is going to watch then you must know the exact preferences of that person.

For conventional programming, this problem becomes increasingly difficult because:
  1. You don't know what all things determine the watching habits of a person.
  2. Even if you know, then the solution will not scale to millions of user at a time because for each person you have to write a separate solution based on his/her habits.
This is where machine learning is implemented. Generally, machine learning will be trained on two premises in this case:

  1. Based on your past data, which movies are most likely you are going to watch?
  2. What are people like you watching these days?
Do let me know your thoughts by commenting below.

PS: Full code used in the article can be found here.

References
https://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738
https://www.youtube.com/watch?v=mbyG85GZ0PI&index=1&list=PLD63A284B7615313A

The article was originally published here

Source: HOB