Making a Successful Chatbot: Deep Learning

By ridhigrg |Email | Feb 5, 2019 | 6462 Views

Let's build a chatbot that can answer questions about any text you give it an article or even a book using care offs. Just imagine the boost in productivity, all of us will have once we have access to expert systems for any given topic. Instead of sifting through all the jargon in a scientific paper, you just give it the paper then ask it the relevant questions, entire textbooks libraries videos images whatever. You just feed it some data and it would become an expert at it.
All seven billion people on earth would have the capability of learning anything much faster, the web democratized information and this next evolution will democratize something just as important guidance. The ideal chat pod can talk intelligently about any domain that's the Holy Grail, but domain-specific chat BOTS are definitely possible. The technical term for this is a question answering system surprisingly we've been able to do this since way back in the 70s.
Systems allowed programmers to encode patterns into their BOTS called artificial intelligence markup language or AI machine learning that meant less code for the same results. But don't use AI ml, it's so old, look new now with deep learning we can do this without hard-coded responses and have much better results. The generic case is that you give it some facts as input and then ask it a question, it will give you the right answer after logically reasoning about it. The input could also be that everybody is happy and then the question could be whats the sentiment? The answer would be positive. Other possible questions are whats the entity, what are the part of speech tags, what's the translation to French, we need a common model for all of these questions.
This is what the AI community is trying to figure out how to do Facebook research made some great progress with this just two years ago when they released a paper introducing this really cool idea called a memory Network. LS TM networks proved to be a useful tool in tasks like text summarization, but their memory encoded by hidden states and weights is too small for very long sequences of data be that a book or a movie. A way around this for language translation, for example, was to store multiple LSP m states and use an attention mechanism to choose between them. But they develop another strategy that outperforms LF TMS or QA systems.

The idea was to allow a neural network to use an external data structure as memory storage. It learns where to retrieve the required memory from the memory bank in a supervised way. When it came to entering questions from COI data that was generated, that info was pretty easy to come by, but in real-world data, it is not that easy. Most recently there was a four-month long cattle contest that a startup called meta mind placed in the top. 

That's the one well focus on, so let's build it programmatically using Karos. This data set is pretty well organized, it was created by Facebook AI research for the specific goal of improving textual reasoning its grouped into 20 different tasks, each task tests a different aspect of reasoning. So overall it provides a good overview of all the different capabilities of your learning model. There are a thousand questions for training a thousand for testing per task. Each question is paired with a statement or series of statements, as well as an answer the goal is to have one model that can succeed in all tasks. Easily will use pre-trained glove vectors to help create a sequence of word vectors from our input sentences and these vectors will act as inputs to the model.

The dmn architecture defines two types of memory semantic and episodic. These input vectors are considered the semantic memory whereas episodic memory might contain other knowledge as well. The first module the input module is a GRU or gated recurrent unit that runs on a sequence of word vectors. A GRU cell is a kind of like an L STM cell but its more computationally efficient. Since it only has two gates and it doesn't use a memory unit. So far in a vector, it outputs hidden states after every sentence and these outputs are called sacks and the paper because they represent the essence of what is fed.
Given a word vector and the previous time step detector will compute the current time step vector. The uplinking is a single layer neural network we sum up the matrix multiplications and add a bias term and then the signal it squashes it to a list of values between 0 and 1. 

Then there's the question module, it processes the question word by word and outputs a vector by using the same GRUs as the input module and the same weights we can encode both of them by creating embedding layers for both. Then we'll create an episodic memory representation for both the motivation. The episodic memory module, its composed of two nested GRUs use the energy, GRE generates what are called episodes. 
We can initialize our model and set its loss function. Have categorical cross entropy with the stochastic gradient descent implementation rms prop. Then train it on the given data using the fit function, we can test this code in the browser without waiting for it to Train because luckily for us this researcher uploaded a web app with a fully trained model of this code. We can generate a story which is a collection of sentences each describing an event in sequential order.

Then well ask it a question pretty high accuracy response lets generate another story and ask it another question. let's go over the three key facts we've learned GRE use to control the flow of data like LST M cells but are more computationally efficient. Using just two gates update and reset dynamic memory networks offer state-of-the-art performance in question, entering systems and they do this by using both semantic and episodic memory.

Source: HOB