Chatbots sucks

Friends, we bring to your attention an abridged translation of an interesting presentation on the problems of creating chat bots: what are the features of this task, what difficulties stand in the way of developers and how they can be solved. And we also asked an expert from the Jet Infosystems Machine Learning Center to comment on this material. You will find his opinion at the end of the article.

Not social networks, not mobile applications, but messaging is a new and very important trend. Today, the intensity of messaging is growing exponentially, and in terms of data volume this way of communication has overtaken social networks somewhere else at the beginning of 2015. For example, Facebook focuses on messages and is going to push everything else that is connected with your page on this social network - a news feed and stuff - somewhere far away in the background.

Today, thanks to smartphones, touch interfaces have conquered the world. And it looks like the next step will be the chat user interface in which you will have a voice or text chat bot.

Let's talk about the difficulties that have to be overcome - and will have to be overcome in the future - by developers of full-fledged chat bots.

What do analysts tell us?

Gartner claims that by 2022, 70% of all interactions with customers will occur through some kind of AI. It is assumed that even the number of voice calls in the company will decrease by 10%. At a minimum, when users make calls to the company in 60% of cases, human participation is not required. And in other cases, people will participate only at some stage of the process, probably directly communicate with users.

According to other studies, the chatbot market will grow exponentially. And it's not just about customer service, but also about sales and marketing. Researchers believe that chatbots will be able to replace 6% of the workforce worldwide. This means that almost 200 million people will have to look for a new, more interesting job.

Finally, analysts believe using chatbots can save $ 7 trillion a year worldwide.

Intentions and Expressions

Two years ago I hated chat bots, and today I like them because I develop them. But many today still do not like them. Why are chatbots so bad?

It seems to me that the reasons for their imperfection can be divided into two groups: one relates to the field of AI, and the second group is related to people. But since I'm an engineer and not too strong in psychology and other topics related to the users themselves, they will only talk about AI.

First of all, we need to ask ourselves: how do chatbots recognize what users are telling them? Recognition has two key concepts. What is the intention of the user, his goal? For example, a person received a bill for services for a huge amount and wants to know why so many. He turns to customer support, his intention is to get an answer to his question. Such a user can use multiple expressions ; he may ask: “Why is my account so big?”; "What happened?"; “What's wrong with my account?”; “Why should I pay so much?” These concepts are very widely used in the field of creating chat bots.

How to recognize intentions and expressions?

Chatbots used to isolate keywords. Say, if the text “user” was in the text from the user, then the appeal probably refers to the billing service. But the user can say: " I saw in the bill that I probably had the wrong subscription ." This person is not interested in the account, he wants another subscription. And you have to invent all kinds of crutch rules.

But the user can turn over the sentence and say: " I think that I have the wrong subscription, because I noticed something is wrong on my account ." And your crutches no longer work! And you expand these rules by piling up new layers of crutches.

These rules do not work perfectly, and the developers spent too much time on finalization. Further progress was achieved only through machine learning. In the field of understanding of a natural language (Natural Language Understanding, NLU) there is a separate direction which works on this task.

At the same time, NLU is an integral part of Natural Language Processing (NLP) - this term describes everything related to understanding and generating speech. Understanding what someone is telling you about is essentially a classification of expressions based on a particular intention.

Difficulties in determining intentions

One of the main problems is the existence of "bad" expressions. To train the algorithms, you need to generate your own expressions of intent. For example, you have such examples:

Why is my account so big?
Why is my bill so expensive?
Why is the amount in my account so big?

They can be used as input for training NLU and NLP systems. These sentences are very similar, they begin with "Why ...". Also, all sentences contain the words "my account", so you can use a simple rule of changing the order, for example, about the presence of the phrase "my account". In addition, each time the word "so" is mentioned, and we can assume that it refers specifically to the discussion of the account. If you haven’t explained to the algorithm what exactly matters in determining this kind of intent, then your NLP system will not work well.

In the sentence, “ Why on earth should I pay much more than usual? »There is not one of the words highlighted above. The NLP system will not be able to understand that the proposal belongs to this group, unless you have a single intention. But if there are several intentions, then the situation becomes very complicated.

Variety of expressions

So we need a very diverse set of expressions. Generating them is not an easy task. To do this, I developed my diversification strategy.

One part of the NLP system manages a variety of synonyms, and another part handles a variety of word order. Most NLP systems consist of two main blocks:

Features of Word2Vec allow it to recognize certain properties of words, and as a result, words that are similar in meaning will be close to each other. Put on the axis "size" vehicles:

And if you did everything right, you can distribute these entities along another axis - the environment for which this type of transport is intended.

To train the system you need to know which variety is relevant for your task and which is not relevant. Let's say you need to transport something really big, and the method of transportation does not matter. Then, probably, such a variety will be acceptable:

In another case, the size may not matter if only delivered by land.

Word2Vec converts words to vector representation. This representation is then transmitted to a neural network that studies word order.

My strategy is pretty simple. First we create a lot of offers. In the case of the account, we are not interested in synonyms, but only the word order.

Why is my account so big?
My account is so big!
In my account, an outrageous amount!

The word “score” is in different places of the sentence. We are trying to generate as many examples as possible with different word order, without worrying about how to express one thought in different words.

Then you need to add synonyms for diversity. Suppose, in the first sentence, instead of “ why, ” you can say “ how it happened, ... ”, “ explain to me why ... ” or “ tell me how it happened that ... ”, and then you can end the sentence with any another option. For example, “ Explain to me why your check is so huge .”

We create a list of synonyms allowed in this context, write a script to generate expressions, and get a good training dataset. And then you need to explicitly tell the system which synonyms are allowed in this context.


It seems to me that many companies forget that the resulting chat bot needs to be improved.

Take the sentence: “ How did it happen that I need to pay a lot more than usual? »These words are not included in my synonyms, and the word order is different from the ones used earlier. No need to add this sentence as another expression. First see if there is such a word order, and if not, just add the sentence as a new word order, and then add synonyms. This way you get a very diverse set of expressions.

Variety Expression Assessment

How do you know how well everything works? You can experiment with combining vector representations of words into clusters of simplified expressions, and evaluate how diverse they are.

The metric is very simple: we count the number of clusters and divide it by the number of expressions. On average, a wide variety of expressions have a level of approximately 50%.

Obviously, if the variety is below 20%, then you are not working very well. On the other hand, if the diversity is above 80%, then the expressions are probably not related to each other, and it is very difficult for the algorithm to find a good combination.

Recognition accuracy

A train score means how accurately the system works on a training dataset. This is not very interesting, because you will only get an assessment of your training dataset. And what to do next is unclear.

In this situation, you should be interested in which expressions are classified incorrectly. Where did everything go wrong?

The sentence “ I saw in the bill that I probably have the wrong subscription ” has a high probability of being incorrectly classified even with the examples that I showed above.

And if you see that the proposal is incorrectly classified, then you need to figure out how to improve the situation.

Word2Vec and a recurrent neural network are often considered black boxes; they are very difficult to interpret. But still there is one not too difficult way. Separately for each word we add noise. And starting to change a word, you will see how sensitive the definition of this particular word becomes.

So we do with every word in the sentence: add noise to the vector and run it through the algorithm. If the result is the same, then the algorithm is not interested in this word. If after adding noise the result becomes completely different, then probably this word is very relevant.

By analyzing sensitivity in this way, you can understand that the intention was determined erroneously, for example, because of the word “count”. Or, if the sentence contains only the words “why ...”, then perhaps they led to such a definition of intention. If the whole thing is in the word “score”, then you have several options:

Conversation Flows

The next thing, which too few people pay attention to, is conversational flows. How do people use your bot? How do they work with him? What is happening in the conversation? What do most users do?

This diagram shows the development of the conversation. It can be seen that this thread is used most often, that many people like this part of the bot. But it’s more important to see where moments arise in a conversation because of which people stop using the bot: they ask something, and then suddenly leave.

Emotion Analysis

It’s important to try to understand how people feel when they chat with your chat bot. For this we use the analysis of emotions. If you can understand that 60% or more users feel dissatisfied after talking with a chat bot, then something needs to be done. I do not think any dialogue platform takes into account such information.

Quite often, negative emotions cause proposals that do not look negative - they were just learned by a chat bot based on emoji. If you have a sarcastic sentence in the dataset, then it can mean anything. A bot should not assume that it is associated with bad emotion, because the expression can be neutral.

Examples of using

When to use chat bots? When your customers need a quick response, or they don’t want to hang in line on the phone, before an operator is ready to help them.

If there is not enough staff, why not use the chat bot to quickly answer basic or frequent questions?

It is also very important to find out which area of ​​your business accounts for the majority of the costs, because then you can save a lot of money if you transfer this area to chat bots.

Robots are not the best interlocutors in terms of emotionality. If you call the rescue service, you want a direct answer. But if you have problems with the car or a small accident, you call the car service and do not want to wait long on the line, then you can just say: "My tire burst," something like that. Of course, I do not mean small accidents or accidents with a large number of injured, in such situations chat bots can not be used.

And I want to touch upon another aspect: assessing whether the chat bot is suitable for a specific task. Many of our customers come to us and say: “ We have such a form on the site, why not transfer it to the chatbot? “But we have been working with forms for many years, and we believe that they are very effective.

The forms are very simple, and people know how to work with them. Why turn a form into a chat bot? Some of the user interaction is suitable for the chatbot, but sometimes it’s better to simply fill out the form, because it is much easier.

But there are other situations.

For example, if the graphical interface is not very convenient for searching for information, then the conversational user interface may become the output. Of course, if you have "bad" expressions, then the chat bot will not work. And usually such expressions are generated by a person.

The most important thing is the design of the conversation. A whole VoiceCon conference is dedicated to this topic. The design of the conversation should be very rigorous. And if you make a good design, then you can get away with not very high-quality AI. And this is pretty sad because it causes crashes in chatbots.

Useful Tips

Probably, many of you have seen pop-ups on websites: “Hello. Can I help you with something? ”Or“ I see that you are trying to send a letter. Let me help you! ”

Do not put chat bots on the main page. DO NOT DO THIS.

Such bots can only be used in a milder form, if they do not occupy the entire page. Then people start using the bot because it can help you find the answers. In addition, since you limit the scope of the bot, it becomes easier to develop. However, you do not bother people with a bot that they do not currently need.

People tend to personalize the chatbot. They don't like deliberate robots.

Personalize your chatbot! Even if you tell your customers every time that it is a robot, still give it some individual features.

If you are a hotel manager, then perhaps you need a very formal chat bot that speaks like a butler: “Yes, sir,” “Thank you, sir.” And if you are a telecommunications corporation, and are developing tariff plans for youth, then it is better to use something like: "Hey dude, how are you?"

Always remember who your audience is.

If you are neat and consistent, the chat bot will appeal to your customers more. Maintain consistency throughout the conversational flow, because often in the beginning everything goes well, and after a while the conversation becomes very robotic.

It is very important that your chat bot communicates in a collaborative manner. And of great importance is the way it appears at the very beginning.

A person has opened your chat bot for some reason. Explain immediately to the user that you can make a chat bot. In addition, manage the user's expectations: say that you are a bot, because some errors will be excusable to him.

Do not use too general phrases. Do not ask: “How can I help you?”

I usually ask such bots: “ Help me win the lottery ”, or something like that. So let the bot communicate clearly and simply.

It is really important - and difficult - to handle the transfer of initiative in a conversation.

When we communicate with each other, we use visual signals that allow us to understand: " Now it's my turn to speak ." But chatting is pretty hard to implement. It is necessary to make it clear to the user that now the bot is waiting for what a person will write to it. For example, he ordered a burger, and the bot can ask: “ Do you want more fries? "

It is very important to use “human” language in bots.

Dry conversation is very annoying. Use the identity of the bot to make the conversation more believable.

It is very tempting to start confirming or verifying confirmations throughout the conversation with the client.

But it is annoying if the bot constantly asks: “Is everything all right?” Or “Right?” Or “You ordered a burger, right?” Do not do this in your chat bot!

This is permissible only where it is really relevant. For example, at the end of the conversation add: “ Good. Now you have to pay a burger with fries and a drink. Is that all right? »An elegant way to do this is to have built-in confirmation. If you order a burger, you may ask: “ Good. Do you want fries to your burger? ”

At some point, the bot may misinterpret something. In addition, people tend to make mistakes. Sometimes you order something and think: “ Hmm, I’ll probably take triple bacon instead, that's better .” Therefore, the developer must make sure that the chatbot is able to handle this. You must create a conversation flow in order to understand that at this point in the conversation we want to change the order.

Design threads that are robust against user errors because people are constantly making mistakes!

The point is to remind a person a little, but communication with people who constantly apologize is very annoying. So don't let your chat bot apologize too often. Avoid apologies, suggest solutions. Say: “ I did not understand. Can you rephrase? "If you make a few mistakes, you can tell the user:" You are asking for something that I do not understand. You can order a burger, fries and drinks from me . ”


Make it easy for the client to create a chatbot with a good conversation design. If you make a mistake, try to fix it. Honestly, chatbots still suck. But you can create chat bots that will really work, do some work for us. Only have to approach this correctly.

Expert Opinion

And now we want to share with you the opinion of an expert of our company about the ideas and tips described here.

Nikolay Knyazev, leading machine learning specialist at Jet Infosystems :

The author’s speech is dedicated to what the developer needs to know in order to make successful chat bots. The topic is interesting, many points do not lie on the surface. However, it is worthwhile to understand that, from the point of view of algorithms, a survey of methods is quite general, and there is no indication of their implementation.

For example, an example is given with Word2vec, but it is not said about other methods of vectorization of words, for example, Glove, Fasttext. There are various ways to classify the text (from the word bag to RNN and CNN), but there is no comparison or libraries where they are implemented (for example, for Python these are Keras, Scikit-learn, etc.). Two important points were missed from the general review. First, regional specifics: different methods will be successful for different languages. And the implementation for the English language is definitely not suitable for the Russian language. Although in our country there are developments, for example, bigARTM or the Taiga building, on the basis of which it is possible to train the mentioned Word2vec.

Another aspect is the concept of using the bot: one-shot bot (choosing an apartment, solving a problem) or constant use (part of the work process). In the first case, indeed, emotions from communication and all that the author spoke about are remembered. In the second, people are waiting precisely for the “Kalashnikov assault rifle” so that it performs its functions without additional questions, slowly learning the shortest ways to use it. And here it’s not the emotional color that comes to the fore, but the user experience, because so far the neural networks in our head adjust to reality much faster than their brothers in iron.


All Articles