CHATBOT FOR KNOWLEDGE SHARINGBack Case Studies
Client: Big EU based corporation
Length: 5 months
Goal: Create a chatbot that is able to connect a user with the colleague that can help him/her with a current topic or problem.
Tech: Python, Docker
The client approached us with an idea to create a chatbot solution that allows finding the best person among the user’s colleagues that can help him with a current topic or problem. The chatbot was supposed to interact with employees, learn what they are working on and derive areas of expertise. Once the chatbot has knowledge about employees he is ready to give assistance - when a user (employee) needs help regarding some topic, chatbot finds the best expert (colleague) with relevant experience. NLP and machine learning techniques enables us to implement the matching solution. In case the chatbot is in doubt when matching or needs more relevant information, it can ask additional questions to get more context from the user.
The approach we followed for creating the chatbot app included defining the architecture, conversation flow and matching algorithm. Once we had initial dataset with employee descriptions, in raw-text format, we implemented data extraction and matching. For extracting important information from text, we relied on a number of NLP methods and algorithms. Matching was implemented using classification method, while clustering was employed for defining right clarifying questions.
The complete my profile part of the chatbot (collecting user data) was implemented as a state machine. If there is missing information in a user’s profile, this part would be activated at every second login. The user is asked only those questions whose answers are missing, and can cancel the complete my profile at any point. In this case, the same question will be asked in the following logins. The user’s information is stored for later use - training the model and matching. In addition, the user can enter daily status reports about his current work at any point and thus provide additional information for our machine learning model.
As for asking questions and the NLP methods, the first step was processing the questions and user texts (user descriptions). NLP steps that we used are the following: autocorrection, lowercase, text cleanup, abbreviation replacement, phrase detection and replacement, removal of unimportant and noise words, stopword removal, lemmatization, stemming. We aim to improve our model by introducing synonyms and word relations. We used Google’s word-to-vec model to calculate similarities between words. We extracted similarities between words that appear in the dataset provided by the client. Each user was described by a number of attributes (dimensions). For calculating the probability that a user is a good match for the given question, the text of the question and texts of all user-dimensions are compared and matching features are created. When creating features we were considering word relations/similarities, word importance and phrases. The features are used to train the classification model that will output probability whether a user is a good match or not.
For generating clarifying questions we used clusterization: all keywords which describe users expertise were grouped into clusters, by using techniques such as word2vec, to map similar (co-occurring) words to similar vectors, and K-means algorithm. Clusters were used in clarifying question phase - after selecting top experts for given question, if chatbot is in doubt which expert to recommend he can check to which clusters they belong, possibly filter and separate them by cluster keywords which describe their expertise. Using these keywords chatbot can generate clarifying question to get more context from user.
We created a chatbot that is able to collect user information and recommend an expert of interest. Many relevant text-processing techniques were applied to extract information from raw text, which is latter used for training our machine learning model. Each conversation made with the chatbot is saved and available for future use - retraining and further improvement.