What it takes to build a production ready AI solutionBack 11/30/17
When we decided to open up the SmartCat, we had already gained various experience working mostly for outsourcing web development companies. At first, we tried to apply the same processes and steps to build data science solutions. We realized soon enough that this would be a learning journey and we needed to adjust our way of thinking in order to bring the AI project to production successfully. What makes this process event harder is the team itself, consisting of engineers and scientists from different backgrounds speaking completely different languages. This blog describes our experience story so far; it is not complete yet since this field is changing rapidly and we are adjusting to changes on a daily basis.
Initial idea vs reality
We wanted to build a Data Science company and we wanted to help various businesses solve their problems by using smart solutions. Data Science and Big Data was gaining popularity, everybody was reading about machine learning and AI solutions and how those solutions had made other companies successful. Most of the clients that contacted us back then came with the same problems: “We are collecting data and we want to do something smart but we do not have a clear vision, can you help us?”. This is how we came up with business insights as a service offering where we request the data and data specification, we jump on it and do exploratory analysis in order to gain insights and produce some key conclusions, visualizations and a couple of ideas for the future. After such engagement, clients mostly parked that idea because they realized it is not a priority for them at that moment and that was the majority of our data science engagements back then.
On the other front, we were mostly successful, since data engineering and data ops services was something that our clients requested often. They recognized the power of data, and they wanted to equip themselves with the right infrastructure and selection of the tools so they came to us for help with building pipelines and storage solutions. At the beginning of SmartCat, our two teams, data science and data engineering/ops were separated, working on different projects and not collaborating much.
Our vision from the start was to make a company which could bring data science solutions end to end, from the phase of working on ideas with the client, over workshops where we choose the right idea and create a detailed plan, to the prototyping phase followed by the integration phase, but those projects were nowhere to be found. We were patient since we firmly believed there was a gap in the market, and companies that were not big enough to have an internal data science team could profit from our expertise. And after a year or so, an end to end project appeared - a recommender system which should work on user analytics and real time offer to produce personalized recommendations. Finally, all aspects we wanted in a single solution, a recommender to build personalized recommendations was the perfect task for the data science team, given that real time offers are something our data engineering team solves each day and all of this had to run on a reliable infrastructure which was perfect for our data ops team. But things did not go as smooth as we hoped for.
The process or lack of it
So, we started. The first phase was easy. A single data scientist sat down on data, started to write Python Jupiter notebooks and started reasoning about data. We were so confident we understood the problem so we did not involve the client that much. We gained some key insights, plotted several graphs and composed a document with insights, features and ideas about implementation. We organized a call with the client eager to present our findings. During the call we realized we did not involve the client enough in this phase, since some things that we found were not relevant for the domain, some things were not relevant for this particular company, and we missed the important details. Our report was half finished, and it was back to the drawing board. However, we realized that this call was really important for us. Lesson 1: We should involve the client more often in this phase.
We prepared good report on second attempt and the client was happy with the results. We chose one of the approaches and decided to start with prototyping. Again, a lone data scientist started creating a web demo to showcase the idea. We tried a couple of different algorithms, and we settled for the one which made sense to us and gave the best results. We put together a small web demo and the client started testing. They were satisfied with the results with a few small remarks which we fixed and we got the approval to proceed to the production phase.
The production phase in our minds should have been easy - we had the web demo, we would just make the API and integrate with the final solution. We hugely underestimated this phase. The prototyping code was not ready at all. It was done by a data scientist, I mean no offense, but they were focused on their own problem, clustering users and showing the best possible recommendations to each user. They are all about precision, if a test set shows above 90% accuracy, that is considered a good job. They do not think about the amount of users, ways to integrate, dataset size, performance, availability of the systems around the data science solution. One example is using Flask as Python web framework. It is good for this phase; you can put a demo together easily and quickly, but problems occur when you have more than one user. If we had done all this in Django we could have avoided double work. Also, working with database dump is not the same as working with the database itself. You are not just a user of the system and you must take that into account. Lesson 2: involve data engineers with data scientists earlier, already in the prototyping phase, to avoid double work.
You should have heard how we communicated about problems. It took us a while to explain to each other how the system works. The data science team could not grasp how data was ingested into the system, what should be cached, what hurt our performance, while data engineers asked a bunch of times how the whole pipeline for calculating recommendations worked. We spoke different languages. Where the data science team used the word “bean”, the data engineering team used the word “bucket”, and they were explaining basically the same thing. We paid the price of not working closely together previously and not sharing knowledge enough. Lesson 3: do more internal knowledge sharing sessions and adjust the vocabulary and processes.
The other thing we realized in this phase, was that we were building a complex system which we took for granted. You must do a good job on automation and monitoring so you can maintain the system with success. When working on data science systems, you need to monitor your tools and infrastructure but you also need to monitor how your algorithm works in production (click rate, goals achieved, revenue generated). These are the things we know now and say upfront to the clients but which we did not know back then. Lesson 4: do not underestimate the complexity of the system and speak upfront about all the details which are needed for a production ready system with the client.
The first phase is really important; make sure to do the best job here to understand the client's needs and his business. In this phase, communication is the key ingredient. Our face to face workshop helps building a strong relationship with the customer, understand the problem and define an implementation plan in this phase. All upcoming phases depend on the success of this phase.
We know today much better what types of engagements there are, what are the steps for each phase and what team member you need in each phase to finish the project successfully.
We have divided our projects in 3 groups: business insights as a service, prototyping and production. For each phase, it is important to have people from different teams involved working together to have successful delivery. From the initial idea one more role has appeared in our technical team - a data wrangler, someone who does not have to be a data scientist with rich knowledge about algorithms and approaches but someone with strong business knowledge, a glorified business analyst armed with Jupyter and Python knowledge. He plays an important role in the initial phase, works with clients in the business domain and extracts key findings from data. His sole task in the initial phase is to come up with the best possible idea which can improve the business of our client.
We have improved our internal communication, we now know that bucket for engineers is bean for scientists, we are learning the vocabulary of each team and we are more efficient. What makes us better is the fact that we need to explain our solution to different team members not speaking our language, and in the end this means we will do a better job explaining what we do to our client. It is easy to explain an approach you took to a fellow data scientist, but not so much when speaking with a business stakeholder interested in profit the said solution will realize. You need to adjust your language, so we started practicing that internally. Diagrams also look much better now, since people from different teams need to understand what is going on, so visualizations that we make have improved a lot these days.
As I said at the beginning, this is a field which is changing a lot nowadays, so I expect this is not the end of our learning journey. What is certain is that we are a much better company now than we were two years ago thanks to projects done with mixed teams.