SmartCat at ODSC Boston next week!Back Events
Abstract: This tutorial will showcase a joint effort of Data Engineering setup and Data Science analysis in making a real-time anomaly detection system at scale. In particular, it will address data engineering challenges, like setup and configuration of Kafka, Spark Streaming and Spark cluster and downstream data storage, visualization and alerting. On data science part tutorial goes into difficulties with unsupervised data modeling both in batch and online fashion, implementation challenges in order to scale and touches on ensemble techniques for accuracy improvement and detector selection.
Anomaly detection is predominantly done in unsupervised fashion since labeled data is rarely available or classes are highly imbalanced. To make the problem harder, important anomalies turn out to be contextual or collective rather than just point anomalies in univariate time series. Session attendees will have a chance to hear about use of robust PCA, LSTMs, autoencoders and other methods implemented to serve as anomaly detectors. Methods are implemented in python library keras, but in order to scale and process data coming from Kafka in real-time, they are adjusted to run on Spark cluster utilizing Spark Streaming. At the end of a processing pipeline is bayesian ensemble learning model. Attendees will be able to see it in action and understand how it helps the system to select best detectors dynamically.
Let us know if you want to meet up at the conference or after!