SmartCat at Berlin BuzzWords 2018Back Events
One of our favorite conferences is coming up – Berlin Buzzwords and we are excited to be there again this year!
Our co-founder and CTO Matija Gobec will talk about optimizing Apache Spark applications.
When DataFrames fail, resort to mapPartitions
06/11/2018 - 17:20 to 18:00
Session abstract: DataFrame is an awesome interface for data manipulation in Spark but when the complexity grows outside of the capabilities of Spark itself, you need to resort to "violence". In this talk I will explain one of the projects which became too complex to be executed using the DataFrame API and had to be rewritten into a custom code applied using mapPartitions function. We will cover some of the tips and tricks for reducing lineage complexity, share our process of analyzing pain points and get into details of mapPartitions functionality to leverage Spark's distributed processing capabilities and reliability while executing custom code.