SmartCat at Berlin BuzzWords 2018

Back Events

One of our favorite conferences is coming up – Berlin Buzzwords and we are excited to be there again this year!

Our co-founder and CTO Matija Gobec will talk about optimizing Apache Spark applications.

When DataFrames fail, resort to mapPartitions

06/11/2018 - 17:20 to 18:00

Moon Lounge

Session abstract: DataFrame is an awesome interface for data manipulation in Spark but when the complexity grows outside of the capabilities of Spark itself, you need to resort to "violence". In this talk I will explain one of the projects which became too complex to be executed using the DataFrame API and had to be rewritten into a custom code applied using mapPartitions function. We will cover some of the tips and tricks for reducing lineage complexity, share our process of analyzing pain points and get into details of mapPartitions functionality to leverage Spark's distributed processing capabilities and reliability while executing custom code.



Previous post Next post

Sanja Hajdukovic

Finances & Business Administration

Organized, calculated and a great communicator with a decade of experience in banking industry. Extensive experience in customer care with personal interest in human psychology and relations.