Welcome to Percona Live Online 2021
Online Open Source Database Conference
Back To Schedule
Thursday, May 13 • 13:30 - 14:30
How Adobe Does Millions of Records Per Second Using Apache Spark

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.

Adobe's Unified Profile System is the heart of its Experience Platform. It ingests TBs of data a day and is PBs large. As part of this massive growth we have faced multiple challenges in our Apache Spark deployment which is used from Ingestion to Processing. We want to share some of our learnings and hard-earned lessons and as we reached this scale.

Repeated Queries Optimization - or the Art of How I learned to cache my physical Plans. SQL interfaces expose prepared statements, how do we use the same analogy for batch processing?
Know thy Join - Joins/Group By are unavoidable when you don't have much control over the data model, But one must know what exactly happens underneath given the deadly shuffle that one might encounter.
Structured Streaming - Know thy Lag - While consuming off a Kafka topic that sees sporadic loads, its very important to monitor the Consumer lag. Also makes you respect what a beast backpressure is.
Skew! Phew! - Skewed data causes so many uncertainties, especially at runtime. Configs that applied on day zero no longer apply on day 100. The code must be made resilient to Skewed datasets.
Sample Sample Sample - Sometimes the best way to approach a large problem is to eat a small part of it first.
Redis - Sometimes the best tool for the job is actually outside your JVM. Pipelining + Redis is a powerful combination to supercharge your data pipeline.
We will present our war stories and lessons for the above and hopefully will benefit the broader community.

avatar for Yeshwanth Vijayakumar

Yeshwanth Vijayakumar

Sr. Engineering Manager/Architect, Adobe Systems Inc
I am a Sr. Engineering Manager/Architect on the Unified Profile Team in the Adobe Experience Platform; it’s a PB scale store with a strong focus on millisecond latencies and Analytical abilities and easily one of Adobe’s most challenging SaaS projects in terms of scale. I am actively... Read More →

Thursday May 13, 2021 13:30 - 14:30 EDT
Room #3