Natarajan Chidhambharam

MySQL DBA, Edmodo

Natarajan Chidhambharam is an infrastructure engineer at Edmodo with focus on database scalability and reliability. Relational databases, db infrastructure solutions for large scale websites are his main working interests. Edmodo successfully handled 25x db trac growth during the... Read More →

Miklos Szel

Senior MySQL Architect, Edmodo

Miklos Mukka Szel is a Senior DB Architect at Edmodo. With more than 20 years’ experience in system and network administration, he has also worked for Walt Disney International as its main International MySQL DBA. Miklos specializes in MySQL-based high availability solutions, performance... Read More →

Wednesday May 12, 2021 07:00 - 07:30 EDT
Room #3

MySQL, All Databases

07:00 EDT

Unified Point in Time Recovery in the Cloud [60min]

Meet WAL-G - disaster recovery tool for PostgreSQL, MySQL, MS SQL, MongoDB and other databases. WAL-G was designed for cloud deployments of PostgreSQL HA clusters. But its approach scaled well not only for petabytes of data and thousands of PG instances, but for various database engines as well.
In this talk, we will present architecture of point in time recovery with WAL-G, common points, and differences of many OLTP databases wrt online backup and changed data capture.

WAL-G is free and open source, led by community of developers.

Speakers

Andrey Borodin

Team lead of opensource RDBMS development, Yandex.Cloud

Software engineer, computer scientist, developer at Yandex, Ph.D., associated professor at Ural Federal University, co-founder of Octonica company. Researching data indexing since 2008. Teaching at Yandex School for Data Analysis and UrFU. Interested in backup technologies and data... Read More →

Senior Software Engineer, Huawei Technologies

8+ years of experience in open source development and management, contributed to projects like OpenStack, Libvirt, Hadoop, etc. Started to work on openEuler since 2019, currently work as community manager for openEuler community.

Xinyong Xiang

openGauss Community Manager, Huawei Technologies

openGauss Community Manager, openGauss Maintainer. Has been engaged in open source community related work, including OpenStack, Kubernetes, SODA, KubeEdge, openEuler , openGauss, etc. since 2015

Bo Zhao

Senior Software Engineer, Huawei Technologies

Bo Zhao has been actively working in opensource community for over 6 years. Currently, he is actively introducing and expanding the general arm ecosystem in the upstream communities on DB area.

Wednesday May 12, 2021 08:00 - 08:30 EDT
Room #4

Other SQL, All Databases

08:00 EDT

Dr. XtraBackup or: How I Learned to Stop Worrying and Love Backups I

This is the first session of two.
In this session we will discuss the fundamentals of backups how to perform basic operations with Percona XtraBackup 8.

Intro
- Why your backup strategy is (probably) wrong?
- The Schröedinger Backup.

Percona XtraBackup 8
-The Swiss Army Knife of MySQL Backups.
- The Inconsistent Backup Made Consistent.
- Backup, Prepare and Restore.
- Incremental Backups.
- Compression.

Speakers

Pep Pla

Consultant, Percona

Pep has been working with databases all his life. Born in a small village by the Mediterranean, he currently lives in Barcelona. He loves tech, traveling, good food, music and, all things NASA. He hates talking about himself in the third person and has a particular sense of humor... Read More →

Wednesday May 12, 2021 08:00 - 09:00 EDT
Room #5

Management & Backup, Development, Tools, or Utilities

08:00 EDT

Optimizing and Troubleshooting MySQL with PMM

Two of the most critical and challenging tasks for MySQL DBA's are optimizing MySQL performance and troubleshooting MySQL problems. The databases powering your applications need to be able to handle heavy loads while remaining responsive and stable, so that you can deliver an excellent user experience. DBA’s are expected to have plans in place to solve these issues. In this presentation, we will briefly talk about best practices for troubleshooting and optimization, as well as spending time showing how Percona Monitoring and Management (PMM) approaches typical performance optimization and troubleshooting tasks. We will look into how to spot a bad query which needs an index, inefficient queries which may need to be reworked, as well as spotting when a system (not sized correctly to manage its current load) can become saturated.

Speakers

Peter Zaitsev

CEO & Co-founder, Percona

Peter Zaitsev is CEO and co-founder of Percona. As one of the foremost experts on Open Source strategy and databases optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies... Read More →

Wednesday May 12, 2021 08:00 - 09:00 EDT
Room #6

Management & Backup, Tools

08:00 EDT

What is OpenSearch?

You may have heard of OpenSearch, but what is it exactly and how did it come about? In this session, you’ll learn about the components of OpenSearch, what they do, and what problems they can solve. No prerequisites for this session, but it prepares you for other OpenSearch sessions.

Speakers

Kyle Davis

Developer Advocate, Valkey

Kyle is the Senior Developer Advocate on the Valkey project. He has a long history with open source software development; he was a founding contributor to the OpenSearch project and most recently worked to build a community around Bottlerocket OS. When not working, Kyle enjoys 3D... Read More →

Wednesday May 12, 2021 08:00 - 09:00 EDT
Room #8

OpenSearch Community Track

08:30 EDT

Practical Database Automation with Ansible

Automation has been a marketing buzzword in the industry for a long time.

But that doesn't make it irrelevant.

One area that needs attention is how automation affects the database.

Should we automate databases? And if so, should we automate all the things? Where can I start? What are the dangers of automation?

This presentation will explore those questions and provide a good starting place for anyone who wants to get going with automation.

Specifically, we will look at Ansible, a popular automation tool. We will get to know Ansible concepts by using database-centric examples.

The examples provided will be based in MySQL, but the concepts can be used on any database environment.

By the end of this presentation, everyone should be ready to automate their databases responsibly.

Speakers

Derek Downey

Founder/Trainer, DistributedDBA

As a technologist excited about open source database systems and the businesses that they power, Derek is enthusiastic about efficiency through automation, Operational Visibility, and the adoption of Cloud Technologies.Derek has helped with many customers implement automation for... Read More →

Wednesday May 12, 2021 08:30 - 09:00 EDT
Room #4

Management & Backup, Development, Tools, or Utilities

08:30 EDT

MongoDB surviving after unclean shutdowns

Abstract: How MongoDB recovers from unclean shutdown explaining according to what works the recovery process with WiredTiger Journal internally and how data is protected during the whole process. It could be internal stus but I think it's wonderful how MongoDB and WiredTiger implements Write Ahead Log and I love to teach in a simple way.

Speakers

Alexandre Araujo

Senior Database Engineer, DBAcorp Brazil

Specialist Database Adminsitrator with 20 years of experience in Brazil acting in projects on the main Brazilians FinTechs and StockBrokers companies at Financial Brazilian Market

Wednesday May 12, 2021 08:30 - 09:00 EDT
Room #2

MongoDB, All Databases

08:30 EDT

Creating MySQL User-Defined Functions in C++ Has Never Been Easier

In this session I will show how to use **C++ UDF wrappers** from Percona Server 8.0.22+ to add new custom functionality to MySQL.
Forget about *funcinit()* / *func()* / *funcdeinit()* functions, individually defined context structures for passing data between them, manual memory allocations and ugly casts from *void* * to extract function parameter values - now you have nice **c++14** wrappers to do all the dirty work behind the scene with minimal overhead.

Speakers

Yura Sorokin

Principal Software Engineer, Percona

Yura is a Principal Software Engineer at Percona, mostly working on Percona Server Core. He is the primary developer who implemented "Compressed Columns with Dictionaries" and "SEQUENCE_TABLE()". Before joining in July 2015 he was leading a cloud file service backend dev team which... Read More →

Wednesday May 12, 2021 08:30 - 09:00 EDT
Room #1

MySQL, All Databases

08:30 EDT

ClickHouse 2021: New Features and Roadmap

In 2021 the ClickHouse community is shipping the features that you probably always dreamed of. We are eliminating previously known limitations of ClickHouse. I will tell you and show demos about: replication without ZooKeeper; semistructured data; processing of frequent small INSERTs; support for transactions; window functions and projections and more and more... We're really excited about these features and hope they will make ClickHouse even better for your analytic applications.

Speakers

Alexey Milovidov

Lead ClickHouse Engineer, Yandex

Alexey Milovidov was the original designer of ClickHouse, starting from its inception in 2008. He is an expert on high-performance C++, analytic applications, and SQL databases. Alexey is the lead committer of the ClickHouse open source project on Github. He leads the ClickHouse development... Read More →

Wednesday May 12, 2021 08:30 - 09:30 EDT
Room #7

Altinity Community Track

09:00 EDT

The 10 Open Source Database Trends That Are Transforming Your Database Infrastructure Forever

Open source software is the defacto standard for many new applications, this is especially true in the database industry. Currently, MySQL, PostgreSQL, MariaDB, MongoDB, Elastic, and others have shown up in every industry and organization in the world in some form or another. People are no longer choosing a single database for the company, they are letting developers and architects choose the best database for the job.

This has led to an increase in the number of technologies operations teams have to support. Couple that increases in technologies with a growing micro-service ( or cloud-native ) development paradigm where every service has its own database and where all the data is valuable.

Now companies are now faced with dozens of technologies, hundreds or even thousands of individual database instances, and petabytes of data. The management of the complexity of such an environment is changing the way we look at systems and operations.

Let’s talk about the trends and tell you what you need to know about how to manage the new multi-verse of data.

Speakers

Matt Yonkovit

Head of Open Source Strategy, Percona

Matt is currently working as the Head of Open Source Strategy (HOSS) for Percona, a leader in open source database software and services. He has over 15 years of experience in the open source industry including over 10 years of executive-level experience leading open source teams... Read More →

Wednesday May 12, 2021 09:00 - 09:30 EDT
Room #1

Hybrid or Mixed Deployments, All Databases

09:00 EDT

A Large Scale MongoDB Migration From MMAPv1 to WiredTiger

MMAPv1 storage engine was deprecated in MongoDB 4.0 and removed in MongoDB 4.2, which makes WiredTiger the only available storage engine for MongoDB from version 4.2 and downwards. Even WiredTiger became available in MongoDB version 3.0 most users delayed the migration between the storage engines for various reasons, like bugs or the uncertainty of being an early adopter. As a matter of fact, some users delayed the unavoidable storage engine migration until MongoDB 4.0.

In this presentation, we are going to describe a large-scale migration of an MMAPv1 environment to WiredTiger. For the scope of this presentation, we define as a large-scale environment, as an installation with more than 100 TeraBytes of data, more than 1000 shards involved, and more than 100 different workloads. We are going to focus mainly on the preparation steps, as we think it's the most important piece of this type of migration. At the same time, we are going to analyze the actual migration steps, the rollback procedure, and most importantly the lessons learned

Speakers

Antonios Giannopoulos

Senior Database Administrator, Rackspace Technology

I am working as Senior NoSQL Database Administrator at Rackspace supporting thousands of MongoDB installations over the past 7 years. I have 18 years experience in databases and system engineering. I really enjoy challenges in sharding and schema design and love migrations from Relational... Read More →

Wednesday May 12, 2021 09:00 - 09:30 EDT
Room #2

MongoDB, All Databases

09:00 EDT

Scaling MySQL @LinkedIn with Vitess

MySQL serves as the datastore for many of the important internal tools at LinkedIn. A typical MySQL cluster at LinkedIn has 1 primary and 2 replicas for read-scaling and High Availability. To scale the reads for all these tools, more replicas are added to the cluster. But, what about write-scaling which still goes to a single primary?

We started looking for an answer to this question about a year and a half ago when one of the tools ramped up quickly and was struggling due to writes. When we decided sharding as the solution to this, there was a choice between writing the sharding logic in the application itself or choosing Vitess.

Vitess stood out for us. Although we had to tweak the schema design, there were minimal changes to the application code as it supports almost all the SQL queries and connection pooling.

This talk will be focused about our journey with Vitess. We will take the attendees through the ‘Why’ of our journey, why we chose Vitess and not any other sharding method; how we migrated the platform real time and what tremendous metrics we achieved post successful migration.

The talk will consist of:-
- Introduction
- Why Vitess?
- Brief Introduction to Vitess
- Challenges while moving to Vitess
- What it looks like at the Infra-level
- Key Achievements

Along with talking about our journey with Vitess, we will touch base upon what is next for Vitess@LinkedIn.

Key Takeaways
- Vitess as a sharding solution to MySQL
- Insights from our learnings

Speakers

Apoorv Purohit

SRE, LinkedIn

I started my career as an application developer in 2016 handling C++ telephony module and report generation from MySQL. With my growing interest in databases, I switched to become a full time database engineer in 2017 at OneDirect. I was a part of the team that handled the entire... Read More →

Karthik Appigatla

Staff SRE, LinkedIn

Karthik Appigatla has been working on various large scale data stores for a decade primarily focused on MySQL. Currently, he has been working for LinkedIn for the last 5 years. Prior to LinkedIn, he worked for Yahoo, Pythian and Percona where he was responsible for helping clients... Read More →

Wednesday May 12, 2021 09:00 - 09:30 EDT
Room #6

MySQL, All Databases

09:00 EDT

Open Source Database Architectures: Shifting From Capture-First to Query-First

It’s like picking a flavor of ice cream. You have your go-to databases, your favorites, the ones you know and love. But are they always the right choice for every task? And, the reality is you’re going to be working across five or six anyway. So how to choose? Instead of fitting the workload to the database, you fit the database to the workload. Is your workload mostly reads? Writes? Or a mixture? And what kinds of reads, linear scans, random reads? Do you need a database for transactioning or for durability? We’ll walk through the many open source options, project by project, and map workloads to the right database that can make or break the success of your next project.

Speakers

Rob Dickinson

CTO and Co-Founder, Resurface Labs

Charly Batista

PostgreSQL Tech Lead, Percona

A Brazilian living in China... Charly is passionate about new cultures, their languages and traditions. Charly has been working with database and development for more than 15 years and has participated in small and large projects in Brazil, the US, China, and other countries. Currently... Read More →

Wednesday May 12, 2021 09:00 - 10:00 EDT
Room #4

PostgreSQL, All Databases

09:30 EDT

MongoDB : To Shard or Not To Shard?

Sharding is one of the most difficult aspects of MongoDB to get right. Shard too early and your costs go up with little gain, shard too late and you and your customers will be feeling the pain. This talk aims to help inform you on considerations about when to shard, when sharding may not be the right thing for your database, and things to consider when you are going to move forward with sharding your MongoDB database.

Speakers

Mike Grayson

MongoDB Database Engineer, Percona

Mike Grayson is a MongoDB Database Enginer at Percona, the unbiased open source database experts. Mike has been involved in many aspects of the MongoDB community since he started using the database in 2014. Heading the Western NY MongoDB User Group, blogging and being involved in... Read More →

Wednesday May 12, 2021 09:30 - 10:00 EDT
Room #2

MongoDB, All Databases

09:30 EDT

The Lost Art of Database Design

The scalability of your application and your database is only as good as the database design you put behind it. Designing your schema, the database structures, and planning for the future need to happen early and needs to evolve over time. In today's rapid pace development cycles, the database design is often overlooked or even dismissed entirely. Databases and marketing teams tout "Schemaless Designs", database as a service, and new tech that makes caring about databases a thing of the past. I will explain why database design is as important as ever and I will give you the 8 things you need to design on every application regardless of which database or service you use.

Speakers

Matt Yonkovit

Head of Open Source Strategy, Percona

Wednesday May 12, 2021 09:30 - 10:00 EDT
Room #6

Other, Development, Tools, or Utilities

09:30 EDT

ClickHouse Developer Tutorial, Part 1 - Intro to ClickHouse

Heard about ClickHouse and been itching to try it out? This talk is for you! It's a tutorial to get new ClickHouse developers up and running quickly. We'll begin by summarizing practical differences between ClickHouse and row stores like MySQL or PostgreSQL. Next, we'll show how to install ClickHouse and connect to it with popular client tools. We'll then teach basics of ClickHouse SQL, focusing on commands to build reports and dashboards. Continue from this talk to the Tutorial Lab for some real ClickHouse exercises and Q&A with our experts.

Speakers

Robert Hodges

CEO, Altinity

Robert Hodges is CEO of Altinity, an enterprise provider for ClickHouse data warehouse. He's also a database geek with experience on at least 20 DBMS types. Robert caught the Kubernetes bug at VMware in 2018.

Alexander Zaitsev

CTO, Altinity

Alexander is CTO and a founder of Altinity, which operates and supports ClickHouse for enterprises. After a career building large analytic apps on Vertica and ClickHouse, he turned to making ClickHouse itself work better. Alexander has helped over 4 of 4 Submit one hundred companies... Read More →

Wednesday May 12, 2021 09:30 - 10:30 EDT
Room #7

Altinity Community Track

09:30 EDT

MySQL Performance for DevOps

MySQL performance can be improved by tuning queries, server options, and hardware. Traditionally it was an area of responsibility of three different roles: Development, DBA and System Administrators. Now DevOps handle these all. But there is a gap. Knowledge, gained by MySQL DBAs after years of focus on the single product is hard to gain when you focus on more than one. This is why I am doing this session. I will show minimal, but the most effective, set of options which will improve MySQL performance. For illustrations, I will use real user stories, gained by my Support experience, and Percona Kubernetes operator for PXC.

Speakers

Sveta Smirnova

Principal Support Engineering Coordinator, Percona

Sveta Smirnova is a MySQL Support Engineer with over 10 years of experience. She currently works in Percona. Her main professional interests are problem-solving, working with tricky issues, bugs, finding patterns that can solve typical issues quicker, teaching others how to deal with... Read More →

Wednesday May 12, 2021 09:30 - 10:30 EDT
Room #1

MySQL, All Databases

10:00 EDT

Backup, DR, and Migration of Data-Rich Applications in Kubernetes

**Title: Backup, DR, and migration of data-rich applications in Kubernetes**

Managing application state in Kubernetes requires handling not only persistent data requirements of the application, but also associated Kubernetes objects and declarative configuration that is specified by
application developers. This expands the boundaries of data protection in the traditional application sense. In this session, we will discuss and demo how NetApp Astra simplifies and provides a consistent end-to-end application data lifecycle management for modern applications running on Kubernetes clusters.

Speakers

Securing OpenSearch

OpenSearch has a strong set of security features but not everyone takes advantage of them. This session will provide an overview of the features as well as show ways to apply these features for real world problems. The session will include demos of data masking, authentication, and authorization, as well as index, document, and eld level security.

Speakers

Kyle Davis

Developer Advocate, Valkey

Wednesday May 12, 2021 10:00 - 11:00 EDT
Room #8

OpenSearch Community Track

10:00 EDT

Power Use of Indexes in PostgreSQL - A User Perspective.

There have been many presentations about the Different Indexes in PostgreSQL ( B-Tree, HASH, GIN, GiST etc), especially from the PostgreSQL architecture perspective.

But these talks always lacked details from the user perspective on the selection of indexes.

It is common to see that architects and developers fail to select the right types of index and the way it should be used. Just an overview of all types of indexes also won't help much in decision-making. In this talk, I am covering the following points also.

1. Index when partitioning is not an option.
2. 2. Inverted Indexes and their usefulness in the real world.
3. 3. Tips and techniques for efficient index usage.
4. 4. How important is Index usage monitoring and how to do that.

This talk is more towards proper examples and demonstrations. This presentation with demonstrations is expected to drive users to the right selection of indexes and better usage

Speakers

Sergey Kuzmichev

Support Engineer, Percona

Sergey is a support engineer in Percona. Interested in all things databases, he's currently working mainly with MySQL and PostgreSQL. He started his career working as an Oracle DBA, later moving to a DevOps engineer role supporting a Java-based trading platform running on PostgreSQL... Read More →

Jobin Augustine

PostgreSQL Escalation Specialist, Percona

Jobin Augustine is a PostgreSQL expert and Open Source advocate and has more than 19 years of working experience as consultant, architect, administrator, writer, and trainer in PostgreSQL, Oracle and other database technologies. He has always been an active participant in the Open... Read More →

Wednesday May 12, 2021 10:00 - 11:00 EDT
Room #3

PostgreSQL, All Databases

10:30 EDT

ClickHouse Developer Tutorial, Part 2 - Lab Exercises

This talk consists of live lab exercises for the ClickHouse Developer Tutorial, Part 1. Please attend that talk and then join us for fun using ClickHouse. We have some puzzles for you to try that will test your ClickHouse knowledge. You can run all queries straight from your web browser, so there's no preparation required. Join us for the fun!

Speakers

Robert Hodges

CEO, Altinity

Author of BangDB, Founder of IQLECT

Sachin has over 20 years of experience in building software products in database, ecommerce and distributed computing area. He has previously worked with Microsoft in the SQL org, developing key value store for devices. In Amazon he led the engineering team for sponsored link platform... Read More →

Wednesday May 12, 2021 11:00 - 11:30 EDT
Room #6

Other NoSQL, All Databases

11:00 EDT

A Tale of Two Communities: How Open Source, ClickHouse and Superset Help Visualize Your Data

Databases have data. Business intelligence tools visualize it. This talk walks through how we are building a polished integration between ClickHouse database and Superset BI using 100% open source techniques. We'll introduce ClickHouse and Superset, then describe the connectivity problems facing us at the beginning. Next, we'll show how we worked at the community level using Github and Slack workspaces to solve them quickly. We'll end with a demo of the working result. This case study shows the power of open source communities to serve our shared developers.

Getting Your Mind Around OpenSearch Geospatial Data

Flat earthers need not attend! This session will go through the basics of geospatial data in general, then go deeper on how it works in (and inside of) OpenSearch. The session will explain how to get geospatial information into OpenSearch, then how to query and visualize the data.

Speakers

Staff Software Engineer @ Nutanix Era, Nutanix Inc.

Manish leads the open-source database team in the Nutanix Era product group, managing the implementations of Postgres, MySQL and MariaDB. Previous to joining Nutanix, Manish was a Senior MTS at Oracle, working on the Parallel Query framework in the Oracle database engine.

Mehboob Alam

Sr. Solutions Architect, Nutanix, Inc.

Mehboob is a long-time open-source advocate and evangelist in the Postgres community, co-organizer of various community meetups and the annual global Postgres US conference. At Nutanix, he guides the development and support of Postgres in the Era DBaaS platform and helps customers... Read More →

Wednesday May 12, 2021 11:30 - 12:00 EDT
Room #6

Other Cloud, Cloud Technologies

12:00 EDT

The Changing Face of Open Source Database Software Adoption. How the Market Changed in the Last 12 Months.

In this wide-ranging state of the market keynote, Peter will be discussing a range of recent themes and developments. These include: 1) the overall growth of open source and how this might have been impacted by Covid-19. 2) The role of the public cloud - is this be good or bad for open source? 3) The top reasons companies choose open source software. 4) Whether licensing changes could change the direction of open source software. 5) Why your company should be actively contributing to open source.

Speakers

Director of Engineering, Kasten by Veeam

Tom graduated with an M.S.E from the University of Michigan in 2013. His first job was on the server team at Maginatics, cloud based file system company which was acquired by EMC late in 2014. After the acquisition, he joined Dropbox where he was focused on improving the efficiency... Read More →

Wednesday May 12, 2021 13:00 - 13:30 EDT
Room #4

Kubernetes, Cloud Technologies

13:00 EDT

MariaDB 10.6 - What's New?

MariaDB's 10.6 is coming with a lot of improvements, specifically centered around performance. Although the contents of this talk will depend on exactly what makes it into MariaDB 10.6 GA, the following topics will be covered.

Atomic DDL in MariaDB
Optimizer changes
InnoDB changes
Oracle compatibility changes and parser changes

Speakers

Vicențiu Ciorbaru

Team Lead, Senior Developer, MariaDB Foundation

Vicentiu works at the MariaDB Foundation as a Software Engineer and Team Lead. He focuses on optimizer development, but has also worked on other parts of the MariaDB Server. Vicențiu has been part of the MariaDB ecosystem since 2013, where he first contributed Roles to MariaDB. Over... Read More →

Wednesday May 12, 2021 13:00 - 13:30 EDT
Room #9

MariaDB Community Track

13:00 EDT

Using Percona Audit Plugin in Daily Operation

Log audits have for some become very important for various reasons. Perhaps you need to provide documentation for who had access to your database, perhaps you need to investigate a data breach. Many other scenarios may apply, and for this purpose Percona provides a plugin for MySQL for providing just that!
In this session we'll have a look at how to enable the audit plugin, how to configure it, and - most importantly - collect the logs and import them to a ClickHouse database for storage and analysis. And we'll have a look at some of the possibilities this provides.

Speakers

Lenny Andersen

DBA, Norlys

I have been using MySQL since 2003 and have been a MySQL DBA since 2013 with a constant focus on security and performance.

Wednesday May 12, 2021 13:00 - 13:30 EDT
Room #1

MySQL, All Databases

13:00 EDT

Something Went Wrong: Understanding Alerting and Anomaly Detection

Being informed of something going wrong after it has happened is never ideal. Setting up a system to monitor logs and metrics allows you to be proactive rather than reactive. This session will cover the differences between alerting and anomaly detection, how each works, and where they are best employed.

Speakers

Senior Support Engineer, Percona

Fernando joined Percona in early 2013 after 8 years working for a Canadian company specialized in Linux and Open Source technologies. As a member of Percona's Support team, Fernando works closely with customers helping them troubleshoot issues with MySQL, PostgreSQL, and MongoDB servers... Read More →

Jobin Augustine

PostgreSQL Escalation Specialist, Percona

Wednesday May 12, 2021 13:00 - 13:30 EDT
Room #3

PostgreSQL, All Databases

13:00 EDT

Push-button deploy MongoDB with Ansible

Installing a large number of MongoDB servers can be quite challenging, especially for the newcomer.
In this talk, we will show you the details about the project of creating a push-button approach to deploy sharded MongoDB clusters and replica sets using Ansible.
We will start with a quick introduction to Ansible, and show you the most interesting parts of the actual playbook code, as well as the benefits of this approach.
You will leave this session with some ideas you can reuse in your particular environment.

Speakers

Ivan Groenewold

Architect, Percona

Ivan has spent 15+ years architecting and supporting mission-critical environments for top-of-the-line companies. He has vast experience running on-prem or in the cloud, using diverse database technologies like Oracle, MySQL and MongoDB.He is also regular speaker & contributor to... Read More →

Kim Thomas

AppOps - OpenSource Database Team, FISERV

I am a DataBase Architect at FISERV, primarily responsible for Delivering Open Source Database solutions across the Enterprise. Knowledgeable in DBaaS, Relational, Columnar, Big Data, OLAP, OLTP, NoSQL, and DB Operator Technologies. Have been working with Various Database Technologies... Read More →

Wednesday May 12, 2021 13:00 - 14:00 EDT
Room #2

MongoDB

13:30 EDT

DuckDB: Embedded Analytics with Parallel/Vector/Columnar Performance

DuckDB is a project coming out of CWI in the Netherlands that combines vector, columnar, and parallel capabilities.
Highlights:

Great performance for analytic queries
Fast batch load
Near linear scaling
Near-zero administration

DuckDB is a SQLite replacement for analytics:
TRANSACTIONS | ANALYTICS
-------------------------------------------
EMBEDDED/SERVERLESS SQLite | DuckDB
-------------------------------------------
SERVER PROCESS MySQL/MariaDB |
Postgres | Columnstore
In this session you will learn about:

What is different about Embedded/Serverless data engines?
Current performance and features of DuckDB
Best practices using DuckDB

Speakers

Fabrízio de Royes Mello

PostgreSQL Developer, OnGres Inc

Currently help people and teams to take the full potential of relational databases, especially PostgreSQL, helping them to design the structure of the database (modeling), build physical architecture (database schema), programming (procedural languages), SQL (usage, tuning, best practices... Read More →

Alvaro Hernandez

Founder, OnGres

Álvaro is a passionate database and software developer. Founder of OnGres ("ON postGRES"), he has been dedicated to Postgres and R&D in databases for more than two decades.Álvaro is at heart an open source advocate and developer. He has created software like StackGres, a Platform... Read More →

Wednesday May 12, 2021 13:30 - 14:00 EDT
Room #6

PostgreSQL, All Databases

13:30 EDT

Collaboration in Open Source: A Q&A on Github, Jira, Zulip and Knowledgebase

This is an extended, interactive panel / Q&A version of the 20 min talk on Collaboration in Open Source. Whereas the 20 min talk sets expectations, this session in the MariaDB Community Room goes in depth and solicits input from the audience. How can MariaDB Foundation improve its level of interactivity and collaboration with its developer community?

Speakers

Kaj Arnö

CEO, MariaDB Foundation

Kaj Arnö is CEO of the MariaDB Foundation. He is a software industry generalist, having served as VP Professional Services, VP Engineering, CIO and VP Community Relations of MySQL AB prior to the acquisition by Sun Microsystems. At Sun, Kaj served as MySQL Ambassador to Sun and Sun... Read More →

Ian Gilfillan

Principal technical writer: documentation, MariaDB Foundation

Ian first came across MySQL in the 90s, upgrading from mSQL while developing South Africas' first online grocery store, and teaching and developing internet programming courses. He was lead developer for South Africa’s largest media company from 2000, and wrote the book Mastering... Read More →

Robert Bindar

Server Developer, MariaDB Foundation

Robert started working for the MariaDB Foundation in 2018 as a server developer. His main focus is divided between server development and helping the community contribute faster and more efficiently to the MariaDB codebase. Robert is based mostly in Brasov, Romania.

Vicențiu Ciorbaru

Team Lead, Senior Developer, MariaDB Foundation

Anna Widenius

Chief of Staff, MariaDB Foundation

Anna Widenius is a Chief of Sta in the MariaDB Foundation.

Wednesday May 12, 2021 13:30 - 14:30 EDT
Room #9

MariaDB Community Track

13:30 EDT

Tricks Of The Trade

TRICKS OF THE TRADE; A COLLECTION OF TECHNIQUES ADRESSING COMMON ADMINISTRATION MISTEPS AND MISTAKES

PostgreSQL is not only the most sophisticated open source database management system in world it's also among the most reliable and easy to setup. But even under the best of circumstances there are situations where things can just plain go wrong by making a wrong assumption. This purpose of this talk is to review the most common missteps and mistakes administrating a Postgres data cluster and how to prevent them from escalating into production-level issues.

For the purposes of this presentation, we will not cover query tuning per se.

We'll first start with the most common issues and gradually review some of the more esoteric challenges a DBA can encounter.

Here's a breakdown of the topics that will be covered:

- Host Based Authentication Rules
- rules that are never reached
- METHOD mangling
- appreciating "peer"
- too much "trust"
- the much maligned "reject"
- about password hashing: password vs md5 vs scram-sha-256
- SSL laxity i.e. host vs hostssl
- Over using the superuser
- SSL
- Certificates
- CA signed vs self-signed
- life span: too long vs too short
- About Ciphers: weak (peformance) vs strong (security)
- Replication
- postgres logging
- where to put it
- too much vs too little
- log rotation
- Over Allocation Of System Resources
- swap vs noswap
- Linux's OOM Process Killer
- some runtime parameters of interest
- max_connections
- effective_cache_size
- work_mem
- maintenance_work_mem
- Good autovacuuming hygiene

Speakers

Robert Bernier

PostgreSQL Consultant, Percona

Robert's experience extends several decades. His first experience was playing hangman on a DECwriter shortly after man first landed on the moon. His foray into commercial applications was programming Fortran, via punchcards, on an IBM 360 which in those days had 4MB RAM. Over the... Read More →

Wednesday May 12, 2021 13:30 - 14:30 EDT
Room #4

PostgreSQL, All Databases

14:00 EDT

Change Data Capture (CDC) on Top of Statement-Based Replication (SBR)

Yes, that's right, here at Box, we were able to successfully create a Change Data Capture stream on top of Statement-based Replication. Although normally impossible, some quirks in our data access layer coupled with some unique usages of MySQL query comments, PGTID and Kafka have enabled us to successfully provide an at-least-once delivery event stream of changes in our sharded MySQL infrastructure. The sharded MySQL infrastructure lies at the heart of Box.com, made up of 100s of shards, 1000s of servers and billions of records. In this talk, I will take you through the implementation details as well as the challenges involved in building out our change stream.

Speakers

Venkat Morampudi

Sr. Software Engineer, Box, Inc.

Venkat Morampudi is a Sr. Software Engineer on Database & Cache Infrastructure team at Box.

Wednesday May 12, 2021 14:00 - 14:30 EDT
Room #1

MySQL, All Databases

14:00 EDT

I’ve Got a Fever and the Only Prescription is Apache Druid

Digital transformation initiatives have unlocked large and fast-moving data sets including clickstreams, network telemetry, application monitoring and IoT devices. Analytics architectures have not kept pace, with most data still being run through existing “cold analytics” systems and tools designed for smaller and less time-sensitive workloads. “Hot analytics” denotes workloads where the responsiveness of the system is instantaneous and can support self-service data exploration, and where the data is extremely fresh, allowing for more informed decision-making.

The breadth of analytical systems in the world today demands a clear approach to selecting the right one for a given workload. In this talk, we’ll discuss a temperature-based way of thinking, where workloads get “hotter” as they become more interactive, more concurrent, and more likely to need up-to-the-second data.
Apache Druid is a modern cloud-native, stream-native, analytics database designed for workflows where fast queries and instant ingest are important. Druid excels at instant data visibility, ad-hoc queries, operational analytics, and handling high concurrency. It is a strong candidate for being the workhorse system for hot analytics.
In this session Rachel will discuss:
How to categorize your analytics workloads based on temperature.
The distinctive attributes of Apache Druid that recommend it for hot analytics where query speed and data freshness is paramount.

Speakers

Rachel Pedreschi

Vice President, Community, Imply

A “Big Data Geek-ette,” No stranger to the world of high-performance databases and data warehouses, Rachel has more than 20 years of business intelligence and data engineering experience, and is a Cassandra, Vertica, Informix and Redbrick certified DBA on top of her work with... Read More →

Wednesday May 12, 2021 14:00 - 15:00 EDT
Room #7

Altinity Community Track

14:00 EDT

Venmo's Aurora Upgrades With Open Source Tools

Venmo's Aurora database clusters are the centerpiece of its success, but it's not without its own operational challenges like upgrading from one major version to another. To unlock performance efficiency and operational costs, we had to rely on a number of Open Source tools to make a successful non-event upgrade possible.

- Percona Monitoring and Management to measure query performance.
- pt-upgrade to validate queries between versions.
- ProxySQL for a virtually no downtime switchover/rollback.
- In-house tools to bridge some gaps in testing.

During this talk, we will piece together these tools and the process we followed to not let Venmo users down.

Speakers

Ashwin Nellore

Manager, Software Developer 3, Venmo (Paypal Inc)

Kushal Shah

MTS 1, Database Engineer, Paypal/Venmo

Kushal is MTS 1, Database Engineer at Venmo (Paypal) with focus on Database scalability and reliability. He has 7 years of expereince working on relational as well as NO SQL data stores like MySQL, MongoDB, DocumentDB, DynamoDB. Prior to Venmo, he has worked as Database Engineer at... Read More →

Wednesday May 12, 2021 14:00 - 15:00 EDT
Room #2

Amazon, Cloud Technologies

14:00 EDT

The Top 5 Things You Should Know About Databases on Kubernetes vs VMs

Kubernetes is becoming the default infrastructure for deploying a variety of stateless and stateful services. So what are the pros and cons of bringing databases into a DevOps-style management strategy based on Kubernetes? Data systems are very performance sensitive and moving to new virtualization strategies is potentially hazardous. Moreover, Kubernetes doesn’t supply all the capabilities needed to run stateful data services -- you need to understand how to partition responsibilities between Kubernetes and the database itself. This means you can’t just stuff a database into a container and let it fly – YOLO! Instead, Kubernetes requires an entirely different approach to the curation of databases. VMware has deep expertise with virtual machines and Kubernetes. Our team also has committers and contributors to the PostgreSQL project. Our talk will focus on the different techniques of running databases in long-running VMs versus short-running containers. In particular, we’ll cover all of the lifecycle steps, not only the initial deployment experience, including:

How quickly can a database be provisioned?
What are the real differences in performance?
What it's like to scale up and scale down for each?
How does recovery compare when there's a failure?

This is an intermediate-level talk, so we'll skip the mechanics of deploying databases (helm charts, operators, etc). Bonus: lots of data and comparison analysis!

Speakers

Marco Nicosia

Product Line Manager - Tanzu Edge, VMware Tanzu

Marco Nicosia is a Product Manager for Tanzu Edge at VMware. He has been working with large-scale deployments since the 90s, starting at Inktomi, a Web search engine. He was an early technical leader in the PaaS industry while at Engine Yard, serving clients such as Groupon, New Relic... Read More →

Rachel Heaton

Software Engineer, Tanzu SQL, VMware

Rachel has worked at consultancies, startups, and larger companies in a variety of roles, like head of engineering, manager, and tech lead. She’s currently working at VMware on our Postgres products.

Adam Berlin

Software Engineer, Tanzu SQL, VMware

Adam Berlin is a jack-of-all-trades software engineer with experience coaching and consulting. He is currently a member of the VMware Tanzu SQL with Postgres for Kubernetes engineering team.

Wednesday May 12, 2021 14:00 - 15:00 EDT
Room #6

Kubernetes

14:00 EDT

Scaling Large Tables in PostgreSQL With Declarative Table Partitioning

When a table gets too large, performance and maintenance are heavily affected. Splitting the table into multiple partitions to achieve the desired performance.

Table partitioning has been supported in PostgreSQL for many years as a design pattern of table inheritance, which is complex to use correctly, and didn't benefit from any parallelism. Since PostgreSQL version 10, there is support for declarative table partitioning, having new features in improvements in later versions. Table partitioning is now much easier to use and there are more use cases covered. In this talk, we will review with concrete examples how you would benefit from table partitioning, how to use declarative partitioning, and what are the implications of taking some decisions when designing the schema.

Speakers

Boriss Mejias

Solution Architect, EDB

I'm a holistic system software engineer (aka Solution Architect), PostgreSQL consultant and trainer, free software user, and headbanger. I have been working with PostgreSQL since version 9.1. First, as part of my job related to other projects, and with full dedication since 2017... Read More →

Wednesday May 12, 2021 14:00 - 15:00 EDT
Room #3

PostgreSQL, All Databases

14:30 EDT

SQL Without the Database? Stream Instead of Storing

Why store when you can stream? Modern open source streaming platforms like Apache Kafka provide means of storing and processing event driven data. When combined with streaming analytics projects like Apache Flink, many business applications, and event driven microservices may not even need a database storage layer. With the rise of streaming SQL engines, in particular FlinkSQL, developers with SQL and database skills can use these to build complex event processing and advanced analytics applications. There are performance advantages to being able to run queries before data lands in a database. However, full solutions often need long term storage as well for regulatory purposes, forensic purposes or to train machine learning to work better in the stream. This talk will cover new approaches to data architecture, how it ts in to existing event and traditional database applications, and provides practical examples, and approaches to operating systems built on streaming on cloud, using open source

Speakers

Simon Elliston Ball

Senior Product Manager for MSK, AWS

Simon has been working on streaming data analytics for many years, at Hortonworks, the Cloudera, and now with Amazon Web Services, where he is the Product Manager for Amazon Managed Streaming for Apache Kafka. He also managed contributions to Flink and Kafka for Cloudera, and is an... Read More →

Wednesday May 12, 2021 14:30 - 15:00 EDT
Room #5

Hybrid or Mixed Deployments, All Databases

14:30 EDT

Database Backup and Import With Kanister

Backup and migration of database systems running in Kubernetes can be complicated. Kanister, an open source tool from Kasten, simplifies the process. Kanister offers tools for exporting and importing databases from cloud storage - greatly simplifying backup and migration of databases running cluster databases.

This talk will introduce Kanister and discuss basic use. Examples will be given for the migration of a Postgres database between clusters. The talk will also introduce use for disaster recovery and setup of development clusters.

Speakers

Aaron H Alpar

Member Technical Staff, Kasten

Aaron Alpar is a Member of Technical Staff at Kasten by Veeam. He has extensive background in reliable production systems. He has been working with Kubernetes since 2017, has been a presenter at Kubecon, and is an active contributor on Github. Talks: Presentation: KubeCon North America... Read More →

Wednesday May 12, 2021 14:30 - 15:00 EDT
Room #4

Kubernetes, Cloud Technologies

14:30 EDT

JSON Additions in MariaDB - Featuring JSON_TABLE

MariaDB has had JSON support for a while now. Released initially in 10.2, MariaDB tries to follow the SQL standard as close as possible. One of the new additions coming to MariaDB 10.6 is support for JSON_TABLE. In this talk we will go through the details of this new feature, as well use cases and interactions with JSON path. We will also compare MariaDB's implementation to other databases, so that you are aware of pitfalls if a migration is due. On the topic of migration, MariaDB has also introduced a data type plugin that understands MySQL's binary JSON format and coverts it to MariaDB's text based representation, without needing to do a full dump and restore.

Speakers

Vicențiu Ciorbaru

Team Lead, Senior Developer, MariaDB Foundation

Wednesday May 12, 2021 14:30 - 15:00 EDT
Room #9

MariaDB Community Track

14:30 EDT

Data Protection for Rapid Recovery at Scale

Some things can’t scale in the cloud. When you are trying to get all the performance out of your systems for SaaS and IaaS instances, RAID 0 seems like a good option. What do you do when you have 60 servers go down due to an SSD failure? In this session you will learn about new breakthrough data protection technology for SSDs that gives you better performance during a drive rebuild than you can get from RAID 0 and from a smaller footprint.

Speakers

Steve Fingerhut

President & CBO, Pliops

Steve has built multiple new technology businesses to over a billion in annual revenue. His experience includes SVP/GM Toshiba Memory America’s SSD Business Unit, VP Marketing SanDisk’s Enterprise SSD division, Co-founder and VP Marketing LSI’s Accelerated Solutions Division... Read More →

Wednesday May 12, 2021 14:30 - 15:00 EDT
Room #1

MySQL, All Databases

15:00 EDT

Fun and Games: Why We Picked ClickHouse To Drive Gaming Analytics at GiG

With such a saturated market for the iGaming industry, many technology providers are reviewing their architecture to provide a leaner, more sustainable platform. Complex architectures that involve heavy license and high maintenance costs were two of the rules to avoid when GiG kicked o R&D on what technology to choose at the heart of GiG Data. Tied with the necessity of good governance whilst empowering stakeholders needing real-time data, the list of databases became shorter and shorter.

With no vendor-locking and costly licenses ClickHouse came out a winner at being the best candidate for any business looking to host an on premise realtime database for analytics. Stephen and Matthew will be elaborating on how ClickHouse 2 of 5 Speakers became the database of choice.

Speakers

Stephen Borg

Director of Data, Gaming Innovation Group

Stephen has had a career in technology for online and retail gambling, having worked close to business for a number of B2C and B2B providers. Sound background in technology, delivering affiliate and gambling platforms using both .NET and Java frameworks. For the past 8 years Stephen... Read More →

Matthew Formosa

Enterprise Data Architect, Gaming Innovation Group

Matthew is an experienced Data Engineer having worked on several Big Data applications within various domains, including Gaming and FinTech. Being an AWS certified solutions architect, he is naturally highly comfortable working with AWS solutions, as well as Apache Spark, Apache Kafka... Read More →

Wednesday May 12, 2021 15:00 - 15:30 EDT
Room #7

Altinity Community Track

15:00 EDT

Running and Scaling MongoDB on Kubernetes

Percona is committed to deliver its software on various platforms and operating systems, including Kubernetes. Percona Kubernetes Operator for Percona Server for MongoDB allows do deploy, manage and easily scale MongoDB clusters on Kubernetes with ease. We are going to demonstrate how to do it, share some tips and tricks about managing MongoDB on Kubernetes.

Speakers

Sergey Pronin

Group Product Manager, Percona

Wednesday May 12, 2021 15:00 - 15:30 EDT
Room #3

Kubernetes, Cloud Technologies

15:00 EDT

Virtual Work and Leadership in the Time of Pandemics

Virtual work and leadership in the time of pandemics MariaDB Foundation has been a virtual organization long before the pandemic struck. That gave us a head start in finding ways to adapt team leadership and management to the new times. Team leadership and management are different in a virtual organization, and in this session, we will talk about best practices within development teams. Meeting practices, chat tools, emails, zoom meetings are all affected by the developers spending now also their private lives in isolation – not just their business lives. How do we create humane working conditions for developers and DBAs?

Speakers

Anna Widenius

Chief of Staff, MariaDB Foundation

Anna Widenius is a Chief of Sta in the MariaDB Foundation.

Wednesday May 12, 2021 15:00 - 15:30 EDT
Room #9

MariaDB Community Track

15:00 EDT

Why We Chose Trino. Choosing, Using, and Extending Trino (fka PrestoSQL) For a Primary Datastore

There are a lot of capture-first pipelines out there, which are very good at squirreling data away, but are relatively slow or cumbersome for queries. For example, Kafka and Pulsar are great for write performance, but horribly slow at scanning all the data in the queue.

For a query-first architecture, a different mindset and approach is required. You can’t build the whole data pipeline and then hope to tune the queries after the fact. Instead, you model the query behaviors you hope to achieve first, and then work backward to define ingestion and indexing requirements. Build it fast, keep it fast. Enter Trino, a distributed query engine that is an ideal starting point for a query-first data architecture like ours. To use it, we built a custom memory connector to use Trino as a primary memory store. An unusual, but fun, use for Trino.

We’ve developed Trino connectors that are optimized to work with local data, so that there is no network hop between the query engine and the data being computed. This gives a 5-20X improvement for our workloads compared with running against even the fastest remote datastores. I’ll walk through the discovery process to get to Trino, and how we built a new Trino connector.

Speakers

Rob Dickinson

CTO and Co-Founder, Resurface Labs

Wednesday May 12, 2021 15:00 - 15:30 EDT
Room #6

Other SQL, All Databases

15:00 EDT

Deploying a Sharded Vitess Sandbox Cluster in Public Cloud Kubernetes in 10 Minutes

Learning about Vitess ("A database clustering system for horizontal scaling of MySQL" - https://vitess.io) is straightforward enough, as is running the Get Started demo on your computer. But once you want to start scaling out a sandbox cluster, or want to run realistic benchmarks against your schema design (both of which are hard to do on a personal computer), setting up a full cluster in a pinch seems daunting... or is it?

I'm here to show you, with a live demo/tutorial, that deploying and evaluating a Vitess sandbox cluster, into a public cloud environment, can be done super easily. In fact, my aim is to bootstrap a fully functioning cluster within 10 minutes of starting the demo.

With the remaining demo time, I will demonstrate other Vitess operations, such as:
* Scaling up and down the cluster
* Increasing and decreasing the number of shards without losing data
* Configure zonal SSDs for MySQL
* Backup and restore (so you can shut down the cluster to save money or discard an experiment, then bring it back up again with the original data)
* Deploy the experimental Vitess orchestrator component
* Planned and unplanned failovers
* Automatic rolling upgrades, and controlled rolling upgrades
* Metrics, dashboards

Even with the best possible documentation (and the Vitess documentation is quite good!), getting a fully working cluster, experimenting with it, and getting everything configured the way you want can involve a bunch of trial and error. I hope that my demo can help you bypass some of the more boring trial-and-error, and get running more quickly with your Vitess evaluation.

For this demo, I will be using the excellent open-source Vitess-operator for Kubernetes, provided by PlanetScale. Even if you aren't considering deploying Vitess on Kubernetes in production, I still highly recommend it for sandbox use. Deploying an arbitrary number of components is super trivial with the operator, and everything auto-wires automatically. No need to delay your evaluation by needing to manually bootstrap a cluster one node at a time, or write your own deployment tools.

Speakers

Jordan Moldow

Staff Software Engineer, Box, Inc.

Jordan Moldow is a Staff Software Engineer on Box’s Database Tools and Automations team. After earning MIT BS degrees in CSE and mathematics in 2014, Jordan moved to California to join Box. Jordan and his teammates focus on backend database infrastructure, providing the tools, intermediate... Read More →

Wednesday May 12, 2021 15:00 - 16:00 EDT
Room #4

Kubernetes, Cloud Technologies

15:00 EDT

Should You Run Databases Natively in Kubernetes?

Kubernetes has hit a home run for stateless workloads, but can it do the same for stateful services such as distributed databases? Before we can answer that question, we need to understand the challenges of running stateful workloads on, well anything. In this talk, we will first look at which stateful workloads, specifically databases, are ideal for running inside Kubernetes. Secondly, we will explore the various concerns around running databases in Kubernetes for production environments, such as:

The production-readiness of Kubernetes for stateful workloads in general
The pros and cons of the various deployment architectures
How much performance may be lost when performing IO inside containers
The failure characteristics of a distributed database inside containers

In this session we will demonstrate what Kubernetes brings to the table for stateful workload and what database servers must provide to fit the Kubernetes model. This talk will also highlight some of the modern databases that take full advantage of Kubernetes and offer a peek into what's possible if stateful services can meet Kubernetes halfway. We will go into the details of deployment choices, how the different cloud-vendor managed container offerings differ in what they offer, as well as compare performance and failure characteristics of a Kubernetes-based deployment with an equivalent VM-based deployment.

Speakers

Karthik Ranganathan

CTO and Co-founder, Yugabyte

Karthik was one of the original database engineers at Facebook responsible for building distributed databases including Cassandra and HBase. He is an Apache HBase committer, and also an early contributor to Cassandra, before it was open-sourced by Facebook. He is currently the co-founder... Read More →

Wednesday May 12, 2021 15:00 - 16:00 EDT
Room #5

Kubernetes, Cloud Technologies

15:00 EDT

ARM Power! Comparing MySQL x86 vs ARM Performance

With the recent launch of Apple M1 chips and Amazon Graviton Processor, the discussion about ARM performance compared to x86 gained a lot of traction. Not only because it shows promising results in terms of performance, but when compared to the x86 instances, the costs are in general smaller.

In this talk we are going to discuss some scenarios where we compare the performance between instances that have the same cost and instances that have similar hardware capacities.

Lastly, we will check if the ARM MySQL ecosystem (backup tools, monitoring and others) is mature to support production workloads.

It is expected by the end of the session that DBAs, sysadmins and managers have a clearer idea about ARM capabilities compared to x86.

Speakers

Vinicius Grippa

Senior Support Engineer, Percona

Vinicius Grippa is a Percona Senior Support Engineer, Oracle Ace, and author of the book Learning MySQL. Vinicius has a Bachelor's degree in Computer Science and has been working with databases for 16 years. He has experience in designing databases for mission-critical applications... Read More →

Wednesday May 12, 2021 15:00 - 16:00 EDT
Room #1

MySQL, All Databases

15:30 EDT

MariaDB ColumnStore – A Columnar Storage Engine, First Class Citizen in MariaDB

MariaDB has had ColumnStore (a columnar storage engine) available for a while now. The problem was that ColumnStore (formerly known as InniDB) was coded in such a way that required a custom version of MariaDB to function. The installation was also non-trivial, with quite a set of dependencies needed.
After a significant amount of work, both within MariaDB's codebase and ColumnStore's codebase, it is now possible with MariaDB 10.5 to simply load the ColumnStore plugin and run CREATE TABLE ... ENGINE=ColumnStore.
In this talk we will do an overview of the state of ColumnStore in MariaDB, discuss use cases as well as cover some implementation details to better understand performance implications when using ColumnStore.

Speakers

Vicențiu Ciorbaru

Team Lead, Senior Developer, MariaDB Foundation

Wednesday May 12, 2021 15:30 - 16:00 EDT
Room #9

MariaDB Community Track

15:30 EDT

Optimizing and Troubleshooting MongoDB with PMM

In this presentation, we will show how you can utilize PMM (Percona Monitoring and Management) to monitor MongoDB and diagnose various issues that you can face running MongoDB whether stand alone or in a sharded cluster.
We will look at:

Identifying slow queries
Troubleshooting performance degradation
Looking for possible optimizations

After attending this presentation, you should be comfortable understanding how PMM can be used to work with MongoDB and can help in the daily lives of DBAs or developers in working with MongoDB

Speakers

Senior Database Engineer, Palantir technologies

A Red Hat Certified Architect that is also enjoying Windows and network administration and has an itch for Databases. Loves The Cloud. Currently working as an SRE for a team that is supporting a variety of datastores, mostly NoSQL. All round geek with a genuine passion for anything... Read More →

Antonios Giannopoulos

Senior Database Administrator, Rackspace Technology

Pedro Albuquerque

Staff Database Engineer, Wise (former TransferWise)

I have many years of working in various database technologies, which include relational and NoSQL platforms. I am currently focused on MariaDB, MongoDB and PostgreSQL datastores at Wise. Previously to Wise, I was focused on MongoDB at ObjectRocket by Rackspace, supporting customers... Read More →

Wednesday May 12, 2021 16:00 - 16:30 EDT
Room #2

Hybrid or Mixed Deployments, All Databases

16:00 EDT

Debug a Kubernetes Operator

The goal of this live debugging session is to better understand how to work with a failing Kubernetes Operator and get used to some helpful Kubernetes commands.

Each of the three examples follows the same structure:

* Apply an invalid YAML manifest.
* Figure out what is wrong and how to fix it.
* Hints that may help solve the problem.
* A detailed walkthrough to understand and solve the problem.

Speakers

Philipp Krenn

Developer Advocate, Elastic

Philipp lives to demo interesting technology. Having worked as a web, infrastructure, and database engineer for more than ten years, Philipp is now working as a developer advocate at Elastic â€” the company behind the open source Elastic Stack consisting of

Wednesday May 12, 2021 16:00 - 16:30 EDT
Room #3

Management & Backup, Development, Tools, or Utilities

16:00 EDT

Going the distance

In every DBA's life, there is a point where data needs to be copied over great distances. This can be the other coast, or it can be another continent. It can be because of implementing a disaster recovery system, it can be because distant read replicas are needed for the application.
In this talk, we use examples with MySQL and xtrabackup to discuss the issues of long-distance copies and the potential performance tuning opportunities. I
n this talk, we will:

Examine the characteristics of transferring a large amount of data over WAN links.
Discuss compression and encryption options.
Check out options for copying an already existing backup.
Check out options for streaming backups on the y.

Speakers

Peter Boros

Principal Architect, Percona

Percona Backup for MongoDB - Developer and User Joint Use Case Session

This session will take a look at what both sides of Percona’s backup solution for MongoDB has to offer - providing developer tool side info as well as specific end-user info to help you understand the workings of PBM as well as know the tips and tricks and common solutions and troubleshooting around this popular open-source tool for backing up MongoDB.

Percona Backup for MongoDB is an open-source distributed, low-impact solution for achieving consistent backups of MongoDB sharded clusters and replica sets. In part of this talk, we will take a look under the hood of PBM from the development side. We will also cover the architectural decisions and techniques that lie behind distributed backups and PITR. From the user side of PBM, there are still some general use questions ranging from the simple to the complex that we will cover relating to both the backup and the restore sides of the PBM tool. The user side goal is to explain how to set up the backup solutions and the restore situations for both replica sets and shared clusters. We will discuss some of the main problems our support customers encounter in their production environments and provide tips to help you navigate and avoid them proactively.

Speakers

Kimberly Wilkins

NoSQL Tech lead - MongoDB, Percona

Kimberly Wilkins is the MongoDB Technical Lead at Percona with 20+ years of experience managing and architecting databasesKimberly has been a Principal Engineer, a solutions architect, a manager, and a DBA. During those roles she has built out and managed expert database teams across... Read More →

Andrew Pogrenboi

Principal Software Engineer, Percona

Andrew Pogrebnoi is the primary developer on the Percona Engineering Team behind the current updates to PerconaBackup for MongoDB

Rafael Galinari

MongoDB Support Engineer, Percona

Rafa is a Support Engineer at Percona who has spent quite a bit of time learning about MongoDB as well as working with a wide variety of customer issues. Whenever MongoDB is not consuming his time, he likes to go to the beach and practice CrossFit. He also enjoys mountain biking... Read More →

Wednesday May 12, 2021 16:30 - 17:30 EDT
Room #5

MongoDB

16:30 EDT

Securing PostgreSQL From External Attack

This talk explores the ways attackers with no authorized database access can steal Postgres passwords, see database queries and results, and even intercept database sessions and return false data. Postgres supports features to eliminate all of these threats, but administrators must understand the attack vulnerabilities to protect against them. This talk covers all known Postgres external attack methods.

Speakers

Bruce Momjian

Vice President, Postgres Evangelist, EDB

Bruce Momjian is co-founder and core team member of the PostgreSQL Global Development Group, and has worked on PostgreSQL since 1996. He has been employed by EDB since 2006. He has spoken at many international open-source conferences and is the author of PostgreSQL: Introduction and... Read More →

Wednesday May 12, 2021 16:30 - 17:30 EDT
Room #2

PostgreSQL, All Databases

17:00 EDT

Flame Graphs for MySQL DBAs

Flame graphs are a visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately. They can be generated using Brendan Gregg's open source programs on github.com/brendangregg/FlameGraph, which create interactive SVG files to be checked in browser.

Different types of Flame Graphs (CPU, Off-CPU, Memory, Differential etc) are presented. Various tools and approaches to collect profile information of different aspects of MySQL or MariaDB server internal working are presented Several real-life use cases where Flame Graphs helped to understand and solve the problem are discussed.

Speakers

Information Security Architect, Percona LLC

David has been a Linux systems admin for more than 20 years, generally in different roles - development, network admin, support, DBA, and more.Contributor to the EPEL packages for Openstack.C.I.S.S.P and is the text book "tin foil hat" / "paranoid security guy".

Thursday May 13, 2021 07:00 - 07:30 EDT
Room #4

Other OSDB Topics

07:00 EDT

Top 10 Tips For MongoDB Performance

MongoDB is highly tuneable with many options for optimizing performance. However, the sheer quantity of tuning options can be overwhelming, and you can waste precious time unless you know which tuning activities are most likely to provide a return on your time investment. In this presentation, we’ll review ten of the fundamental MongoDB performance tuning practices and see how to use these in a systematic way to improve MongoDB performance.

Topics will include document design, workload and query optimization, use and misuse of transactions, configuring memory to avoid physical IO, disk IO optimization, and MongoDB cluster optimization.

The following subjects will be covered:

• Adopting a methodical tuning methodology
• MongoDB schema design
• MongoDB indexing
• Tuning tools included in the MongoDB core
• Tips for optimizing find() and aggregate() statements
• Tuning update, inserts and deletes
• Transaction performance management
• Memory Tuning
• Disk tuning
• Replica set tuning

Speakers

Guy Harrison

CTO, Southbank Software

Thursday May 13, 2021 07:00 - 08:00 EDT
Room #5

MongoDB, All Databases

07:00 EDT

Building Cost-Based Query Optimizers With Apache Calcite

Query optimization is one of the most challenging problems in database systems. For many years, creating a query optimizer was considered black art, available only to a limited number of companies and products.
Not any more. Apache Calcite is an open-source framework that allows you to build query engines, and query optimizers in particular, at a significantly lower engineering cost. In this talk, I will present query optimization capabilities of Apache Calcite, including cost-based and heuristic optimization drivers and an extensive library of optimization rules. I will also present several examples of production-grade optimizers based on Apache Calcite.

Speakers

Vladimir Ozerov

Co-founder, Querify Labs

Vladimir Ozerov is a co-founder of Querify Labs, where he manages the research and development of query engines for technology companies. Before that, Vladimir worked on distributed systems Apache Ignite and Hazelcast for more than eight years, focusing on distributed data processing... Read More →

Thursday May 13, 2021 07:00 - 08:00 EDT
Room #2

Other, Development, Tools, or Utilities

07:00 EDT

How Machine Learning Inside Databases Solves Significant Data-Science Challenges

Machine Learning inside databases is becoming a hot trend. Last time at Percona Live 2020, our team presented AI Tables - an open-source solution that enables automated machine learning capabilities inside databases. The main idea of AI Tables is to allow anyone who works with databases to implement ML projects in a matter of hours without requiring data science skills.

It is as simple as using SQL queries!

In the journey of bringing AI Tables to the community, we have discovered and solved Machine Learning problems that are hard even for ML engineers but are common for data inside databases.

For example:
Forecasting inventory for all products in all stores (**GROUP BY store, product_id**), given a table that contains all inventory updates over time (**ORDER BY time**).

This problem is complex even for experienced ML engineering teams. In a traditional ML approach, you would need to train one model for each product at each store, which can mean thousands or hundreds of thousands of models, not even thinking of the logistic nightmare to bring such many models to production.

Another example of a challenge solved is creating views that do **joins between data tables and ML models**. It significantly streamlines using machine learning inside BI tools to forecast data trends. Also, it opens broader possibilities for anomaly detection and much more!

We have made significant progress in solving those problems automatically through AI-Tables, and we would like to share with you our approach and discuss some interesting insights that we have made in the process.

**Agenda:**
- 5 min | Advantages of ML inside a database over the traditional approach
- 15 min | Machine learning workflows inside databases
- 15 min | Automated multivariate time-series forecasting
- 15 min | Joining tables with ML models
- 10 min | Q&A

Speakers

Jorge Torres

CEO, MindsDB

Jorge Torres is the Co-founder & CEO of MindsDB. He is also a visiting scholar at UC Berkeley researching machine learning automation and explainability. Before founding MindsDB, he worked for a number of data-intensive start-ups, most recently working with Aneesh Chopra (the first... Read More →

Patricio Cerda-Mardini

Machine Learning Research Engineer, MindsDB

Patricio Cerda-Mardini is a Machine Learning Research Engineer. As a masters student at PUC Chile, he focused on machine learning methods for human-robot interaction and recommendation systems, areas in which he holds a couple of academic publications. Prior to joining MindsDB, he... Read More →

Thursday May 13, 2021 07:00 - 08:00 EDT
Room #1

Other OSDB Topics

07:30 EDT

Open Source Databases and ARM

ARM is gaining a lot of traction, especially with High-Performance Computing Softwares.

Opensource Databases is no exception and most of the leading opensource databases are now available on ARM (MySQL, MariaDB, PostgreSQL, MongoDB, ClickHouse, etc...)

Let's explore the state of different open-source databases and their supporting ecosystems/tools, understanding the performance, functionality, active community, etc...

Whatever your use-case it is quite likely that it could be ported to ARM and this comes with a lot of advantages.

So let's unwind this completely new VERTICAL of running Opensource DBs on ARM.

Speakers

Krunal Bauskar

Engineer, Huawei

Krunal Bauskar has been actively working in the MySQL space for over a decade. He is currently driving the adoption of the ARM ecosystem for MySQL/MariaDB/Percona through his #mysqlonarm initiative working at Huawei. In the past he has worked on multiple MySQL projects viz. undo log... Read More →

Thursday May 13, 2021 07:30 - 08:00 EDT
Room #3

Hybrid or Mixed Deployments, All Databases

07:30 EDT

Setup and manage alerts for databases with Integrated alerting in Percona Monitoring and Management

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance and improve the security of your business-critical database environments, no matter where they are located or deployed.
In this talk, we will show how to set up integrated alerting (sending alerts to external channels) in PMM. In this session we will:

Show how the alerts are sent to the external channels.
Examine the architecture of the alerting system in PMM.
Dene a custom alert, and examine it showing up on the external channels as well.

Speakers

Peter Boros

Principal Architect, Percona

Zoriana Stefanyshyn

QA Analyst, Percona

Zoriana joined Percona 1 year ago as QA Analyst on the Percona Platform team. Her previous QA experience was in dierent domains - document management, automotive, car navigation systems, and SDK and now she is new to open-source but she is really inspired by the products of Percona. The... Read More →

Thursday May 13, 2021 07:30 - 08:00 EDT
Room #6

Monitoring

07:30 EDT

Build a Scale-Out Real-Time Data Warehouse for Analytics Within Seconds by Combining Apache Flink + TiDB

There is a growing demand for real-time data warehouses by data-driven companies to implement real-time Online Analytical Processing analytics, real-time data panels, and real-time application monitoring. However, the architecture of real-time data warehouses has long been thought complex and difficult to operate and maintain.

As an open source and distributed Hybrid Transactional/Analytical Processing (HTAP) database, TiDB can be used as a backbone storage for real-time data warehouse in multiple use: business dDataSsource, dimension table DdataSsource and the analytical database for summarized data. The combination of stream processing systems (e.g. Apache Flink) and TiDB could become an efficient, easy-to-use, real-time data warehouse that features horizontal scalability and high availability.

In this talk, Qi Zhi will deep dive into what a real-time data warehouse is, how TiDB powers real world real-time data warehouses and the patterns on combining streaming processing systems and TiDB.

Speakers

Zhi Qi

Realtime Analytics R & D Engineer, PingCAP

Zhi Qi is a software engineer at PingCAP, working on Real-time Analytics and BigData Ecosystem of TiDB. He gave a speech about Flink TiDB real-time data warehouse at Flink Forward Asia 2020.

Thursday May 13, 2021 07:30 - 08:00 EDT
Room #4

Other OSDB Topics

08:00 EDT

Introducing ProxyWeb - The Open Source Web Interface For ProxySQL

Introducing ProxyWeb the first Open Source ProxySQL Web User Interface. It had proven itself extremely useful during Edmodo's 25x traffic growth last march and now it's available under GPLv3. It can be installed as a docker container or as a system service in 10 seconds.

It has a responsive design, supports administering multiple ProxySQL servers, generating adhoc traffic reports, hiding unnecessary tables on a per-server basis and it comes with detailed documentation.

To make the evaluation easier it comes with a really extensive docker-compose based test environment that gives the user a fully working 'infrastructure' that consist of a MySQL cluster, ProxySQL, ProxyWeb, Orchestrator, Health Check, Sysbench.

The environment can be fully operated through a web browser after the initial start, which takes less than 45 seconds.

In the presentation the audience will be walked through the installation and the configuration of the ProxyWeb and the docker-compose based test environment will be used to set up a ProxySQL cluster from scratch. Once the setup is completed we will generate traffic with Sysbench and perform a failover with Orchestrator.

The codebase and the documentation can be accessed at http://proxyweb.org

Speakers

Miklos Szel

Senior MySQL Architect, Edmodo

Thursday May 13, 2021 08:00 - 08:30 EDT
Room #5

MySQL, All Databases

08:00 EDT

MySQL Server Component Manifest Files

MySQL configuration has traditionally been done via system variables with values coming from either command line, config files or SET commands. This can be a security issue since it doesn't support a trust model rooted in some well known trusted state that cannot be modified by less trusted actors.

This is what the manifest le security model is aiming at solving.
It roots server security into a well known and trusted source (the server's OS le permissions) and builds on top of it to allow secure configuration of components.

In this talk we will review how manifest files work and also check some of the early adopter components of the new secure configuration model.

Speakers

Georgi Kodinov

MySQL SrvGen team lead, Oracle MySQL

Georgi "Joro" Kodinov has been working on MySQL for more than 10 years. He's leading the Server General team that deals with security, performance monitoring and the mysql client server protocol. Before working on databases Joro was serving as an IT manager for a Bulgarian bank... Read More →

Thursday May 13, 2021 08:00 - 08:30 EDT
Room #7

MySQL Community Track

08:00 EDT

Overview of MySQL Server plugins and what is new in MySQL 8

Plugins are the piece of the software, which provides the additional services. MySQL has the plugins and it was matured a lot on MySQL 8. I am interested to talk all about the features of MySQL server plugins and how we can install and uninstall, how we can retrieve the Plugin information!

**My Agenda:**

1. What is the scope of plugin in MySQL?

- Will explain the role of plugins in MySQL

2. How to install/Uninstall and obtain the plugin information?

- Will explain about the plugin installation
- Will explain about the plugin uninstallation
- Will explain how to obtain the plugin information like (plugin directory, plugin is active or not, information_schema.plugins tables, SHOW PLUGINS command)"

3. Different type of MySQL plugins

- Query rewriter
- DDL rewriter ( MySQL 8 )
- Version token
- Clone plugin ( MySQL 8 )
- MySQL enterprise threadpool"

4. Plugin services:

- Locking services
- Keyring services "

5. Q/A

Speakers

Sri Sakthivel M.D.

Thursday May 13, 2021 08:00 - 08:30 EDT
Room #4

Other, Development, Tools, or Utilities

08:00 EDT

Default to Open: Steps and Traps

Sharing is caring. In an ideal world, everyone has bought into transparency and there is no problem with communication. Now you pick the right tools that let you collaborate and share information easily and go - you're open.

Unfortunately, the biggest challenge in technology is that people have differing opinions. Lenz and Sanja talk about their experiences from SUSE, Red Hat, and Percona.

From licensing to open sourcing newly acquired company products, from motivating engineering teams to both accept contributions and contribute back to other open source projects, from documentation to branding.

"Default to open" is a work style that is worth striving for, even if your product code is not open source. Let's talk about it.

Speakers

Sanja Bonic

Head of Open Source Programs Oce, Percona

Lenz Grimmer

Sr. Director, Server Engineering, Percona

Lenz Grimmer supports and leads the engineering teams at Percona that work on server products like Percona Server for MongoDB, MySQL, PostgreSQL and related components. He's been involved in Linux and Open Source technologies in various roles and capacities since the mid-90s and has... Read More →

Thursday May 13, 2021 08:00 - 08:30 EDT
Room #2

Other OSDB Topics

08:00 EDT

Native Chaos Engineering in Databases

Chaos Engineering is revolutionizing testing means and doing it the cloud-native way is the best way in today's rapidly changing world with a huge shift in the paradigm of Kubernetes resiliency. Karthik S, one of the maintainers for LitmusChaos would be introducing how to carry out Chaos Engineering, the cloud-native way. Further, he will touch upon how Chaos Engineering is carried out in Cloud-Native Databases with LitmusChaos. He will also touch upon observability considerations for chaos engineering and what hooks Litmus provides for the same.

Speakers

Karthik Satchitanand

Co-Founder & Software Architect, ChaosNative

Karthik Satchitanand is one of the maintainers of the CNCF sandbox project LitmusChaos. He is passionate about all thingsKubernetes, and is generally interested in DevOps, storage performance/benchmarking & chaos engineering.

Thursday May 13, 2021 08:00 - 09:00 EDT
Room #9

Data on Kubernetes Community Track

08:00 EDT

Performance Optimization - How to Get the Best Out of Your Indexes on Postgres and MySQL

During this talk we will discuss how the index works on Postgres and MySQL. What are the differences between the implementations and what are the more appropriate choices for different scenarios? We will discuss the general B+-tree indexes but also discuss GIN, GIST and understand where they are best suited with examples and understand why some database migrations are a failure due to differences in implementation or lack of a specific index.

Speakers

Charly Batista

PostgreSQL Tech Lead, Percona

Thursday May 13, 2021 08:00 - 09:00 EDT
Room #3

Hybrid or Mixed Deployments, All Databases

08:00 EDT

MySQL Backup Solutions in 2021

Backups are important! Everyone makes mistakes, bugs are easily overlooked, hardware will fail eventually. If you don't want to lose data when disaster strikes, your backups will be your savior. In this talk I will guide you through some of the most common backup techniques for MySQL that we use in 2021. I will explain the strengths and weaknesses of each solution and we'll go into detail about what the impact of each solution has on your recovery time objectives (RTO) and recovery point objectives (RPO). And we'll go into detail about how to achieve these objectives and to understand their impact on your environment.

Speakers

Matthias Crauwels

Principal Consultant, Pythian Services Inc

Since the age of 10 I’ve always been passionate about computers. I’ve been working with them ever since. In 2005 I got my degree in computer science. I used to work at a major Belgian university where I was developing e-learning applications. In that position, I was the one who... Read More →

Thursday May 13, 2021 08:00 - 09:00 EDT
Room #1

Management & Backup, Development, Tools, or Utilities

08:30 EDT

MariaDB High Availability in a Cocktail Mix with Envoy and Orchestrator

For a considerable set of critical applications at Wise (former TransferWise), database high availability is a must to ensure that we don't let our customers down.

At Wise, we offload mostly of our relational databases operational toil into AWS RDS managed services. However, for some use-specific cases which need different availability requirements, we run some clusters on EC2.

In this presentation, I will show how we implemented high availability for our MariaDB clusters running on EC2 with an integration with Envoy and Orchestrator in order to decrease failovers and maintenances from few minutes to just a few seconds.

Speakers

Pedro Albuquerque

Staff Database Engineer, Wise (former TransferWise)

Thursday May 13, 2021 08:30 - 09:00 EDT
Room #4

HA/Cluster, All Databases

08:30 EDT

Zoned Namespaces for the Next Era in Application Performance

Learn how ZNS SSDs (Zoned Namespaces) may be leveraged to help you achieve scalable MySQL™ performance for your next wave in growth, whether that's measured in end-user count, supported IoT devices, or data volumes.

ZNS SSDs give applications more direct control over physical data placement, bypassing the internal architecture of conventional SSDs, to achieve the next progression of application performance and scale for the digital transformation era.

User experience depends on application responsiveness which directly ties to MySQL performance and the latency of underlying storage resources. ZNS SSDs eliminate some bottlenecks of conventional SSD architecture and may deliver more predictable data response times and higher MySQL transaction rates for workloads that involve concurrent read and write operations.

Please join Percona CEO, Peter Zaitsev, and Wim De Wispelaere, Western Digital VP Corporate Strategic Initiatives for a 30- minute presentation for an overview of Zoned Namespaces technology and how we solve your next wave in application growth by using Percona Server® for MySQL with Ultrastar® ZNS SSDs. We’ll explain how ZNS zone block interface, the MyRocks pluggable ZenFS le system, and Linux® support will help you push the limits of database performance at scale

Speakers

Peter Zaitsev

CEO & Co-founder, Percona

CTO, TerminusDB

Dr Gavin Mendel-Gleason is CTO of TerminusDB. He is a former research fellow at Trinity College Dublin in the School of Statistics and Computer Science. His research focuses on databases, logic and verification in software engineering. His work includes contributing to the Seshat... Read More →

Thursday May 13, 2021 09:00 - 09:30 EDT
Room #9

Data on Kubernetes Community Track

09:00 EDT

MariaDB Notebooks in JupyterHub

The MariaDB Jupyter kernel project helps you use MariaDB from within the Jupyter notebook ecosystem.
You can display the results of your favourite queries in a notebook, plot result sets using %magic commands or export data from MariaDB to Python notebooks unleashing the full power of these technologies for data analytics.

This talk covers the current state of the MariaDB kernel, the existing features,
how to install and use it and demonstrates the simplest way to deploy JupyterHub in your organization so that people can use MariaDB in individual notebook workspaces using shared MariaDB Server deployments.

There is no background knowledge expected to understand the content of this talk, if you've ever used a Jupyter notebook, MariaDB or both or maybe you'd just love to hear about these technologies, you're more than welcome to attend.

Speakers

Robert Bindar

Server Developer, MariaDB Foundation

Thursday May 13, 2021 09:00 - 09:30 EDT
Room #2

IDE, Development, Tools, or Utilities

09:00 EDT

The Many Ways to Copy Your Database

Everyone needs to copy the data in their database for backups and to clone more database instances. This talk will describe and compare many ways to do this - everything from logical data dump and file copying through native cloning and backup tools to advanced scale-out techniques for large-scale copying. Whether you are using a cloud or servers in your data centres, this talk will tell you how to choose the best way to copy your database in every circumstance with performance comparisons and lessons from real-life experience.

Speakers

Nicolai Plum

Database Engineer, Booking.com

Nicolai Plum works in the Database Engineering team of Booking.com managing database product features and service design. His previous roles at Booking.com have ranged widely from Linux systems administration team lead through storage and systems architecture to regulatory compliance... Read More →

Thursday May 13, 2021 09:00 - 09:30 EDT
Room #3

MySQL, All Databases

09:00 EDT

Oracle MySQL Database Service with HeatWave for Real-Time Analytics

MySQL HeatWave - Extreme Performance, Cloud Scale, Significant Cost Savings.

Since 2020 the MySQL development team is offering a fully managed database service of the MySQL Enterprise Edition. Traditionally MySQL InnoDB is designed for online transaction processing (OLTP) load. MySQL can cover online analytics processing (OLAP) load, but it is often rather slow and tricky.
Beginning of December 2020, the MySQL team started a second MySQL Cloud offering called MySQL HeatWave for Real-time Analytics which is based on a new in-memory analytic accelerator which has been designed for extreme performance and cloud scale. This service provides a single, unified platform for both OLTP and OLAP workloads. It can scale to several hundreds of cores and provides around 400x speedup over MySQL for analytic workloads and enables scalable analysis over tens of terabytes of MySQL data. 2 of 4 Speakers Customers can now run all their OLTP and analytics workloads with MySQL without the need to move their data out of MySQL or without requiring any change to their application.
In this presentation we provide an overview about what is going on under the hood and will support our slides with a demo of the technology.

Speakers

Carsten Thalheimer

Senior Principal Cloud Solution Engineer, Oracle MySQL GBU

Thursday May 13, 2021 09:00 - 09:30 EDT
Room #7

MySQL Community Track

09:00 EDT

Production Grade ProxySQL in 2021

Widespread adoption of ProxySQL, the high performance, high availability, protocol-aware proxy for MySQL has lead to a plethora of different and highly innovative solutions for the inevitable scalability issues that MySQL DBAs run into with highly demanding workloads and the fast paced data growth of this modern era.

This talk aims to provide insights into real world MySQL scalability solutions implemented using ProxySQL by deep diving into the key areas to focus on when rolling out ProxySQL in production, examples on how to bulletproof the failover process and many other pertinent tuning recommendations.

- Where should I deploy ProxySQL?
- What hardware does ProxySQL need to run on?
- What does a typical production grade deployment look like?
- How should I design my ProxySQL query rules to ensure an efficient yet granularly controlled ruleset?
- What are the steps in planning a failover process?
- What should I do to achieve a transparent failover?
- What are the key variables I must tune on my production deployment?

Speakers

Rene Cannao

CEO, ProxySQL LLC

René has over 2 decades of experience as System, Network and Database Administrator, focusing mainly on MySQL. In this period he build an analytic and problem solving mindset and he is always eager to take on new challenges, especially if they are related to high performance. From... Read More →

Nick Vyzas

CTO, ProxySQL

Nick focuses on maximizing the scalability, availability, and performance of MySQL environments of all shapes and sizes with ProxySQL. Over the last 15 years his focus has been on MySQL database administration and open source software projects at various companies around the world... Read More →

Thursday May 13, 2021 09:00 - 10:00 EDT
Room #1

HA/Cluster, All Databases

09:00 EDT

Everything a DBA Should Know About Kubernetes

What is this Kubernetes thing? Why should you care? What happens to a database deployed on Kubernetes? Is it even possible?

Looking from the outside Kubernetes can be frightening, especially when stateful applications are concerned, but fear not! This session will take you through the steps of deploying an application on Kubernetes.

Speakers

Janos Pasztor

Senior Software Engineer, Red Hat

Founder and CTO at VictoriaMetrics, VictoriaMetrics

VictoriaMetrics founder and core developer. Go contributor and author of popular libraries fasthttp, fastcache, quicktemplate

Roma Novikov

Technical Director, Percona Monitoring and Management at Percona, Percona

Roma Novikov joined Percona at the beginning of 2017 as Director of Platform Engineering. He started programming in 6th grade and has more than 15 years commercial experience in web development. He previously worked as CTO of one of the biggest web development/web design e-commerce... Read More →

Thursday May 13, 2021 10:00 - 10:30 EDT
Room #5

Monitoring, Development, Tools, or Utilities

10:00 EDT

Igor Lukanin

Developer Advocate, Cube Dev

Igor is a developer advocate from the Cube.js team that provides developers with tools to build modern analytical applications. He's obsessed with data visualization and storytelling and feels equally comfortable writing SQL and ECMAScript.

Thursday May 13, 2021 10:00 - 11:00 EDT
Room #2

Other, Development, Tools, or Utilities

10:00 EDT

Evolution of Partitioning Features in PostgreSQL - A Super-Charged Elephant

The Partitioning feature in PostgreSQL is not something new. But it has matured over several years, release after release, especially the last 3 releases. The evolutionary nature (small changes in each version) is often overlooked by users. These small changes resulted in a build-up of a powerful Partitioning in PostgreSQL new versions. Now it is considered capable of even replacing the world's biggest database software in terms of market share even in Data Warehouses. Gradual evolution leads to no big announcements. So surprisingly very fewer users are still moved to native partitioning.

This talk gives a detailed look at
1. The birth of Native Partitioning and features.
2. How PostgreSQL 11 addressed some of the missing pieces.
3. How PostgreSQL 12 made a dramatic improvement in usability
4. 4. PostgreSQL 13 and ready to take on giants

This presentation is expected to create enthusiasm in the audience to drive towards one of the most powerful features in PostgreSQL.

Speakers

Jobin Augustine

PostgreSQL Escalation Specialist, Percona

Thursday May 13, 2021 10:00 - 11:00 EDT
Room #1

PostgreSQL, All Databases

10:30 EDT

Migration From 5.6 to 8.xx.xx

Intro

I start working in a Belgium company that use Mysql 5.6.
We use 6 production servers and we decide to migrate from this old version to mysql 8.0.19.
All the servers have Centos 7 system (so RPM and YUM available)

The first test was done by devops team, they use mysqldump to make this job, but as each server is about 4T the down time was over 30 hours and it's not acceptable for customer.
It's when I take this task and start reading docs.. I saw that there is the IN - PLACE upgrade that just change the binarys and catalog, but don't move any byte of the DB data.

I In TEST

First I connect to slack mysql Channel and I start talking with Fred. He give me a very nice documentation and advises to achieve this migration with a big % of probability to be fine.

I create a copy of the main DB in test ENV, and I apply the procedure with yum. The migration complete after 2 hours. This down time is acceptable.

II Issues found

The migration process is simple, but depending on what you have in your DB, you can face some stranges situations. For example, we use puppet in our servers, and puppet manage the /etc/my.cnf le, so in TEST I have the first migration CRASH because of puppet. We just comment puppet in the crontab to avoid this issue

Other issue was some warnings after the util checkForServerUpgrade

'NO_ZERO_DATE', 'NO_ZERO_IN_DATE'
The syntax 'expire-logs-days' is deprecated
character-set-server: 'utf8'

Warnings easy to resolve

III Timings

For 4T of data, the miration was about 2 hours for the whole process.
The upgrade catalog is the process that take more time.
Use screen or tmux for this step :)

IV Why mysql 8

We stay with Mysql DB because for us it's a very good product, the performences are fine and the new features created by the version 8.xx are amazing, this are the most important for us:

- The shell dump
I make some comparative tests and the same server instance can be dumped in 2 hours when mysqldump was over 9 hours to complete. And I check that the shell dump don't create locks.
The fact that this shell dump don't create locks is important because CLONE procedure use shell dump to create the replicas, and this performances and no locks allow to create replicas during the working period.

- Replicaset
I start working 15 years ago as Oracle DBA, and I use Oracle dataguard broker to manager switchovers for example. With Replicaset I found a very nice set of commands to manage Source and Replicas very easy. I use it to move old Centos 6 version servers to Centos 7 without downtime, just switching the Master role.

- Plugins
I was looking for easy ways to catch locks or bad queries or metadata stats from Mysql Catalog, the plugins are the answer. There are a lot of them already created and people can create their own plugin.

Speakers

Luis Dias

DBA, Oracle

Oracle dba last 15 years Mysql dba since.... the pandemic startup

Thursday May 13, 2021 10:30 - 11:00 EDT
Room #7

MySQL Community Track

10:30 EDT

Build your team, build your product, pay technical debt... REPEAT

Technology is moving faster than it used to be, teams and products need to adapt and react fast to the pace, technical debt is part of the journey.
I've spent 7 years building the engineering team at MedTrainer using talent in Mexico, our team is our strength, we are now paying the technical debt by leveraging knowledge from experts like Percona, and learning in the process. The talk will include some of the decisions and lessons learned in the process

Speakers

Mariano Rentería

Director of Engineering, MedTrainer

Johannes Schlüter

Software Engineering Manager, Oracle MySQL

Johannes Schlüter is a Software Engineering Manager in Oracle's MySQL Team. After development and management for different MySQL Connectors, he is now leading a new team, working on improving the MySQL experience in Cloud environments. Johannes is a long term Open Source contributor... Read More →

Kenny Gryp

MySQL Product Manager, Oracle MySQL

MySQL Product Manager focussing on InnoDB, Replication and all things High Availability.

Thursday May 13, 2021 11:00 - 12:00 EDT
Room #7

MySQL Community Track

11:00 EDT

Scaling Out Distributed Storage Fabric with RocksDB

Engineers at Nutanix have been working on the challenge of building a next-generation architecture for its distributed storage fabric. Scaling this architecture to the needs of the future required three primary objectives: significant improvements in sustained random write performance, support for large-capacity deep storage nodes for multi-petabyte scale and reducing storage latency by a significant magnitude.

These goals required re-imagining the core approach to how metadata is stored in the fabric management system and move the metadata closer to where is the data is stored.

After extensive research and testing, RocksDB was chosen as the core component for this project, based on its open-source pedigree and proven reliability and industry adoption. Within a few months, the engineering team was able to ramp up expertise, build confidence with the open-source technology and eventually grow its adoption into several core products at Nutanix.

In this technical talk, we will share the new architecture, deployment mode and some of the early lessons learned in adopting RocksDB and discuss some innovative enhancements we were able to make to fit our performance goals and objectives.

One of the significant improvements has been the addition of async read/write support to RocksDB. Currently, the open source RocksDB exposes blocking I/O APIs which can limit overall system throughput under resource constraints. We developed a Fibers/Co-routine based non-blocking I/O solution for RocksDB.

In addition to this, we plan to talk about topics and projects that have been built on this enhanced RocksDB implementation.
These projects will become the foundation for the Nutanix future products.

Speakers

Yasaswi Kishore

Senior member of Technical Staff, Nutanix

Yasaswi is a senior member of technical staff in the metadata subsystem for Nutanix distributed filesystem. Prior to Nutanix, Yasaswi completed his undergraduate program in Computer Science at PES University, Bangalore, India.

Sandeep Madanala

Nutanix

Sandeep is a Senior technical manager in the metadata subsystem for Nutanix distributed filesystem. He leads and manages the ChakrDB team, a scale out KV Store built on top of RocksDB. Prior to Nutanix, Sandeep worked at VMWare and graduated from Indian Institute of Technology, M... Read More →

Raghav Tulshibagwale

Staff Engineer, Core Data Path, Nutanix Inc.

Raghav is a Staff engineer and technical lead in the metadata subsystem for Nutanix distributed filesystem. Prior to Nutanix, Raghav worked on Database Kernels and filesystems. Raghav completed his Masters in Computer Science from University of Southern California, Los Angeles.

Pulkit Kapoor

MTS, Core Data Path, Nutanix, Inc.

Alexander Rubin

Sr. Database Engineer, AWS

Hemant Bhanawat

Engineering, Yugabyte

Eno Compton

Developer Relations Engineer, Google

Eno is a Developer Relations Engineer at Google working on Cloud SQL. He is one of the maintainers of the Cloud SQL Auth proxy. He is also a total language nerd with a Ph.D. in Classical Chinese and Japanese and a decade of experience in nearly a dozen computer languages. Nowadays... Read More →

Thursday May 13, 2021 13:00 - 13:30 EDT
Room #8

Google Community Track

13:00 EDT

Optimizing and Troubleshooting PostgreSQL with PMM

Optimizing and Troubleshooting PostgreSQL with PMM. In this presentation, we will show how you can utilize PMM (Percona Monitoring and Management) to eciently monitor PostgreSQL systems, and diagnose various issues that you can face running PostgreSQL.
We will look at, but not only:

Identifying pending queries
Troubleshooting performance degradation
Looking for possible optimizations

After attending this presentation, you should be comfortable understanding how PMM can be used to work with PostgreSQL and help in the daily lives of DBAs or anyone dealing with that database.

Speakers

Agustin Gallego

Senior Support Engineer, Percona

Sergey Kuzmichev

Support Engineer, Percona

Thursday May 13, 2021 13:00 - 13:30 EDT
Room #1

Monitoring

13:00 EDT

Successfully Run Your MySQL NDB Cluster in Kubernetes

Fortunately, MySQL NDB Cluster already has auto-healing, data distribution, instant scaling and many other features built-in - making it a perfect t for Cloud Native. This session walks through the few steps necessary to deploy a distributed NDB setup in a Kubernetes cluster manually or with an operator.

NDB runs in Kubernetes serving mission critical microservices at the heart of Cloud Native production systems. The experience from these adventures mix with knowledge gained from building an NDB operator from scratch. Boiled down to a few tips and tricks are hopefully helpful to guide around the usual traps running NDB or any database in Kubernetes.

Speakers

Tiago Alves

Oracle MySQL

Tiago is a software engineering manager for MySQL NDB Cluster. Tiago is passionate about software engineering with focus on software quality.

Thursday May 13, 2021 13:00 - 13:30 EDT
Room #7

MySQL Community Track

13:00 EDT

GraphJin - The Automagical GraphQL to SQL Compiler

In 2015, Facebook introduced GraphQL, a front-end query framework designed to shield users from the intricacies of the various backends one would find in modern data stacks. As with ORMs before then, most backend code ended up being databases, adding layers of abstraction that ended up being inefficient and required immense investments to scale out for performance.

Additionally, most app developers are not very familiar with SQL and go to great lengths to avoid learning it. This has created several problems like n+1 queries, inefficient queries, minimal use of database features, etc. While Postgres is growing in popularity with this audience the large majority of advanced features like JSON support, Recursive CTE’s, Window functions are never used. Often developers are unfamiliar with even simple features like the various types of JOINS and choose inefficient solutions like multiple queries instead.

GraphJin, an open-source project, was developed to solve this disconnect by putting all the power in the hands of the UI/UX developer and freeing up the backend developer to focus on the truly hard problems and optimize the queries to take full advantage of the advanced features of Postgres.

GraphJin is a compiler written in Go that can convert the GraphQL describing the data needed into a single efficient SQL query optimized for Postgres or MySQL. It discovers the schema and relationship graph of the database to help it build efficient queries and provide the frontend developer an auto-complete enabled GraphQL query builder to quickly fetch the data needed.

https://github.com/dosco/graphjin

Speakers

Vikram Rangnekar

Founder 42papers.com, 42papers.com

Vikram Rangnekar grew up in Bombay, studied computer science at the University of Delaware. He founded Socialwok a Techcrunch50 startup that was early in the enterprise collaboration space. This led him to Linkedin early in 2010 where he worked on various things from the API platform... Read More →

Thursday May 13, 2021 13:00 - 13:30 EDT
Room #4

Other, Development, Tools, or Utilities

13:00 EDT

Introducing Transit Nodes: A Sparse Data Structure for Recording (Sharding) Denormalizations

At Box, we have a fairly uncommon combination of business requirements that, when taken together, means that our relational data access layer must implement cross-shard move operations and orchestration. These moves can be large, and often need to be split across multiple asynchronous transactions. In the middle of this asynchronous orchestration, objects that would ordinarily live on the same shard, may be split across two shards. Our mapping database must faithfully record where each object currently resides, as well as the intended destination.

Viewed more generally, we have a system described by the following:
* A sharded data store;
* With a tree of relationships between object types that can be traversed upwards and downwards;
* With denormalized data that is propagated through the graph (in our case, the target shard id);
* Where the denormalized data is mutable, and might need to be updated in response to a move operation higher up in the tree;
* Where the application needs to control when and how the denormalized data is updated;
* And the application does not need to use the denormalized data in a relational fashion (it doesn't need to be indexed, used in a WHERE clause, etc.)

We recently finished developing and deploying an enhancement to our mapping system, to be able to store the denormalize data in a sparse data structure, with high read performance. When moves are not in progress, no additional data storage is needed besides the graph itself, and reads on the denormalized data are made efficient via caching. When moves are in progress, "transit node" rows are inserted into the mapping database in order to precisely record the new state of objects that have moved already, but while retaining the state of the objects that haven't moved yet. After the moves, the transit node rows can be garbage collected.

The transit node concept was carefully designed with a number of invariants, which make it very safe to cache values without worrying about cache corruption or cache invalidation. We designed the concept for ourselves to store shard IDs, but can theoretically be used for other kinds of denormalizations that match the above generalization.

We will briefly cover the context of sharding at Box, to provide the motivation for the transit node concept. The rest of the talk will present the semantics, invariants, and behaviors of transit nodes, and some results from our deployment. My hope is that the concept can be more broadly useful beyond what we originally designed it for.

Speakers

Jordan Moldow

Staff Software Engineer, Box, Inc.

Thursday May 13, 2021 13:00 - 13:30 EDT
Room #3

Other, Development, Tools, or Utilities

13:00 EDT

Introduction to Presto: The SQL Engine for Data Platform Teams

Presto is an open source high performance, distributed SQL query engine. Born at Facebook in 2012, Presto was built to run interactive queries on large Hadoop-based clusters. Today it has grown to support many users and use cases including ad hoc query, data lake analytics, and federated querying. In this session, we will give an overview on Presto including architecture and how it works, the problems it solves, and most common use cases. We'll also share the latest innovation in the project as well as what's on the roadmap.

Speakers

Tim Meehan

Chair of Presto Technical Steering Committee & Software Engineer, Meta

Tim is a Software Engineer at Meta working the core Presto engine. He is also the Chairperson of the Technical Steering Committee of Presto Foundation that hosts Presto under the Linux Foundation. As the chair and a Presto committer, he is works with other foundation members to drive... Read More →

Dipti Borkar

Cofounder & Chief Product Officer, Ahana

Dipti Borkar is the Cofounder, Chief Product Officer & Chief Evangelist at Ahana, the Presto company. She is responsible for all things strategy, product and community. She is also the Chairperson of the Presto Foundation, Outreach team. She has over 15 years of experience in data... Read More →

Thursday May 13, 2021 13:00 - 13:30 EDT
Room #10

Presto Community Track

13:30 EDT

A QLDB Cheatsheet for MySQL Users

Amazon's new ledger database (QLDB) is an auditor's best friend and lives up to the stated description of "Amazon QLDB can be used to track each and every application data change and maintains a complete and verifiable history of changes over time."

This presentation will go over what was done to take a MySQL application that provided auditing activity changes for key data, and how it is being migrated to QLDB.

While QLDB does use a SQL-format for DML, and you can perform the traditional INSERT/UPDATE/DELETE/SELECT. The ability to extend these statements to manipulate Amazon Ion data (a superset of JSON) gives you improved data manipulation and for example the FROM SQL statement.

Get a blow by blow comparison of MySQL structures (multiple tables and lots of columns) and SQL converted into a single QLDB table, with immutable, and cryptographically verifiable transaction log. No more triggers, duplicated tables, extra auditing for abuse of binary log activity.

We also cover the simplicity of using X Protocol and JSON output for data migration, and the complexity of AWS RDS not supporting X Protocol.

Speakers

Ronald Bradford

Lead Database Engineer/Architect, Lifion by ADP

Thursday May 13, 2021 13:30 - 14:00 EDT
Room #6

Amazon, Cloud Technologies

13:30 EDT

Demystifying Database Performance Issues With SQL Commenter and Cloud SQL Insights

Have you ever tried to troubleshoot a database performance issue in an application that was built using an ORM? ORMs can simplify development of applications that communicate with databases, but since the ORMs are generating the SQL statements, it can be difficult to determine which application code is resulting in slow queries.

SQL Commenter is an open source library that enables ORMs to augment SQL statements with comments about the code that caused its execution, making it easier to correlate your application code with the SQL statements that were generated by the ORM.

In this session, we will demonstrate how to set up and use SQL Commenter with an application that uses Sequelize.js to diagnose query performance. We'll also touch on the other frameworks and ORMs that sqlcommenter supports as well as how you can view this data in db logs and observability tools, including Cloud SQL Insights.

Speakers

Jan Kleinert

Developer Advocate, Google

Jan Kleinert leads a team of Developer Advocates as part of Google Cloud, focusing on Compute and Databases. Prior to joining Google, she worked in a variety of roles ranging from developer relations to web analytics and conversion optimization.

Thursday May 13, 2021 13:30 - 14:00 EDT
Room #8

Google Community Track

13:30 EDT

Percona Monitoring and Management Customization for Greater Visibility

PMM (Percona Monitoring and Management) delivers out-of-the-box on a rich set of MySQL, PostgreSQL, MongoDB, ProxySQL, and HAProxy service metrics, along with OS resource metrics, providing deep visibility for the thousands of PMM installations worldwide. Did you know that PMM can be enhanced to also integrate metrics about your Data? Once you teach PMM about your Data, you'll be able to bring Alerting and other customizations to your environment - come to this talk in order to learn how to:

Leverage custom queries in MySQL in order to generate new graphs of your data
Compose and interact with PMM's Integrated Alerting
Importing and designing custom dashboards - build rich visualizations that map your data alongside your system performance
Developing Alerts based on query performance variations
How you can contribute to PMM

Speakers

Michael Coburn

Principal Architect, Percona

Michael joined Percona as a Consultant in 2012 after having worked with high volume stock photography websites and email service provider platforms. With a foundation in Systems Administration, Michael acted as Product Manager responsible for Percona Monitoring and Management (PMM... Read More →

Thursday May 13, 2021 13:30 - 14:00 EDT
Room #1

Monitoring

13:30 EDT

Crave for Speed? Accelerating Open-Source Project Builds

Joining and contributing to an open-source project often involves a significant effort and learning curve and can often end up as a challenging experience.

Helping newcomers who may be junior or experienced developers get up to speed quickly with minimal changes to their local development environment is a desirable path and can help bring in a new generation of developers and hobbyists to sustain and grow the open source communities.

Crave Cloud is a free service that allows developers to submit pull requests to their favorite open-source projects, and receive a private build for testing and download in a fraction of the time it usually takes to build the entire project. All with minimal setup and changes to your local development environment and without taking up the precious cycles of your local machine. Using the elastic capacity of the cloud, Crave can automatically submit your changes, accelerate the build by 6-10x and return a private binary for your testing.

With Crave OSS, we hope to encourage developers to join open source projects, learn by tinkering with code and effortlessly submit changes without making changes to the core code. Join us for a fast-paced introduction to Crave Developer Cloud and get access to a free cloud sponsored by EquinixMetal, Nutanix to support CNCF projects.

Speakers

Yuvraaj Kelkar

CEO/Co-founder, Crave.io

Yuvraaj is a systems engineer who wrote code in his previous jobs that took hours to build and test. He then cofounded Crave.io to improve developer productivity using a remote task execution platform called Crave that reduces the time required to clone, build and test code.

Mehboob Alam

Sr. Solutions Architect, Nutanix, Inc.

Thursday May 13, 2021 13:30 - 14:00 EDT
Room #2

Other, Development, Tools, or Utilities

13:30 EDT

PrestoDB Administration Fundamentals – Why, What and How

The session will discuss how to set up, run, and scale Presto at your organization. You will learn about configuring data sources, memory requirements and monitoring, in addition to how to access the metadata information and get run time information by queries as well as through the admin UI. In case you need to see the live plan, there will be methods shown and explained. We'll also cover some of the tuning mechanisms for Presto. By the end of this session you'll be able to handle PrestoDB in large deployments.

Speakers

Ravi Shankar

CEO, PassionBytes

Ravishankar Nair, currently is the CEO of PassionBytes, a Florida based information technology company providing consultancy to many fortune 500 companies around the globe in the field of distributed computing, big data and cloud services. Ravi works heavily on big data eco-system... Read More →

Thursday May 13, 2021 13:30 - 14:00 EDT
Room #10

Presto Community Track

13:30 EDT

HammerDB: A Better Way to Benchmark Your Open Source Database

HammerDB is the leading open source database benchmarking software for commercial and open source databases. Hosted by the industry-standard benchmarking body the TPC, HammerDB supports workloads derived from the transactional TPC-C and Analytic TPC-H benchmark specifications.
In this session the lead developer of HammerDB will explain what it does, how it works and how it has been designed to avoid the pitfalls so common to other database benchmarking software to deliver high performance and scalability.
Using PostgreSQL and MySQL we will walk through practical transactional benchmarking scenarios looking at the operating system and database configuration tuning and analysis giving insights into benchmarking skills that can be deployed in your own environment.
Finally, we will look at where HammerDB is going with future development and features planned for 2021 and beyond and how you can get involved in the HammerDB community to help make comparing and contrasting database performance open to all.

Speakers

Steve Shaw

open source database lead, Intel

Shachar Guz

Product Manager, Google

Shachar is a product manager at Google Cloud, he works on the Cloud Database Migration Service. Shachar worked in various product and engineering roles and shares a true passion about data and helping customers get the most out of their data. Shachar was formerly a product manager... Read More →

Thursday May 13, 2021 14:00 - 14:30 EDT
Room #8

Google Community Track

14:00 EDT

Validating JSON

JSON or JavaScript Object Notation has become the data interchange format of choice. Most relational databases have added a JSON data type (Oracle, Postgresql, MySQL) or some accommodation for JSON data (SQL Server, MariaDB). But the free form nature of JSON is problematic for relational databases resulting in compromises in speed, handling of key-value pairs, and general lack of the ability to validate data. RDMS have had the ability to check for missing values, data type checks, and range checks but that is lacking in the JSON sphere. However, JSON-Schema.org has developed a vocabulary to annotate and validate JSON documents to describe your data formats, documents your implementation, and provides a way to validate data to allow both automatic testing and assuring data quality.

The work of JSON-Schema.org is heading towards RFC status and could very well remove many of the objections to JSON data use in a relational system. We will look at whom in starting to use their methods and the progress in standardization.

Speakers

Dave Stokes

MySQL Community Manager, Oracle

Dave Stokes is a MySQL Community Manager for Oracle Corporation and travels extensively to promote MySQL, speaking over thirty times each year for the past several years. He is also the author of MySQL & JSON - A Practical Programming Guide which is a guide for those wishing to take... Read More →

Thursday May 13, 2021 14:00 - 14:30 EDT
Room #7

MySQL Community Track

14:00 EDT

Databases: The Anchor in Your CI/CD Process

DevOps is about improving processes to develop and deliver quality software with both speed and stability. At face value, it's a simple concept. However, fear of instability and the desire to control databases are preventing many organizations from updating their process.

The fear of changing processes and automating is understandable. Databases have never been more important because data has never been more important. Every disaster nightmare a DBA, compliance officer, or PR team can imagine is wrapped around ensuring the database is safe.

In this talk, Kristyl Gomes and Robert Reeves will demonstrate why implementing standardization and automation is necessary to achieve what every team wants—speed and stability, with control.

Speakers

Kristyl Gomes

Director of Quality Engineering, Liquibase

Kristyl has over 15 years of experience in software quality assurance that spans mainframe, desktop, mobile & web applications. At Liquibase, Kristyl is responsible for ensuring the technical quality of all Liquibase products. Kristyl holds a BE degree in Electronics Engineering from... Read More →

Robert Reeves

CDF TOC Member & CTO, Liquibase, CDF & Liquibase

Robert is as passionate about open source and helping developers as he is about punk rock and comic books. His experience includes application delivery from all the stacks to all the platforms and making certain your job never tramples a family event. He has turned his focus on the... Read More →

Thursday May 13, 2021 14:00 - 14:30 EDT
Room #1

Other, Development, Tools, or Utilities

14:00 EDT

GraphQL as Analytical Language for Data Warehouses

GraphQL is a perfect language to query OLAP databases and make BI analytics on top of data warehouses (DWH). We at Bitquery built API based on GraphQL, allowing users to easily query DWH without knowledge on underlying low-level things like servers, databases, cubes and metrics.

We will share our approach, experience, tools that we used, pro and cons of this approach. Our experience will be useful for the developers and users of OLAP, DWH and BI solutions.

Speakers

Aleksey Studnev

CTO, Bitquery LLC

Aleksey is CTO and founder of Bitquery LLC. Before he tool chief architect and founder of successfull start-ups in AdTech industry, focused on data analytics and optimisation. Aleksey is passionate about applying mathematical approaches in the software development

Thursday May 13, 2021 14:00 - 14:30 EDT
Room #6

Other, Development, Tools, or Utilities

14:00 EDT

5 Ways Facebook’s Ludicrous Usage Drives Presto Innovation

Presto at Facebook has evolved significantly since its inception in 2012. In this session, Ariel will discuss this evolution including fundamental architectural improvements on scale and efficiency, the business cases that drive improvements like these, how the Presto use cases at Facebook have grown (and what they are), and how this all is balanced with features that the Presto community looks for. You'll learn how a company like Facebook thinks about Presto, how it can be used, and where it's going.

Speakers

Ariel Weisberg

Software Engineer, Facebook

Shashank Agarwal

Database Migrations Engineer, Google LLC

Christoph Bussler

Solutions Architect, Google

Chris was always fascinated by systems and data integration between on-premises systems, clouds, and their combination. As a Solutions Architect at Google Cloud (Google, Inc.) he is focusing on databases, data migration, multi-cloud database deployments, and data integration in enterprise... Read More →

Thursday May 13, 2021 14:30 - 15:00 EDT
Room #8

Google Community Track

14:30 EDT

Building A Customer Journey Using Domain Driven Design and GraphQL

Customers expect to have nuanced journeys in their interaction with several aspects of sales, including ordering, shipping and payments. For example, a customer may want to order using a voice channel, send a pinned location on a map as a delivery location and rely on self-service for returns and payments. These ordering journeys are characterized by a reliance on a mesh of API-driven apps. With Dgraph, you get out-of-the-box support for GraphQL APIs.

Another unique aspect of these journeys is the iterative style of development involved. Developers rely directly on feedback from active users, and constantly update the coding artifacts involved. A major impediment to rapid iterations is the time taken by developers to implement changes made to the data model. Developers tend to make changes to the database, and then refactor the API to accommodate the changes across the Create, Read, Update and Delete (CRUD) actions. In this talk, you will learn modeling techniques using GraphQL and Dgraph that support these rapid iteration needs.

Speakers

Anand Chandrashekar

Principal Engineer, Dgraph Labs

Running Presto on AWS With Ahana Cloud

Presto, the fast-growing open source SQL query engine, disaggregates storage and compute and leverages all data within an organization for data-driven decision making. It is driving the rise of Amazon S3-based data lakes and on-demand cloud computing. In this session you'll learn how Ahana Cloud, the only managed service for Presto, simplifies Presto deployment & management on AWS using Kubernetes so data platforms teams of any size can use it.

Speakers

Gary Stafford

Solutions Architect, AWS

Gary is a solutions architect at AWS where he works with some of the world's largest Enterprise customers to understand their business drivers, assess application portfolios, and design reliable and cost-effective cloud native architectures. Previously he was an enterprise architect... Read More →

Dipti Borkar

Cofounder & Chief Product Officer, Ahana

Thursday May 13, 2021 14:30 - 15:30 EDT
Room #10

Presto Community Track

15:00 EDT

Docstore - Uber’s Highly Scalable Distributed SQL Database

Uber had 93 million monthly active platform consumers in Q4 2020 and there were more than 5 billion trips on our platform in 2020 alone. No wonder we have to deal with a massive volume of data. The real-time nature of the Uber platform also imposes certain restrictions related to availability and consistency.

This is exactly why we built Docstore. Docstore is a general-purpose multi-model database that provides a strict serializability consistency model on a partition level and can scale horizontally to serve high volume workloads. It is currently in production and is serving business-critical use cases.

In this session we will be doing an in-depth study of the architecture of Docstore.

Speakers

Ovais Tariq

Senior Manager, Uber Technologies

Ovais is a Sr. Manager in the Core Storage team at Uber. He leads the Operational Storage Platform group with a focus on providing a world-class platform that powers all the critical business functions and lines of business at Uber. The platform serves tens of millions of QPS with... Read More →

Himank Chaudhary

Staff Software Engineer, Uber Technologies

Himank is the Tech Lead of Docstore at Uber. His primary focus area is building distributed databases that scale along with Uber's hyper-growth. Prior to Uber, he worked at Yahoo in the mail backend team to build a metadata store. Himank holds a master's degree in Computer Science... Read More →

Thursday May 13, 2021 15:00 - 15:30 EDT
Room #5

MySQL, All Databases

15:00 EDT

Monitoring Hundreds of RDS PostgreSQL Instances with PMM: The Rappi Case

The popularity of DBaaS cannot be denied. They are incredibly helpful for growth not only due to the operational tasks assistance but also for monitoring and visibility of database internals. In the case of RDS, Amazon provides pretty cool features other than the well-known CloudWatch: things like Enhanced Monitoring or Performance Insights are fantastic....but they came with a (unusually high) cost.

Enter PMM: The Percona Monitoring and Managing tool. PMM being highly customizable and based on well-known open source tools, appears as a great alternative, especially for DBA teams that require deep understanding of what is going on inside the databases.

However, PMM also require a considerable amount of time and effort to have it the way we wanted, especially for PostgreSQL.

Our journey involves work on several aspects like:
- PMM server capacity
- Limitations due to being at a DBaaS
- Several dashboard customization
- Additional data sources via custom queries and textfile-collectors
- Grafana tune
- Prometheus magic
- And some hacking...

Speakers

Daniel Guzman Burgos

Performance & Scalability DBA, Rappi Inc.

Daniel studied Electronic Engineering, but quickly becomes interested in all data things. He has worked as a DBA since 2007 for several companies including a 7 years journey at Percona as the MySQL Tech Lead for the Managed Services department. He is currently a member of the Performance... Read More →

Rodrigo Cadaval

Database Engineer Lead, Rappi Inc.

Rodrigo studies Information Systems Engineering. Started working in 2014 as Full Stack Developer (PHP - Laravel) and data was always his main focus and interest, reaching the point of replacing multiple backend processes with stored procedures. In 2016 he becomes PostgreSQL DBA, incorporating... Read More →

Thursday May 13, 2021 15:00 - 16:00 EDT
Room #4

Monitoring, Development, Tools, or Utilities

15:00 EDT

OtterTune: Using Machine Learning to Automatically Optimize Database Configurations

Database management systems (DBMS) expose dozens of configurable knobs that control their runtime behavior. Setting these knobs correctly for an application's workload can improve the performance and efficiency of the DBMS. But such tuning requires considerable efforts from experienced administrators, which is not scalable for large DBMS fleets. This problem has led to research on using machine learning (ML) to devise strategies to optimize DBMS knobs for any application automatically. The OtterTune database tuning service from Carnegie Mellon uses ML to generate and install optimized DBMS configurations. OtterTune observes the DBMS's workload through its metrics and then trains recommendation models that select better knob values. It then reuses these models to tune other DBMSs more quickly.

In this talk, I will present an overview of OtterTune and discuss the challenges one must overcome to deploy an ML-based service for DBMSs. I will also highlight the insights we learned from real-world installations of OtterTune to tune MySQL, PostgreSQL, and Oracle.

Speakers

Andy Pavlo

Associate Professor (CMU), Co-Founder (OtterTune), Carnegie Mellon University

[Andy Pavlo](http://www.cs.cmu.edu/~pavlo/) is an Associate Professor of Databaseology in the Computer Science Department at Carnegie Mellon University. He is also the co-founder of [OtterTune](https://ottertune.com).

Thursday May 13, 2021 15:00 - 16:00 EDT
Room #1

Other, Development, Tools, or Utilities

15:00 EDT

Prepping Kubernetes for Stateful Workloads Pt.1

Data and Kubernetes have historically had an oil and water relationship. Keeping data alive in an ecosystem where everything is ephemeral is a messy proposition at best. Many solutions opt to simply use external services for anything that needs to survive past the typical life cycle of a pod.

In this 2-part hands-on session, I will cover an open-source solution and some best practices to get whatever stateful workloads you have up and running and keep them around long after the pods of today are a distant memory.

Speakers

Eric Zietlow

Director of Developer Relations, MayaData

Eric has been everything from a full-stack developer to a distributed systemssolutions architect. He takes his varied experience into his current role in developerrelations at MayaData and as an ambassador for the Data on Kubernetes Community.

Thursday May 13, 2021 15:00 - 16:00 EDT
Room #2

Other OSDB Topics

15:30 EDT

Efficiently Deploying PostgreSQL Instances

In this talk we will review how to deploy PostgreSQL environments, to be able to have any version of PostgreSQL running within minutes... or even seconds! We will show you what cool tools we use in the Percona Support team to efficiently deploy from standalone servers to more complex replication and HA topologies. After attending, you will have all the knowledge you need to start testing your applications against fully functional PostgreSQL instances... fast!

Speakers

Agustin Gallego

Senior Support Engineer, Percona

Thursday May 13, 2021 15:30 - 16:00 EDT
Room #6

Other, Development, Tools, or Utilities

15:30 EDT

Presto and Apache Iceberg

Apache Iceberg is an open table format for huge analytic datasets. At Twitter, engineers are working on the Presto-Iceberg connector, aiming to bring high-performance data analytics on Iceberg to the Presto ecosystem. In this session, Chunxu will share what they have learned during the development and the future work of interactive queries.

Speakers

Chunxu Tang

Senior Software Engineer, Twitter

Chunxu is a senior software engineer in Twitter's Interactive Query team where he works on developing and maintaining Presto, Druid, and graph analytics services. He received his doctoral degree from Syracuse University, where he did research on machine learning and distributed collaboration... Read More →

Thursday May 13, 2021 15:30 - 16:00 EDT
Room #10

Presto Community Track

15:30 EDT

MySQL Architectures in a Nutshell

Following MySQL InnoDB Cluster as our first, fully integrated MySQL High Availability solution based on Group Replication, MySQL Shell 8.0.19 includes MySQL InnoDB ReplicaSet which delivers another complete solution, this time based on MySQL Replication.

The basic idea for InnoDB ReplicaSet is to do the same for classic MySQL Replication as InnoDB Cluster did for Group Replication. We take a strong technology that is very powerful but can be complex, and provide an easy-to-use AdminAPI for it in the MySQL Shell.

In just a few easy to use Shell commands, a MySQL Replication database architecture can be configured from scratch including:

Data provisioning using MySQL CLONE
Setting up replication
Performing manual switchover/failover.

and we keep improving, join the session to discover the last developments related to MySQL Database Architectures.

Speakers

Kenny Gryp

MySQL Product Manager, Oracle MySQL

MySQL Product Manager focussing on InnoDB, Replication and all things High Availability.

Thursday May 13, 2021 15:30 - 16:30 EDT
Room #7

MySQL Community Track

16:00 EDT

How Twitter Runs Presto at Scale in the Cloud

Presto is a widely adopted federated SQL engine for federated querying across multiple data sources. With Presto, you can perform ad hoc querying of data in place.

In this session, Twitter engineer Beinan Wang will share how they use Presto at scale with over 3K Presto workers and 10 million queries and a highly-scalable query predictor service. At Twitter, this service helps improve the performance of Presto clusters and provides expected execution statistics on Business Intelligence dashboards.

Speakers

Beinan Wang, Ph.D.

Senior Staff Software Engineer, Alluxio

Beinan builds large scale distributed SQL systems (presto&hive) for Twitter's data platform team.

Thursday May 13, 2021 16:00 - 16:30 EDT
Room #10

Presto Community Track

16:00 EDT

MySQL & PostgreSQL Migration to AWS at Groupon

THE MANY FLAVORS OF REPLICATION

This talk presents an overview of the many forms of replication currently supported in Postgres.

Here's a breakdown of the topics that will be covered:
- We start with the replication configurations and which includes:
- multi node PRIMARY-STANDBY replication cluster
- Cascading Replication
- Other:
- Replicating to another READ-WRITE host
- active-active
- analytics
- Replicating to a READ-ONLY host using a time delay
- Working with detached READ-ONLY systems

- There are three (3) forms of replicating technologies that have been developed for database systems:
- statement replication
- trigger replication
- binary (the most commonly used solution)

- There are two types, or classifications, of replication which is used by Postgres:
- asynchronous replication
- synchronous replication

Of course this is by no means a complete list as there are so many methods and their variations which are possible in Postgres.

Speakers

Robert Bernier

PostgreSQL Consultant, Percona

Thursday May 13, 2021 16:00 - 17:00 EDT
Room #5

PostgreSQL, All Databases