Oracle takes a new twist on MySQL: Adding data warehousing to the cloud service

oracle-cloud-dw.jpg

MySQL, the open source relational database that came to Oracle through the Sun Microsystems acquisition, originated as a relatively simple relational database that was known for one task: transaction processing. In an announcement today, Oracle is unveiling an extended version of MySQL that takes it into data warehousing territory. It is releasing a new managed MySQL database service on Oracle Cloud Infrastructure (OCI) that will support both transaction and analytic processing workloads.

That creates a key change for MySQL users. With few if any analytic options open, MySQL users typically resorted to ETL to move data to a separate database if they needed a data warehouse. In the new Oracle cloud service, it’s part of the same offering, and thanks to liberal use of in-memory technology, eliminates the need to run ETL.

Until now, MySQL has been primarily restricted to transaction processing as it lacks features, such as support for materialized views for queries relying on aggregated or derived data that are critical to analytics. And also, until now, Oracle has not been top-of-mind for MySQL users; by coming out with a highly differentiated service, it aims to change that.

The approach that Oracle is taking to extend MySQL is not all that unusual in the open source database world; it is adding extensions rather than modifying the core engine to deliver new functionality. That practice is common in the PostgreSQL community. For instance, the Greenplum database, now part of Pivotal, adapted PostgreSQL to support analytics. Citus Data, now part of Microsoft, extended PostgreSQL to support sharded transaction processing. And the list goes on.

Oracle is not the first to innovate atop MySQL. AWS extended MySQL in Aurora for large, multi-terabyte OLTP deployments where parallel processing can support high concurrency. But it uses a different storage engine and maintains compatibility at the API level. By contrast, Oracle kept the original storage engine, but added a new one alongside to add real-time analytics making it easier for developers and data scientists to do their work without having to perform ETL and go to another database for analytics.

MySQL was traditionally not thought of as an analytic target because it lacked some of PostgreSQL’s richer functionality. Beyond lacking support of materialized views, MySQL had more limited capabilities for dropping or truncating tables; running joins and triggers; not to mention supporting non-SQL languages for stored procedures.

But MySQL has an important ace in the hole. Unlike PostgreSQL, it supports pluggable storage engines. That set the stage for Oracle to pull out all the stops for extending MySQL: turn it into a combined transaction and analytic platform that keeps data in the same place and then leverage its cloud infrastructure to aggressively price it.

It starts with what Oracle terms a “hybrid columnar in-memory” data store. At first glance, that sounds like applying the technology Oracle already offers as an option with its flagship database: Oracle Database-In-Memory, which runs alongside the row store. By default, customers specify what data goes into the hybrid data store, but they can also flip a switch that automatically puts all data there (although for most installations, that is probably not the most economical option). It also sounds a lot like what MariaDB offers as an option as part of its platform with its column store.

A key difference of course, is that while MariaDB has similarity to MySQL because of its heritage, Oracle MySQL is, literally, MySQL. But there are several more important differences. First Oracle MySQL cloud customers don’t need to specify which queries should be executed by the in-memory engine. This is done automatically by the MySQL optimizer. Then there’s another key difference: In the in-memory column store, Oracle MySQL Analytics Engine  also does vector processing on the rows, where multiple repetitive instructions are pipelined into a single operation. This is akin to what Actian does with Actian Vector. Again, the optimizer, not the user, chooses the processing path.

There are other optimizations that accelerate performance and support terabyte-scale data volumes. For instance, there is dynamic workload partitioning that scales out parallel processing and distributed query processing algorithms for distributed joins. That is enabled by hash tables cached in the processor for directing joins.

Oracle claims that, with optimizations that are baked into its native cloud infrastructure, it can underprice competing cloud data warehousing services. As to benchmarks, Oracle cites its own TPC-H runs comparing its new service without an index (data warehouses rarely run with indexes) outperforming a conventional MySQL implementation with an index by 400x, running a 400 GByte data set on a 64-core machine. Not surprisingly, Oracle also cites superior price/performance from its own benchmarks against its favorite target, Amazon Redshift.

Clearly, Oracle is looking outside its core client base. While they own MySQL, Oracle is better known for high-end enterprise databases with its eponymous platform; not surprisingly, MySQL has been known as Oracle’s poorer stepsister. Nonetheless, as an open source platform, MySQL continues to be popular, and in the November 2020 db-Engines ranking, comes in second only to Oracle database itself.

As MySQL forms the pillar for open source database services from each of the usual suspects, Oracle’s strategy was not to make MySQL a carbon copy of Amazon RDS for MySQL, Azure Database for MySQL, or Google Cloud SQL. Instead, Oracle set out to make its offering a different service by adding analytics, and then aggressively pricing the service. Putting its money where its mouth is, Oracle is open sourcing its benchmark suite on GitHub so customers can run A-B tests themselves comparing Oracle MySQL Cloud to any of the other MySQL cloud services, rather than relying on Oracle’s own numbers.

Oracle has not been known for its ability to draw new database customers; most cloud providers heavily publicize the customers that they have won from Oracle. But the early track record with Oracle’s Autonomous Database has shown a surprisingly strong proportion of new customer wins. Given that Oracle’s prime target for autonomous database has been the bulk of its existing base, greenfield customers are icing on the cake. For MySQL, it’s a different story because Oracle is a challenger in this market, and therefore, the prime target will be new customers. Oracle will need to find new ways to get its message out to build mindshare with an audience that hasn’t associated Oracle Cloud as the likely place for running MySQL.