A few weeks after AWS was released Amazon Keyspaces for Apache Cassandra, now it’s DataStax’s turn. After a false start last year, DataStax is now going live with its long awaited Astra Database-as-a-Service (DBaaS). Contrary to what we reported a few weeks back, Astra will be based on DataStax Enterprise (DSE), not the bare bones DataStax distribution by Apache Cassandra (DDC). It is the same offering that was previously branded as Apollo. And most importantly, the new service, based on a cloud-native Kubernetes (K8s) infrastructure, is designed to be cloud vendor independent.
Advertising of Astra comes within a week, with inspiration from Eric David Benari, we informally call “Database Week.” With DataStax. Redis, and Hitachi Ventara Anyone holding digital online events (instead of conferences) this week is going to be getting a lot of database messages from them and others in the next few days.
arguably, Cassandra is the last popular open source database to get a managed cloud service. Exclusive SQLite is the last of the two dozen databases ranked in popularity by db Engines to get there. Until a month ago there were none, and now there is a real choice: DataStax’s offer, which stays close to Apache Cassandra open source engine and AWS if Keyspaces service runs on another storage engine but is API compliant with Cassandra. As we noted a few weeks back, Keyspace follows a lot in the pattern that AWS established dawn and DocumentDB.
At launch, Astra will be available on it AWS and Google Cloud, but with the latter, DataStax has a closer relationship that so far includes joint marketing and integration with the Google Cloud console. Initially it will be a tenant implementation, but that will change later – along with support for other public clouds such as Microsoft Azure.
Simplification of Cassandra
The arrival of managed cloud services for Cassandra is key to making this high-performance, highly scaled distributed database available to a wider audience. Cassandra has long been known for its performance and scale, but never for its ease of use. Given these obstacles, it seems more than a minor miracle that Cassandra ranks as high as the 12th most popular database tracked by db-Engines. But as the popularity of AWS’s DynamoDB service shows, there is a great demand for distributed databases.
Of course, managed cloud services eliminate most, if not all, of housekeeping, especially when it comes to patches, maintenance and upgrades. But particularly critical were changes in management and implementation, many of which are related to modernization with the new K8s operator and the associated management API (which functions as a K8s sidecar). The Management API wraps an abstraction layer around it JMX (Java Management Extensions) that Cassandra uses to provide monitoring; JMX was used because Cassandra was written in Java. Without API, JMX would be far more brittle because it is a low-level design that would otherwise need to be adapted when running on different platforms. The new API is modular and not only works with K8s, but other operators such as Puppet.
DataStax has also opened their new one Metrics Collector for Cassandra which was designed to integrate with Prometheus, the open alarm tool, and Grafana, for visualization. The association with Prometheus and Grafana means that DataStax no longer needs to reinvent the wheel when it comes to monitoring and alerting, and with Astra it has developed a template that pre-develops the dashboards and best practices to help customers decide what to do instrument and monitor – an important stumbling block with traditional Cassandra implementations.
The cloud-native journey
As noted, Astra will be based on DSE, which is DataStax’s commercial implementation of Apache Cassandra, with added features such as enhanced security, an administration console, stored storage, memory support, search plus options for analysis and graphing.
The new K8s operator was a 180-degree shift from DataStax’s original strategy for its scheduled cloud service. The first iteration was that the platform would be expanded to work with each cloud, but the Google Cloud partnership announced a year ago brought about the change that resulted in Astra. This is where the plans for the K8s operator came in, and along with it the new management API to simplify integration with JMX.
And along the way, DataStax will refactor the platform into microservices that allow separation of the computer from storage, support multitenancy, enable serverless operation, and provide far more flexibility in scaling. For example, when DSE on Astra is refactored in microservices, the customer could specify whether a computer node needs to be scaled up or scaled across multiple nodes, depending on their required level of service and budget. In the future, DataStax wants to make these optimizations easy and automatic for Astra users.
Open source restructuring
After a few years of emphasis on differentiation with Apache Cassandra, DataStax is now looking to adapt its platform with the Apache project, and in the long run is likely to follow the process similar to Cloudera. The underlying database will be open source, but the binaries implementing features such as the management console will be specific to the commercial offering.
That’s the approach that DataStax took to the big design change that resulted in Astra: transitioning to a cloud-native architecture based on microservices, containers and K8s. It was a 180-degree shift from the original strategy that took a more monolithic approach to adapting the platform to specific clouds. While the ultimate decision rests with the local community, DataStax plans to send cloud-native extensions to the open source project.
A first step
DataStax targets those who want a cleaner implementation of Apache Cassandra. Like AWS, it promises a similar developer experience that supports Cassandra tools and APIs that they are used to. But it will stay closer to Apache Cassandra in its CQL support, table space and key management along with some under the hood differences with features like load balancing. In addition to the paid level, DataStax will also offer a free community level that maximizes 10 GBytes for developers who want to learn Cassandra.
While Astra will initially be available on AWS and Google Cloud, it’s on the latter where the options become interesting because DataStax is one of the databases that is part of Google Cloud’s open source database partner program. In the short term, this means joint marketing and integration with the Google cloud console, but in the longer term we would like to see integration with some of Google Cloud’s offerings for data flow, analytics and machine learning.
As mentioned above, this is just the first step in developing DSE and Cassandra into a cloud-native database. For the moment, AWS’s approach, leveraging its existing storage engines and experience with DynamoDB, has given it an edge by supporting serverless operation at launch. Although DataStax’s first launch achieved much of the goal of simplification, it will ultimately make the service much more cost-competitive. We expect the initial, one-tenant Astra launch to mostly appeal to DataStax’s existing customer base, with multi-tenancy being the key to appeal to a wider audience. But as we noted in our piece on Keyspaces, there is one more important step that we want to see: better tools for application developers to model schemas and create apps running against Cassandra.