Do Apache Cassandra well again: DataStax goes cloud, Kubernetes, open source and multimodel

Leads with code to drive Cassandra ubiquitous. This is the key message DataStax presents on the occasion of releasing its open-source Kubernetes operator to Cassandra along with improved data flow and graphical queries.

In these stormy times for both open source software (OSS) as a whole, and DataStax as a database provider built around the open source kernel Apache Cassandra, this is something worth exploring. ZDNet joined with Patrick McFadin, DataStax VP Developer Relations, to discuss the ins and outs.

Towards cloud and open source via Kubernetes

As we have highlighted on Big on Data, data is moving to the cloud. This is done using OSS, and Kubernetes too. The fact that DataStax has chosen Kubernetes to highlight its contribution to the Apache Cassandra community should not surprise.

With the Cassandra Kubernetes operator, DataStax claims that businesses and users have a uniform scaling stack for calculation and data. The question is: how exactly does this come from. Has it been developed by DataStax and then donated to the community as a concrete sign of a new approach?

DataStax recently recruited a number of executives to renew its leadership. Chet Kapoor and Sam Ramji, the new CEO and CSO, respectively, are both ex-Googlers. In a recent interview, Ramji highlighted some key areas: Reconnecting with the OSS community, emphasize services and supportthat makes life easier for developers.

“We’re embracing open source again,” McFadin confirmed. McFadin’s role in implementing the Kubernetes operator was instrumental, both at the technical and social levels. Kubernetes sees a quick update. According to a 2019 Cloud Native Computing Foundation survey, 78% of Kubernetes surveyed use production, compared to 58% last year.

cloud database-1.jpg

DataStax goes cloud and open source via Kubernetes. (Image: DataStax)

This means that various organizations have worked to get Kubernetes to work with Cassandra, which is among the 10 most popular databases in the world, according to DB Engines. This was the background that McFadin worked against.

On the one hand, as he noted, having many implementations of the same thing means that people can be on the same page in terms of what’s important to work on. On the other hand, integration is a balancing act, both technically and socially.

DataStax has partnered with Sky, Orange, Netflix, Target and many other teams in the Cassandra community to improve and promote the operator. McFadin, who has a longstanding commitment to OSS, pointed out the obvious: Each of these teams is focused on solving the issues that matter most to them.

The way DataStax approaches this, per. McFadin, is not dumping code on GitHub and expecting the community to adopt it as the unique way to work with Kubernetes. DataStax has developed more than one operator – there is also a Kubernetes sidecar and management API. DataStax is using this to develop its own cloud and now it is available for everyone to use.

Actions and words, code and advocacy

DataStax’s cloud-controlled version, formerly called Constellation, is now being reclassified as Astra. It is expected to be generally available soon. McFadin acknowledged the fact that Cassandra has a reputation for being robust but difficult to handle. McFadin also referred to the upcoming version 4.0 of Cassandra, which DataStax has promised to contribute. He said it will be the best release yet, not because of sexy new features, but because of how stable it will be.

Speaking of cloud, open source code and community, the discussion opened to a broader topic. McFadin referred to reconnection with the Cassandra community and Apache Software Foundation (ASF), as a humbling experience. He said people were eager to listen, but to gain their trust, DataStax would let actions speak louder than words. In other words, DataStax supports its intent with what counts most in OSS code. Or does it?


There are many pieces in the open source software puzzle. (Image: Photo by Hans-Peter Gauster on Unsplash)

Valuing and measuring contributions in terms of code alone is not the only way to think about OSS. the ASF favors community over code. Measuring contributions in terms of code is not trivial, but is sufficiently well understood and can be done. But what about the contribution of society, for example, in terms of advocacy?

McFadin referred to his own experience with the DataStax advocacy team. Against this background, I mentioned a few metrics that can be used to measure community engagement and contributions: number of workshops, topics and related participation, answering questions in public forums, etc.

We have previously considered the question of whether measuring contributions and rewarding contributors could be a fairer way to grow and sustain the US. McFadin had no answer to that. However, he pointed out that healthy OSS communities attract contributions from many actors and in many ways.

In any case, DataStax is not considering a change in license to discourage cloud providers from offering Cassandra as a service. A licensed Apache license and a commercial license are all that is needed, McFadin and if Amazon wants to do this, so be it.

Towards a future with more models via graph

Reconnecting with the community sounds like a good thing. More functionality for open source Cassandra – as well. However, for DataStax this creates a familiar and inevitable tension. What features remain in DataStax Enterprise (DSE) and what features make it open source Cassandra?

McFadin responded by saying that DataStax does not expect its product to be used in a 100% DataStax store. He went on to add that customers value not only features but also a partner they can trust, and that’s what DataStax wants to be. The recent acquisition of TLP should also be seen in the light of a new emphasis on a service-based model.

However, as these important issues are in focus, we risk overseeing something else, which is also important: DataStax’s move towards becoming a multi-model database. DataStax has added graph features to the DSE a long time ago. Until now, however, it was not really possible to mix and match native Cassandra data and graph data.


Adding graphical query features to DataStax native original data via Gremlin may be the first step towards a multi-model future. (Image: Apache Tinkerpop)

As of the recently released DSE 6.8, graphical queries can now use native Cassandra data models. Inserting data into the DSE makes them available for query using Gremlin. This allows developers to build multi-model applications with joins, matching and review over distributed, large datasets.

In addition to empowering graphical users, this is also a big win for “traditional” DSE users. As McFadin noted, few developers are religiously devoted to one or the other computer model. Most of them will just use the right tool for the job.

By allowing DSE users to add graphical query features to their arsenal, DSE gets a number of things. First, the ability to do things goes together. Graf excels at this and DSE users will benefit. Perhaps more important, however, is that DataStax takes the first step toward one future with more models. To run Cassandra ubiquitous this may work well.

Source link