Candidate for data analysis Computer chips offers a platform that, along with open source Apache Spark technology on which its core is based has long been a favorite to attack streaming data, data engineering and machine learning workloads. It also has “moonlight” as an SQL analysis platform capable of meeting popular Business Intelligence (BI) tools and their queries in a pinch. Today, Databricks announces SQL Analytics, a set of interface and infrastructure features that transform SQL analytics on the platform from pure sideline to first-class use case.
The major ticketing elements are the addition of a SQL Analytics Workspace user interface to the Databricks platform as well as the ability to create dedicated SQL Analytics endpoints. The former utilizes technology derived from Databricks’ acquisition of Redash, announced in June. The latter are clusters dedicated to ad hoc analysis / BI workloads and allow customers to take full advantage of Delta motor capabilities added to the core data platform, also in June.
Note the legs
Databricks platform users – including both Azure Databricks and Unified Data Analytics platform service hosted on Amazon Web Services – already had the ability to create SQL-based notebooks. Cells in these notebooks can hold SQL queries and present the results in tabular form or as relatively simple visualizations, which in turn can be combined into a special dashboard view of the laptop. These features include rudimentary analysis and BI workloads, but in reality function more as a convenience feature in servicing the computer technology and machine learning workloads that Databricks has excelled at.
The new SQL Analytics workspaces are now available in a completely separate view from the standard Databricks workspace via a sort of shift menu, which is accessible by clicking a button at the bottom left of the Databricks UI. They provide a full-screen query display with robust syntax completion, a major productivity boost over free-form text entry on a notebook cell. Also present is a list of existing databases, tables, and columns at the bottom left of the screen – something that in the standard workspace requires shifting focus away from the laptop. All of this is depicted in the figure at the top of this post.
Data visualizations are an important strength in Analytics Workspace with support for multiple visualizations per. Request. Each visualization can be added to an externally defined dashboard that exists independently of the saved query. All of this compares favorably with SQL notebooks, which either allow either a table view or a single visualization of the data returned by queries in the laptop. In addition, there are more types of visualization available in an Analytics Workspace query than in a notebook. Here is an example of such a visualization:
Analytics Workspaces also supports rule-based alerting, driven by specific query result conditions, monitored at a configurable frequency. Full query control is also supported through a dedicated history view in Analytics Workspace.
Means to an end point
The capabilities of SQL Analytics Workspace go beyond the user interface, but warn about features and history view. New SQL Analytics endpoints are special database clusters dedicated to displaying BI / ad hoc analytics queries. While these clusters offer access to the same data that is visible to conventional clusters, they isolate their workloads, providing greater concurrency. These clusters are provided based on “T-shirt” sizes (ie, small, medium, large, etc.), which avoids the need to specify the number and types of master and work nodes.
Automatic scaling features are available so that additional clusters can be assigned and de-prepared based on workload requirements. The automatic scaling is controlled by a customer-specified number of cluster counts minimum and maximum. BI tools and other query clients simply connect to a single endpoint and can remain blissfully unaware of the existence of multiple clusters. The data chip platform directs all endpoint requests to a specific cluster in the appropriate load balancing mode.
Start your engine
SQL Analytics endpoints use Delta Engine and Photon technology, which were added to databases in June. One way of thinking about Delta Engine is as an optimized C ++ – based rewriting of Spark SQL engine. But it really goes beyond that, as Photon delivers a vectorized query engine that, according to Databricks, offers fast parallel processing; up to 5 times faster scan performance a cost-based query optimization performance adaptive query that dynamically reschedules queries on the run and dynamic runtime filters that improve data jumps with greater granularity for even faster queries.
Prior to Delta Engine added databases Delta Lake databases capabilities (and subsequently open sources for working with Apache Spark). Delta Lake added the ability to update and delete data efficiently and do so within the framework of ACID transactions (atomicity, consistency, isolation, durability). Since most data lake technologies, including underlying file formats, are geared for reading rather than writing, this was a significant addition, adding support for data versioning and “time travel” queries as a consequence.
Data Warehouse platforms have supported ACID transactions and efficient updates and deletions all the time, which in some respects has made them more versatile than data lakes. The combination of Analytics endpoints, Delta Engine, and Delta Lake adds Databricks’ “data lakehouse” paradigm, making the data lake a viable alternative to a data warehouse in most utility cases.
Because this activation is implemented in the infrastructure layer, search engine optimizations become applicable not only from SQL Analytics Workspace, but also from third-party BI and data integration platforms. Probably for the reason BI juggernaut Table participates in the Databricks announcement today, as well as provider of ELT platforms (extract-load transformation) Fivetran.
Managers from both companies see the lakehouse model and its unifying effect as significant. Francois Ajenstat, Tableau’s Chief Product Officer, said “As organizations rapidly move their data to the cloud, we see a growing interest in conducting analytics on the data lake.” Fivetrans CEO George Fraser called Databrick’s SQL Analytics “a critical step in … combining traditional SQL analysis with machine learning and computer science”, adding that companies “should be able to … implement more analytics paradigms of an overall Lakehouse architecture support it. “
While similar Teradata and Snefnug can take issues with the notion of the data lake as a primary analytics repository, it is clear that databases and its partners see the model as legitimate. It’s also clear that Databricks is ready to invest what is needed to make the lakehouse model credible, and take its platform well past its inception as a commercially enhanced cloud-based Spark service. With the addition of SQL Analytics, databases get the opportunity to see that credibility is disproved or proven. It will be more competitive no matter what and customers will benefit from it.