Today on a dedicated online event to Azure Data Explorer (ADX), Microsoft is announcing several improvements to the service, including a next-generation release of the underlying engine and a number of integration points to make it more accessible, more enticing and more useful. ADX, most commonly used for telemetry data sea workloads and analytics solutions as a service, is now running even faster overall than it had, will have several optimizations and will connect with a number of other data services, streaming data sources and data visualization solutions. This will help a service that has been very successful, but not very well known, even among Azure analytics experts, gain more mainstream appeal.
Performance gains galore
What new features are coming to ADX? To begin with, Microsoft is introducing a new version of the core engine (in preview with GA expected in February) that takes a completely different strategy than querying data. The Kusto v3 engine generates multiple versions of the desired query, uses the fastest, and compiles it to the original code before executing it so that it runs at maximum speed. The indexing layer in the v3 engine has also been rewritten. As a result of these changes, Microsoft says that queries run between 2x and 30x faster.
And in addition to this raw performance gain, ADX will now offer self-refreshing materialized views, query result set caching, and configurable sharding / partitioning. Almost real-time scoring with machine learning models – including those hosted on Azure Machine Learning as well as from other platforms, packaged in ONNX format – also added. Rapid Fourier transforms, geospatial merges, and polynomial regression are also on board. ADX also gets security-level features that make it more appealing to customers who want to support a wide range of users, some of whom may not deserve unobstructed access to all data.
On the data integration side, ADX now has an adapter for Apache Kafka it is gold certified by Confluent, the company founded by Kafka’s creators. There is also now integration between Liquid bit and ADX via Azure Blob Storage, to which Fluent Bit can now deliver data, and from which ADX automatically (and has been) able to ingest them. ADX’s 1-click capture and streaming capture features, which Microsoft had already released in preview, are now generally available (GA).
For data visualization, ADX will now offer a built-in dashboard facility where visualizations returned from data exploration queries can be attached as tiles. This feature was previously released in preview in June this year; and such a dashboard, taken from the function documentation, is shown in the image at the top of this post. In addition to these native dashboards, ADX also integrates with Grafana, through a plugin that will now offer a graphical query builder.
Specifically in the Microsoft ecosystem, ADX can now be queried from Azure Data Studio (which so far has mostly been a tool for working with SQL Server); will integrate with Azure Data Share; will support Vnet and parameterized DirectQuery integration with Power BI; and via a data connection will act as a linked service to Azure Synapse Analytics. In addition, Microsoft’s roadmap includes further integration of ADX as a full resident Synapse service, apparently in the same way Azure Data Factory, or even Apache Spark, is today.
It was already cool
It is important to keep in mind that all this new power and versatility is being layered on top of a service that was already massively powerful. ADX is the commercialization of Microsoft’s internal “Kusto” technology that powers Microsoft services such as Azure Monitor, Microsoft Intune, Azure Time Series Insightsand Product insight into Dynamics 365. While it was a well-kept secret, it was a groundbreaking and innovative cloud service from the start.
Microsoft has said that present version (v2) of the Kusto engine can run queries over a billion rows in less than a second. This performance is so good that the claim almost sounds like hyperbole, which may explain why not all customers could appreciate the power of ADX and put it to use immediately. Nevertheless, the ADX runs a total of over 1 million CPU cores in the Azure cloud, consuming new data at a rate of 35 Petabytes a day, and now saves cumulative 2+ Exabytes.
Growth begins growth
By the way, an Exabyte is a thousand petabytes or the equivalent of a million terabytes. And an Exabyte was where ADX’s cumulative data volume was in January this year, according to Microsoft. In other words, ADX’s total data under management has fully doubled in the last 9 months, no doubt eased by the COVID-19 pandemic and its acceleration of digital transformation.
Clearly, the ADX did not exactly need a “shot in the arm”, but the improved performance, capabilities and integration will also give it improved visibility. A stronger Azure Data Explorer should provide a stronger Synapse Analytics, a stronger Azure Machine Learning, a stronger HDInsight and a more valuable Azure Data Lake Storage layer. Stronger synergies between services can make each service more valuable on its own. Synapse Analytics had already done that for Azure; hopefully ADX can further enhance the synergistic effect.