Cloudera took a big step forward with it Cloudera Machine Learning (CML) platform today. The company introduces new operational management features for machine learning models and management functions for the data science pipelines that produce them. See ZDNet Editor in Chief Lawrence Dignan’s post to cover the news itself, and some really useful analysis of how it places Cloudera in the analytics market. To enhance Dignan’s analysis, I will cover details of Machine Learning Operations (MLOps) features Cloudera announces today. And before I do, I want to explain why customers need them to begin with.
To understand why MLOps is needed, consider that machine learning models are actually software. Usually, the models are implemented as REST-based web services and they undergo a development process that involves code creation. In addition to software development parallels, machine learning also involves the use and processing of data sets, just as BI and other descriptive analytical work does.
For precisely these reasons, machine learning work must be supported by the same type of source code management, testing, versioning and automated implementation that other software has. Similarly, data science environments need data management support, including cataloging and descent tracking of machine learning models and their underlying data sets. Clouderas MLOps offers both addresses: model installation and control functions surface inside CML, while control functions are displayed in Cloudera’s shared data experience (SDX) substance.
The control features come to SDX as enhancements announced by Cloudera in December for open source Apache Atlas project. Although Atlas is an industry standard, Cloudera is its main backer and the project was founded by Hortonworks, which merged with Cloudera in a deal announced in October 2018. Cloudera data directory also has a basis in Apache Atlas.
Features for managing machine learning in SDX include the aforementioned model cataloging and line properties. SDX also provides security infrastructure over REST web service interfaces created around deployed models.
Management and administration
CML management features include automated deployment support as well as a model monitoring service to track performance, accuracy, and operation of the model in general. CML can also track individual predictions made by the model and how well they correspond to “basic truth”, ensuring consistency and providing detailed context for assessing the model’s overall accuracy. CML offers built-in functionality that can be generated to manage and ensure the interpretability of machine learning models Shap and LEMON-based model and prediction explanations.
On the development side, CML is based on template-based projects consisting of associated source code files, development sessions (configurable Kubernetes containers), experiments, models and jobs. As these projects progress, developers can integrate API calls to CML within their source code to log experiments and their associated metadata and metrics.
Open platform, hyper / multi cloud
In an advanced briefing with ZDNet, Cloudera explained that, given the management capabilities of the Apache Atlas, and CML were a component of Cloudera Data Platform (CDP), Cloudera’s MLOps capabilities are actually open standards, as the company hopes other industry players adopt them. Because CDP supports, and SDX manages, deployments across private and (potentially multiple) public clouds, the CML environment is also portable across target platforms.
Cloudera explained to ZDNet that among its customers are organizations that have progressed well beyond the evaluation phase of machine learning and have tens, hundreds or even thousands of models in production. Managing these models on an ad hoc basis and lack of structured development tools to manufacture them is simply unsustainable. Necessity is the mother of linguistic invention, Cloudera MLOps is the company’s concrete answer to the needs of these customers.
Cloudera is a customer of Brust’s consulting company, Blue Badge Insights.