Informatica brings server-free computation to the Data Integration Cloud


Credit: Informatica

A few years after disclosure its microservice-based second generation Intelligent Cloud Service. Informatica s the latest quarterly release has finally caught the serverless bug. It is among a set of new features that add new capabilities for managing data pipelines and integrating streaming.

Serverless computing is a natural fit for ingesting data and integration processes as they are often run in batch, and depending on the mix of sources, they could also have very variable resource consumption profiles. The guiding notion of server-free is to eliminate the need to provide “just-in-case” capacity to handle spikes, as the system automatically adjusts preparation based on traffic. The new serverless setting is auto-scale and has built-in high availability and restore. Customers can still use server-based settings for more predictable long-term workloads.

While server-free simplifies users’ lives by having the system automatically available resources, the downside is that costs can be unpredictable. As part of the new serverless option, Informatica offers a calculator that uses machine learning to profile new workloads that provide an estimate of costs based on whether customers prioritize performance (with parallel processing) or costs (going through a single node) .

With serverless, Informatica steals a page from cloud-based services that have already made serverless staples for ETL and integration offerings based on data pipelines. Among them are AWS glue. Azure Data Factory. Google Cloud Data Fusionand straight data Bricks, which added a server-free option.

A related feature is the use of machine learning to help organizations rationalize their data pipelines. Since low-code / no-code cloud-based tools make it almost too easy to build pipelines, customers can easily build up a confusing array of one-time expenses. Informatica’s new tool monitors the pipelines, scans data sources, operations, and targets to identify which pipelines use similar transformation patterns, and guides users to build configurable templates that reduce scatter and make them more configurable and maintainable.

And when consuming streams, Informatica has added a new capability that scans the Kafka archive to track data conditions, just as it already does for database and file sources. And when doing data prep, Informatica’s cloud service can recommend signups. The visual integration designer for Informatica’s cloud ETL service, in turn, has stolen a page from data prep by recommending transformation operations based on scanning sources and targets.

Among incremental updates is the addition of de-duplication capabilities to the data quality services introduced last year. While de-duplication is hardly new with Informatica, it was previously only available on-site or as part of a bring-your-own-license (BYOL) support to run Informatica data quality on Amazon EC2 or other cloud infrastructure services. The catalog has been enhanced with a selection of views for data engineers, business analysts and data scientists through menus that allow users to select logical or physical views of metadata. The directory has been expanded from the usual list of database sources to review metadata from cloud services such as Microsoft Power BI. Qlik Sense, AWS glue, Google Cloud, snowflakeand other sources.

Rounding out the spring release is the exposure of customer master data using the underlying graph database that provides a more intuitive way to represent and explore customer relationships. The new release is now available on it AWS. Azure, and in beta for Google Cloud.

Source link