When it comes to being data driven, organizations are running the gamut with maturity levels. Most people believe that data and analysis provide insight. But only one-third of respondents in a TDWI survey1 said they were really data-driven, which means they analyze data to drive decisions and actions.
Successful data-driven companies promote a collaborative, targeted culture. Managers believe in data and are management-oriented. The company’s technology side ensures audio data quality and puts analytics into operation. The data management strategy spans the full analytical life cycle. Data is accessible and usable by several people – computer technicians and data scientists, business analysts and less technical business users.
TDWI analyst Fern Halper conducted research by analysts and data professionals across industries and identified the following five best practices for becoming a data-driven organization.
1. Build relationships to support collaboration
If IT and business teams do not cooperate, the organization cannot function in a data-driven way – so removing barriers between groups is important. Achieving this can improve market performance and innovation; but collaboration is challenging. Business decision makers often do not believe that IT understands the importance of fast results, and conversely, IT does not believe that the company understands data management priorities. Office politics comes into play.
But having clearly defined roles and responsibilities with common goals across departments encourages teamwork. These roles should include: IT / architecture, business and others managing various tasks on business and IT sites (from business sponsors to DevOPs).
2. Make data accessible and reliable
Making data accessible – and ensuring their quality – is key to breaking down barriers and being data driven. Whether it is a data engineer who collects and transforms data for analysis or a data scientist who builds a model, everyone benefits from credible data gathered and built around a common vocabulary.
When organizations analyze new forms of data – text, sensor, image and streaming – they must do so across multiple platforms such as data warehouses, Hadoop, streaming platforms and data lakes. Such systems may reside on site or in the cloud. TDWI recommends several best practices to help:
- Create a data integration and pipeline environment with tools that provide federated access and merge data across sources. It helps to have point-and-click interfaces to build workflows and tools that support ETL, ELT and advanced specifications such as conditional logic or parallel jobs.
- Manage, reuse and manage metadata – that is, data about your data. This includes size, author, database column structure, security and more.
- Make sure reusable data quality tools with built-in analytics capabilities that can profile data for accuracy, completeness and ambiguity.
3. Provide tools to help your business work with data
From marketing and finance to operations and HR, business teams need self-service tools to speed up and simplify data preparation and analytics tasks. Such tools can include built-in, advanced techniques such as machine learning and many work across the analytical life cycle – from data collection and profiling to monitoring analytical models in production. These “smart” tools have three options:
- Automation assists during model building and model management processes. Preparation of data tools often use machine learning and natural language processing to understand semantics and speed up data customization.
- reusability draws from what has already been created for data management and analysis. For example, a source-to-target data pipeline workflow can be stored and integrated into an analytic workflow to create a predictable model.
- Explainability helps business users understand output when, for example, building a predictable model using an automated tool. Tools that explain what they have done are ideal for a data-driven business.
4. Consider a coherent platform that supports collaboration and analysis
When organizations mature analytically, it is important to theirs platform to support multiple roles in a common interface with a unified data infrastructure. This strengthens collaboration and makes it easier for people to do their jobs. For example, a business analyst may use a discussion room to collaborate with a data scientist while building a predictable model and under test. The data scientist can use a notebook environment to test and validate the model as it is versioned and metadata is captured. The data scientist can then notify the DevOps team when the model is ready for production – and they can use the platform’s tools to continuously monitor the model.
5. Use modern management technologies and practices
Governance – that is, rules and policies that prescribe how organizations protect and manage their data and analysis – is critical to learning how to trust data and be data-driven. But TDWI studies indicate that a third of organizations do not control their data at all. Instead, many focus on security and privacy rules. Their research also indicates that fewer than 20 percent of organizations perform any form of analytic governance, which includes control and monitoring models in production.
Decisions based on bad data – or degraded models – can have a negative impact on business. As more people across an organization gain access to data and build models, and as new types of data and technologies emerge (big data, cloud, stream mining), data management practices have to evolve. TDWI recommends three features of management software that can enhance your data and analytical management:
- Data directories, glossaries and dictionaries. These tools often include sophisticated tagging and automated procedures for building and updating catalogs – as well as discovering metadata from existing datasets.
- Data lineage. Data line, combined with metadata, helps organizations understand where data came from and track how they were changed and transformed.
- Model Management. Ongoing model tracking is crucial to analytics governance. Many tools automate model monitoring, schedule updates to keep models current, and send alerts when a model is degrading.
In the future, organizations may move beyond traditional governing council models to new approaches such as agile governance, embedded governance, or crowd rule. But involving both IT and business stakeholders in the decision making process – including data owners, data managers and others – will always be the key to robust governance at data-driven organizations.
1. What it takes to be computer controlled. A TDWI report on best practices by Fern Halper and David Stodder. 2017.