Executive Insights

Data Is the New Business Currency: Unlock Value by Building a Modern Data Platform

August 30, 2021

Key takeaways

Many organizations do not realize that success will increasingly be driven by how well they can manage and optimize the preparation and mastering of data.

Trends driving the need for a modern data infrastructure include an increasing need for data creation and collection, an increasing need for real-time decision-making, and the increasing importance of data as a driver of business change.

To build a modern data infrastructure, organizations should consider data digestion and storage, data organization, and data visualization and output.

Enterprise AI platforms can simplify data digestion and mapping, and help to consolidate data exploration and visualization under a single AI tool.

The success or failure of companies over the next 10 years will be increasingly driven by how well they can take advantage of their data. Despite the growing importance of data for supporting better decision-making and achieving competitive differentiation, many organizations have failed to realize its full transformative potential. This failure often occurs in organizations that are too focused on the data-transformation end game — for example, analytics and visualization, machine learning (ML) tools, and business intelligence software — rather than on the foundational inputs such as data integration, and the preparation and mastering of data. That’s like investing in the cart before buying the horse.

If companies do not have quality data to analyze, they will not see the return on investment they expect from data analytics tools. However, quality data comes from many different sources and in many different formats. Companies must be able to access and aggregate data from multiple silos in order to properly analyze it.

Several trends are driving the need for modern data infrastructure. They include:

  • An increasing volume of data creation and collection

  • The transition of workloads to the cloud, including both hybrid cloud/on-premises environments and multicloud environments

  • The fragmentation of applications, data environments and data sources — including enterprise resource planning (ERP), customer relationship management (CRM) and customer data platform (CDP) software

  • An increasing need for real-time decision-making

  • An increasing need for “self-serve” insights across business units

  • The increasing need of stakeholders outside the organization to access data and insights

  • The increasing importance of data as a driver of business change

Data is being generated everywhere — e.g., during ecommerce transactions, by the online tracking of customers and from Internet of Things (IoT) devices — and the impact is being felt across all industries. According to a survey by Dresner, 52% of businesses are using big data today. Even companies in lagging industries such as manufacturing, healthcare and retail — more than 40% of them, in fact — are currently using big data.

All industries gather data, but much of that data is never used. Unused data, also known as “dark data,” accounts for more than 50% of all data held by corporations. That means companies are going to great lengths to gather data but never actually analyze it.

The principal reason for this phenomenon is the existence of legacy IT infrastructures that have data residing in siloed databases and in different formats. These complex data structures make the data unusable without significant integration efforts.

In addition, finding the right talent to make sense of this data can be challenging. According to a QuantHub survey, there is a shortage of data scientists, and talent is expensive.

Before data can be made accessible to business and research analysts, companies must first integrate, clean, organize and store their data. A significant investment in a modern data infrastructure is required to accomplish this.

An increase in cloud investment was seen in 2020, with overall IT spending decreasing by 8% but cloud spending increasing by 18%. As more companies move more data to the cloud, investment levels are projected to increase. A KPMG study showed that 88% of surveyed companies are currently using cloud IT infrastructure for at least some of their data, and in the next two years, 50% of companies intend to move all their data to the cloud.

Moving data to the cloud is an investment that fuels future investment — especially in working with the data. Once the data is hosted, companies can seek out third-party vendors to do much of the heavy lifting. Companies can sign up for software solutions to integrate, clean, host, analyze and visualize the data. The cloud storage industry is growing at a CAGR of 22%, and the analytics and data visualization industries are also expected to grow rapidly (CAGRs of 13% and 10%, respectively).

Historically, companies gathered data and stored it on their own servers, in the format it arrived in, in whatever database it went into. The different databases did not talk to each other, but because they mainly operated independently of one other, this was not an issue.

As companies have increasingly realized the value of analyzing their data, they have begun to build out data analytics teams that can measure data across the organization and that can better inform decision-making.

However, simply gathering and combining the data that lives in many different locations is an extremely onerous task and can significantly reduce the efficacy of an organization’s data scientists or analysts. By switching to a modern cloud architecture, organizations can more efficiently and effectively integrate, store and clean data, thus making it easier to extract and analyze (see Figure 1).

Figure 1

Modern data infrastructure

Image
data-infrastructure_v2

Figure 1

Modern data infrastructure

Image
data-infrastructure_v2

When building a modern infrastructure to harness the power of data, organizations must consider a number of factors.

Data digestion and storage. Factors related to data digestion and storage include:

Data integration. Organizations have many choices for integrating/replicating data across databases and applications — for example, extract, load, transform (ELT)/extract, transform, load (ETL); change data capture (CDC); data virtualization; integration platform as a service (iPaaS); and application programming interface (API). Organizations will typically utilize multiple technologies in parallel depending on the data integration need. For example, CDC may be used to create real-time, harmonized repositories for high-volume, structured data across cloud/hybrid environments (e.g., to support real-time reporting), whereas APIs are more typically used for low-volume, high-frequency application-to-application linkages (e.g., to support daily operations).

Data storage. Two main examples of cloud data storage are data lakes and data warehouses. Data lakes store all data, regardless of format. Data warehouses are more organized and have all data in a common format, making it easier to analyze. Companies such as Snowflake specialize in making data lakes accessible to analysts by automating the process of cleaning the data lake and moving it to data warehouses.

Data exploration. Once the data is in a usable format, analysis can be performed. There are many open-source libraries for analyzing data (e.g., Pandas) and building ML models (e.g., TensorFlow) that have robust functionality and large communities supporting them, although they do require coding abilities. An increasing number of paid, enterprise, and low- and no-code solutions make the data more accessible to business analysts. This data democratization empowers citizen data scientists, defined as people who “create or generate models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics.”

Data visualization and output. Once the data is formatted and analyzed, software services/business intelligence solutions can provide tools and dashboards to help communicate data findings to the broader company and make the insights actionable. Some examples are Looker, Tableau and Power BI. Additionally, the data can then be used in internal apps or through third-party software (such as customer personalization through a digital experience platform) to provide value to customers.

Historically, deploying artificial intelligence (AI) and ML models and applications into production has entailed numerous time-intensive manual steps and processes with various stakeholders contributing along the way, resulting in elongated lead times for deployment. However, enterprise AI platforms simplify data digestion and mapping and consolidate data exploration and visualization under a single AI tool, allowing various stakeholders to better focus on analysis, programming and development (see Figure 2). For example, these platforms support tasks across the analytics pipeline, from data ingestion, integration and visualization to advanced modeling, testing and application deployment including the use of AI and ML technologies.

For organizations seeking to play catch-up with their peers or ones that do not prioritize investing resources in technical staff or cumbersome tools, enterprise AI vendors are looking to fill the need with an end-to-end solution.

Figure 2

Historical deployment of AI/ML models vs. consolidation of workflows into a single platform

Image
historical-deployment

Figure 2

Historical deployment of AI/ML models vs. consolidation of workflows into a single platform

Image
historical-deployment

Momentum continues to build for organizations across industries to better harness their data in support of growth objectives and internal optimization initiatives. Companies are increasingly investing in solutions that can integrate with other systems, unlock predictive insights and empower a broader set of users beyond data scientists to participate in the process.

There is tremendous opportunity in this market. We are still in the early innings.

English