Customers Contact TR

How Data Mesh Transforms Enterprise Data Management

In the first part of this blog series, we explained what a Data Mesh architecture is, the challenges it addresses, and how Google Cloud supports Data Mesh. Today, we’ll dive deeper into how to implement this architecture by exploring two practical use cases.


1) Building a Scalable Data Mesh for Retail Analytics

Imagine a retail company aiming to create a scalable, accessible data infrastructure that supports business analytics, inventory management, and customer behavior insights. The company’s objective is to organize and manage data in a decentralized way, with each team—such as marketing, data science, and business intelligence—accessing the data products they need. Let’s explore how this architecture works in practice using the components outlined in the diagram below.


Overview of the key architectural components in a data mesh implemented on Google Cloud

1) Central Services for Management and Discovery

  • Management: The company’s central team uses services like IAM (Identity and Access Management) to control access to sensitive data. Only authorized team members can access certain data products or resources. For instance, marketing teams can access datasets on customer behavior, while inventory management teams only see stock-related data.
  • Access Policies: With Dataplex, the company establishes consistent access control policies across different data products, ensuring security is enforced even as data is shared across domains. BigQuery Reservations help the team efficiently manage and allocate computational resources, optimizing costs for analytics queries.
  • Discovery: The central service layer includes the Dataplex Catalog, which maintains metadata about data resources. Cloud Run powers a user-friendly search interface, making it easy for teams to discover the data products they need. For example, the data science team can quickly locate the datasets they need for training predictive models, while business analysts can search for the latest sales data. Documentation about each data product is stored in Google Drive, offering detailed guidance on how to consume and use each product effectively.

2) Data Domains Exposing Data Products

  • Data Domain 1 – Inventory Management: In this domain, data is managed through Cloud Spanner (for structured, transactional data like inventory levels). The domain also uses BigQuery for native storage and as an authorized dataset that contains structured inventory data. The sales and stock data are grouped into data products, such as a BigQuery table that contains daily inventory reports.
  • Data Domain 2 – Customer Behavior and Marketing: Another domain manages customer interaction data, stored in Cloud Storage as raw data, then curated into structured datasets using Dataplex. This domain also exposes a BigQuery dataset for querying customer data while enabling a read API for teams outside the domain to consume this data without direct access to the underlying raw data.

3) Consumers of Data Products

  • Reports Consumer – Looker to BigQuery: The marketing team uses Looker to create dynamic reports based on datasets available in BigQuery. Marketing analysts can generate real-time insights on customer trends and product performance by accessing curated data products. Looker connects to BigQuery to consume and visualize data in a user-friendly way.
  • Data Science Consumer – Vertex AI to BigQuery: The data science team uses Vertex AI to train machine learning models, consuming data products from BigQuery. For example, a model predicting customer lifetime value (CLV) might use both transactional data from the inventory domain and customer behavior data from the marketing domain.
  • Data Management Consumer – Dataproc to BigQuery: The data engineering team uses Dataproc for large-scale data processing with Spark. It pulls data from BigQuery for more advanced analytics and batch processing, such as generating nightly data pipelines that aggregate sales performance or inventory forecasts.

2) Building a Data Mesh for Financial Fraud Detection and Risk Management

In the financial services industry, managing and securing sensitive data is crucial. Banks and financial institutions must not only store vast amounts of transactional data but also ensure this data is analyzed in real-time to detect fraudulent activity, assess risks, and provide accurate financial forecasts. A scalable, efficient data architecture is essential to support these critical functions.


By implementing a data mesh, these organizations can decentralize data management across different departments while ensuring that data products are secure, discoverable, and easily accessible to the right teams. Let’s break down how the data mesh components can help build a real-time fraud detection and risk management system.


An example scenario in which consumers use data products through a range of interfaces, including authorized datasets and APIs

1) Operational/Source Data

  • Cloud Spanner (Inventory Management): Transaction data is stored in Cloud Spanner, ideal for managing structured transactional data like accounts, transactions, and financial history. This is the foundational source for risk assessments and fraud detection.
  • Pub/Sub (Raw Event Streaming): For real-time fraud detection, the system relies on Pub/Sub to stream raw event data. This could include transaction details, login attempts, or unusual activities detected by security systems. This streaming data is vital for detecting fraud as it happens.

2) Data Ingestion Pipeline

Once the raw data is gathered, it’s processed and ingested into the data mesh.

  • Dataflow: It processes and transfers the data through a pipeline, ensuring it’s structured, cleansed, and enriched before being used for analysis. The data is now ready for deeper analytics and machine-learning tasks.

3) All Domain Data

The processed data is stored in a central location for easy access and consumption.

  • BigQuery: All domain data, including transaction records, account histories, and risk assessments, are stored in BigQuery. BigQuery’s ability to scale and run complex queries efficiently makes it ideal for analyzing large datasets and supporting fraud detection algorithms.

4) Product Interfaces

  • BigQuery Authorized Datasets (v1 and v2): Authorized datasets in BigQuery allow specific teams, such as the fraud detection and risk management teams, to query the data securely. For example, the fraud detection team might query transaction data to spot irregularities and potential fraud in near-real time.
  • Data Access API (GKE): For more flexible and customizable access, Google Kubernetes Engine (GKE) hosts Data Access APIs. Data scientists or analysts can use these APIs to access specific subsets of data, such as recent transactions or customer profiles, directly from within their applications.

5) Data Consumers

  • BI Reporting: Looker and SQL Engine: The Business Intelligence (BI) team relies on Looker, a powerful BI tool, to generate dashboards and reports based on data in BigQuery. They might create visualizations of transactional trends or identify emerging risk patterns by querying both historical transaction data and real-time event data. Additionally, an SQL engine is used for more customized reporting needs.
  • Machine Learning (ML) Models: Vertex AI and Dataproc: Data scientists use Vertex AI to train machine learning models that predict fraudulent transactions or assess financial risk. These models are powered by BigQuery datasets. The ML models are trained on historical transaction data to learn patterns that may indicate fraud. Additionally, Dataproc is used for data analytics processing, helping to enrich the data used for machine learning by performing complex data transformations and feature engineering tasks.

⭐⭐⭐


These scenarios illustrate how the architectural components of a data mesh come together to provide a scalable, secure, and flexible solution for managing and consuming data. Whether for a retail company seeking to break down data silos or a financial institution aiming to modernize fraud detection and risk management, the data mesh offers a solution that decentralizes data ownership while ensuring controlled access and seamless collaboration. Central services manage security, discovery, and platform-wide policies, enabling teams across domains to access and leverage data according to their specific needs.



Author: Umniyah Abbood

Date Published: Jan 21, 2025



Discover more from Kartaca

Subscribe now to keep reading and get access to the full archive.

Continue reading