website logo

Retail Fashion Boutique — Sales & Collection Analytics

Published

  • retail
  • databricks
  • power-bi
  • astro
  • medallion

Note: To protect client confidentiality, this case study uses a public Kaggle dataset. It mirrors the approach we’ve successfully delivered for multiple retail clients.

Use Case

Fashion boutique chains often struggle with limited visibility into their sales and operations. Seasonal trends are difficult to track, making it unclear which collections perform well across Spring, Summer, Fall, and Winter. At the same time, boutique managers lack reliable insights into how individual stores or regions compare, leaving performance differences unnoticed until it is too late.

The absence of consolidated reporting also creates challenges in understanding product and category performance. Retailers find it hard to identify which styles or collections drive the most revenue and which ones are underperforming. Similarly, customer behavior is poorly understood, with no consistent view of purchasing patterns, average order value, or unit volumes across time.

These gaps in visibility lead to inefficiencies across the business. Inventory planning often suffers from overstock in some categories and stockouts in others, while marketing campaigns are not always aligned with actual customer preferences. The lack of a single, data-driven reporting framework forces decisions to rely on fragmented spreadsheets and manual reports, slowing down the ability to act quickly in a fast-changing fashion market.

Our Approach

Architecture overview

Data Ingestion Pipeline

Source System

  • PostgreSQL with tables: sales, products, customers, boutiques.

Ingestion Mechanism

  • CDC via DebeziumAzure Event Hubs (Kafka-compatible) → Azure Blob Raw Zone.

Raw Zone Pattern

  • Immutable JSON; partitioned as /<table>/ingest_date=YYYY-MM-DD/hour=HH/.
  • Metadata: ingest_ts, source_table, op_type, transaction_id.

Reference Data

  • High-volume (sales) via CDC; low-volume (collections, categories, boutiques) via periodic snapshots.

For the demo, the Kaggle files are landed in the Raw Zone with the same structure as our CDC pipeline.

Data Processing Pipeline

Bronze (Raw, Entity-Oriented)

  • bronze.sales, bronze.products, bronze.boutiques, bronze.customers (derived from the single source file using select + distinct; no cleansing).
  • Audit columns: ingest_ts, source_file; partitioned by ingest_date.

Silver (Cleansed, Conformed, Modeled)

  • Typing & formats; standardize discount to 0–1; text normalization.
  • Derived pricing: NetPrice = UnitPrice * (1 - Discount), Revenue = Qty * NetPrice.
  • Surrogate keys + Dimensional model:
    • silver.dim_product, silver.dim_boutique, silver.dim_customer, silver.dim_date.
    • silver.fact_sales_clean joining SKs and validated metrics.
  • Data Quality: not-null checks, domain checks, referential integrity.
  • Change Handling: Type-1 standardization; optional SCD-2 for boutique/product if history exists.

Gold (Analytics Marts)

  • gold.monthly_sales (Month, TotalRevenue, UnitsSold, Orders, AOV, MoM, YoY)
  • gold.region_category_sales (Region, Category, Revenue, Units, Share%)
  • gold.top_products (ranked by revenue with contribution%)
  • gold.boutique_performance (Boutique, Revenue, Units, Growth_MoM, Growth_YoY)
  • Optimized with Delta OPTIMIZE & Z-ORDER on OrderDate, Region; CSV/Parquet exports under /gold/exports/.

Business Intelligence & Visualization

With Gold data available, we designed an interactive Power BI solution that serves leadership and operations. The model follows a star schema for performance, supports self-service filtering, drill-throughs, and is optimized for mobile consumption. Row-level security can be applied to restrict data by region or role. The dashboard includes an executive overview, collection/season analysis, and boutique/customer insights.

Conclusion

This solution shows how modern data engineering turns raw operational data into decision-ready insight. By combining Databricks for processing and Power BI for visualization, we deliver a robust, scalable, and business-centric analytics foundation. While the demo uses public data, the architecture and implementation mirror production systems and are adaptable to any retail context.