Data Transformation: Importance of Structural Transformation

It is crucial to transform the structure of data when moving it from operational source systems to Business Intelligence (BI) systems because operational systems and analytical systems have fundamentally different requirements for data organization [1–7].

Operational Systems (OLTP)

Designed for transactional efficiency, focusing on capturing, processing, and storing individual business events. Data is often highly normalized (e.g., header/detail patterns) to optimize inserts, updates, and consistency, even if that means complex joins for queries [1, 5, 8–13].

BI & Analytical Systems (OLAP)

Optimized for analysis, reporting, and decision support. Structures like the star schema provide quick access to aggregated historical data [6, 14–16].

Why Structural Transformation Is Important

Data Cleansing and Quality Management

Data Cleansing

Data cleansing is the process of detecting and correcting or removing corrupt, incorrect, or irrelevant data from a dataset [8, 29, 34]. Even if operational data quality is high, small inconsistencies can derail analytical results [8, 29, 34, 35].

How to Identify Data Quality Issues

  1. Check Against Business Rules: Validate duplicates, missing or impossible values, inconsistent formats [8, 37, 38, 40].
  2. Data Profiling & Cluster Analysis: Identify outliers or odd distributions during exploratory analysis [39, 41].
  3. Verify Against Real Values: Confirm with actual sources (e.g., calling customers) for highest accuracy [39, 41, 42].

Data Quality Feedback Loop

Feed BI-discovered quality issues back to operational systems to prevent recurrence [37, 40, 42–44].

Stale Data Errors

Occur when data correctly entered at the time loses context as business definitions change, leading to misinterpretation in historical analysis [46–49].

Example

The code “GM” meant “General Motors” in 1995 but “General Mills” after 2017. Without handling, BI queries aggregate both under “GM,” corrupting trends [46–48, 51–53].

Impact

Solution

Use Slowly Changing Dimensions (SCDs) to track and manage dimension changes over time, preserving historical accuracy [55, 56].

Roche’s Maxim of Data Transformation

“Data should be transformed as far upstream as possible, and as far downstream as necessary” [19, 57, 58].

As Far Upstream as Possible

As Far Downstream as Necessary

Multidimensional Analytics: Measures, Dimensions, and Facts

Star Schema

Structure

A central fact table links to descriptive dimension tables. Fact tables hold measures and foreign keys; dimensions hold context attributes [17–20, 69, 71, 83].

Benefits

Levels and Hierarchies

Hierarchies organize dimension members into levels for drill-down and roll-up [75, 85, 86].

Slowly Changing Dimensions (SCD)

Nonadditive and Semiadditive Measures

BI tools default to summation—explicit aggregation functions (AVERAGE, LAST_VALUE) are needed for nonadditive measures [111]. Missing values treated as zero can also skew results [112, 113].

Calculation Precedencies and Custom Rollups

When custom rollups are required, the order of operations matters for non-commutative formulas [113–116]. For example, summing individual Actual/Plan ratios differs from ratio of summed Actual to summed Plan [115, 116].

OLAP Operations: Drill-Down, Roll-Up, Slice, and Dice

Other Related Operations