Data Allocation Strategies

Data allocations are how one logical group of data gets spread across a destination data set, i.e. a group of applications which uses multiple servers (Apptio, 2015). According to ETL-Tools (n.d.), they state that a depending on the data allocation one can get different granularity levels. This can be a judgment call. and understanding your allocation strategy is vital for developing and understanding your data models (Apptio, 2015; ETL-Tools, n.d.).

The robustness and accuracy of the model depend on the allocation strategy between data sets, especially because the wrong allocation can create data fallout (Apptio, 2015). Data fallout is where data isn’t assigned between data sets. For instance, like how most SQL join (join left, join right, etc.) statements fail to combine every line of data between two data sets.

ETL-Tools (n.d.), stated that there are dynamic and fixed level granularity, however Apptio (2015), stated there can be many different levels of granularity. The following are some of the different data allocation strategies (Apptio, 2015; Dhamdhere, 2014; ETL-Tools, n.d.):

  1. Even spread allocation: data allocation where all data points are assigned the same allocation no matter what (i.e. every budget in the household gets the total sum of dollars divided by the number of budgets, regardless that the mortgage costs more than the utilities). It is the easiest to implement but its too overly simplified.
  2. Fixed allocation: data allocation based on data that doesn’t change, which stays constant (i.e. credit card limits). Easy to implement but the logic can be risky for data sets that can change over time.
  3. Assumption-based allocation (or manually assigned percentage or weights): data allocation based on arbitrary means or an educated approximation (i.e. budgets, but not a breakdown). Uses subject matter experts, but it is as good as the level of expertise making the estimates.
  4. Relationship-based allocation: data allocation based on the association between items (i.e. hurricane max wind-speeds and hurricane minimum central pressure). This can be easily understood, however, there may be some nuance that can be lost. In the given example there can be a lag between hurricane max wind-speeds and hurricane minimum central pressure, meaning a high correlation but still has errors.
  5. Dynamic allocation: data allocations based on data that can change off of a calculated field (i.e. tornado wind-speed to e-Fujita scale). Easily understood, unfortunately, it is still an approximation at a higher level of fidelity than lower levels of allocations.
  6. Attribute-based allocation: data allocations weighted by a static attribute of an item (i.e. corporate cell phone costs and data usage by service provider like AT&T, Verizon, T-mobile; Direct spend weighting of shared expenses). Reflects real-life data usage, but lacks granularity when you want to drill down to find the root cause.
  7. Consumption-based allocation: data allocation by measured consumption (i.e. checkbook line item, general ledgers, activity-based costing). Huge data sets needed, greater fidelity, but must be updated frequently.
  8. Multi-dimensional allocation: data allocation based on multiple factors. It could be the most accurate level of allocation for complex systems, it can be hard to understand from an intuitive level therefore not as transparent as a consumption-based allocation.

The higher the number, the more mature/higher the level of granularity of the data. Sometimes it is best to start at a level 1 maturity and work our way up to a level 8. Dhamdhere (2014), suggests that for best practice consumption-based allocation (i.e. activity-based costing) is a best practice when it comes to allocation strategies given its focus on accuracy. However, some levels of maturity may not be acceptable in certain cases (ETL-tools, n.d.). Please take into consideration what is the best allocation strategy for yourself, for the task before you, and the expectations of the stakeholders.

Resources: