Adv DBs: Data warehouses

Data warehouses allow for people with decision power to locate the adequate data quickly from one location that spans across multiple functional departments and is very well integrated to produce reports and in-depth analysis to make effective decisions (MUSE, 2015). Corporate Information Factory (CIF) and Business Dimensional Lifecycle (BDL) tend to reach the same goal but are applied to different situations with it pros and cons associated with them (Connolly & Begg, 2015).

Corporate Information Factory:

Building consistent and comprehensive business data in a data warehouse to provide data to help meet the business and decision maker’s needs.   This view uses typically traditional databases, to create a data model of all of the data in the entire company before it is implemented in a data warehouse.  From the data warehouse, departments can create (data marts-subset of the data warehouse database data) to meet the needs of the department.  This is favored when we need data for decision making today rather than a few weeks out to a year once the system is set up.  You can see all the data you wish and be able to work with it in this environment.  However, a disadvantage from CIF is that latter point, you can see and work with data in this environment, with no need to wait weeks, months or years for the data you need, and that requires a large complex data warehouse.  This large complex data warehouse that houses all this data you would ever need and more would be expensive and time-consuming to set up.  Your infrastructure costs are high in the beginning, with only variable costs in years to follow (maintenance, growing data structures, adding new data streams, etc.) (Connolly & Begg, 2015).

This seems like an approach to a newer company, like twitter, would have.  Knowing that in the future they could do really powerful business intelligence analysis on their data, they may have made an upfront investment in their architecture and development team resources to build a more robust system.

Business Dimensional Lifecycles:

In this view, all data needs are evaluated first and thus creates the data warehouse bus matrix (listing how all key processes should be analyzed).   This matrix helps build the databases/data marts one by one.  This approach is best to serve a group of users with a need for a specific set of data that need it now and don’t want to wait for the time it would take to create a full centralized data warehouse.  This provides the perk of scaled projects, which is easier to price and can provide value on a smaller/tighter budget.  This has its drawbacks, as we satisfy the needs and wants for today, small data marts (as oppose to the big data warehouse) would be set up, and corralling all these data marts into a future warehouse to provide a consistent and comprehensive view of the data can be an uphill battle. This, almost ad-hoc solutions may have fixed cost spread out over a few years and variable costs are added to the fixed cost (Connolly & Begg, 2015).

This seems like an approach a cost avoiding company that is huge would go for.  Big companies like GE, GM, or Ford, where their main product is not IT it’s their value stream.

The ETL:

To extract, transform and load (ETL) data from sources (via a software) will vary based on the data structures, schema, processing rules, data integrity, mandatory fields, data models, etc.  ETL can be done quite easily in a CIF context, because all the data is present, and can be easily used and transformed to be loaded to a decision-maker, to make appropriate data-driven decisions.  With the BDL, not all the data will be available at the beginning until all of the matrices is developed, but then each data mart holds different design schemas (star, snowflake, star-flake) which can add more complexity on how fast the data can be extracted and transformed, slowing down the ETL (MUSE, 2015).  In CIF all the data is in typical databases and thus in a single format.