Adv DBs: Data warehouses and OLAP

Data warehouses allow for people with decision power to locate the adequate data quickly from one location that spans across multiple functional departments and is very well integrated to produce reports and in-depth analysis to make effective decisions (MUSE, 2015a). Data could be stored in n-dimensional data cubes that can be dissected, filtered through, rolled up into a dynamic application called Online analytical processing (OLAP). OLAP can be its own system or part of a data warehouse, and if it’s combined with data mining tools it creates a decision support system (DSS) to uncover/discover hidden relationships within the data (MUSE, 2015b). DSS needs both a place to store data and a way to store data.  The data warehouse doesn’t solve they “Why?” questions, but the “How?, What?, When?, Where?” and that is where OLAP helps.  We want to extract as much knowledge as possible for decision making from these systems, hence this explains why we need both in DSS to solve all questions not just a subset.  But, as aforementioned that data mining tools are also needed for a DSS.

Data Warehouses

Discovering knowledge through archives of data from multiple sources in a consolidated and integrated way is what this warehouse does best.  They are subject-oriented (organized by customers, products, sales, and not in the invoice, product sales), integrated (data from different sources in the enterprise perhaps in different formats), time-variant (varies with respect to time), and nonvolatile (new data is appended not replacing old).  Suitable applications to feed data can be mainframes, proprietary file systems, servers, internal workstations, external website data, etc., which can be used for analysis and discovering knowledge for effective data-based decision making.  Detailed data can be stored online if it can help support/supplement summarized data, so data warehouses can be technically light due to summarized data.  Summarized data, which is updated automatically as new data enters the warehouse, mainly help improve query speeds. So, where is the detailed data: offline (Connolly & Begg, 2015).  Looking into the architecture of this system:

The ODS, Operational data store, holds the data for staging into the data warehouse.  From staging the load manager performs all Extraction and Loading functions to the data into the warehouse, meanwhile, the warehouse manager performs all the Transformation functions to the data into the warehouse.  The query manager performs all the queries into the data warehouse. Metadata (definition of the data stored and its units) are used by all these processes and exist in the warehouse as well (Connolly & Begg, 2015).

 OLAP

Using analytics to answer the “Why?” questions from data that is placed in an n-dimensional aggregate view of the data that is a dynamical system, sets this apart from other query systems.  OLAP is more complex than statistical analysis on aggregated data, it’s more of a slice and dice with time series and data modeling.  OLAP servers come in four main flavors: Multidimensional OLAP (MOLAP: uses multidimensional databases where data is stored per usage), Relational OLAP (ROLAP: supports relational DBMS products with a metadata layer), Hybrid OLAP (HOLAP: certain data is ROLAP and other is in MOLAP), and Desktop OLAP (DOLAP: usually for small file extracts, data is stored in client files/systems like a laptop/desktop).

DSS, OLAP, and Data Warehouse Application

Car insurance claims DSS.  Insurance companies can use this system to analyze patterns of driving from people, what damage can or cannot occur due to an accident, and why someone might claim false damages to fix their cars or cash out.  Thus, their systems can define who, what, when, where, why and how per accident against all other accidents (they can slice and dice by state, type of accident, vehicle types involved, etc) they have processed to help them resolve if a claim is legitimate.

References:

Business Intelligence: Compelling Topics

Departments are currently organized in a silo. Thus, their information is in silo systems, which makes it difficult to leverage that information across the company.  When we employ a data warehouse, which is a central database that contains a collection of decision-related internal and external sources of data, it can aid in the data analysis for the entire company (Ahlemeyer-Stubbe & Coleman, 2014). When we build a multi-level Business Intelligence (BI) system on top of a centralized data warehouse, we no longer have silo data systems, and thus, can make a data-driven decision.  Thus, to support data-driven decision while moving away from a silo department kept data to a centralized data warehouse, Curry,  Hasan, and O’Riain (2012) created a system that shows results from the hospital centralized data warehouse at different levels of the company, as the organization level (stakeholders are executive members, shareholders, regulators, suppliers, consumers), the functional level (stakeholders are functional managers, organization manager), and the individual level (stakeholders are the employees).  Data may be centralized, but specialized permissions on data reports can exist on a multi-level system.

The types of data that exist and can be stored in a centralized data warehouse are: Real-time data: data that reveals events that are happening immediately, Lag information: information that explains events that have recently just happened; and Lead information: information that helps predict events into the future based off of lag data, like regression data, forecasting model output (based off of Laursen & Thorlund, 2010).  All with the goal of helping decision makers if certain Target Measures are met.  Target measures are used to improve marketing efforts through tracking measures like ROI, NVP, Revenue, lead generation, lag generations, growth rates, etc. (Liu, Laguna, Wright, & He, 2014).

Decision Support Systems (DSS) were created before BI strategies.  A DSS helps execute the project, expand the strategy, improve processes, and improves quality controls in a quickly and timely fashion.  Data warehouses’ main role is to support the DSS (Carter, Farmer, & Siegel, 2014).  Unfortunately, the talks above about data types and ways to store data to enable data-driven decisions it doesn’t explain the “how,” “what,” “when,” “where,” “who”, and “why.”  However, a strong BI strategy is imperative to making this all work.  A BI strategies can include, but is not limited to data extraction, data processing, data mining, data analysis, reporting, dashboards, performance management, actionable decisions, etc. (Fayyad, Piatetsky-Shapiro, & Smyth, 1996; Padhy, Mishra, & Panigrahi, 2012; McNurlin, Sprague,& Bui, 2008).  This definition along with the fact the DSS is 1/5 principles to BI suggest that DSS was created before BI and that BI is a more new and holistic view of data-driven decision making.

But, what can we do with a strong BI strategy? Well with a strong BI strategy we can increase a company’s revenue through Online profiling.  Online profiling is using a person’s online identity to collect information about them, their behaviors, their interactions, their tastes, etc. to drive a targeted advertising (McNurlin et al., 2008).  Unfortunately, the fear comes when the end-users don’t know what the data is currently being used for, what data do these companies or government have, etc.  Richards and King (2014) and McEwen, Boyer, and Sun (2013), expressed that it is the flow of information, and the lack of transparency is what feeds the fear of the public. McEwen et al. (2013) did express many possible solutions, one which could gain traction in this case is having the consumers (end-users) know what variables is being collected and have an opt-out feature, where a subset of those variables stay with them and does not get transmitted.

 

Reference:

  • Ahlemeyer-Stubbe, Andrea, Shirley Coleman. (2014). A Practical Guide to Data Mining for Business and Industry, 1st Edition. [VitalSource Bookshelf Online]. Retrieved from https://bookshelf.vitalsource.com/#/books/9781118981863/
  • Carter, K. B., Farmer, D., & Siegel, C. (2014-08-25). Actionable Intelligence: A Guide to Delivering Business Results with Big Data Fast!, 1st Edition. [VitalSource Bookshelf Online]. Retrieved from https://bookshelf.vitalsource.com/#/books/9781118920657/
  • Curry, E., Hasan, S., & O’Riain, S. (2012, October). Enterprise energy management using a linked dataspace for energy intelligence. In Sustainable Internet and ICT for Sustainability (SustainIT), 2012 (pp. 1-6). IEEE.
  • Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37. Retrieved from: http://www.aaai.org/ojs/index.php/aimagazine/article/download/1230/1131/
  • Laursen, G. H. N., & Thorlund, J. (2010) Business Analytics for Mangers: Taking Business Intelligence Beyond Reporting. Wiley & SAS Business Institute.
  • Liu, Y., Laguna, J., Wright, M., & He, H. (2014). Media mix modeling–A Monte Carlo simulation study. Journal of Marketing Analytics, 2(3), 173-186.
  • McEwen, J. E., Boyer, J. T., & Sun, K. Y. (2013). Evolving approaches to the ethical management of genomic data. Trends in Genetics, 29(6), 375-382.
  • McNurlin, B., Sprague, R., & Bui, T. (09/2008). Information Systems Management, 8th Edition. [VitalSource Bookshelf Online]. Retrieved from https://bookshelf.vitalsource.com/#/books/9781323134702/
  • Padhy, N., Mishra, D., & Panigrahi, R. (2012). The survey of data mining applications and feature scope. arXiv preprint arXiv:1211.5723.  Retrieved from: https://arxiv.org/ftp/arxiv/papers/1211/1211.5723.pdf
  • Richards, N. M., & King, J. H. (2014). Big data ethics. Wake Forest L. Rev., 49, 393

Business Intelligence: Decision Support Systems

Many years ago a measure of Business Intelligence (BI) systems was on how big the data warehouse was (McNurlin, Sprague,& Bui, 2008).   This measure made no sense, as it’s not all about the quantity of the data but the quality of the data.  A lot of bad data in the warehouse means that it will provide a lot of bad data-driven decisions. Both BI and Decision Support Systems (DSS) help provide data to support data-driven decisions.  However, McNurlin et al. (2008) state that a DSS is one of five principles of BI, along with data mining, executive information systems, expert systems, and agent-based modeling.

  • A BI strategies can include, but is not limited to data extraction, data processing, data mining, data analysis, reporting, dashboards, performance management, actionable decisions, etc. (Fayyad, Piatetsky-Shapiro, & Smyth, 1996; Padhy, Mishra, & Panigrahi, 2012; and McNurlin et al., 2008). This definition along with the fact the DSS is 1/5 principles to BI suggest that DSS was created before BI and that BI is a more new and holistic view of data-driven decision making.
  • A DSS helps execute the project, expand the strategy, improve processes, and improves quality controls in a quickly and timely fashion. Data warehouses’ main role is to support the DSS (Carter, Farmer, & Siegel, 2014).  The three components of a DSS are Data Component (comprising of databases, or data warehouse), Model Component (comprising of a Model base) and a dialog component (Software System, which a user can interact with the DSS) (McNurlin et al., 2008).

McNurlin et al (2008) state a case study, where Ore-Ida Foods, Inc. had a marketing DSS to support its data-driven decisions by looking at the: data retrieved (internal data and external market data), market analysis (was 70% of the use of their DSS, where data was combined, and relationships were discovered), and modeling (which is frequently updated).  The modeling offered great insight for the marketing management.  McNurlin et al. (2008), emphasizes that DSS tend to be defined, but heavily rely on internal data with little or some external data and that vibrational testing on the model/data is rarely done.

The incorporation of internal and external data into the data warehouse helps both BI strategies and DSS.  However, the one thing that BI strategies provide that DSS doesn’t is “What is the right data that should be collected and presented?” DSS are more of the how component, whereas BI systems generate the why, what, and how, because of their constant feedback loop back into the business and the decision makers.  This was seen in a hospital case study and was one of the main key reasons why it succeeded (Topaloglou & Barone, 2015).  As illustrated in the hospital case study, all the data types were consolidated to a unifying definition and type and had a defined roles and responsibilities assigned to it.  Each data entered into the data warehouse had a particular reason, and that was defined through interviews will all different levels of the hospital, which ranged from the business level to the process level, etc.

BI strategies can affect supply chain management in the manufacturing setting.  The 787-8, 787-9, and 787-10 Boeing Dreamliners have outsourced ~30% of its parts and components or more, this approach to outsourcing this much of a product mix is new since the current Boeing 747 is only ~5% outsourced (Yeoh, & Popovič, 2016).  As more and more companies increase their outsourcing percentages for their product mix, the more crucial it is to capture data on fault tolerances on each of those outsourced parts.  Other things that BI data could be used is to make decisions on which supplier to keep or not keep.  Companies as huge as Boeing can have multiple suppliers for the same part, if in their inventory analysis they find an unusually larger than average variance in the performance of an item: (1) they can either negotiate a lower price to overcompensate a larger than average variance, or (2) they could all together give the company a notice that if they don’t lower that variance for that part they will terminate their contract.  Same things can apply with the auto manufacturing plants or steel mills, etc.

Resources: