Adv DBs: Data warehouses

Data warehouses allow for people with decision power to locate the adequate data quickly from one location that spans across multiple functional departments and is very well integrated to produce reports and in-depth analysis to make effective decisions (MUSE, 2015). Corporate Information Factory (CIF) and Business Dimensional Lifecycle (BDL) tend to reach the same goal but are applied to different situations with it pros and cons associated with them (Connolly & Begg, 2015).

Corporate Information Factory:

Building consistent and comprehensive business data in a data warehouse to provide data to help meet the business and decision maker’s needs.   This view uses typically traditional databases, to create a data model of all of the data in the entire company before it is implemented in a data warehouse.  From the data warehouse, departments can create (data marts-subset of the data warehouse database data) to meet the needs of the department.  This is favored when we need data for decision making today rather than a few weeks out to a year once the system is set up.  You can see all the data you wish and be able to work with it in this environment.  However, a disadvantage from CIF is that latter point, you can see and work with data in this environment, with no need to wait weeks, months or years for the data you need, and that requires a large complex data warehouse.  This large complex data warehouse that houses all this data you would ever need and more would be expensive and time-consuming to set up.  Your infrastructure costs are high in the beginning, with only variable costs in years to follow (maintenance, growing data structures, adding new data streams, etc.) (Connolly & Begg, 2015).

This seems like an approach to a newer company, like twitter, would have.  Knowing that in the future they could do really powerful business intelligence analysis on their data, they may have made an upfront investment in their architecture and development team resources to build a more robust system.

Business Dimensional Lifecycles:

In this view, all data needs are evaluated first and thus creates the data warehouse bus matrix (listing how all key processes should be analyzed).   This matrix helps build the databases/data marts one by one.  This approach is best to serve a group of users with a need for a specific set of data that need it now and don’t want to wait for the time it would take to create a full centralized data warehouse.  This provides the perk of scaled projects, which is easier to price and can provide value on a smaller/tighter budget.  This has its drawbacks, as we satisfy the needs and wants for today, small data marts (as oppose to the big data warehouse) would be set up, and corralling all these data marts into a future warehouse to provide a consistent and comprehensive view of the data can be an uphill battle. This, almost ad-hoc solutions may have fixed cost spread out over a few years and variable costs are added to the fixed cost (Connolly & Begg, 2015).

This seems like an approach a cost avoiding company that is huge would go for.  Big companies like GE, GM, or Ford, where their main product is not IT it’s their value stream.

The ETL:

To extract, transform and load (ETL) data from sources (via a software) will vary based on the data structures, schema, processing rules, data integrity, mandatory fields, data models, etc.  ETL can be done quite easily in a CIF context, because all the data is present, and can be easily used and transformed to be loaded to a decision-maker, to make appropriate data-driven decisions.  With the BDL, not all the data will be available at the beginning until all of the matrices is developed, but then each data mart holds different design schemas (star, snowflake, star-flake) which can add more complexity on how fast the data can be extracted and transformed, slowing down the ETL (MUSE, 2015).  In CIF all the data is in typical databases and thus in a single format.


Adv DB: Data Warehouse & Data Mining

Data warehouses allow for people with decision power to locate the adequate data quickly from one location that spans across multiple functional departments and is very well integrated to produce reports and in-depth analysis to make effective decisions (MUSE, 2015a). The data warehouse doesn’t solve the: Who, What, Where, When, Why and How, but that is where data mining can help.  Data warehouse, when combined with data mining tools, can create a decision support system (DSS), which can be used to uncover/discover hidden relationships within the data (MUSE, 2015b). DSS needs both a place to store data and a way to sort meaningful data in order to make sense of the data and provide meaningful insights to the decision-maker.  Data that can be used for meaningful insights must be prepared/transformed (and checked for quality) while in the data warehouse, but must be completed before the data is used in a data mining tool.  Also, results from the data mining tool can be placed back into the data warehouse to allow its results to be seen by all end-users and to be reused by others.

Data Warehouse & Data Mining

A data warehouse is a centralized collection of data that is consistent, subject-oriented, integrated, special variant and/or temporally variant, nonvolatile data to enable decisions makers to make desirable business decisions based on their gathered insights and predictions from the data about the near future (Tryfona et al, 1999). Ballou & Tayi (1999) stated that a key feature of a data warehouse is its usage for decision making not for operational purposes.  Nevertheless, data warehouses don’t solve the questions: Who, What, When, Where, Why and How, it’s just a data depository (MUSE, 2015b). Hence, it validates what Tryfona et al (1999) stated, there is little distinction/differentiator on how data is modeled in a data warehouse as with a database. Databases though can and are used in operational situations, thus invalidating Tryfona et al (1999) argument, because as Ballou & Tayi (1999) pointed out operational data usually focuses heavily on current data whereas decision-makers look at historical data across time intervals to make temporal comparisons.

Databases and/or data warehouses cannot make a decision all on its own, but they are the platform to which data is stored centrally so that the right decision analysis techniques can be conducted on the data in order to provide meaning from them. The right decision analysis technique comes from data mining, which helps find meaningful once-hidden patterns from the data (in this case stored in the data warehouse).  Data mining can look into the past and current data to make predictions into the future (Silltow, 2006).   Though this is nothing new, statisticians have been using these techniques in a manual fashion for years to help discover knowledge from data. Thus, discovering knowledge through these centrally stored data, which can possibly come from multiple sources in a business or other data creation system that could be tied/linked together is what a warehouse does best (Connolly & Begg, 2015). What data warehouses also enable is using the same data in new ways to discover new insights about a subject than what the original purpose was (reuse) for collecting that data (Ballou & Tayi, 1999).  Data warehouses can support several low-level organizational decisions as well as high-level organizational (enterprise-wide) decisions.  Suitable applications to feed data into a data warehouse to aid in decision making can come from: mainframes, proprietary file systems, servers, internal workstations, external website data, etc.  Storing some data offline or online helps mainly to improve querying speeds. Summarized data, which is updated automatically as new data enters the warehouse, can help improve query speeds, while detailed data can be stored online if it can help support/supplement summarized data (Connolly & Begg, 2015).

Failure in the implementation of a data warehouse can be generated from poor data quality. Data quality should be built into the data warehouse: planning, implementation, and maintenance phases.  Ballou & Tayi (1999) warned that even though this feature of data stored in a data warehouse is a key driver for companies to adopt a warehouse is that data quality must be preserved.  Data quality encompasses the following attributes: accuracy, completeness, consistency, timeliness, interpretability, believability, value-added, and accessibility.  Most people generating data are familiar with its error rates, margins of error, its deficiencies, and idiosyncrasies, but when rolled up in a data warehouse (and it is not communicated properly), people outside of that data-generating organization will not know this and their final decisions could be prone to errors.  One must consider the different needs for data quality within a data warehouse, as the levels of quality needed for relevant decision making, project design, future needs, etc.  We must ask from our data providers what is unsatisfactory and to what quantifiable level is the current data that they are providing into the data warehouse (Ballou & Tayi, 1999).  As the old adage goes “Garbage In – Garbage Out”.

So, what can cause data quality issues?  Let’s take a mortgage company, REMAX, which has a data warehouse, however, the data for sales isn’t consistent, because there are different definitions of what a sale/price could be based on differing stakeholders.  The mortgage company can say that a sale is the closing price of the house, whereas REMAX may say the negotiated list price of house, the broker may say the final settlement price of the house after the home inspection, the insurance company is the price of the building materials in the house plus 65-70 thousand dollars for internal possessions.  This may be all the data that REMAX wants to have to provide the best service to their customer and to provide a realistic view of what goes on in purchasing a house, monetarily, but REMAX must know this information ahead of time as they input this data into their data warehouse.  This could be valuable information for the home buyer when they are deciding which one of two to three properties that they would like own.  There could be syntactic inconsistencies between all these sources of data like $60K, $60,000, $60,000.00, 60K, $60000, etc.

Another way the implementation of a data warehouse could fail, according to Ballou & Tayi (1999), can come from not including appropriate data (in other words: data availability).   Even though critical data can exist among: soft data (uncertain data), text-based data, external sources of data, this set of data could altogether be ignored.  They continue to add that this type of data, so long as it can support the organization in any “meaningful way” should be added into the centralized data warehouse.  Though one must weigh the high cost of acquiring the data that may be useless because it is relatively easy (inexpensive) to delete data that is rarely used once in the system.  But, then there is an opportunity cost to adding irrelevant data, we could have used our resources to improve the timeliness of the current data (or provide real-time data) or eliminating null values in a different data set that is already in the system.

To solve the issue of data quality, decision-makers and data warehouse managers must think systematically about what data is required, why it is required, and how should it be collected and used (Ballou & Tayi, 1999).  This could be done when a data warehouse manager asks the end-users what decisions this data warehouse will support.  From that information one can decipher what is required from these stakeholders through the MoSCoW: What is a “Must have”?; What is a “Should have”?; What is a “Could have?”; and What is a “Wish to have”? In the REMAX case, they should have the final asking price before the inspection listed (as they do) as a “Must have”, typical closing costs for a house in that price range that is provided by the mortgage company as a “Should have”, the average house insurance costs as a “Could Have”, etc. Balou & Tayi (1999) said that other factors can affect data quality enhancement projects, like the: Current quality, required quality, anticipated quality, priority of organizational activity (as aforementioned with MoSCoW), Cost of data quality enhancements (and their aforementioned tradeoffs/opportunity costs), and their value-added to the data warehouse.  Data quality is needed in order to use data mining tools, and many papers using data mining or text mining always talk about a preprocessing step that must occur before full analysis can begin: Nassirtoussi et al (2015),  Kim et al (2014), Barak & Modarres (2015), etc.

According to Silltow (2006), data mining tools can be group into three types: Traditional (have complex algorithms and techniques to find hidden patterns in the data and highlight trends), dashboard (data changes are shown on a screen which is mostly used to monitor information), and text-mining (using complex algorithms and techniques to find hidden patterns in text data, even to a point of figuring out the sentiment in a string of words and can include video and audio data).  These data mining techniques range from artificial neural networks (prediction models that use training data to learn and then make forecasts) like in Nassirtoussi et al (2015) and Kim et al (2014); decision trees (uses a bunch of defined if-then statements, also known as rules, and are easier to understand the results of the data) like in Barak & Modarres (2015); nearest neighbor (uses similar past data to make predictions into the future), etc.

Finally, another aspect of data quality is the output of the data from data mining tools, especially since we can then plug the output back into the data warehouse for future reuse.  Data mining tools are just that, automatic algorithms used to discover knowledge.  These tools lack the intuitive nature presented in humans to decipher between a relevant and irrelevant correlation.  For instance, data stored in a hospital data warehouse may link data collected in the summer of insane amount of increased ice cream consumption which could lead to obesity and the number of pool/beach drownings and say that ice cream consumption leads to them, rather than looking at the fact that they both occur in the summer but are not necessarily causing one or the other.  This is why Silltow (2006) suggest that all results provided by these tools be quality checked after utilized to not give out false, irrelevant insights that are preposterous when analyzed by a human.


Data warehouses allow for people with decision power to locate adequate data quickly to make effective decisions. The data that is planned, entered, maintained should be of acceptable quality.  Poor quality in the data may drive poor quality decisions.  The best way to improve data quality is by looking at the eight factors of data quality as aforementioned when asking stakeholders what data from a systemic point of view would be useful in a data warehouse.  Sometimes asking what data should be included is very hard for decision-makers to make at that moment, though they could have a general idea of what decisions they need to make soon.  Data collection and quality must be weighed against all of their cost and their significance.


  • Ballou, D. P., & Tayi, G. K. (1999). Enhancing data quality in data warehouse environments. Communications of the ACM, 42(1), 73-78.
  • Barak, S., & Modarres, M. (2015). Developing an approach to evaluate stocks by forecasting effective features with data mining methods. Expert Systems with Applications, 42(3), 1325–1339.
  • Connolly, T. & Begg, C. (2015).  Database Systems:  A Practical Approach to Design, Implementation, and Management, Sixth Edition.  Boston:  Pearson.
  • Kim, Y., Jeong, S. R., & Ghani, I. (2014). Text opinion mining to analyze news for stock market prediction. Int. J. Advance. Soft Comput. Appl, 6(1).
  • My Unique Student Experience (2015a). Data Warehousing Concepts and Design. Retrieved from: Asset.aspx?MID=1819502&aid=1819506
  • My Unique Student Experience (2015b). Online Analytical Processing. Retrieved from:
  • Nassirtoussi, A. K., Aghabozorgi, S., Wah, T. Y., & Ngo, D. C. L. (2015). Text mining of news-headlines for FOREX market prediction: A Multi-layer Dimension Reduction Algorithm with semantics and sentiment. Expert Systems with Applications, 42(1), 306-324.
  • Silltow, J. (2006) Data mining 101: Tools and techniques.  Retrieved from:
  • Tryfona, N., Busborg, F., & Borch Christiansen, J. G. (1999, November). starER: a conceptual model for data warehouse design. In Proceedings of the 2nd ACM international workshop on Data warehousing and OLAP (pp. 3-8). ACM.

Business Intelligence: Compelling Topics

Departments are currently organized in a silo. Thus, their information is in silo systems, which makes it difficult to leverage that information across the company.  When we employ a data warehouse, which is a central database that contains a collection of decision-related internal and external sources of data, it can aid in the data analysis for the entire company (Ahlemeyer-Stubbe & Coleman, 2014). When we build a multi-level Business Intelligence (BI) system on top of a centralized data warehouse, we no longer have silo data systems, and thus, can make a data-driven decision.  Thus, to support data-driven decision while moving away from a silo department kept data to a centralized data warehouse, Curry,  Hasan, and O’Riain (2012) created a system that shows results from the hospital centralized data warehouse at different levels of the company, as the organization level (stakeholders are executive members, shareholders, regulators, suppliers, consumers), the functional level (stakeholders are functional managers, organization manager), and the individual level (stakeholders are the employees).  Data may be centralized, but specialized permissions on data reports can exist on a multi-level system.

The types of data that exist and can be stored in a centralized data warehouse are: Real-time data: data that reveals events that are happening immediately, Lag information: information that explains events that have recently just happened; and Lead information: information that helps predict events into the future based off of lag data, like regression data, forecasting model output (based off of Laursen & Thorlund, 2010).  All with the goal of helping decision makers if certain Target Measures are met.  Target measures are used to improve marketing efforts through tracking measures like ROI, NVP, Revenue, lead generation, lag generations, growth rates, etc. (Liu, Laguna, Wright, & He, 2014).

Decision Support Systems (DSS) were created before BI strategies.  A DSS helps execute the project, expand the strategy, improve processes, and improves quality controls in a quickly and timely fashion.  Data warehouses’ main role is to support the DSS (Carter, Farmer, & Siegel, 2014).  Unfortunately, the talks above about data types and ways to store data to enable data-driven decisions it doesn’t explain the “how,” “what,” “when,” “where,” “who”, and “why.”  However, a strong BI strategy is imperative to making this all work.  A BI strategies can include, but is not limited to data extraction, data processing, data mining, data analysis, reporting, dashboards, performance management, actionable decisions, etc. (Fayyad, Piatetsky-Shapiro, & Smyth, 1996; Padhy, Mishra, & Panigrahi, 2012; McNurlin, Sprague,& Bui, 2008).  This definition along with the fact the DSS is 1/5 principles to BI suggest that DSS was created before BI and that BI is a more new and holistic view of data-driven decision making.

But, what can we do with a strong BI strategy? Well with a strong BI strategy we can increase a company’s revenue through Online profiling.  Online profiling is using a person’s online identity to collect information about them, their behaviors, their interactions, their tastes, etc. to drive a targeted advertising (McNurlin et al., 2008).  Unfortunately, the fear comes when the end-users don’t know what the data is currently being used for, what data do these companies or government have, etc.  Richards and King (2014) and McEwen, Boyer, and Sun (2013), expressed that it is the flow of information, and the lack of transparency is what feeds the fear of the public. McEwen et al. (2013) did express many possible solutions, one which could gain traction in this case is having the consumers (end-users) know what variables is being collected and have an opt-out feature, where a subset of those variables stay with them and does not get transmitted.



  • Ahlemeyer-Stubbe, Andrea, Shirley Coleman. (2014). A Practical Guide to Data Mining for Business and Industry, 1st Edition. [VitalSource Bookshelf Online]. Retrieved from
  • Carter, K. B., Farmer, D., & Siegel, C. (2014-08-25). Actionable Intelligence: A Guide to Delivering Business Results with Big Data Fast!, 1st Edition. [VitalSource Bookshelf Online]. Retrieved from
  • Curry, E., Hasan, S., & O’Riain, S. (2012, October). Enterprise energy management using a linked dataspace for energy intelligence. In Sustainable Internet and ICT for Sustainability (SustainIT), 2012 (pp. 1-6). IEEE.
  • Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37. Retrieved from:
  • Laursen, G. H. N., & Thorlund, J. (2010) Business Analytics for Mangers: Taking Business Intelligence Beyond Reporting. Wiley & SAS Business Institute.
  • Liu, Y., Laguna, J., Wright, M., & He, H. (2014). Media mix modeling–A Monte Carlo simulation study. Journal of Marketing Analytics, 2(3), 173-186.
  • McEwen, J. E., Boyer, J. T., & Sun, K. Y. (2013). Evolving approaches to the ethical management of genomic data. Trends in Genetics, 29(6), 375-382.
  • McNurlin, B., Sprague, R., & Bui, T. (09/2008). Information Systems Management, 8th Edition. [VitalSource Bookshelf Online]. Retrieved from
  • Padhy, N., Mishra, D., & Panigrahi, R. (2012). The survey of data mining applications and feature scope. arXiv preprint arXiv:1211.5723.  Retrieved from:
  • Richards, N. M., & King, J. H. (2014). Big data ethics. Wake Forest L. Rev., 49, 393

Business Intelligence: Predictions Followup

  • Potential Opportunities:

o    Health monitoring.  Currently, smart watches are tracking our heart rate, steps, standing time, climbing stairs, siting time, heart beats, workouts, biking, sleep, etc.  But, what if we had a device that measured daily our chemicals in our blood, that is no longer as painful as pricking your finger if you are diabetic.  This, the technology could not only measure your blood chemical makeup but could send alerts to EMT and doctors if there is a dangerous imbalance of chemicals in your blood (Carter et al., 2014).  This would require a strong BI program across emergency responders, individuals, and doctors.

o    As Moore’s law of computational speed moves forward in time, the more chances are companies able to interpret real-time data and produce lead information which can drive actionable data-driven decisions. Companies can finally get answers to strategic business questions in minutes as well (Carter et al., 2014).

o    Both internal data (corporate data) and external data (competitor analysis, costumer analysis, social media, affinity and sentiment analysis), will be reported to senior leaders and executives who have the authority to make decisions on behalf of the company on a frequent basis.  These issues may show up in a dashboard, with x number of indicators/metrics as successfully implemented in a case study of a hospital (Topaloglou & Barone, 2015).

  • Potential Pitfalls:

o    Tools for threat detection, like those being piloted in New York City, could have an increased level of discrimination (Carter, Farmer, & Siegel, 2014). As big data analytics is being used to do facial recognition of photographs and live video to identify threats, it can lead to more racial profiling if the knowledge fed into the system as a priori has elements of racial profiling.  This could lead to a bias in reporting, track higher levels of a particular demographic, and the fact that past performance doesn’t indicate the future.

o    Data must be validated before it is published onto a data warehouse.  Due to the low data volatility feature of data warehouses, we need to ensure that the data we receive is correct, thus expected value thresholds must be set to capture errors before they are entered.  Wrong data in, means wrong data analysis, and wrong data-drove decisions.  An example of expected value thresholds could be that earth’s temperature cannot exceed 500K at the surface.

o    Amplified customer experience.  As BI incorporates social media to gauge what is going on in the minds of their customer, if something were to go viral that could hurt the company, it can be devastating for the company.  Essentially we are giving the customer an amplified voice.  This can be rumors of software, hardware leaks as what happens for every Apple iPhone generation/release, which can put current proprietary information into the hands of their competitors.  A nasty comment or post that gets out of control on a social media platform, to celebrity boycotts.  Though, the opportunity here lies in receiving key information on how to improve their products, identify leakers of information, and settle nasty rumors, issues, or comments.

  • Potential Threats:

o    Loss of data through hackers, which are aiming to steal someone’s identity.  Firewalls must be tighter than ever, and networks must be more secure than ever as a company goes into a centralized data warehouse.  Data warehouses are vital for BI initiatives, but if HR data is located in the warehouse, (for example to help HR identify likelihood measures of disgruntled employees to aid in their retention efforts) then if a hacker were to get a hold of that data, thousands of people information can be compromised.  This is nothing new, but this is a potential threat that must be mitigated as we proceed into BI systems.  This can not only apply to people data but company proprietary data.

o    Consumer advertisement blitz. If companies use BI to blast their customers with ads in hopes to better market to people and use item affinity analysis, to send coupons and attract more sales and higher revenues.  There is a personal example here for me:  XYZ is a clothing store, when I moved to my first house, the old owner never switched their information in their database.  But, since they were a frequent buyer and those magazines, coupons, flyers, and sales were working on the old owner of the house, they kept getting blasted with marketing ads.  When I moved in, I got a magazine every two days.  It was a waste of paper and made me less likely to shop there.  Eventually, I had enough and called customer service.  They resolved the issue, but it took six weeks after that call, for my address to be removed from their marketing and customer database.  I haven’t shopped there since.

o    Informational overload.  As companies go forward into implementing BI systems, they must meet with the entire multi-level organization to find out their data needs.  Just because we have the data, doesn’t mean we should display it.  The goal is to find the right amount of key success factors, key performance indicators, and metrics, to help out the decision makers at all different levels.  Complicating this part up can compromise the adoption of BI in the organization and will be seen as a waste of money rather than a tool that could help them in today’s competitive market.  This is such a hard line to walk on, but it is one of the biggest threats.  It was realized in the hospital case study (Topaloglou & Barone, 2015) and therefore mitigated for through extensive planning, buy-in, and documentation.



Business Intelligence: Decision Support Systems

Many years ago a measure of Business Intelligence (BI) systems was on how big the data warehouse was (McNurlin, Sprague,& Bui, 2008).   This measure made no sense, as it’s not all about the quantity of the data but the quality of the data.  A lot of bad data in the warehouse means that it will provide a lot of bad data-driven decisions. Both BI and Decision Support Systems (DSS) help provide data to support data-driven decisions.  However, McNurlin et al. (2008) state that a DSS is one of five principles of BI, along with data mining, executive information systems, expert systems, and agent-based modeling.

  • A BI strategies can include, but is not limited to data extraction, data processing, data mining, data analysis, reporting, dashboards, performance management, actionable decisions, etc. (Fayyad, Piatetsky-Shapiro, & Smyth, 1996; Padhy, Mishra, & Panigrahi, 2012; and McNurlin et al., 2008). This definition along with the fact the DSS is 1/5 principles to BI suggest that DSS was created before BI and that BI is a more new and holistic view of data-driven decision making.
  • A DSS helps execute the project, expand the strategy, improve processes, and improves quality controls in a quickly and timely fashion. Data warehouses’ main role is to support the DSS (Carter, Farmer, & Siegel, 2014).  The three components of a DSS are Data Component (comprising of databases, or data warehouse), Model Component (comprising of a Model base) and a dialog component (Software System, which a user can interact with the DSS) (McNurlin et al., 2008).

McNurlin et al (2008) state a case study, where Ore-Ida Foods, Inc. had a marketing DSS to support its data-driven decisions by looking at the: data retrieved (internal data and external market data), market analysis (was 70% of the use of their DSS, where data was combined, and relationships were discovered), and modeling (which is frequently updated).  The modeling offered great insight for the marketing management.  McNurlin et al. (2008), emphasizes that DSS tend to be defined, but heavily rely on internal data with little or some external data and that vibrational testing on the model/data is rarely done.

The incorporation of internal and external data into the data warehouse helps both BI strategies and DSS.  However, the one thing that BI strategies provide that DSS doesn’t is “What is the right data that should be collected and presented?” DSS are more of the how component, whereas BI systems generate the why, what, and how, because of their constant feedback loop back into the business and the decision makers.  This was seen in a hospital case study and was one of the main key reasons why it succeeded (Topaloglou & Barone, 2015).  As illustrated in the hospital case study, all the data types were consolidated to a unifying definition and type and had a defined roles and responsibilities assigned to it.  Each data entered into the data warehouse had a particular reason, and that was defined through interviews will all different levels of the hospital, which ranged from the business level to the process level, etc.

BI strategies can affect supply chain management in the manufacturing setting.  The 787-8, 787-9, and 787-10 Boeing Dreamliners have outsourced ~30% of its parts and components or more, this approach to outsourcing this much of a product mix is new since the current Boeing 747 is only ~5% outsourced (Yeoh, & Popovič, 2016).  As more and more companies increase their outsourcing percentages for their product mix, the more crucial it is to capture data on fault tolerances on each of those outsourced parts.  Other things that BI data could be used is to make decisions on which supplier to keep or not keep.  Companies as huge as Boeing can have multiple suppliers for the same part, if in their inventory analysis they find an unusually larger than average variance in the performance of an item: (1) they can either negotiate a lower price to overcompensate a larger than average variance, or (2) they could all together give the company a notice that if they don’t lower that variance for that part they will terminate their contract.  Same things can apply with the auto manufacturing plants or steel mills, etc.



Business Intelligence: Data Warehouse

A data warehouse is a central database, which contains a collection of decision-related internal and external sources of data for analysis that is used for the entire company (Ahlemeyer-Stubbe & Coleman, 2014). The authors state that there are four main features to data warehouse content:

  • Topic Orientation – data which affects the decisions of a company (i.e. customer, products, payments, ads, etc.)
  • Logical Integration – the integration of company common data structures and unstructured big data that is relevant (i.e. social media data, social networks, log files, etc.)
  • Presence of Reference Period – Time is an important part of the structural component to the data because there is a need in historical data, which should be maintained for a long time
  • Low Volatility – data shouldn’t change once it is stored. However, amendments are still possible. Therefore, data shouldn’t be overridden, because this gives us additional information about our data.

Given the type of data stored in a data warehouse, it is designed to help support data-driven decisions.  Making decisions from just a gut feeling can cost millions of dollars, and degrade your service.  For continuous service improvements, decisions must be driven by data.  Your non-profit can use this data warehouse to drive priorities, to improve services that would yield short-term wins as well as long-term wins.  The question you need to be asking is “How should we be liberating key data from the esoteric systems and allowing them to help us?”

To do that you need to build a BI program.  One where key stakeholders in each of the business levels agree on the logical integration of data, common data structures, is transparent in the metrics they would like to see, who will support the data, etc.  We are looking for key stakeholders on the business level, process level and data level (Topaloglou & Barone, 2015).  The reason why, is because we need to truly understand the business and its needs, from there we can understand the current data you have, and the data you will need to start collecting.  Once the data is collected, we will prepare it before we enter it into the data warehouse, to ensure low volatility in the data, so that data modeling can be conducted reliable to enable your evaluation and data-driven decisions on how best to move forward (Padhy, Mishra, & Panigrahi,, 2012).

Another non-profit service organization that implemented a successful BI program through the creation of a data warehouse can be found by Topaloglou and Barone (2015).  This hospital experienced positive effects towards implementing their BI program:  end users can make strategic data based decisions and act on them, a shift in attitudes towards the use and usefulness of information, perception of data scientist from developers to problem solvers, data is an immediate action, continuous improvement is a byproduct of the BI system, real-time views with data details drill down features enabling more data-driven decisions and actions, the development of meaningful dashboards that support business queries, etc. (Topaloglou & Barone, 2015).

However, Topaloglou and Barone (2015) stressed multiple times in the study, which a common data structure and definition needs to be established, with defined stakeholders and accountable people to support the company’s goal based on of how the current processes are doing is key to realizing these benefits.  This key to realizing these benefits exists with a data warehouse, your centralized location of external and internal data, which will give you insights to make data-driven decisions to support your company’s goal.