Business Intelligence: Multilevel BI

Annotated Bibliography

Citation:

Curry, E., Hasan, S., & O’Riain, S. (2012, October). Enterprise energy management using a linked dataspace for energy intelligence. In Sustainable Internet and ICT for Sustainability (SustainIT), 2012 (pp. 1-6). IEEE.

Author’s Abstract:

“Energy Intelligence platforms can help organizations manage power consumption more efficiently by providing a functional view of the entire organization so that the energy consumption of business activities can be understood, changed, and reinvented to better support sustainable practices. Significant technical challenges exist in terms of information management, cross-domain data integration, leveraging real-time data, and assisting users to interpret the information to optimize energy usage. This paper presents an architectural approach to overcome these challenges using a Dataspace, Linked Data, and Complex Event Processing. The paper describes the fundamentals of the approach and demonstrates it within an Enterprise Energy Observatory.”

 

My Personal Summary:

Using BI as a foundation, a linked (key data is connected to each other to provide information and knowledge) dataspace (a huge data mart with data that is related to each other when needed) for energy intelligence was implemented for the Digital Enterprise Research Institute (DERI), which has ~130 staff located in one building.  The program was trying to measure the direct (electricity costs for data centers, lights, monitors, etc.) and indirect (cost of fuel burned, the cost of gas used by commuting staff) energy usage of the enterprise to become a more sustainable company (as climate change is a big topic these days).  It covered that a multi-level and holistic view of the business intelligence (on energy usage) was needed.  They talked about each of the individual types of information conveyed at each level.

My Personal Assessment:

However, this paper didn’t go into how effective was the implementation of this system.  What would have improved this paper, is saying something about the decrease in the CO2 emission DERI had over the past year.  They could have graphed a time series chart showing power consumption before implementation of this multi-level BI system and after.  This paper was objective but didn’t have any slant as to why we should implement a similar system.  They state that their future work is to provide more granularity in their levels, but nothing on what business value it has had on the company.  Thus, with no figures stating the value of this system, this paper seemed more like a conceptual, how-to manual.

My Personal Reflection:

This paper doesn’t fit well into my research topic.  But, it was helpful in defining a data space and multi-level and holistic BI system.  I may use the conceptual methodology of a data space in my methodology, where I collect secondary data from the National Hurricane Center into a big data warehouse and link the data as it seems relevant.  This, should save me time, and reduce labor intensive costs to data integration due to postponing it when they are required.  It has changed my appreciation of data science, as there is another philosophy to just bringing in one data set at a time into a data warehouse and make all your connections, before moving on to the next data set.

A multilevel business intelligence setup and how it affects the framework of an organization’s decision-making processes. 

In Curry et al. (2012), they applied a linked data space BI system to a holistic and multi-level organization.  Holistic aspects of their BI system included Enterprise Resource Planning, finance, facility management, human resources, asset management and code compliance.  From a holistic standpoint, most of these groups had silo information that made it difficult to leverage across their domains.  However, this is different than multi-level BI system setup.  Defined in Table II in Curry et al (2012), in the multi-level set up, the data gets shown to the organization (stakeholders are executive members, shareholders, regulators, suppliers, consumers), functional (stakeholders are functional managers, organization manager), and individual level (stakeholders are the employees).  Each of these stakeholders has different information requirements and different levels of access to certain types of data. Thus, the multi-level BI system must take this into account.  Thus, different information requirements and access mean different energy metrics, i.e. Organizational Level Metrics could be Total Energy Consumption, % Renewable energy sources, versus Individual Level Metrics could be Business Travel, Individual IT consumption, Laptop electricity consumption, etc.  It wouldn’t make sense that an executive or a stake holder to look at every 130 staff members Laptop electricity consumption metric when they could get a company-wide figure.   However, the authors did note that the level organization data can be further drilled down, to see where the cause could be for a particular event in question.  Certain data that the executives can see will not be accessed by all individual employees. Thus, a multi-level BI system also addresses this.  Also, employee A cannot view employee B’s energy consumption because of lateral level view of the BI system data may not be permissible.

Each of the different levels of metrics reported out by this multi-level BI system allows that particular level to make data-driven decisions to reduce their carbon footprint.  An executive can look at the organizational level metrics, and institute a power down your monitors at night initiative to save power corporate wide.  But, at the individual level, they could choose to leave to go to work earlier, not to be in traffic too long and waste less gas, thus reducing their indirect carbon footprint for the company.  Managers can make decisions to a request for funding for energy efficient monitors and laptops for all their teams, or even a single power strip per person, to reduce their teams’ energy consumption cost, which is based on the level of metrics they can view.

 

What is Business Intelligence?

Business Intelligence (BI) is gathering, managing and analyzing data that is vital for the survival of business in this current hyper-competitive environment (Thomas, 2001 & Vuori, 2006). A BI practitioner is to help decision makers from being overwhelmed with a huge wealth of data. Thus they act as a filter because decision makers will ignore any information that is not useful or meaningful (Vuori, 2006).

The BI Cycle is a continuous cycle, which could be easily reduced to planning the information you want to collect, ethically collect reliable information, analyzing the data to form intelligence, and disseminating the intelligence in an understandable way (Thomas, 2001). It can be expanded into six steps, per Vuori (2006):

  1. Defining information needs
  2. Information gathering
  3. Information processing
  4. Analysis
  5. Dissemination
  6. Utilization and Feedback

A good BI system would make use of a knowledge database and a communication system (Thomas, 2001). With this system and cycle and the correct data, we can have information on our competitors, new technology, public policy, customer sentiment, market forces, supply chain information, etc. Having this information at the disposal of the decision maker will allow for data-driven decisions, to increase their company’s competitive advantage.

Three BI cycle characteristics that drive productivity

  1. Identifying Needs versus Wants in the first phase: Are we defining “Information that is wanted but that is not really needed”, “Information that lacks and that is recognized to be needed” or “Information that is needed but not known to be needed, wanted nor asked for” (Vuori, 2006)? The last two are extremely important. The second one satisfies the end-user of the BI; the other can identify huge revelations. In the last case, if a company only looks at the most active or their biggest competitor they may lose sight of the smaller competitor gaining traction. Getting the right information that is needed is key to not wasting time and increase productivity.
    • Influences the BI practitioner organization
    • Influences the Decision Makers (from first line managers to executive level)
    • Departments in which the data/information is collected from
  2. Protecting their Intellectual Capital: When companies have high turnover rates, or when sensitive/proprietary information is transported on drives or in the minds of the employees, or when senior personnel accidentally give out information in conferences/symposiums (Thomas, 2001), we run the risk of becoming vulnerable as a company. Another example is the supply chain if one company uses a key supplier and their competitor uses that same key supplier to produce a similar product mix. Then what guarantees are being used to ensure that information is being transported between the two companies through the supplier? Information leaks can lead to a loss of a competitive advantage. Protecting the intellectual capital will allow companies not to have to constantly create new products and focus on improving their current product mix.
    • All employees
    • Supply chain (horizontally and vertically)
    • Production lines
    • Human Resources/Legal
    • Management
  3. Dissemination of the correct analysis: This will allow managers to make data-driven decisions that should help protect the business, enter a new market space, etc. If the practitioners of BI, could give the decision maker the information they need based on their analysis and nothing more, we would be saving time, reducing decision fatigue, and time wasted on producing the analytics. Thus, constant communication must occur between the practitioner and decision makers to avoid non-value added work. Feedback cycles, help make future work/endeavors to become more productive over time.
    • Influences the BI practitioner organization
    • Influences the Decision Makers (from first line managers to executive level)
    • Communications departments

An example of an innovative use of BI, is DeKalb County, GA. The CIO, has leveraged BI and analytics, to set up smart policing initiatives (where police are being used more effectively and efficiently to prevent crimes and lowering crime rates), enhance public safety (develop and maintain green neighborhoods), promote jobs and economic development (Matelski, 2015). The CIO has taken data from multiple systems and followed the cycle above to ask the right questions, to identify the needs for particular data, its collection, processing, and analysis to its dissemination to the key decision makers (via intuitive dashboards and key performance indicators).

References:

Innovation: Decision making tools

Decision making tools:

To provide opportunities for creative and innovative thinking one must (Hashim et al., 2016):

  • Keep asking and looking for answers
  • Making associations and observing correlation
  • Anticipating on future events and happenings
  • Making speculation on possibilities
  • Exploring ideas, actions, and results

Nominal Grouping Technique

A tool for decision making is known as Nominal Grouping Technique (NTG), where it can be used to identify elements of a problem, identify and rank goals by priorities, identify experts, involve people from all levels to promote buy-in of the results (Deip, Thensen, Motiwalla, & Seshardi, 1997; Hashim et al., 2016; Pulat, 2014).  Pulat (2014) describes the process as listing and prioritizing a list of options that is created through a normal brainstorming session, where the list of ideas is generated without criticism or evaluation.  Whereas Deip et al. (1977) describe the process as one that taps into the experiences of all people by asking them all to state their idea on a list, and no discussion is permitted until all ideas are listed, from which after a discussion on each item on the list can ranking each idea can begin. Finally, Hashim et al. (2016) stated that the method is best used to help a small team to reach consensus by gathering ideas from all and exciting buy-in of ideas.

Deip (1977) and Hashim et al. (2016) lists the following advantages and disadvantages to the process:

+     Dominance by high-status, aggressive, or verbal people can participate along with everyone in an equal manner.

+     gain group consensus when everyone is involved

+     The focus remains on the problem and avoids premature evaluation of ideas

+     Minimal interruptions of creative ideas during the silence phase

+     Discussions only clarify items and eliminate misunderstanding

–      Cross fertilization of ideas is diminished

–      May reduce flexibility

–      Bringing everyone to the table may be costly

Delphi method

Dalkey and Helmer (1963), described that the Delphi project was a way to use expert opinion, with the hopes of getting the most strong consensus of a group of experts.  Pulat (2014) states that ideas are listed, and prioritized by a weighted point system to help reduce the number of possible solutions with no communication between the experts or of the results during the process until the very end.  However, Dalkey and Helmer (1963) described the process as repeated interviewing or questioning individual experts while avoiding confrontation of other experts.  Questions are centered on some central problem and between each round of questioning consists of available data requested by one expert to be shown to all experts, or new information that is considered potentially relevant by an expert (Dalkey & Helmer, 1963; Pulat, 2014).  The solution from this technique improves with soliciting experts with a range of experiences (Okoli & Pawlowski, 2004; Pulat, 2014).

Benefits and limitations (Dalkey & Helmer, 1963; Okoli & Pawlowski, 2004):

+     Encourage independent thought

+     Decreases group thought bias (predisposition to be swayed by another person or an entire group)

+     Minimize confrontation of opposing views

+     Easy to correct misconceptions that a person harbored over certain facts or theoretic assumptions

+     Ensuring that relevant data gets feed to all the experts

+     Allows experts to change their mind to obtain results that are free from bias

+     More penetrative analysis on the problem, through each round

–      Very costly on time and resources due to the multiple rounds and seeing each expert 1 on 1

–       Vague questions invite critical comments while providing little value to solving the problem

The main difference from the Delphi technique and nominal grouping is the avoidance of conflict through conducting decision-making processes on a one on one fashion rather than in a group setting.  Given that ideas can be triggered by words (or a particular word order), the nominal approach could, in theory, generate more solutions than the Delphi technique (Hashim et al., 2016; Deip et al., 1977).  Hashim et al. (2016) stated that other triggers for imagination/creativity/ideas could be images, events, possible events, conflict events, conflict occurrences, emotions, environment, culture, games, music, etc. But, with independent meetings rather than a group meeting, solutions are well thought out and avoid group thought bias (Dalkey & Helmer, 1963).  When, selecting between these two techniques, the type of problem and desired outcome of the process should drive the methodology.  However, there are many other different types of decision-making techniques as well, like multi-voting, basic brainstorming, etc. (Pulat, 2014).

Resources:

  • Dalkey, N., & Helmer, O. (1963). An experimental application of the Delphi method to the use of experts. Management science9(3), 458-467.
  • Deip, P., Thesen, A., Motiwalla, J., & Seshardi, N. (1977). Nominal group technique.
  • Hashim, A. T., Ariffin, A., Razalli, A. R., Shukor, A. A., NizamNasrifan, M., Ariffin, A. K., … & Yusof, N. A. A. (2016). Nominal Group Technique: a Brainstorming Tool for Identifying Learning Activities Using Musical Instruments to Enhance Creativity and Imagination of Young Children.International Advisory Board23, 80.
  • Okoli, C., & Pawlowski, S. D. (2004). The Delphi method as a research tool: an example, design considerations and applications. Information & management42(1), 15-29.
  • Pulat, B. (2014) Lean/six sigma black belt certification workshop: body of knowledge. Creative Insights, LLC.

Innovation: Technology and Trends in museums

Definition of Museum: term applied to zoos, historical sites, botanical gardens, aquariums, planetariums, children’s museums, and science and technology centers (US DoJ, 2009)

Museum Edition: Key trend: Short-term trend: Driving Ed Tech adoption in museums for the next one to two years

With the introduction of mobile technology, increasing in processing speed every year and the introduction of Artificial Reality (AR) through Pokémon Go, there is a huge opportunity to create discoverable museums displays serviceable through mobile devices (CNET, 2016; New Horizons, 2016; Bonnington, 2015). The AR technology uses the mobile device camera and interlaces pocket monster’s called Pokémon in real time through some creative coding, therefore through a mobile device these Pokémon are made visible, even though they do not exist (CNET, 2016). Mobile devices are not just for gaming they have become the primary computing device for most people across the globe as well as a primary way to access information (Bonnington, 2015; New Horizons, 2016).  Adding in, Pokémon Go’s added benefit, which promotes end users to walk to key areas have been designated to be either Pokestops (for getting key items for game play) or Pokémon Gym (to either build up a team’s gym or take it down) therefore enhancing the experience (CNET, 2016).  It is projected that in the next 5-years mobile devices could have enough processing power to handle 4K streaming, immersive virtual reality gaming, and seamless multi-tasking (Bonnington, 2015).  Therefore, creating a new museum experience using an AR system similar to Pokémon Go, with interactive museum displays similar to Pokestops or Pokémon Gyms could become a reality, enhance exploration, interpretation, and sharing.  This would essentially be a more interactive self-guided virtual tour, similar to what has been implemented in the Broad Museum in Los Angeles and is a prioritized strategy for San Francisco’s Museum of Modern Art (New Horizons, 2016).  If we can centralize/core up all of the museums into one interface similar to what Israel is doing with their museums (so far they have represented 60 museums), we could see bigger adoption rates (Museums in Israel, n.d.). According to New Horizons (2016), hyper zoom features on particular displays, gamification, location-based services, AR, ad social networking integration can increase patron’s experiences.  This area all aspects that Pokémon Go is trying to promote through their mobile device game.

Forces that impact the trend

  • Technological: There is a need to update the WiFi Infrastructure in museums to handle the increase in demand, which is a key force negatively impacting this technology (New Horizons, 2016; Government of Canada, n.d.). Though, computer codes and infrastructure designs are becoming more open source which is a force of positive impact.
  • Safety: There is added need to improve design and flow of a museum to accommodate distracted patrons using this new AR system.
  • Cultural: Museums at one point use to ban cameras, but now with many mobile devices and the proposed AR system above, it would be hard to enforce now (New Horizons, 2016). Also, given the fact that museums are wanting to increase participation.

Museum Edition: Technology: Improving Accessibility for Disabled populations

One in 10 people lives with a disability or approximately 0.65 Billion people (Disabled World, n.d.).  It is imperative and ethical that museums create exhibits for all their patrons. Deviations from societal norms have caused people with disabilities in the past to be considered as signs of divine disapproval, with the end thoughts and actions stating that they need to be fixed (Grandin, 2016), when there is nothing wrong with them, to begin with.  A few of the many areas for improvements with technology are:

  • Websites and online programming: making them more accessible and eliminating barriers through the incorporation of universally good design (New Horizons, 2016; Grandin, 2016).
  • Addressing Article 30 of the UN Disability Convention: Implementing technology to allow enjoyed access to performances, exhibits, or services (UN, 2006). This would allow, encourage, and promote all people to participate to the fullest extent possible (New Horizons, 2016; UN, 2006).
  • Use of software to create alternative formats for printed brochures: Braille, CDs, large print (US DoJ, 2009). Also, using that same software to create Braille exhibit guides (New Horizons, 2016).
  • Using closed captions for video displays (New Horizons, 2016).

An excellent way to test universally good design is for museums to partner with disabled students to test their design’s usability and provided meaningful feedback (New Horizons, 2016). Essentially, one way to approach universally good design is to ask the three questions (Wyman, Timpson, Gillam, & Bahram, 2016):

  1. “Where am I?”
  2. “Where can I go from here?”
  3. “How can I get there?” or “How can I make that happen?”

 

Forces that impact the technology

  • Educational: There is a lack of disability responsiveness training by the staff of a museum, which is leading to a lack of knowledge of best practices, how best to serve the disable population, etc. (New Horizons, 2016).
  • Financial: Lack of resources to design or even implement new programs for people with disabilities is a key force negatively impacting this technology (New Horizons, 2016; Grandin, 2016). However, the best designs are simple, intuitive, flexible, and equitable, therefore making accessible design a universally good design (Grandin, 2016; Wyman et al., 2016). How do museums know about universally good design? Museums are able to accomplish this by working with the disable community and advocacy organizations (New Horizons, 2016). So, as museums begin making their updates on exhibits, or to their building, they should take into account accessible design. For people with disabilities, a universally good design is one where there is no additional modifications are needed for them (Grandin, 2016).

Resources

Big Data Analytics: Compelling Topics

Big Data and Hadoop:

According to Gray et al. (2005), traditional data management relies on arrays and tables in order to analyze objects, which can range from financial data, galaxies, proteins, events, spectra data, 2D weather, etc., but when it comes to N-dimensional arrays there is an “impedance mismatch” between the data and the database.    Big data, can be N-dimensional, which can also vary across time, i.e. text data (Gray et al., 2005). Big data, by its name, is voluminous. Thus, given the massive amounts of data in Big Data that needs to get processed, manipulated, and calculated upon, parallel processing and programming are there to use the benefits of distributed systems to get the job done (Minelli, Chambers, & Dhiraj, 2013).  Parallel processing allows making quick work on a big data set, because rather than having one processor doing all the work, you split up the task amongst many processors.

Hadoop’s Distributed File System (HFDS), breaks up big data into smaller blocks (IBM, n.d.), which can be aggregated like a set of Legos throughout a distributed database system. Data blocks are distributed across multiple servers. Hadoop is Java-based and pulls on the data that is stored on their distributed servers, to map key items/objects, and reduces the data to the query at hand (MapReduce function). Hadoop is built to deal with big data stored in the cloud.

Cloud Computing:

Clouds come in three different privacy flavors: Public (all customers and companies share the all same resources), Private (only one group of clients or company can use a particular cloud resources), and Hybrid (some aspects of the cloud are public while others are private depending on the data sensitivity.  Cloud technology encompasses Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS).  These types of cloud differ in what the company managers on what is managed by the cloud provider (Lau, 2011).  Cloud differs from the conventional data centers where the company managed it all: application, data, O/S, virtualization, servers, storage, and networking.  Cloud is replacing the conventional data center because infrastructure costs are high.  For a company to be spending that much money on a conventional data center that will get outdated in 18 months (Moore’s law of technology), it’s just a constant sink in money.  Thus, outsourcing the data center infrastructure is the first step of company’s movement into the cloud.

Key Components to Success:

You need to have the buy-in of the leaders and employees when it comes to using big data analytics for predictive, prescriptive or descriptive purposes.  When it came to buy-in, Lt. Palmer had to nurture top-down support as well as buy-in from the bottom-up (ranks).  It was much harder to get buy-in from more experienced detectives, who feel that the introduction of tools like analytics, is a way to tell them to give up their long-standing practices and even replace them.  So, Lt. Palmer had sold Blue PALMS as “What’s worked best for us is proving [the value of Blue PALMS] one case at a time, and stressing that it’s a tool, that it’s a compliment to their skills and experience, not a substitute”.  Lt. Palmer got buy-in from a senior and well-respected officer, by helping him solve a case.  The senior officer had a suspect in mind, and after feeding in the data, the tool was able to predict 20 people that could have done it in an order of most likely.  The suspect was on the top five, and when apprehended, the suspect confessed.  Doing, this case by case has built the trust amongst veteran officers and thus eventually got their buy in.

Applications of Big Data Analytics:

A result of Big Data Analytics is online profiling.  Online profiling is using a person’s online identity to collect information about them, their behaviors, their interactions, their tastes, etc. to drive a targeted advertising (McNurlin et al., 2008).  Profiling has its roots in third party cookies and profiling has now evolved to include 40 different variables that are collected from the consumer (Pophal, 2014).  Online profiling allows for marketers to send personalized and “perfect” advertisements to the consumer, instantly.

Moving from online profiling to studying social media, He, Zha, and Li (2013) stated their theory, that with higher positive customer engagement, customers can become brand advocates, which increases their brand loyalty and push referrals to their friends, and approximately 1/3 people followed a friend’s referral if done through social media. This insight came through analyzing the social media data from Pizza Hut, Dominos and Papa Johns, as they aim to control more of the market share to increase their revenue.  But, is this aiding in protecting people’s privacy when we analyze their social media content when they interact with a company?

HIPAA described how we should conduct de-identification of 18 identifiers/variables that would help protect people from ethical issues that could arise from big data.   HIPAA legislation is not standardized for all big data applications/cases; it is good practice. However, HIPAA legislation is mostly concerned with the health care industry, listing those 18 identifiers that have to be de-identified: Names, Geographic data, Dates, Telephone Numbers, VIN, Fax, Device ID and serial numbers, emails addresses, URLs, SSN, IP address, Medical Record Numbers, Biometric ID (fingerprints, iris scans, voice prints, etc), full face photos, health plan beneficiary numbers, account numbers, any other unique ID number (characteristic, codes, etc), and certifications/license numbers (HHS, n.d.).  We must be aware that HIPAA compliance is more a feature of the data collector and data owner than the cloud provider.

HIPAA arose from the human genome project 25 years ago, where they were trying to sequence its first 3B base pair of the human genome over a 13 year period (Green, Watson, & Collins, 2015).  This 3B base pair is about 100 GB uncompressed and by 2011, 13 quadrillion bases were sequenced (O’Driscoll et al., 2013). Studying genomic data comes with a whole host of ethical issues.  Some of those were addressed by the HIPPA legislation while other issues are left unresolved today.

One of the ethical issues that arose were mentioned in McEwen et al. (2013), for people who have submitted their genomic data 25 years ago can that data be used today in other studies? What about if it was used to help the participants of 25 years ago to take preventative measures for adverse health conditions?  However, ethical issues extend beyond privacy and compliance.  McEwen et al. (2013) warn that data has been collected for 25 years, and what if data from 20 years ago provides data that a participant can suffer an adverse health condition that could be preventable.  What is the duty of the researchers today to that participant?

Resources:

FUTURING & INNOVATION: WHAT IS INNOVATION?

One could define Innovation as an idea, value, service, technology, method, or thing that is new to an individual, a family, a firm, a field, an industry, or a country (Jeryaraj & Sabhewal, 2014; Rogers, 1962; Rogers, 2010; Sáenz-Royo, Gracia-Lázaro, & Moreno, 2015). Based on this definition above an invention can be seen as an innovation, but not all innovations are inventions (Robertson, 1967).  Also, even though something may not be considered as an innovation by one entity, it can still be considered as innovative if adopted by a completely different entity (Newby, Nguyen, & Waring, 2014).

Innovation moving from one entity to another can be considered as Diffusion of innovation.  Diffusion of Innovation is a theory that is concerned with the why, what, how, and rate of innovation dissemination and adoption between entities, which are carried out through different communication channels over a period of time (Ahmed, Lakhani, Rafi, Rajkumar, & Ahmed, 2014; Bass, 1969; Robertson, 1967; Rohani & Hussin, 2015; Rogers, 1967; Rogers 2010).  However, there are possible forces that can act on an innovation that can influence the likelihood of the innovation success, for example financial, technological, cultural, economical, legal, ethical, temporal, social, global, national, local, etc.  Therefore, when viewing a new technology or innovation for the future, one must think critically about it and evaluate it from different forces/lenses.

Resources:

  • Ahmed, S., Lakhani, N. A., Rafi, S. K., Rajkumar, & Ahmed, S. (2014). Diffusion of innovation model of new services offerings in universities of karachi.International Journal of Technology and Research, 2(2), 75-80.
  • Bass, F. M. (1969). A new product growth for model consumer durables. Management science15(5), 215-227.
  • Jeyaraj, A., & Sabherwal, R. (2014). The bass model of diffusion: Recommendations for use in information systems research and practice.JITTA : Journal of Information Technology Theory and Application, 15(1), 5-30.
  • Newby, M., Nguyen, T.,H., & Waring, T.,S. (2014). Understanding customer relationship management technology adoption in small and medium-sized enterprises. Journalof Enterprise Information Management, 27(5), 541.
  • Robertson, T. S. (1967). The process of innovation and the diffusion of innovation. The Journal of Marketing, 14-19.
  • Rogers, E. M. (1962). Diffusion of innovations. (1st ed.). New York: Simon and Schuster.
  • Rogers, E. M. (2010). Diffusion of innovations. (4st ed.). New York: Simon and Schuster.
  • Rohani, M. B., & Hussin, A. R. C. (2015). An integrated theoretical framework for cloud computing adoption by universities technology transfer offices (TTOs).Journal of Theoretical and Applied Information Technology,79(3), 415-430.
  • Sáenz-Royo, C., Gracia-Lázaro, C., & Moreno, Y. (2015). The role of the organization structure in the diffusion of innovations.PLoS One, 10(5). doi: http://dx.doi.org/10.1371/journal.pone.0126078

Big Data Analytics: Future Predictions?

Big data analytics and stifling future innovation?

One way to make a prediction about the future is to understand the current challenges faced in certain parts of a particular field.  In the case of big data analytics, machine learning analyzes data from the past to make a prediction or understanding of the future (Ahlemeyer-Stubbe & Coleman, 2014).  Ahlemeyer-Stubbe and Coleman (2014), argued that learning from the past can hinder innovation.  Although Basole, Seuss, and Rouse (2013), studied past popular IT journal articles to see how the field of IT is evolving, and in Yang, Klose, Lippy,  Barcelon-Yang, and Zhang, (2014) they conclude that analyzing current patent information can lead to discovering trends, and help provide companies actionable items to guide and build future business strategies around a patent trend.  The danger of stifling innovation per Ahlemeyer-Stubbe and Coleman (2014), comes from when we consider a situation of only relying on past data and experiences and not allowing for experiencing or trying anything new.  An example is like trying to optimize a horse-drawn carriage; then the automobile will never have been invented (Ahlemeyer-Stubbe & Coleman, 2014).   This example is a very bad analogy.  We should not focus on only collecting data on one item, but its tangential items as well.  We should focus on collecting a wide range of data from different fields and different sources, to allow for new patterns to form, connections to be made, which can promote innovation (Basole et al. 2013).

Future of Health Analytics:

Another way to analyze the future is to dream big or from a movie (Carter, Farmer, and Siegel, 2014). What if we could analyze our blood daily to aid in tracking our overall health, besides the daily blood sugar levels data that most diabetics are accustom to?  The information generated from here can aid in generating a healthier lifestyle.  Currently, doctors aid patients in their care with their diet and monitor their overall health.  When you are going home, this care disappears.  But, constant monitoring may help outpatient care and daily living.  Alerts could be sent to your doctor or to other family members if certain biomarkers get to a critical threshold.  This could aid in better care, allowing people’s social network to help them keep accountable in making healthy life and lifestyle choices, and possibly lessen the time between symptom detection to emergency care in severe cases (Carter, Farmer, and Siegel, 2014).

Generating revenue from analyzing consumers:

Soon, it is not enough to conduct item affinity analysis (market basket analysis).  Item affinity (market basket analysis) uses rules-based analytics to understand what items frequently co-occur during transactions (Snowplow Analytics, 2016). Item affinity is similar to the Amazon.com current method to drive more sales through getting their customers to consume more.  However, what if we started to look at what a consumer intends to buy (Minelli, Chambers, and Dhiraj, 2013)? Analyzing data from consumer product awareness, brand awareness, opinion (sentiment analysis), consideration, preferences, and purchases from a consumer’s multiple social media platforms account in real time can allow marketers to create the perfect advertisement (Minelli et al., 2013).  Establishing the perfect advertisement will allow companies to gain a bigger market share, or to lure customers to their product and/or services from their competitors.  According to Minelli et al. (2013) predicted that companies in the future should be moving towards:

  • Data that can be refreshed every second
  • Data validation exists in real time and alerts sent if something is wrong before data is published in aiding data driven decisions
  • Executives will receive daily data briefs from their internal processes and from their competitors to allow them to make data-driven decisions to increase revenue
  • Questions that were raised in staff meetings or other organizational meetings can be answered in minutes to hours, not weeks
  • A cultural change in companies where data is easily available and the phrase “let me show you the facts” can be easily heard amongst colleagues

Big data analytics can affect many other areas as well, and there is a whole new world opening up to this.  More and more companies and government agencies are hiring data scientists, because they don’t just see the current value that these scientists bring, but they see their potential value 10-15 years from now.  Thus, the field is expected to change as more and more talent is being recruited into the field of big data analytics.

References:

Ahlemeyer-Stubbe, A., & Coleman, S.  (2014). A Practical Guide to Data Mining for Business and Industry. Wiley-Blackwell. VitalBook file.

Basole, R. C., Seuss, D. C., & Rouse, W. B. (2013). IT innovation adoption by enterpirses: knowledge discovery through text analyztics. Decision Support Systems V(54). 1044-1054.

Carter, K.  B., Farmer, D., Siegel, C. (2014). Actionable Intelligence: A Guide to Delivering Business Results with Big Data Fast!. John Wiley & Sons P&T. VitalBook file.

Minelli, M., Chambers, M., Dhiraj, A. (2013). Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses. John Wiley & Sons P&T. VitalBook file.

Snowplow Analytics (2016). Market basket analysis: identifying products and content that go well together. Retrieved from http://snowplowanalytics.com/analytics/recipes/catalog-analytics/market-basket-analysis-identifying-products-that-sell-well-together.html

Yang, Y. Y., Klose, T., Lippy, J., Barcelon-Yang, C. S. & Zhang, L. (2014). Leveraging text analytics in patent analysis to empower business decisions – a competitive differentiation of kinase assay technology platforms by I2E text mining software. World Patent Information V(39). 24-34.

Big Data Analytics: Career Prospects

Masters and Doctoral graduates have some advantages over Undergraduates, because they have done research or capstones involving big datasets, they can explain the motivations and reasoning behind the work (chapter 1 & 2 of the dissertation), they can learn and adapt quickly (chapter 3 reflects what you have learned and how you will apply it), and they can think critically about problems (chapter 4 & 5 of the dissertation).  Doctoral student, work on a problem for multiple months/years to see a solution (filling in a gap in the knowledge) that they couldn’t dream of seeing as incomplete (or unfillable).  But, to prepare best for a data science position or big data position, the doctoral shouldn’t be purely theoretical, and should contain an analysis of huge datasets.  Based on my personal analysis, I have noticed that when applying for a senior level position or a team lead position in data science, a doctorate gives you an additional three years of experience on top of what you have already.  Whereas if you lack a doctorate, you need a Master’s degree and three years of experience to be considered for that senior level position or a team lead position in data science.

Master levels courses in big data help build a strong mathematical, statistical, computational, and programming skills. Doctorate level courses help you learn and push the limits of knowledge in all these above mentioned fields, but also aid in becoming a domain expert in a particular area in data science.  Commanding that domain expertise, which is what you get through going through a doctoral program, can make you more valuable in the job market (Lo, n.d.).  Being more valuable in the job market can allow you to demand more in compensation.  Multiple sources of can quote multiple ranges for salaries, mostly because, this field has yet to be standardized (Lo, n.d.).  Thus, I would only provide two sources for salary ranges.

According to Columbus (2014), jobs that involve big data could include Big Data Solution Architect, Linux Systems and Big Data Engineer, Big Data Platform Engineer, Lead Software Engineer, Big Data (Java, Hadoop, SQL) have the following salary statistics:

  • Q1: $84,650
  • Median: $103,000
  • Q3: $121,300

Columbus (2014) also stated that it is very difficult to find the right people for an open requisite and that most requisites remain open for 47 days.  According to Columbus (2014), the most wanted skills for analytics data jobs based on of requisition postings in the field are: in Python (96.90% growth in demand in the past year), Linux and Hadoop (with 76% growth in demand, each).

Lo (n.d.) states that individuals with just a BS or MS degree and no full-time work experience should expect $50-75K whereas data scientist with experience can command up from $65-110K.

  • Data scientist can earn $85-170K
  • Data science/analytics managers can earn $90-140K for 1-3 direct reports
  • Data science/analytics managers can earn $130-175K for 4-9 direct reports
  • Data science/analytics managers can earn $160-240K for 10+ direct reports
  • Database Administrators can earn $50-120K, which varies upwards per more experience
  • Junior Big data engineers can earn $79-115K
  • Domain Expert Big data engineers can earn $100-165K

One way to look for opportunities in the field that are currently available is looking into the Gartner’s Magic Quadrant for Business Intelligence and Analytics Platforms (Parenteau et al., 2016). If you want to help push a tool into a higher ease of execution and completeness of vision as a data scientist consider employment in: Pyramid Analytics, Yellowfin, Platfora, Datawatch, Information Builders, Sisense, Board International, Salesforce, GoodData, Domo, Birst, SAS, Alteryx, SAP, MicroStrategy, Logi Analytics, IBM, ClearStory Data, Pentaho, TIBCO Software, BeyondCore, Qlik, Microsoft, and Tableau.  That is one way to look at this data.  Another way to look at this data is to see which tools are the best in the field and (Tableau, Qlik, Microsoft, with SAS Birst, Alterxyx, and SAP following behind) and learn those tools to to become more marketable.

Resources