Adv DBs: A possible future project?

Below is a possible future research paper on a database related subject.

Title: Using MapReduce to aid in clinical test utilization patterns in the medicine

The motivation:

Efficient processing and analysis of clinical data could aid in better clinical tests on patients, and MapReduce solutions allow for an integrated solution in the medical field, which aids in saving resources when it comes to moving data in and out of storage.

The problem statement (symptom and root cause)

The rates of Sexually Transmitted Infections (STIs) are increasing at alarming rates, could the addition of Roper Saint Francis Clinical Network in the South test utilization patterns into Hadoop with MapReduce reveal patterns in the current STIs population and predict areas where an outbreak may be imminent?

The hypothesis statement (propose a solution and address the root cause)

H0: Data mining in Hadoop with MapReduce will not be able to identify any meaningful pattern that could be used to predict the next location for an STI outbreak using clinical test utilization patterns.

H1: Data mining in Hadoop with MapReduce can identify a meaningful pattern that could be used to predict the next location for an STI outbreak using clinical test utilization patterns.

The research questions

Could this study apply to STIs outbreaks rates be generalized into other disease outbreak rates?

Is this application of data-mining in Hadoop with MapReduce the correct way to analyze the data?

The professional significance statement (new contribution to the body of knowledge)

Identifying where an outbreak of any disease (or STIs), via clinical tests utilization patterns has yet to be done according to Mohammed et al (2014), and they have stated that Hadoop with MapReduce is a great tool for clinical work because it has been adopted in similar fields of medicine like bioinformatics.


  • Mohammed, E. A., Far, B. H., & Naugler, C. (2014). Applications of the MapReduce programming framework to clinical big data analysis: Current landscape and future trends. Biodata Mining, 7. doi: – Doctoral Library Advanced Technologies & Aerospace CollectionPokorny, J. (2011).
  • NoSQL databases: A step to database scalability in web environment. In iiWAS ’11 Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services (pp. 278-283). – Doctoral Library ACM Digital Library

Data Tools: Artificial Intelligence and Decision Making

“Machines can excel at frequent high-volume tasks. Humans can tackle novel situations.” – Anthony Goldbloom

Jobs today will look drastically different in 30 years from now (Goldbloom, 2016; McAfee, 2013).  Artificial intelligence (AI) works on Sundays, they don’t take holidays, and they work well at high frequency and voluminous tasks, and thus they have the possibility of replacing many of the current jobs of 2016 (Goldbloom, 2016; Meetoo, 2016).  AI has been doing things that haven’t been done before: understanding, speaking, hearing, seeing, answering, writing, and analyzing (McAfee, 2013). Also, AI can make use of data hidden in “dark wells” and silos, where the end-user had no idea that the data even existed, to begin with (Power, 2015). Eventually, AI and machine learning will be commonly used as a tool to augment or replace decision makers.  Goldbloom (2016) gave the example that a teacher may be able to read 10,000 essays or an ophthalmologist may see 50,000 eyes over a 40-year period; whereas a machine can read millions of essays and see millions of eyes in minutes.

Machine learning is one of the most powerful branches to AI, where machines learn from data, similar to how humans learn to create predictions of the future (Cringely, 2013; Cyranoski, 2015; Goldbloom, 2016; Power, 2015). It would take many scientists to analyze a big dataset in its entirety without a loss of memory such that to gain insights and to fully understand how the connections were made in the AI system (Cringely, 2013; Goldbloom, 2016). This is no easy task because the eerily accurate rules created by AI out of thousands of variables can lack substantive human meaning, making it hard to interpret the results and make an informed data-driven decision (Power, 2015).

AI has been used to solve problems in industry and academia already, which has given data scientist knowledge on the current limitations of AI and whether or not they can augment or replace key decision makers (Cyranoski, 2015; Goldbloom, 2016). Machine learning and AI does well at analyzing patterns from frequent and voluminous amounts of data at faster speeds than humans, but they fail to recognize patterns in infrequent and small amounts of data (Goldbloom, 2016).  Therefore, for small datasets artificial intelligence will not be able to replace decision makers, but for big datasets, they would.

Thus, the fundamental question that decision makers need to ask is how is the decision reduced to frequent high volume task and how much of it is reduced to novel situations (Goldbloom, 2016).  Thus, if the ratio is skewed on the high volume tasks then AI could be a candidate to replace decision makers, if the ratio is evenly split, then AI could augment and assist decision makers, and if the ratio is skewed on novel situations, then AI wouldn’t help decision makers.  They novel situations are equivalent to our tough challenges today (McAfee, 2013).

Finally, Meetoo (2016), warned that it doesn’t matter how intelligent or strategic a job could be, if there is enough data on that job to create accurate rules it can be automated as well; because machine learning can run millions of simulations against itself to generate huge volumes of data to learn from.  This is no different than humans doing self-study and continuous practice to be subject matter experts in their field. But people in STEAM (Science, Technology, Engineering, Arts, and Math) will be best equip them for the future world with AI, because it is from knowing how to combine these fields that novel, infrequent, and unique challenges will arise that humans can solve and machine learning cannot (Goldbloom, 2016; McAfee, 2013; Meetoo, 2016).


Big Data Analytics: Future Predictions?

Big data analytics and stifling future innovation?

One way to make a prediction about the future is to understand the current challenges faced in certain parts of a particular field.  In the case of big data analytics, machine learning analyzes data from the past to make a prediction or understanding of the future (Ahlemeyer-Stubbe & Coleman, 2014).  Ahlemeyer-Stubbe and Coleman (2014), argued that learning from the past can hinder innovation.  Although Basole, Seuss, and Rouse (2013), studied past popular IT journal articles to see how the field of IT is evolving, and in Yang, Klose, Lippy,  Barcelon-Yang, and Zhang, (2014) they conclude that analyzing current patent information can lead to discovering trends, and help provide companies actionable items to guide and build future business strategies around a patent trend.  The danger of stifling innovation per Ahlemeyer-Stubbe and Coleman (2014), comes from when we consider a situation of only relying on past data and experiences and not allowing for experiencing or trying anything new.  An example is like trying to optimize a horse-drawn carriage; then the automobile will never have been invented (Ahlemeyer-Stubbe & Coleman, 2014).   This example is a very bad analogy.  We should not focus on only collecting data on one item, but its tangential items as well.  We should focus on collecting a wide range of data from different fields and different sources, to allow for new patterns to form, connections to be made, which can promote innovation (Basole et al. 2013).

Future of Health Analytics:

Another way to analyze the future is to dream big or from a movie (Carter, Farmer, and Siegel, 2014). What if we could analyze our blood daily to aid in tracking our overall health, besides the daily blood sugar levels data that most diabetics are accustom to?  The information generated from here can aid in generating a healthier lifestyle.  Currently, doctors aid patients in their care with their diet and monitor their overall health.  When you are going home, this care disappears.  But, constant monitoring may help outpatient care and daily living.  Alerts could be sent to your doctor or to other family members if certain biomarkers get to a critical threshold.  This could aid in better care, allowing people’s social network to help them keep accountable in making healthy life and lifestyle choices, and possibly lessen the time between symptom detection to emergency care in severe cases (Carter, Farmer, and Siegel, 2014).

Generating revenue from analyzing consumers:

Soon, it is not enough to conduct item affinity analysis (market basket analysis).  Item affinity (market basket analysis) uses rules-based analytics to understand what items frequently co-occur during transactions (Snowplow Analytics, 2016). Item affinity is similar to the current method to drive more sales through getting their customers to consume more.  However, what if we started to look at what a consumer intends to buy (Minelli, Chambers, and Dhiraj, 2013)? Analyzing data from consumer product awareness, brand awareness, opinion (sentiment analysis), consideration, preferences, and purchases from a consumer’s multiple social media platforms account in real time can allow marketers to create the perfect advertisement (Minelli et al., 2013).  Establishing the perfect advertisement will allow companies to gain a bigger market share, or to lure customers to their product and/or services from their competitors.  According to Minelli et al. (2013) predicted that companies in the future should be moving towards:

  • Data that can be refreshed every second
  • Data validation exists in real time and alerts sent if something is wrong before data is published in aiding data driven decisions
  • Executives will receive daily data briefs from their internal processes and from their competitors to allow them to make data-driven decisions to increase revenue
  • Questions that were raised in staff meetings or other organizational meetings can be answered in minutes to hours, not weeks
  • A cultural change in companies where data is easily available and the phrase “let me show you the facts” can be easily heard amongst colleagues

Big data analytics can affect many other areas as well, and there is a whole new world opening up to this.  More and more companies and government agencies are hiring data scientists, because they don’t just see the current value that these scientists bring, but they see their potential value 10-15 years from now.  Thus, the field is expected to change as more and more talent is being recruited into the field of big data analytics.


Ahlemeyer-Stubbe, A., & Coleman, S.  (2014). A Practical Guide to Data Mining for Business and Industry. Wiley-Blackwell. VitalBook file.

Basole, R. C., Seuss, D. C., & Rouse, W. B. (2013). IT innovation adoption by enterpirses: knowledge discovery through text analyztics. Decision Support Systems V(54). 1044-1054.

Carter, K.  B., Farmer, D., Siegel, C. (2014). Actionable Intelligence: A Guide to Delivering Business Results with Big Data Fast!. John Wiley & Sons P&T. VitalBook file.

Minelli, M., Chambers, M., Dhiraj, A. (2013). Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses. John Wiley & Sons P&T. VitalBook file.

Snowplow Analytics (2016). Market basket analysis: identifying products and content that go well together. Retrieved from

Yang, Y. Y., Klose, T., Lippy, J., Barcelon-Yang, C. S. & Zhang, L. (2014). Leveraging text analytics in patent analysis to empower business decisions – a competitive differentiation of kinase assay technology platforms by I2E text mining software. World Patent Information V(39). 24-34.