Data mining is just a subset of the knowledge discovery process (or concept flow of Business Intelligence), where data mining provides the algorithms/math that aid in developing actionable data-driven results (Fayyad, Piatetsky-Shapiro, & Smyth, 1996). It should be noted that success has much to do with the events that lead to the main event as it does with the main event. Incorporating data mining processes into Business Intelligence, one must understand the business task/question behind the problem, properly process all the required data, analyze the data, evaluate and validate the data while analyzing the data, apply the results, and finally learn from the experience (Ahlemeyer-Stubbe & Coleman, 2014). Conolly and Begg (2014), stated that there are four operations of data mining: predictive modeling, database segmentation, link analysis, and deviation detection. Fayyad et al. (1996), classifies data mining operations by their outcomes: prediction and descriptive.
It is crucial to understand the business task/question behind the problem you are trying to solve. The reason why is because some types of business applications are associated with particular operations like marketing strategies use database segmentation (Conolly & Begg, 2014). However, any of the data mining operations can be implemented for any business application, and many business applications can use multiple operations. Customer profiling can use database segmentation first and then use predictive modeling next (Conolly & Begg, 2014). By thinking outside of the box about which combination of operations and algorithms to use, rather than using previously used operations and algorithms to help meet the business objectives, it could generate even better results (Minelli, Chambers, & Dhiraj, 2013).
A consolidated list (Ahlemeyer-Stubbe & Coleman, 2014; Berson, Smith, & Thearling 1999; Conolly & Begg, 2014; Fayyad et al., 1996) of the different types of data mining operations, algorithms and purposes are listed below.
- Prediction – “What could happen?”
- Classification – data is classified into different predefined classes
- C4.5
- Chi-Square Automatic Interaction Detection (CHAID)
- Support Vector Machines
- Decision Trees
- Neural Networks (also called Neural Nets)
- Naïve Bayes
- Classification and Regression Trees (CART)
- Bayesian Network
- Rough Set Theory
- AdaBoost
- Regression (Value Prediction) – data is mapped to a prediction formula
- Linear Regression
- Logistic Regression
- Nonlinear Regression
- Multiple linear regression
- Discriminant Analysis
- Log-Linear Regression
- Poisson Regression
- Anomaly Detection (Deviation Detection) – identifies significant changes in the data
- Statistics (outliers)
- Classification – data is classified into different predefined classes
- Descriptive – “What has happened?”
- Clustering (database segmentation) – identifies a set of categories to describe the data
- Nearest Neighbor
- K-Nearest Neighbor
- Expectation-Maximization (EM)
- K-means
- Principle Component Analysis
- Kolmogorov-Smirnov Test
- Kohonen Networks
- Self-Organizing Maps
- Quartile Range Test
- Polar Ordination
- Hierarchical Analysis
- Association Rule Learning (Link Analysis) – builds a model that describes the data dependencies
- Apriori
- Sequential Pattern Analysis
- Similar Time Sequence
- PageRank
- Summarization – smaller description of the data
- Basic probability
- Histograms
- Summary Statistics (max, min, mean, median, mode, variance, ANOVA)
- Clustering (database segmentation) – identifies a set of categories to describe the data
- Prescriptive – “What should we do?” (an extension of predictive analytics)
- Optimization
- Decision Analysis
- Optimization
Finally, Ahlemeyer-Stubbe and Coleman (2014) stated that even though there are a ton of versatile data mining software available that would do any of the abovementioned operations and algorithms; a good data mining software would be deployable across different environments and include tools for data prep and transformation.
References
- Ahlemeyer-Stubbe, A., & Coleman, S. (2014). A Practical Guide to Data Mining for Business and Industry, 1st Edition. [VitalSource Bookshelf Online]. Retrieved from https://bookshelf.vitalsource.com/#/books/9781118981863/
- Berson, A. Smith, S. & Thearling K. (1999). Building Data Mining Applications for CRM. McGraw-Hill. Retrieved from http://www.thearling.com/text/dmtechniques/dmtechniques.htm
- Connolly, Thomas, & Begg, C. (2014). Database Systems: A Practical Approach to Design, Implementation, and Management, 6th Edition. [VitalSource Bookshelf Online]. Retrieved from https://bookshelf.vitalsource.com/#/books/9781323135761/
- Minelli, M., Chambers, M., and Dhiraj A. (2013). Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses. John Wiley & Sons P&T. VitalBook file.
- Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37. Retrieved from: http://www.aaai.org/ojs/index.php/aimagazine/article/download/1230/1131/