Cybersecurity attacks are limited by their physical path, the network connectivity and reachability limits, and the attack structure, which is by exploiting a vulnerability that enables an attack (Xie et al., 2010). Previously, automated systems and tools were implemented to deal with moderately skilled cyber-attackers, plus white hat hackers are used to identify security vulnerabilities, but it is not enough to keep up with today’s threats (Peterson, 2012). Preventative measure only deals with newly discoverable items, not the ones that have yet to be discoverable (Fink, Sharifi, Carbonell, 2011). These two methods are preventative measures, with the goal of protecting the big data and cyberinfrastructure used to store and process big data from malicious intent. Setting up these preventative measures are no longer good enough to protect big data and its infrastructure. Thus there has been a migration towards using real-time analysis on monitored data (Glick, 2013). Real-time analysis is concerned with “What is really happening?” (Xie et al., 2010).
If algorithms used to process big data can be pointed towards cyber security, Security Information and Event Management (SIEM), it can add another solution towards identifying cyber security threat (Peterson, 2012). All that big data cyber security analysis will do is make security teams faster to react if they have the right context to the analysis, but it won’t make the security teams act in a more proactive way (Glick, 2013). SIEM has gone above and beyond current cyber security prevention measures, usually by collecting the log data in real time that is generated and processing the log data in real time using algorithms like correlation, pattern recognition, behavioral analysis, and anomaly analysis (Glick, 2013; Peterson, 2012). Glick (2013), reported that data from a variety of sources help build a cyber security risk and threat profile in real-time that can be taken to cyber security teams to react to each threat in real time, but it works on small data sets.
SIEM couldn’t handle the vast amounts of big data and therefore analyzing the next cyber threats came from using tools like Splunk to identify anomalies amongst the data (Glick, 2013). SIEM was proposed for use in the Olympics games, but Splunk was being used for investment banking purposes (Glick, 2013; Peterson, 2012). FireEye is another big data analytics security tool that was used for identifying network threats (Glick, 2013).
- Xie et al. (2010), proposed the use of Bayesian networks for cyber security analysis. This solution considers that modeling cyber security profiles are difficult to construct and uncertain, plus they built the tool for near real-time systems. That is because Bayesian models try to model cause-and-effect relationships. Using deterministic security models are unrealistic and do not capture the full breadth of a cyber attack and cannot capture all the scenarios for real-time analysis. If the Bayesian models are built to reflect reality, then it could be used for near real-time analysis. In real-time cyber security analysis, analysts must consider an attacker’s choices are unknown or if they will be successful in their targets and goals. Building a modular graphical attack model can help calculate uncertainties, which can be done by decomposing the problem into finite small parts, where realistic data can be used to pre-populate all the parameters. These modular graphical attack models should consider the physical paths in the explicit and abstract form. Thus, the near real-time Bayesian network considers the three important uncertainties introduced in a real-time attack (italicized). Using this method is robust as determined by a holistic sensitivity analysis.
- Fink et al. (2011), proposed a mashup of crowdsourcing, machine learning, and natural language processing to dealing both vulnerabilities and careless end user actions, for automated threat detection. In their study, they focused on scam websites and cross-site request forgeries. For scam website identification, the concept of using crowdsourced end users to flag certain websites as a scam is key to this process. The goal is that when a new end user approaches the scam website, a popup appears stating “This website is a scam! Do not provide personal information.” The authors’ solution ties data from heterogeneously common web scam blacklist databases. This solution has high precisions (98%), and high recall (98.1%) on their test of 837 manually labeled sites that was cross-validated using a ten-fold cross -validation analysis between the blacklisted database. The current system’s limitation does not address new threats and different sets of threats.
These studies and articles illustrate that the benefit of using big data analytics for cybersecurity analysis provides the following benefits (Fink et al., 2011; Glick, 2013; IBM Software, 2013; Peterson, 2012; Xie et al., 2010):
(a) moving away from preventative cybersecurity and moving towards real-time analysis to become reactive faster to a current threat;
(b) creating security models that more accurately reflect the reality and uncertainty that exists between the physical paths, successful attacks, and unpredictability of humans for near real-time analysis;
(c) provide a robust identification technique; and
(d) reduction of identifying false positives, which eat up the security team’s time.
Thus, helping security teams to solve difficult issues in real-time. However, this is a new and evolving field that is applying big data analytics. Thus it is expected that many tools will be developed, and the most successful tool would be able to provide real-time cybersecurity data analysis with the huge set of algorithms each aimed at studying different types of attacks. It is even possible for one day to see artificial intelligence to become the next new phase of providing real-time cyber security analysis and resolutions.
- Fink, E., Sharifi, M., & Carbonell, J. G. (2011). Application of machine learning and crowdsourcing to detection of cybersecurity threats. Retrieved from http://repository.cmu.edu/cgi/viewcontent.cgi?article=1049&context=lti
- Glick, B. (2013). Information security is a big data issue. ComupterWeekly.com. Retrieved from http://www.computerweekly.com/feature/Information-security-is-a-big-data-issue
- IBM Software. (2013). Extending security intelligence with big data solutions: Leverage big data technologies to uncover actionable insights into modern, advanced data threats. Retrieved from http://www.ndm.net/siem/pdf/Extending%20security%20intelligence%20with%20big%20data%20solutions.PDF
- Peterson, C. (2012). Big data and the London Olympics cybersecurity challenge. Tech News World. Retrieved from http://www.technewsworld.com/story/75754.html
- Xie, P., Li, J. H., Ou, X., Liu, P., & Levy, R. (2010). Using Bayesian networks for cyber security analysis. In Dependable Systems and Networks (DSN), 2010 IEEE/IFIP International Conference on (pp. 211-220). IEEE.