Adv Topics: Security Issues with Cloud Technology

Big data requires huge amounts of resources to analyze it for data driven decisions, thus there has been a gravitation towards cloud computing to work in this era of big data (Sakr, 2014). Cloud technology is different than personal systems that place different demands on cyber security, where personal systems could have single authority systems and cloud computing systems, have no individual owners, have multiple users, groups rights, and shared responsibility (Brookshear & Brylow, 2014; Prakash & Darbari, 2012). Cloud security can be just as good or better than personal systems because cloud providers could have the economies of scales that can support a budget to have an information security team that many organizations may not be able to afford (Connolly & Begg, 2014). Cloud security can be designed to be independently modular, which is great for heterogenous distributed systems (Prakash & Darbari, 2012).

For cloud computing eavesdropping, masquerading, message tampering, replaying the message, and denial of services are security issues that should be addressed (Prakash & Darbari, 2012). Sakr (2014) stated that exploitation of co-tenancy, a secure architecture for the cloud, accountability for outsourced data, confidentiality of data and computation, privacy, verifying outsourced computation, verifying capability, cloud forensics, misuse detection, and resource accounting and economic attacks are big issues for cloud security. This post will discuss the exploitation of co-tendency and confidentiality of data and computation.

Exploitation of Co-Tenancy: An issue with cloud security is within one of its properties, that it is a shared environment (Prakash & Darbari, 2012; Sakr, 2014). Given that it is a shared environment, people with malicious intent could pretend to be someone they are not to gain access, in other words masquerading (Prakash & Darbari, 2012). Once inside, these people with malicious intent tend to gather information about the cloud system and the data contained within it (Sakr, 2014). Another way these services could be used by malicious people is to use the computational resources of the cloud to carry out denial of service attacks on other people.   Prakash and Darbari (2012) stated that two-factor authentications were used on personal devices and for shared distributed systems, there has been proposed a use of a three-factor authentication. The first two factors are the use passwords and smart cards. The last one could be either biometrics or digital certificates. Digital certificates can be used automatically to reduce end-user fatigue on using multiple authentications (Connolly & Begg, 2014). The third level of authentication helps to create a trusted system. Subsequently, a three-factor authentication could primarily mitigate masquerading. Sakr (2014), proposed using a tool that hides the IP addresses the infrastructure components that make up the cloud, to prevent the cloud for being used if the entry is granted to a malicious person.

Confidentiality of data and computation: If data in the cloud is accessed malicious people can gain information, and change the content of that information. Data stored on the distributed systems are sensitive to the owners of the data, like health care data which is heavily regulated for privacy (Sakr, 2014). Prakash and Darbari (2012) suggested the use of public key cryptography, software agents, XML binding technology, public key infrastructure, and role-based access control are used to deal with eavesdropping and message tampering. This essentially hides the data in such a way that it is hard to read without key items that are stored elsewhere in the cloud system. Sakr (2014) suggested homomorphic encryption may be needed, but warns that the use of encryption techniques increases the cost and time of performance. Finally, Lublinsky, Smith, and Yakubovich (2013), stated that encrypting the network to protect data-in-motion is needed.

Overall, a combination of data encryption, hiding IP addresses of computational components, and three-factor authentication may mitigate some of the cloud computing security concerns, like eavesdropping, masquerading, message tampering, and denial of services. However, using these techniques will increase the time it takes to process big data. Thus a cost-benefit analysis must be conducted to compare and contrast these methods while balancing data risk profiles and current risk models.

Resources:

  • Brookshear, G., & Brylow, D. (2014). Computer Science: An Overview, (12th ed.). Pearson Learning Solutions. VitalBook file.
  • Connolly, T., & Begg, C. (2014). Database Systems: A Practical Approach to Design, Implementation, and Management, 6th Edition. Pearson Learning Solutions. VitalBook file.
  • Lublinsky, B., Smith, K., & Yakubovich, A. (2013). Professional Hadoop Solutions. Wrox. VitalBook file.
  • Prakash, V., & Darbari, M. (2012). A review on security issues in distributed systems. International Journal of Scientific & Engineering Research, 3(9), 300–304.
  • Sakr, S. (2014). Large scale and big data: Processing and management. Boca Raton, FL: CRC Press.

Big Data Analytics: Privacy & HIPAA

Since its inception 25 years ago, the human genome project has been sequenced many 3B base pair of the human genomes (Green, Watson, & Collins, 2015).  This project has given rise of a new program, the Ethical, Legal and Social Implication (ELSI) project.  ELSI got 5% of the National Institute of Health Budget, to study ethical implications of this data, opening up a new field of study (Green et al., 2015 & O’Driscoll, Daugelaite, & Sleator, 2013).  Data sharing must occur, to leverage the benefits of the genome projects and others like it.  Poldrak and Gorgolewski (2014) stated that the goals of sharing data help out with the advancement of the field in a few ways: maximizing the contribution of research subjects, enabling responses to new questions, enabling the generation of new questions, enhance research results reproducibility (especially when the data and software used are combined), test bed for new big data analysis methods, improving research practices (development of a standard of ethics), reducing the cost of doing the science (what is feasible for one scientist to do), and protecting valuable scientific resources (via indirectly creating a redundant backup for disaster recovery).  Allowing for data sharing of genomic data can present ethical challenges, yet allow for multiple countries and disciplines to come together and analyze data sets to come up with new insights (Green et al., 2015).

Richards and King (2014), state that concerning privacy, we must think of it regarding the flow of personal information.  Privacy cannot be thought of as a binary, as data is private and public, but within a spectrum.  Richards and Kings (2014) argue that the data as exchanged between two people has a certain level of expectation of privacy and that data can remain confidential, but there is never a case were data is in absolute private or public.  Not everyone in the world would know or care about every single data point, nor will any data point be kept permanently secret if it is uttered out loud from the source.  Thus, Richards and Kings (2014) stated that transparency can help prevent abuse of the data flow.  That is why McEwen, Boyer, and Sun (2013) discussed that there could exist options for open-consent (your data can be used for any other future research project), broad-consent (describe various ways the data could be used, but it is not universal), or an opt-out-consent (where participants can say what their data shouldn’t be used for).

Attempts are being made through the enactment of Genetic Information Nondiscrimination Act (GINA) to protect identifying data for fears that it can be used to discriminate against a person with a certain type of genomic indicator (McEwen et al., 2013).  Internal Review Boards and Common Rules, with the Office of Human Research Protection (OHRP), have guidance on information flow that is de-identified.  De-identified information can be shared and is valid under current Health Insurance Portability and Accounting Act of 1996 (HIPAA) rules (McEwen et al, 2013).  However, fear of loss of data flow control comes from increase advances in technological decryption and de-anonymisation techniques (O’Driscoll et al., 2013 and McEwen et al., 2013).

Data must be seen and recognized as a person’s identity, which can be defined as the “ability of individuals to define who they are” (Richards & Kings, 2014). Thus, the assertion made in O’Driscoll et al. (2013) about how the ability to protect medical data, with respects to bid data and changing concept, definitional and legal landscape of privacy is valid.  Thanks to HIPAA, cloud computing, is currently on a watch list. Cloud computing can provide a lot of opportunity for cost savings. However, Amazon cloud computing is not HIPAA compliant, hybrid clouds could become HIPAA, and commercial cloud options like GenomeQuest and DNANexus are HIPAA compliant (O’Driscoll et al., 2013).

However, ethical issues extend beyond privacy and compliance.  McEwen et al. (2013) warn that data has been collected for 25 years, and what if data from 20 years ago provides data that a participant can suffer an adverse health condition that could be preventable.  What is the duty of the researchers today to that participant?  How far back in years should that go through?

Other ethical issues to consider: When it comes to data sharing, how should the researchers who collected the data, but didn’t analyze it should be positively incentivized?  One way is to make them co-author of any publication revolving their data, but then that makes it incompatible with standards of authorships (Poldrack & Gorgolewski, 2013).

 

Resources:

  • Green, E. D., Watson, J. D., & Collins, F. S. (2015). Twenty-five years of big biology. Nature, 526.
  • McEwen, J. E., Boyer, J. T., & Sun, K. Y. (2013). Evolving approaches to the ethical management of genomic data. Trends in Genetics, 29(6), 375-382.
  • Poldrack, R. A., & Gorgolewski, K. J. (2014). Making big data open: data sharing in neuroimaging. Nature Neuroscience, 17(11), 1510-1517
  • O’Driscoll, A., Daugelaite, J., & Sleator, R. D. (2013). ‘Big data,’ Hadoop and cloud computing in genomics. Journal of biomedical informatics, 46(5), 774-781.
  • Richards, N. M., & King, J. H. (2014). Big data ethics. Wake Forest L. Rev., 49, 393.