Adv DB: Data Services in the Cloud Service Platform

Rimal et al (2009), states that Cloud Computing Systems are a disruptive service that has gained momentum. What makes it disruptive is that it has similar properties of prior technology, while adding new features and capabilities (like big data processing) in a very cost-effective way.   It has become part of the XaaS (where X can be infrastructure, hardware, software, etc.) as a Service.  According to Connolly & Begg (2014), Data as a Service (DaaS) and Database as a Service (DBaaS) are considered as cloud-based solutions. DaaS doesn’t use SQL interfaces, but it does enable corporations to access data to analyze value streams that they own or those they can easily expand into. DBaaS must be continuously monitored and improved on, because they usually serve multiple organizations, with the added benefit of providing charge-back functions per organization (Connolly & Begg, 2014) or a pay-for-use model (Rimal et al, 2009).  However, one must pick a solution that best serves their business’/organization’s needs.

Benefits of the service

Connolly & Begg (2014), stated that there are benefits to cloud computing such as Cost-reduction due to lower CapEx, ability to scale up or down based on data demands, needs for higher security making data stored here more secure than in-house solutions, 24/7 reliability can be provided, faster development time because time is not take away from building the DBMS from scratch, finally sensitivity/load testing can be made readily available and cheaply because of lower CapEX.  Rimal et al (2009), stated that the benefits came from lower costs, improved flexibility and agility, and scalability.  You can set up systems as quickly as you wish, under a pay-as-you-go model and is great for disaster recovery efforts (as long as the disaster affected your systems, not theirs).   Other benefits could be seen from an article looking at the Data-as-a-Service in the health field: a low cost to implementation and maintainability of databases, defragmentation of data, exchange of patient data across the heady provider organization, and provide a mode for standardization of data types, forms, and frequency to capture data.  From a health-care perspective, it could lead to supporting research, strategic planning of medical decisions, improve data quality, reduce cost, reduce resource scarcity issues from an IT perspective, and finally provide better patient care (AbuKhousa et al, 2012).

Problems that can be removed because of the service

Unfortunately, there are two sides to a coin.  Given that on a cloud service there exists network dependency, such that if the supplier has a power outage, the consumer will not have access to their data.  Other network dependencies can occur like peak service hours, where service tends to be degraded compared to if the company used the supplier during its off-peak hours.  Quite a few organizations use Amazon EC2 (Rimal et al, 2009) as their Cloud DBMS, which if that system is hacked or security is breached the problem is bigger than if it were carried out to only one company. There are system dependencies, like in the case of disaster recovery (AbuKhousa et al, 2012), organizations are as strong as their weakest link when it comes to a disaster, if the point of failure in the service, and there are no other mitigation plans, that organization may have a hard time recuperating their losses.  Also, placing data into these services, you lose control over the data (lose control over availability, reliability, maintainability, integrity, confidentiality, intervenability, and isolation) (Connolly & Begg, 2014).  Rimal et al (2009) stated clear examples of outages that existed in Services like Microsoft (down 22 hours in 2008), or Google Gmail (2.5 hours in 2009), etc.  All of these lack of control points is perhaps one of the main reasons why certain government agencies have had a hard time adopting a commercialized cloud service provider, however they are attempting to create internal clouds.

Architectural view of the system

The overall cloud architecture is the layered system that serves as a single point of contact and uses software applications over the web, using an infrastructure, which draws on resources from necessary hardware to complete a task (Rimal et al, 2009).  Adopted from Rimal et al (2009) the figure below is what these authors describe as the layered cloud architecture.

Software-as-a-Service (SaaS)

Platform-as-a-Service (PaaS)

Developers implementing cloud applications

Infrastructure-as-a-Service (IaaS)

[(Virtualizations, Storage Networks) as-a-Service]

Hardware-as-a-Service (HaaS)

Figure 1. A layered cloud architecture.

Cloud DBMS can be private (an internal provider where the data is held within the organization), public (an external provider manages resources dynamically across their system and through multiple organizations supplying them data), and hybrid (consists of multiple internal and external providers) as defined in Rimal et al (2009).

A problem with DBaaS is the fact that databases between multiple organizations are stored by the same service provider.  Where data can be what separates one organization from its competitors, they must consider the following architecture: Separate Servers, Shared Server but different database processes, Shared databases but separate databases, Shared databases but separate schema, or Shared databases but shared schema (Connolly & Begg, 2014).

Let’s take two competitors: Walmart & Target.  Two supergiant organizations that have trillions of dollars and inventory flowing in and out of their systems monthly.  Let’s also assume three database tables with automated primary keys and foreign keys that connect the data together: Product, Purchasing, and Inventory.  Another set of assumptions: (1) their data can be stored by the same DBaaS provider, (2) their DBaaS provider is Amazon.

While Target and Walmart may use the same supplier for their DBaaS, they can select one of those five architectural solutions to avoid their valuable data to be seen.  If Walmart and Target purchased separate servers, their data can be safe.  They could also go this route if they want to store their huge data and/or have many users of their data. Now if we narrow down our assumptions to the children’s toy section (due to breaking up the datasets to manageable chunks), both Target and Walmart can store their data on a shared server but on separate database processes, they would not have any shared resources like memory or disk, just the virtual environment.  If Walmart and Target went on a shared database server but separate databases, would allow for better resource management between each organization.  If Walmart and Target decided to use the same database (which is unlikely) but hold separate schema, data must have strong access management systems in place.  This may be elected between CVS and Walgreen’s Pharmacy databases, where patient data can be vital, but not impossible to switch from one schema to another, however, this interaction is highly unlikely.   The final structure, highly unlikely for most corporations is sharing databases and schemas.  This final architectural structure is best used for hospitals sharing patient data (AbuKhousa et al, 2012), but, HIPPA must be observed still under this final architecture mode.  Though this is the desired state for some hospitals, it may take years to get to a full system.  Security is a big issue here and will take a ton of developmental hours, but in the long run, it is the cheapest solution available (Connolly & Begg, 2014).

Requirements on services that must be met

Looking at Amazon’s cloud database offering, it should be easy to set up, easy to operate, and easy to scale.  The system should enhance availability and reliability to its live databases compared to in house solutions. There should be software in the databases to back up the database, for recovery at any time. Security patches and maintenance of the underlying hardware and software of the cloud DBMS should be reduced significantly since that is not the burden that should be placed onto the organization. The goal of the Cloud DBMS should be to remove development costs away from managing the DBMS to focus on applications and the data to be stored in the system.  They should also provide infrastructure and hardware as a service to reduce overhead costs in managing these systems.  Amazon’s Relational Database Service can use MySQL, Oracle, SQL Server, or PostgreSQL databases.  Amazon’s DynamoDB is a NoSQL database service.  Amazon’s Redshift costs less than $1K to store a TB of data per year (Varia & Mathew, 2013).

Issues to be resolved during implementation

Rimal et al in 2010, stated some interesting things to consider during a before implementing a Cloud DBMS.  What are the Service-Level Agreements (SLAs) of the supplier Cloud DBMS?  The Cloud DBMS may be up and running 24×7, but if they experience a power outage, what is their SLAs to the organization, as not to impact the organization at all?  What is their backup/replication scheme?  What is their discovery (assists in reusability) schema? What is their load balancing (trying to avoid bottlenecks), especially since most suppliers cater to more than one organization? What does their resource management plan look like? Most cloud DBMS have several copies of the data spread across several servers, so how sure is the vender to ensure no data loss?  What types of security are provided? What is their encryption and decryption strength for the data held within its servers?  How private will the organization’s data be, if hosted on the same server or same database but separate schema?   What are their authorization and Authentication safeguards?  Looking at Varia & Mathew (2013) explain all the cloud DBMS services that Amazon provides, these questions are definitely things that should be addressed for each of their solutions.  Thus, when analyzing a supplier for a cloud DBMS, having technical requirements that meet the Business Goals & Objects (BG&Os) is great to help guide the organization pick the right supplier and solution, given issues that need to be resolved.  Other issues identified came from Chaudhuri (2012): data privacy through access control, auditing, and statistical privacy; allow for data exploration to enable deeper analytics; data enrichment with web 2.0 and social media; query optimization; scalable data platforms; manageability; and performance isolation for multiple organizations occupying the same server.

Data migration strategy & Security Enforcement

When migrating data between organizations into a Cloud DBMS, the taxonomy of the data must be preserved.  Along with taxonomy, one must consider that no data is lost in the transfer, that data is still available to the end-user, before, during and after the migration, and that the transfer is done in a cost-efficient way (Rimal et al 2010).  Furthermore, data migration should be done seamlessly and efficiently as if one were to move between suppliers of services, such that a supplier doesn’t get too entangled into the organization that it is the only solution the organization can see itself as using.  Finally, what type of data do you want to migrate over, mission-critical data may be too precious on certain types of Cloud DBMS, but may be great for a virtualized disaster recovery system?  What type of data to migrate over depends on the needs of a cloud system, to begin with, and what services does the organization want to pay-as-they-go now and in the near future. The type of data to migrate may also depend on the security features provided by the supplier.

Organizational information is a vital resource to any organization, and access to it and maintaining it proprietary is key. If not enforced data like employee social security numbers can be compromised, or credit card numbers of past consumers.  Rimal et al (2009) compared the security considerations in the current cloud systems at the time:

  • Google: uses 128 Bit or higher server authentication
  • GigaSpaces: SSH tunneling
  • Microsoft Azure: Token services
  • OpenNebula: Firewall and virtual private network tunnel

In 2010, Rimal et al, further expanded the security considerations by the suggestion that organizations should look into: authentication and authorization protocols (access management), privacy and federated data, encryption and decryption schemes, etc.

References

  • AbuKhousa, E., Mohamed, N., & Al-Jaroodi, J. (2012). e-Health cloud: opportunities and challenges. Future Internet, 4(3), 621-645.
  • Chaudhuri, S. (2012, May). What next?: a half-dozen data management research goals for big data and the cloud. In Proceedings of the 31st symposium on Principles of Database Systems (pp. 1-4). ACM.
  • Connolly, T., & Begg, C. (2014). Database Systems: A Practical Approach to Design, Implementation, and Management, 6th Edition. [VitalSource Bookshelf version]. Retrieved from http://online.vitalsource.com/books/9781323135761/epubcfi/6/210
  • Rimal, B. P., Choi, E., & Lumb, I. (2009, August). A taxonomy and survey of cloud computing systems. In INC, IMS and IDC, 2009. NCM’09. Fifth International Joint Conference on (pp. 44-51). Ieee.
  • Rimal, B. P., Choi, E., & Lumb, I. (2010). A taxonomy, survey, and issues of cloud computing ecosystems. In Cloud Computing (pp. 21-46). Springer London.
  • Varia, J., & Mathew, S. (2013). Overview of amazon web services. Jan-2014.

Health care as a Service (HaaS) – cloud solution

Health cloud-healthcare as a service (HaaS) Case Study (John & Shenoy, 2014):

The goal of this study is to provide the framework to build Health Cloud, a healthcare system that helps solve some of the issues currently dealt with in the Healthcare data analytics field. Especially, when paper images and data are limiting to only that healthcare provider’s facility until it is faxed, scanned, or mailed. The Health Cloud will be able to: store and index medical data, image processing, report generating, charting, trend analysis, and be secured with identification and access control.  The image processing capabilities of Health Cloud enable for better medical condition diagnosis of a patient.  The image processing structure was built using C++ Code for processing, to request data and to report out is done in Binary JSON (BSON) or text formats. Finally, the system built allows for the image to be framed, visualized, panned, zoomed, and annotated.

Issues related to health care data on the cloud (John & Shenoy, 2014):

  1. The number of MIR data has doubled in a decade, and CT data has increased by 50%, increasing the number of images primary providers are requesting on their patient to improve and create informed patient care. Thus, there is a need for hyper-scale cloud features.
  2. Health Insurance Portability and Accountability Act (HIPAA) requires data to be stored for six years after a patient has been discharged therefore increasing the volume of data. Consequently, there is another need for hyper-scale cloud features.
  3. Healthcare data should be able to be sharing medical data from anywhere and at any time per the Health Information Technology for Economic and Clinical Heath Act (HITECH) and American Recover and Reinvestment Act (ARRA), which aim to reduce duplication of data and improve data quality and access. HIPAA has created security regulations on data backup, recovery, and access. Hence, there is a need to have a community cloud provider familiar with HIPAA and other Regulations.
  4. Each hospital system is developed in silos or purchased from different suppliers. Thus, if data is shared, it may not be in the format that is easily received by the other Thus a common architecture and data model must be developed.  This can be resolved under a community cloud.
  5. Creation of seamless access to the data stored in the cloud among various mobile platforms. Thus, a cloud provided option such as a Software as a Service may best fit this requirement.
  6. Healthcare workflows are better managed in cloud-based solutions versus paper-based
  7. Cloud capabilities can be used for processing data, depending on what is purchased from which supplier.

Pros and Cons of healthcare data on the public or private cloud:

On-site private clouds can have limited storage space, and the data may not be in a format that is easily transferable to other on-site private clouds (Bhokare et al., 2016). Upgrades, maintenance, and infrastructure costs fall 100% of the health care providers.  Although these clouds are expensive, they offer the most control of their data and more control over-specialization of reports.

Public clouds distribute the cost of the upgrades, maintenance, and infrastructure to all others requesting the servers (Connolly & Begg, 2014). However, the servers may not be specialized 100% to all regulatory and legal specifications, or the servers could have additional regulatory and legal specification not advantageous to the healthcare cloud system.  Also, data stored on public clouds are shared with other companies, which can leave healthcare providers feeling vulnerable with their data’s security within the public cloud (Sumana & Biswal, 2016).

The solution should be a private or public community cloud.  A community cloud environment is a cloud that is shared exclusively by a set of companies that share the similar characteristics, compliance, security, jurisdiction, etc. (Connolly & Begg, 2014). Thus, the infrastructure of all of these servers and grids meet industry standards and best practices, with the shared cost of the infrastructure is maintained by the community. Certain community services would be optimized for HIPAA, HITECH, ARRA, etc. with little overhead to the individual IT teams that make up the overall community (John & Shenoy, 2014).

Reference

  • Bhokare, P., Bhagwat, P., Bhise, P., Lalwani, V., & Mahajan, M. R. (2016). Private Cloud using GlusterFS and Docker.International Journal of Engineering Science5016.
  • Connolly, T., Begg, C. (2014). Database Systems: A Practical Approach to Design, Implementation, and Management, (6th). Pearson Learning Solutions. [Bookshelf Online].
  • John, N., & Shenoy, S. (2014). Health cloud-Healthcare as a service (HaaS). InAdvances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on (pp. 1963-1966). IEEE.
  • Sumana, P., & Biswal, B. K. (2016). Secure Privacy Protected Data Sharing Between Groups in Public Cloud.International Journal of Engineering Science3285.