Data in-motion, Data at-rest, & Data in-use

Data in-motion is the real-time streaming of data from a broad spectrum of technologies, which also encompasses the data transmission between systems (Katal, Wazid, & Goudar, 2013; Kishore & Sharma, 2016; Ovum, 2016; Ramachandran & Chang, 2016).  Data that is stored on a database system or cloud system is considered as data-at-rest and data that is being processed and analyzed is considered as data-in-use (Ramachandran & Chang, 2016).  The analysis of real-time streaming data in a timely fashion is also known as stream reasoning and implementing solutions for stream reasoning revolve around high throughput systems and storage space with low latency (Della Valle et al., 2016). Cisco (2017), stated that data in motion’s value decreases with time, unlike data-at-rest.

Data-in-motion focuses on the velocity and variety portion of the Gartner’s 3Vs of Big Data definition (Della Valle, Dell’Aglio, Margara, 2016). This is becoming an important issue in data analytics due to the emergence of the Internet of Things (IoT), which could be deployed in the cloud and can constitute as the variety portion of data-in-motion (Ovum, 2016).  Della Valle et al. (2016), stated that knowledge had been represented in various ways and the analysis of this data would allow for understanding implicit information hidden in these different forms of explicit knowledge.


Figure 1 is adapted from Della Valle et al. (2016), which is a conceptual model for real-time streaming that can provide a scalable solution for large volumes of data or a large variety of data sources. In this diagram (Figure 1), a wrapper hides the individuality of the data source by transforming it to look like one data source, while the mapping ties all the data together (Della Valle et al., 2016).

Kishore and Sharma (2016) and Ramachandran and Chang (2016), describes in their conceptual model the definition of data-in-motion as data-in-transit from two systems of data-at-rest.  Kishore and Sharma (2016) stated that data is most vulnerable while it is in motion. Given the vulnerabilities of data-in-motion, Kishore and Sharma (2016), discussed that protecting data-in-motion could be done either through encryption or Virtual Private Network (VPN) connections through the entire process. Ramachandran and Chang (2016), stated that encryption is the only security technique for data-in-motion.  However, security is not addressed in Della Valle et al. (2016) system, and this is just one reason of many on why Kishore and Sharma (2016) suggested that security for data-in-motion as an area for future research.

Cisco (2017), illustrates the need for further knowledge and development of data-in-motion research because retailers have the most to benefit from it. For retail environments, data collection and processing is key for thriving and increasing their profit margins because retailers are trying to build brand recognition, brand affinity, and a relationship with their customers.  All of this is done to enhance the customer experience, for example using the data coming from the web camera to create a virtual mirror where the customer can try on accessories and see how this accessory fits their personal style, is creating a customer experience from data-in-motion (Cisco, 2017).  This virtual mirror must use facial recognition technology similar to the use of Snapchat filters.  Other ways retailers could use data-in-motion data is by collecting phone data location and demographic data to create a real-time promotion for nearby travelers or in-store customers (Cisco, 2017).  Finally, Cisco (2017), also discussed how data in motion could help in providing proactive and cost-effective health care, enhancing manufacturing supply chain, provide scalable and secure energy production, etc.


International health care data laws

Governing the way that health is dealt with internationally since 1969 is the International Health Regulations (IHR) and it had been updated in 2005 (Georgetown Law, n.d.; World Health Organization [WHO], 2005). Under Article 45 of the IHR deals with the treatment of personal data (WHO, 2005):

  • Personal identifiable data and information that has been collected or received shall be confidential and processed anonymously.
  • Data can be disclosed for purposes that are vital for public health. However, the data that is transferred must be adequate, accurate, relevant, up-to-date, and not excessive data that has to be processed fairly and lawfully.
  • Bad or incompatible data is either corrected or deleted.
  • Personal data is not kept any longer than what is necessary.
  • WHO will provide data of the patient to the patient upon request in a timely fashion and allow for data correction from the patients

The European Union has the Directive on Data Protection of 1998 (DDP), and Canada has Personal Information Protection and Electronic Documents Act of 2000 (PIPEDA) that is similar to the U.S. HIPAA regulations set forth by the U.S. Department of Health and Human Services (Guiliano, 2014). Eventually, the EU in 2012 proposed the addition of the Data Protection Regulation (DPR) of 2016 (Hordern, 2015, Justice, n.d.).

EU’s DDP allows (Guiliano, 2014):

  • It is outlawed to transfer data to any non-EU entity that doesn’t meet EU data protection standards.
  • The government must give consent before gathering sensitive data for certain situations only
  • Only data that is needed at the time that has an explicit and reasonPable purpose.
  • Patients should be allowed to correct errors in personal data, and if the data is outdated or useless, they must be discarded.
  • People with access to this data must have been properly trained.

EU’s DPR allows (Hordern, 2015; Justice, n.d.):

  • People can allow for data to be used for future scientific research where the purpose is still unknown as long as the research is conducted by “recognized ethical ”
  • Processing data for scientific studies based on the data that has already been collected is legal without the need to get additional consent
  • Health data may be used without the consent of the individual for public health
  • Health data cannot be used by employers, insurance, and banking companies
  • If data is being or will be used for future research, data can be retained further than current regulations

Canadian’s PIPEDA allows (Guiliano, 2014):

  • Patients should know the business justification for using their personal and medical data.
  • Patients can review their data and have errors corrected
  • Organizations must request from their patients the right to use their data for each situation except in criminal cases or emergencies
  • Organizations cannot collect patient and medical data that is not needed for the current situation unless they ask for permission from their patients and telling them how it will be used and who will use it.

Other Internal laws or regulations regard big data from Australia, Brazil, China, France, Germany, India, Israel, Japan, South Africa and the United Kingdom are summarized in the International and Comparative Study on Big Data (der Sloot & van Schendel, 2016).  When it comes to transferring U.S. collected and processed data internationally, the U.S. holds all U.S. regulated entities liable to all U.S. data regulations (Jolly, 2016).  Some states in the U.S. further restrict the export of personal data to international entities (Jolly, 2016).  Thus, any data exported or imported from other countries must deal with the regulations of the country (or state) of origin and those of the country (or state) to which it is exported in.

In the United Kingdom, a legal case on health care data was presented and was ruled upon.  This case dealt with the rate of de-identifiable primary care physician prescription habits data breached confidentiality laws because of the lack of consent (Knoppers, 2000).  The consent had to cover both commercial and public issues purposes.  This lack of both types of consent meant that there was a misuse of data. In the Supreme Court of Canada, consent was not collected properly and violated the expectation of privacy between the patients and private healthcare provider (Knoppers, 2000).  All of these laws and regulations amongst international and domestic views of data usage, consent, and expectation of privacy with healthcare data all are trying to protect people from the misuse of data.


Data auditing for health care

Data auditing is assessing the quality and fit for purpose of data via key metrics and properties of the data (Techopedia, n.d.).  Data auditing processes and procedures are the business’ way of assessing and controlling their data quality (Eichhorn, 2014). Doing data audits allows a business to fully realize the value of their data and provides higher fidelity to their data analytics results (Jones, Ross, Ruusalepp, & Dobreva, 2009). Data auditing is needed because the data could contain human error or it could be subject to IT data compliance like HIPAA, SOX, etc. regulations (Eichhorn, 2014). When it comes to health care data audits, it can help detect unauthorized access to confidential patient data, reduce the risk of unauthorized access to data, help detect defects, help detect threats and intrusion attempts, etc. (Walsh & Miaolis, 2014).

Data auditors can perform a data audit by considering the following aspects of a dataset (Jones et al., 2009):

  • Data by origin: observation, computed, experiments
  • Data by data type: text, images, audio, video, databases, etc.
  • Data by Characteristics: value, condition, location

A condensed data audits process for research is proposed by Shamoo (1989):

  • Select published, claimed, or random data from a figure, table, or data source
  • Evaluate if all the formulas and equations are correct and used correctly
  • Convert all the data into numerical values
  • Re-derive the original data using the formulas and equations
  • Segregate the various parameters and values to identify the sources of the original data
  • If the data is the same as those in (1), then the audit turned up no quality issues, if not a cause analysis needs to be conducted to understand where the data quality faulted
  • Formulate a report based on the results of the audit

Jones et al. (2009) provided a four stage process with a detailed swim lane diagram:


For some organizations, it is the creation of log file for all data transactions that can aid in improving data integrity (Eichhorn, 2014).  The creation of the log file must be scalable and separated from the system under audit (Eichhorn, 2015).  Log files can be created for one system or many. Meanwhile, all the log files should be centralized in one location, and the log data must be abstracted into a common and universal format for easy searching (Eichhorn, 2015). Regardless of the techniques, HIPAA section 164.308-3012 talk about information and audits in the health care system (Walsh & Miaolis, 2014).

HIPAA has determined key activities for a healthcare system to have a data auditing protocol (Walsh & Miaolis, 2014):

  • Determine the activities that will be tracked or audited: creating a process flow or swim lane diagram like the one above, involve key data stakeholders, and evaluate which audit tools will be used.
  • Select the tools that will be deployed for auditing and system activity reviews: one that can detect unauthorized access to data, ability to drill down into the data, collect audit logs, and present the findings in a report or dashboard.
  • Develop and employ the information system activity review/audit policy: determine the frequency of the audits and what events would trigger other audits.
  • Develop appropriate standard operating procedures: to deal with presenting the results, dealing with the fallout of what the audit reveals, and efficient audit follow-up


Sample HIPAA compliance Memoranda

Memoranda Title: Healthcare industry: Data privacy requirements per the Health Insurance Portability Accountability Act (HIPAA)

Date: March 1, 2017

Introduction and Problem Definition

Health care data can be used for providing preventative and emergent health care to health care consumers.  The use of this data in aggregate can provide huge datasets, which will allow big data analytics find hidden patterns that could be used to improve healthcare.  However, the Health Insurance Portability Accountability Act (HIPAA) is a health care consumer data protection act, which must be followed.  This Act protects health care consumers’ data from being improperly disclosed or used; and any data exchanged between health care providers, health plans, and healthcare clearinghouse should be necessarily minimized for both parties to accomplish their tasks (Health and Human Services [HHS], n.d.a.; HHS, n.d.b.).  Though the use of big health care data is promising, we must follow our Hippocratic Oath, and HIPAA is the way of keeping our oath while providing new services to our consumers.


All health care data either physical or mental from a person’s past, present and future is protected under HIPAA (HHS, n.d.a). According to the HHS (n.d.b.), groups with health care consumers’ data should always place limits on those who have read, write, and edit access to the data.  Identifiable data can include name, address, birth date, social security number, other demographic data, mental and physical health data or condition, and health care payments (HHS, n.d.a). Any disclosure of health data must be obtained from the individual is via a consent form that states specifically who will get what data and for what purposes (HHS, n.d.a; HHS, n.d.b.).

Consequences of data breaches

A violation is obtaining or disclosing individually identifiable health information (Indest, 2014). Those that are subject to follow the HIPAA regulations are health plans, healthcare providers, and health care clearinghouses (HHS, n.d.a.; HHS, n.d.b.). Any violations by any of the abovementioned parties that have been detected must be corrected within 30 days of discovery to avoid any of the civil or criminal penalties (up to one year of imprisonment) from an HIPAA Violations (Indest, 2014).

Table 1: List of tiered civil penalties for HIPAA Violations (HHS, n.d.a.; Indest, 2014).

HIPAA Violation Minimum Penalty Maximum Penalty
Unknowingly causing a violation $100 per violation until $25K is reached per year $50K per violation until $1.5M is reached per year
Reasonable violation not done by willful neglect $1K per violation until $100K is reached per year $50K per violation until $1.5M is reached per year
Willful neglect with a corrective action plan but requiring time to enact $10K per violation until $250K is reached per year $50K per violation until $1.5M is reached per year
Willful neglect with no corrective action plan $50K per violation until $1.5M is reached per year $50K per violation until $1.5M is reached per year


Data privacy and governance in health care

Lawyers define privacy as (Richard & King, 2014):

  1. Invasions into protecting spaces, relationships or decisions
  2. Collection of information
  3. Use of information
  4. Disclosure of information

Given the body of knowledge of technology and data analytics, data collection and analysis may give off the appearance of a “Big Brother” state (Li, 2010). The Privacy Act of 1974, prevents the U.S. government from collecting its citizen’s data and storing in databases, but it does not expand to companies (Brookshear & Brylow, 2014).  Confidentiality does exist for health records via the Health Insurance Portability and Accountability Act (HIPAA) of 1996, and for financial records through the Fair Credit Act, which also allows people to correct erroneous information in the credit (Richard & King, 2014). The Electronic Communication Privacy Act of 1986 limits wiretapping communications by the government, but it does not expand to companies (Brookshear & Brylow, 2014). The Video Privacy Protection Act of 1988 protects people via videotaped records (Richard and King, 2014). Finally, in 2009 the HITECH Act, strengthened the enforcement of HIPAA (Pallardy, 2015). Some people see the risk of the loss of privacy via technology and data analytics, while another embrace it due to the benefits they perceive that they would gain from disclosing this information (Wade, 2012).  All of these privacy protection laws are outdated and do not extend to the rampant use, collection, and mining of data based on the technology of the 21st century.

However, Richard and King (2014), describe that a binary notion of data privacy does not exist.  Data is never completely private/confidential nor completely divulged, but data lies in-between these two extremes.  Privacy laws should focus on the flow of personal information, where an emphasis should be placed on a type of privacy called confidentiality, where data is agreed to flow to a certain individual or group of individuals (Richard & King, 2014).  Thus, from a future legal perspective data privacy should focus on creating rules on how data should flow, be used, and the concept of confidentiality between people and groups.  Right now the only thing preventing abuse of personal privacy from companies is the negative public outcry that will affect their bottom line (Brookshear & Brylow, 2014).

Healthcare Industry

In the healthcare industry, patients and healthcare providers are concerned about data breaches, where personal confidential information could be accessed, and if a breach did occur 54% of patients were willing of switching from their current provider (Pallardy, 2015).

In healthcare, if data gets migrated into a public cloud rather than a community cloud-specific to healthcare, the data privacy enters into legal limbo.  According to Brookshear and Brylow (2014), cloud computing data privacy and security becomes an issue because, in a public cloud, healthcare will not own the infrastructure that houses the data.  HIPAA government regulations provide patient privacy standard that the healthcare industry must follow.  HIPAA covers a patient’s right to privacy by asking for permission on how to use their personally identifiable information in medical records, personal health, health plans, healthcare clearinghouses, and healthcare transactions (HHS, n.d.b.).  The Department of Health & Human Services collects complaints that deal directly with a violation of the HIPAA regulations (HHS, n.d.a.).  Brown (2014), outlines the cost of each violation that is based on the type of violation, the willful or willful neglect, and how many identical violations have occurred, where penalty costs can range from $10-50K per incident. Industry best practices on how to avoid HIPAA violations come from (Pallardy, 2015):

  • De-identify personal data: Names, Birth dates, death dates, treatment dates, admission dates, discharge dates, telephone numbers, contact information, address, social security numbers, medical record numbers, photographs, finger and voice prints, etc.
  • Install technical controls: anti-malware, data loss prevention, two-factor authentication, patch management, disc encryption, and logging and monitoring software
  • Install certain security controls: Security and compliance oversight committee, formal security assessment process, security incident response plan, ongoing user awareness and training, information classification system, security policies