Data brokers for health care

Data brokers are tasked collecting data from people, building a particular type of profile on that person, and selling it to companies (Angwin, 2014; Beckett, 2014; Tsesis, 2014). The data brokers main mission is to collect data and drop down the barriers of geographic location, cognitive or cultural gaps, different professions, or parties that don’t trust each other (Long, Cunningham, & Braithwaite, 2013). The danger of collecting this data from people can raise the incidents of discrimination based on race or income directly or indirectly (Beckett, 2014).  Some prominent data brokers are Acxiom, Google, and LexisNexis (Tsesis, 2014).

When it comes to companies, it is unknown which data is being collected from people and sold to other companies (Beckett, 2014).  According to Tsesis (2014), the fourth amendment of the constitution does not apply here, due to the nature on how the data is collected and correlated by data mining of third party entities.  So what kind of data do they have?  Current data brokers do have data obtained from company’s people shop at or credit/loans applied for like: names, address, contact info, demographics, occupation, education level, parents’ names, children’s names, gender of the person’s children, hobbies, purchases, salary, and other data that is unknown (Beckett, 2014).

Sensitive data is protected from these commercial data brokers, like medical records, doctor-patient conversations (Beckett, 2014). This is due to HIPAA (Health Insurance Portability and Accountability Act of 1996), which helps de-identify patient data. However, there is always ways around this.  Companies and health insurances are buying online search data, allergy data, dieting data and are correlating it other data to build a health profile on the person (Beckett, 2014). There is a benefit of having in-house data brokers in hospitals where data is stored in silos (Long et al., 2013):

  • Brokers can bring specialized subject matter expertise to connect distributed data for improving patient care and improve healthcare service efficiency.
  • Brokers can help reduce redundant data held in silos.
  • Brokers can increase access to heterogeneous knowledge, though gathering and increasing tacit knowledge. This type of knowledge is derived from a different groups or networks thus the knowledge is a different source of new information.
  • Brokers efforts can help generate innovations.

However, data collected and correlated by data brokers could still be completely wrong as proven with the credit score information from the three big credit agencies (Angwin, 2014; Beckett, 2014). If the data is wrong in totality or partially, it can draw the wrong conclusions on the person and if it is used for discrimination just compounds the problem.  Long et al. (2013), concluded that brokers even in the field of healthcare are an expensive endeavor and as the primary gatekeeper of data they could be overwhelmed. Angwin (2014), compiled a list of 212 commercial data brokers, with about 92 of them allowing for opting-out.  Tsesis (2014), stated that many scholars in the field of privacy are advocating for a person’s right to opt-out of data they did not give consent to be collected. This suggests that with the current law and regulations, data collection and correlation to a person’s profile is currently unavoidable, yet there is a chance that some of that data is wrong, to begin with.

References

Fraud detection in the health care industry using analytics

Fraud is deception, fraud detection is really needed, because as fraud detection algorithms are improving, the rate of fraud is increasing (Minelli, Chambers, &, Dhiraj, 2013). Hadoop and the HFlame distribution have to be used to help identify fraudulent data in other companies like banking in near-real-time (Lublinsky, Smith, & Yakubovich, 2013).

Data mining has allowed for fraud detection via multi-attribute monitoring, where it tries to find hidden anomalies by identifying hidden patterns through the use of class description and class discrimination (Brookshear & Brylow, 2014; Minellli et al., 2013). Class Descriptions identify patterns that define a group of data, and class discrimination identifies patterns that divide groups of data (Brookshear & Brylow, 2014). As data flows in, data is monitored through validity check and detection rules and gives them a score, such that if the validity and detection score surpasses a threshold, that data point is flagged as potentially suspicious (Minelli et al., 2013).

This is a form of outlier data mining analysis, where data that doesn’t fit any of the above groups of data that has been described and discriminated can be used to identify fraudulent data (Brookshear & Brylow, 2014; Connolly & Begg, 2014). Minelli et al. (2013), stated that using historical data to build up the validity check and detection rules with real-time data can help identify outliers in near-real time. However, what about predicting fraud?  In the future, companies will be using Hadoop’s machine learning capability paired with its fraud detection algorithms to provided predictive modeling of fraud events (Lublinsky, Smith, & Yakubovich, 2013).

A process mining framework for the detection of healthcare fraud and abuse case study (Yang & Hwang, 2006)

Fraud exists in processing health insurance claims because there are more opportunities to commit fraud because there are more channels of communication: service providers, insurance agencies, and patients. Any one of these three people can commit fraud, and the highest chance of fraud occurs where service providers can do unnecessary procedures putting patients at risk. Thus this case study provided the framework on how to conduct automated fraud detection. The study collected data from 2543 gynecology patients from 2001-2002 from a hospital, where they filtered out noisy data, identified activities based on medical expertise, identified fraud in about 906.

Before data mining and machine learning, the process was heavily reliant on medical professional with subject matter expertise to detect fraud, which was costly for multiple resources.  Also, machine learning is not subject to human and manual error that is common with humans.  Machine learning algorithms for fraud detection relies on clinical pathways, which are defined as the right people giving the right care services in the right order, with the aim at the reduction of waste and implementing best practices.  Any deviation from this that is abnormal can be flagged by the machine learning algorithm.

References

  • Brookshear, G., & Brylow, D. (2014). Computer Science: An Overview, (12th). Pearson Learning Solutions. VitalBook file.
  • Connolly, T., Begg, C. (2014). Database Systems: A Practical Approach to Design, Implementation, and Management, (6th). Pearson Learning Solutions. VitalBook file.
  • Lublinsky, B., Smith, K., & Yakubovich, A. (2013). Professional Hadoop Solutions. Wrox. VitalBook file.
  • Minelli, M., Chambers, M., &, Dhiraj, A. (2013). Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses. John Wiley & Sons P&T. VitalBook file.
  • Yang, W. S., & Hwang, S. Y. (2006). A process-mining framework for the detection of healthcare fraud and abuse.Expert Systems with Applications31(1), 56-68.

Health care as a Service (HaaS) – cloud solution

Health cloud-healthcare as a service (HaaS) Case Study (John & Shenoy, 2014):

The goal of this study is to provide the framework to build Health Cloud, a healthcare system that helps solve some of the issues currently dealt with in the Healthcare data analytics field. Especially, when paper images and data are limiting to only that healthcare provider’s facility until it is faxed, scanned, or mailed. The Health Cloud will be able to: store and index medical data, image processing, report generating, charting, trend analysis, and be secured with identification and access control.  The image processing capabilities of Health Cloud enable for better medical condition diagnosis of a patient.  The image processing structure was built using C++ Code for processing, to request data and to report out is done in Binary JSON (BSON) or text formats. Finally, the system built allows for the image to be framed, visualized, panned, zoomed, and annotated.

Issues related to health care data on the cloud (John & Shenoy, 2014):

  1. The number of MIR data has doubled in a decade, and CT data has increased by 50%, increasing the number of images primary providers are requesting on their patient to improve and create informed patient care. Thus, there is a need for hyper-scale cloud features.
  2. Health Insurance Portability and Accountability Act (HIPAA) requires data to be stored for six years after a patient has been discharged therefore increasing the volume of data. Consequently, there is another need for hyper-scale cloud features.
  3. Healthcare data should be able to be sharing medical data from anywhere and at any time per the Health Information Technology for Economic and Clinical Heath Act (HITECH) and American Recover and Reinvestment Act (ARRA), which aim to reduce duplication of data and improve data quality and access. HIPAA has created security regulations on data backup, recovery, and access. Hence, there is a need to have a community cloud provider familiar with HIPAA and other Regulations.
  4. Each hospital system is developed in silos or purchased from different suppliers. Thus, if data is shared, it may not be in the format that is easily received by the other Thus a common architecture and data model must be developed.  This can be resolved under a community cloud.
  5. Creation of seamless access to the data stored in the cloud among various mobile platforms. Thus, a cloud provided option such as a Software as a Service may best fit this requirement.
  6. Healthcare workflows are better managed in cloud-based solutions versus paper-based
  7. Cloud capabilities can be used for processing data, depending on what is purchased from which supplier.

Pros and Cons of healthcare data on the public or private cloud:

On-site private clouds can have limited storage space, and the data may not be in a format that is easily transferable to other on-site private clouds (Bhokare et al., 2016). Upgrades, maintenance, and infrastructure costs fall 100% of the health care providers.  Although these clouds are expensive, they offer the most control of their data and more control over-specialization of reports.

Public clouds distribute the cost of the upgrades, maintenance, and infrastructure to all others requesting the servers (Connolly & Begg, 2014). However, the servers may not be specialized 100% to all regulatory and legal specifications, or the servers could have additional regulatory and legal specification not advantageous to the healthcare cloud system.  Also, data stored on public clouds are shared with other companies, which can leave healthcare providers feeling vulnerable with their data’s security within the public cloud (Sumana & Biswal, 2016).

The solution should be a private or public community cloud.  A community cloud environment is a cloud that is shared exclusively by a set of companies that share the similar characteristics, compliance, security, jurisdiction, etc. (Connolly & Begg, 2014). Thus, the infrastructure of all of these servers and grids meet industry standards and best practices, with the shared cost of the infrastructure is maintained by the community. Certain community services would be optimized for HIPAA, HITECH, ARRA, etc. with little overhead to the individual IT teams that make up the overall community (John & Shenoy, 2014).

Reference

  • Bhokare, P., Bhagwat, P., Bhise, P., Lalwani, V., & Mahajan, M. R. (2016). Private Cloud using GlusterFS and Docker.International Journal of Engineering Science5016.
  • Connolly, T., Begg, C. (2014). Database Systems: A Practical Approach to Design, Implementation, and Management, (6th). Pearson Learning Solutions. [Bookshelf Online].
  • John, N., & Shenoy, S. (2014). Health cloud-Healthcare as a service (HaaS). InAdvances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on (pp. 1963-1966). IEEE.
  • Sumana, P., & Biswal, B. K. (2016). Secure Privacy Protected Data Sharing Between Groups in Public Cloud.International Journal of Engineering Science3285.

Building block system of health care data analytics

Building block system of healthcare big data analytics involves a few steps Burkle et al. (2001):

  • What is the purpose that the new data will and should serve
    • How many functions should it support
    • Marking which parts of that new data is needed for each function
  • Identify the tool needed to support the purpose of that new data
  • Create a top level architecture plan view
  • Building based on the plan but leaving room to pivot when needed
    • Modifications occur to allow for the final vision to be achieved given the conditions at the time of building the architecture.
    • Other modifications come under a closer inspection of certain components in the architecture

With big data analytics in healthcare, parallel programming and distributive programming are part of the solution to consider in building a network cluster (Mirtaheri, Khaneghah, Sharifi, & Azgomi, 2008; Services, 2015). Distributed programming allows for connecting multiple computer resources distributed across different locations (Mirtaheri et al., 2008). Programming that is maximizing the connections for processing or accessing data that is distributed across the computational resources is considered as parallel programming (Mirtaheri et al., 2008; Saden, 2011).  Burkle et al. (2001) explained how they used the building block design for a DNA network cluster build a system to help classify/predict what genome they are analyzing to a pathogen and understand which part of the genome found in many pathogens may be immune to certain treatments:

  • Tracing data from sequencer (*.esd file)
    • Base caller
  • “Raw” sequence (*.scf file)
    • Edit and assemble and export genome assembly for research
  • “Clean” sequence
    • External references are called in and outputted from reference databases
  • Genus and species are identified
  • Completed results are calling and outputting into an attributed local database

The process above flow for sequencing genomic data is part of the top-level plan that was modified as time went by, thus step four of the building blocks process.

Now, let’s consider using the building blocks system for healthcare systems, on a healthcare problem that wants to monitor patient vital signs similar to Chen et al. (2010).

  • The purpose that the new data will serve: Most hospitals measure the following vitals for triaging patients: blood pressure and flow, core temperature, ECG, carbon dioxide concentration (Chen et al. 2010).
    1. Functions should it serve: gathering, storing, preprocessing, and processing the data. Chen et al. (2010) suggested that they should also perform a consistency check, aggregating and integrate the data.
    2. Which parts of the data are needed to serve these functions: all
  • Tools needed: distributed database system, wireless network, parallel processing, graphical user interface for healthcare providers to understand the data, servers, subject matter experts to create upper limits and lower limits, classification algorithms that used machine learning
  • Top level plan: The data will be collected from the vital sign sensors, streaming at various time intervals into a central hub that sends the data in packets over a wireless network into a server room. The server can divide the data into various distributed systems accordingly. A parallel processing program will be able to access the data per patient per window of time to conduct the needed functions and classifications to be able to provide triage warnings if the vitals hit any of the predetermined key performance indicators that require intervention by the subject matter experts.  If a key performance indicator is sparked, send data to the healthcare provider’s device via a graphical user interface.
  • Pivoting is bound to happen; the following can happen
    1. Graphical user interface is not healthcare provider friendly
    2. Some of the sensors need to be able to throw a warning if they are going bad
    3. Subject matter experts may need to readjust the classification algorithm for better triaging

Thus, the above problem as discussed by Chen et al. (2010), could be broken apart to its building block components as addressed in Burkle et al. (2011).  These components help to create a system to analyze this set of big health care data through analytics, via distributed systems and parallel processing as addressed by Services (2015) and Mirtaheri et al. (2008).

References

  • Burkle, T., Hain, T., Hossain, H., Dudeck, J., & Domann, E. (2001). Bioinformatics in medical practice: what is necessary for a hospital?. Studies in health technology and informatics, (2), 951-955.
  • Chen, B., Varkey, J. P., Pompili, D., Li, J. K., & Marsic, I. (2010). Patient vital signs monitoring using wireless body area networks. In Bioengineering Conference, Proceedings of the 2010 IEEE 36th Annual Northeast (pp. 1-2). IEEE.
  • Mirtaheri, S. L., Khaneghah, E. M., Sharifi, M., & Azgomi, M. A. (2008). The influence of efficient message passing mechanisms on high performance distributed scientific computing. In Parallel and Distributed Processing with Applications, 2008. ISPA’08. International Symposium on (pp. 663-668). IEEE.
  • Sandén, B. I. (2011-01-14). The design of Multithreaded Software: The Entity-Life Modeling Approach, 1st Edition. [Bookshelf Online].
  • Services, E. E. (2015). Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, 1st Edition. [Bookshelf Online].