There is a lot of complicated connections between patients, their provider, their diagnoses, etc., and graphically representing this relationship data is one of the main highlights of using a NoSQL graph database (Park, Shankar, Park, & Ghosh, 2014). NoSQL (Not only Structured Query Language) databases are databases that are used to store data in non-relational databases i.e. graphical, document store, column-oriented, key-value, and object-oriented databases (Sadalage & Fowler, 2012; Services, 2015). Graph NoSQL databases are used drawing networks by showing the relationship between items in a graphical format that has been optimized for easy searching and editing (Services, 2015). Each item is considered a node and adding more nodes or relationships while traversing through them is made simpler through a graph database rather than a traditional database (Sadalage & Fowler, 2012). Some sample graph databases consist of Neo4j Pregel, etc. (Park et al., 2014).
Case Study: Graph Databases for large-scale healthcare systems: A framework for efficient data management and data services (Park et al., 2014)
Driver for data analytics needs: Finding areas for cost savings through anomaly detection algorithms, because currently there are a bunch of individual tables and non-normalized data that are replicated multiple times which is causing bottlenecks.
Problem: Understanding and establishing relationships between self-referrals and shared providers, which allows for the use of a collaborative filter.
System Needs: Data management needs error-tolerant and non-redundant database system, while data services need data retrieval, analytics queries, statistical data extraction and mining algorithms.
NoSQL Database used: Neo4J graph NoSQL Database using Cypher query to keep the data normalized and reduce the number of individual tables of data due to the advanced yet simple query capabilities
Methodology: Using the 3EG: 3NF Equivalent Graph Transformation algorithm to convert traditional relational database data into graph database data on realistic synthetic healthcare data. The synthetic healthcare data consists of zip-codes, diagnosis of disease, available procedures, beneficiary, claim, and providers. The data when flattened can showcase 1 M beneficiaries to 100 K providers, but in a graphical format, that same data will have 51 M nodes and 257 M relationships.
Queries Ran on the NoSQL Database:
- Shared providers between two beneficiaries
- Shared providers between two beneficiaries through either actual visits or by referrals
- List of shared diseases between two beneficiaries through their claim records
- Any link between two beneficiaries à helps to direct further investigations/queries
- Shared beneficiaries between two providers
- Self-referred beneficiaries for a given provider
- Similar claims based on diagnoses codes
- Patient wants to switch to a new provider based on a referral by another provider
Using 50 random queries for each of the 8 cases above: the time it took to run the first three cases was faster in a MySQL query, but by less than 0.0X seconds, whereas the last 5 cases the NoSQL was faster ranging from 0.5-40 seconds. As data size grew so did the processing time for the last five cases on MySQL grew.
Conclusions: The authors were able to show that with more highly advanced cases, MySQL takes more time than NoSQL. Thus, for big data analytics, NoSQL graph databases can help store dynamic relationship data as well as process more complex queries using fewer lines of code and faster than MySQL queries. This style of storing data allows the end-user in the healthcare field to ask more complex questions and get those answers promptly.
References
- Park, Y., Shankar, M., Park, B. H., & Ghosh, J. (2014). Graph databases for large-scale healthcare systems: A framework for efficient data management and data services. In Data Engineering Workshops (ICDEW), 2014 IEEE 30th International Conference on (pp. 12-19). IEEE.
- Sadalage, P. J., Fowler, M. (2012). NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, 1st Edition. [Bookshelf Online].
- Services, E. E. (2015). Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, 1st Edition. [Bookshelf Online].