Definition of terms
Data in-motion: a part of data velocity, which deals with the speed of data coming in from multiple sources as well as the speed of data traveling between systems (Katal, Wazid, & Goudar, 2013). Essentially data-in-motion can encompass data streaming, data transfer, or real-time data. However, there are challenges and issues that have to be addressed to conducting real-time analysis on data streams (Katal et al., 2013; Tsinoremas et al., n.d.).
Data complexity: consists of the joining, cleaning, and transformation of data from multiple systems to find relationships that are highly correlated (Katal et al., 2013). Complexity increases as the velocity of data coming in or transferred increases (Katal et al., 2013; Tsinoremas et al., n.d.).
Data-in-motion analytics performed in case study (Blount et al., 2010)
Artemis was designed, built and deployed in 2009 through a coalition of the University of Ontario Institute of Technology, SickKids, Department of Pediatrics, and University of Toronto, to help read in data from multiple sensors taken from neonatal intensive care units (NICU). The goal is to have Artemis to read in data from multiple physiological instruments like an electrocardiogram (ECG), heart rate, blood oxygen saturation, respiratory states, etc. to find key patterns and relationships in the data streams (data-in-motion) to provide the best care for infants in NICU. To make Artemis a success, the coalition had to analyze huge amounts of data from a large group of patients. Artemis had to interface with multiple medical devices, should be scalable to add more medical devices, and store raw physiological data while at the same time de-identifying the data per U.S. and Canadian Health Privacy laws. From these multiple medical devices new rules could be created by unsupervised machine learning techniques, and through supervising machine learning techniques with medical/clinical derived rules. The Artemis system has to read in the data in real-time to sort, join, clean, and transform, to evaluate against certain rules and send out an alert or not to medical staff about one of the NICU patients, while at the same time de-identifying the data and storing it into a database for future analysis and tests.
In the test phase, 5 infants were enrolled and in the deployed state 19 infants were enrolled in the study. This study has to take into account, that the cables from all the sensors and the equipment use to collect all the streaming data must not get in the way of the medical/clinical staff when they need to help out the infant. In some cases, when the Artemis system was deployed, some of the sensors were not attached, and thus the Information Management Teams had to work with medical/clinical staff to help train the model on fewer data as well, if they do not have all the ideal sensors needed to send out alerts for certain situations. Therefore, this system provides a way for medical/clinical staff to have constant data on NICU patients in real time from multiple sensors and allow the machine to alert them when certain markers and key performance indicators are met.
Importance of applying data analytics to data-in-motion
It can be easily seen that analyzing infant NICU data is important. It is especially important to leverage analytics to the data stream of the key medical sensors needed per infant in the NICU. What is not easily seen sometimes is how important all the data really is. Since, in the real-life deployment showed that not all the medical sensors are being used to help provide the model with enough information to be of use to medical/clinical staff (Blount et al., 2010).
Also, the use of data streams in a university setting would allow for a different perspective that could be used in the NICU case study above. At the University of Miami, data is triaged into a four-tiered system (Tsinoremas et al., n.d.):
- High-speed storage – for data that is currently being processed, data-in-motion is at its highest (has 300TB of space and costs $2000/TB)
- Mid-range speed storage – for data that is currently being looked at (costs $600-$700/TB)
- Deep storage – long-term data storage, data that is looked at every so often, but not regularly, usually old data (costs $300/TB)
- Archived – data to be stored offline, but it is perfect for data at rest
This tiered system above could be applied to Artemis, such that they could process which of the medical devices should be processed first when resources are limited. Also, this could be applied different, such that there should be a window of which data is currently available, e.g. a 1-hour long record of NICU stats saved locally, with longer records still accessible, but not stored in vital processing spaces. Data windows were discussed, but depending on the situation, data windows could be adjusted to provide the best care for the infants (Blount et al., 2010).
Also, the quality of the sensor data must be taken into account. If more data is needed/preferred to make informed decisions on infant patients in the NICU (Blount et al., 2010), then there should be a focus in collecting, analyzing, high-quality data and the right types of data. This would lead the designers of Artemis, medical, and clinician staff to think deeply about which data is relevant, and how much data is enough to make the decisions needed to tend to the infants (Katal et al., 2013).
Resources
- Blount, M., Ebling, M. R., Eklund, J. M., James, A. G., McGregor, C., Percival, N., … & Sow, D. (2010). Real-time analysis for intensive care: development and deployment of the Artemis analytic system.IEEE Engineering in Medicine and Biology Magazine, 29(2), 110-118.
- Katal, A., Wazid, M., & Goudar, R. H. (2013, August). Big data: issues, challenges, tools and good practices. InContemporary Computing (IC3), 2013 Sixth International Conference on (pp. 404-409). IEEE.
- Tsinoremas, N. F., Zysman, J., Mader, C., Kirtma, B., & Blaire, J. (n.d.) Data in motion: A new paradigm in research data lifecycle management. Center for Computational Science: University of Miami.