Building block system of health care data analytics

Building block system of healthcare big data analytics involves a few steps Burkle et al. (2001):

  • What is the purpose that the new data will and should serve
    • How many functions should it support
    • Marking which parts of that new data is needed for each function
  • Identify the tool needed to support the purpose of that new data
  • Create a top level architecture plan view
  • Building based on the plan but leaving room to pivot when needed
    • Modifications occur to allow for the final vision to be achieved given the conditions at the time of building the architecture.
    • Other modifications come under a closer inspection of certain components in the architecture

With big data analytics in healthcare, parallel programming and distributive programming are part of the solution to consider in building a network cluster (Mirtaheri, Khaneghah, Sharifi, & Azgomi, 2008; Services, 2015). Distributed programming allows for connecting multiple computer resources distributed across different locations (Mirtaheri et al., 2008). Programming that is maximizing the connections for processing or accessing data that is distributed across the computational resources is considered as parallel programming (Mirtaheri et al., 2008; Saden, 2011).  Burkle et al. (2001) explained how they used the building block design for a DNA network cluster build a system to help classify/predict what genome they are analyzing to a pathogen and understand which part of the genome found in many pathogens may be immune to certain treatments:

  • Tracing data from sequencer (*.esd file)
    • Base caller
  • “Raw” sequence (*.scf file)
    • Edit and assemble and export genome assembly for research
  • “Clean” sequence
    • External references are called in and outputted from reference databases
  • Genus and species are identified
  • Completed results are calling and outputting into an attributed local database

The process above flow for sequencing genomic data is part of the top-level plan that was modified as time went by, thus step four of the building blocks process.

Now, let’s consider using the building blocks system for healthcare systems, on a healthcare problem that wants to monitor patient vital signs similar to Chen et al. (2010).

  • The purpose that the new data will serve: Most hospitals measure the following vitals for triaging patients: blood pressure and flow, core temperature, ECG, carbon dioxide concentration (Chen et al. 2010).
    1. Functions should it serve: gathering, storing, preprocessing, and processing the data. Chen et al. (2010) suggested that they should also perform a consistency check, aggregating and integrate the data.
    2. Which parts of the data are needed to serve these functions: all
  • Tools needed: distributed database system, wireless network, parallel processing, graphical user interface for healthcare providers to understand the data, servers, subject matter experts to create upper limits and lower limits, classification algorithms that used machine learning
  • Top level plan: The data will be collected from the vital sign sensors, streaming at various time intervals into a central hub that sends the data in packets over a wireless network into a server room. The server can divide the data into various distributed systems accordingly. A parallel processing program will be able to access the data per patient per window of time to conduct the needed functions and classifications to be able to provide triage warnings if the vitals hit any of the predetermined key performance indicators that require intervention by the subject matter experts.  If a key performance indicator is sparked, send data to the healthcare provider’s device via a graphical user interface.
  • Pivoting is bound to happen; the following can happen
    1. Graphical user interface is not healthcare provider friendly
    2. Some of the sensors need to be able to throw a warning if they are going bad
    3. Subject matter experts may need to readjust the classification algorithm for better triaging

Thus, the above problem as discussed by Chen et al. (2010), could be broken apart to its building block components as addressed in Burkle et al. (2011).  These components help to create a system to analyze this set of big health care data through analytics, via distributed systems and parallel processing as addressed by Services (2015) and Mirtaheri et al. (2008).

References

  • Burkle, T., Hain, T., Hossain, H., Dudeck, J., & Domann, E. (2001). Bioinformatics in medical practice: what is necessary for a hospital?. Studies in health technology and informatics, (2), 951-955.
  • Chen, B., Varkey, J. P., Pompili, D., Li, J. K., & Marsic, I. (2010). Patient vital signs monitoring using wireless body area networks. In Bioengineering Conference, Proceedings of the 2010 IEEE 36th Annual Northeast (pp. 1-2). IEEE.
  • Mirtaheri, S. L., Khaneghah, E. M., Sharifi, M., & Azgomi, M. A. (2008). The influence of efficient message passing mechanisms on high performance distributed scientific computing. In Parallel and Distributed Processing with Applications, 2008. ISPA’08. International Symposium on (pp. 663-668). IEEE.
  • Sandén, B. I. (2011-01-14). The design of Multithreaded Software: The Entity-Life Modeling Approach, 1st Edition. [Bookshelf Online].
  • Services, E. E. (2015). Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, 1st Edition. [Bookshelf Online].

Big Data Analytics: Career Prospects

Masters and Doctoral graduates have some advantages over Undergraduates, because they have done research or capstones involving big datasets, they can explain the motivations and reasoning behind the work (chapter 1 & 2 of the dissertation), they can learn and adapt quickly (chapter 3 reflects what you have learned and how you will apply it), and they can think critically about problems (chapter 4 & 5 of the dissertation).  Doctoral student, work on a problem for multiple months/years to see a solution (filling in a gap in the knowledge) that they couldn’t dream of seeing as incomplete (or unfillable).  But, to prepare best for a data science position or big data position, the doctoral shouldn’t be purely theoretical, and should contain an analysis of huge datasets.  Based on my personal analysis, I have noticed that when applying for a senior level position or a team lead position in data science, a doctorate gives you an additional three years of experience on top of what you have already.  Whereas if you lack a doctorate, you need a Master’s degree and three years of experience to be considered for that senior level position or a team lead position in data science.

Master levels courses in big data help build a strong mathematical, statistical, computational, and programming skills. Doctorate level courses help you learn and push the limits of knowledge in all these above mentioned fields, but also aid in becoming a domain expert in a particular area in data science.  Commanding that domain expertise, which is what you get through going through a doctoral program, can make you more valuable in the job market (Lo, n.d.).  Being more valuable in the job market can allow you to demand more in compensation.  Multiple sources of can quote multiple ranges for salaries, mostly because, this field has yet to be standardized (Lo, n.d.).  Thus, I would only provide two sources for salary ranges.

According to Columbus (2014), jobs that involve big data could include Big Data Solution Architect, Linux Systems and Big Data Engineer, Big Data Platform Engineer, Lead Software Engineer, Big Data (Java, Hadoop, SQL) have the following salary statistics:

  • Q1: $84,650
  • Median: $103,000
  • Q3: $121,300

Columbus (2014) also stated that it is very difficult to find the right people for an open requisite and that most requisites remain open for 47 days.  According to Columbus (2014), the most wanted skills for analytics data jobs based on of requisition postings in the field are: in Python (96.90% growth in demand in the past year), Linux and Hadoop (with 76% growth in demand, each).

Lo (n.d.) states that individuals with just a BS or MS degree and no full-time work experience should expect $50-75K whereas data scientist with experience can command up from $65-110K.

  • Data scientist can earn $85-170K
  • Data science/analytics managers can earn $90-140K for 1-3 direct reports
  • Data science/analytics managers can earn $130-175K for 4-9 direct reports
  • Data science/analytics managers can earn $160-240K for 10+ direct reports
  • Database Administrators can earn $50-120K, which varies upwards per more experience
  • Junior Big data engineers can earn $79-115K
  • Domain Expert Big data engineers can earn $100-165K

One way to look for opportunities in the field that are currently available is looking into the Gartner’s Magic Quadrant for Business Intelligence and Analytics Platforms (Parenteau et al., 2016). If you want to help push a tool into a higher ease of execution and completeness of vision as a data scientist consider employment in: Pyramid Analytics, Yellowfin, Platfora, Datawatch, Information Builders, Sisense, Board International, Salesforce, GoodData, Domo, Birst, SAS, Alteryx, SAP, MicroStrategy, Logi Analytics, IBM, ClearStory Data, Pentaho, TIBCO Software, BeyondCore, Qlik, Microsoft, and Tableau.  That is one way to look at this data.  Another way to look at this data is to see which tools are the best in the field and (Tableau, Qlik, Microsoft, with SAS Birst, Alterxyx, and SAP following behind) and learn those tools to to become more marketable.

Resources