Adv DBs: A possible future project?

Below is a possible future research paper on a database related subject.

Title: Using MapReduce to aid in clinical test utilization patterns in the medicine

The motivation:

Efficient processing and analysis of clinical data could aid in better clinical tests on patients, and MapReduce solutions allow for an integrated solution in the medical field, which aids in saving resources when it comes to moving data in and out of storage.

The problem statement (symptom and root cause)

The rates of Sexually Transmitted Infections (STIs) are increasing at alarming rates, could the addition of Roper Saint Francis Clinical Network in the South test utilization patterns into Hadoop with MapReduce reveal patterns in the current STIs population and predict areas where an outbreak may be imminent?

The hypothesis statement (propose a solution and address the root cause)

H0: Data mining in Hadoop with MapReduce will not be able to identify any meaningful pattern that could be used to predict the next location for an STI outbreak using clinical test utilization patterns.

H1: Data mining in Hadoop with MapReduce can identify a meaningful pattern that could be used to predict the next location for an STI outbreak using clinical test utilization patterns.

The research questions

Could this study apply to STIs outbreaks rates be generalized into other disease outbreak rates?

Is this application of data-mining in Hadoop with MapReduce the correct way to analyze the data?

The professional significance statement (new contribution to the body of knowledge)

Identifying where an outbreak of any disease (or STIs), via clinical tests utilization patterns has yet to be done according to Mohammed et al (2014), and they have stated that Hadoop with MapReduce is a great tool for clinical work because it has been adopted in similar fields of medicine like bioinformatics.

Resources

  • Mohammed, E. A., Far, B. H., & Naugler, C. (2014). Applications of the MapReduce programming framework to clinical big data analysis: Current landscape and future trends. Biodata Mining, 7. doi:http://dx.doi.org/10.1186/1756-0381-7-22 – Doctoral Library Advanced Technologies & Aerospace CollectionPokorny, J. (2011).
  • NoSQL databases: A step to database scalability in web environment. In iiWAS ’11 Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services (pp. 278-283). – Doctoral Library ACM Digital Library

Internal and External Validity

In quantitative research, a study is valid if one could draw meaning and inferences from the results based on methodology employed.  The three ways to look at validity is in (1) Content (do we measure what we wanted), (2) Predictive (do we match similar results, can we predict something), and (3) construct (are these hypothetical or real concepts).  This is not to be confused with reliability & consistency.  Thus, Creswell (2013) warns that if we modify an instrument or combine it with others, the validity and reliability of it could change, and in order to use it we must reestablish its validity and reliability.  There are several threats to validity that exist, either internal (history, maturation, regression, selection, mortality, diffusion of treatment, compensatory/resentful demoralization, compensatory rivalry, testing, and instrumentation) or external (interaction of selection and treatment, interaction of setting and treatment, and interaction of history and treatment).

Sample Validity Considerations: The validity issues are and their mitigation plans

Internal Validity Issues:

Hurricane intensities and tracks can vary annually or even decadally.  As time passes during this study for the 2016 and 2017 Atlantic Ocean Basin this study may run into regression issues.  These regression issues threaten the validity of the study in a way that certain types of weather components may not be the only factors that can increase/decrease hurricane forecasting skill from the average.  To mitigate regression issues, the study could mitigate the effect that these storms with an extreme departure from the average forecast skill have on the final results by eliminating them.  Naturally, the extreme departures from the average forecast skill will, with time, slightly impact the mean, but their results are still too valuable to dismiss.  Finding out which weather components impact these extreme departures from the average forecast skill is what drives this project.  Thus, their removal doesn’t seem to fit in this study and defeats the purpose of knowledge discovery.

External Validity Issues: 

The Eastern Pacific, Central Pacific, and Atlantic Ocean Basin have the same underlying dynamics that can create, intensify and influence the path of tropical cyclones.  However, these three basins still behave differently, thus there is an interaction of setting and treatment threats to the validity of these studies results. Results garnered in this study will not allow me to generalize beyond the Atlantic Ocean Basin. The only way to mitigate this threat to validity is to suggest future research to be conducted on each basin separately.

Resources

Exploring Mixed Methods

Explanatory Sequential (QUAN -> qual)

According to Creswell (2013), this mix method style uses qualitative methods to do a deep dive into the quantitative results that have been previously gathered (often to understand the data with respect to the culture).  The key defining feature here is that quantitative data is collected before the qualitative data and that the quantitative data drives the results from the qualitative.  Thus, the emphasis is given to the quantitative results in order to explore and make sense of qualitative results.  It is used to probe quantitative results by explaining them via qualitative results.  Essentially, using qualitative results to enhance your quantitative results.

Exploratory Sequential (QUAL -> quan)

According to Creswell (2013), this mix method style uses quantitative methods to confirm the qualitative results that have been previously gathered (often to understand the culture behind the data).  The key defining feature here is that qualitative data is collected before the quantitative data and that the qualitative data drives the results from the quantitative.  Thus, the emphasis is given to the qualitative results in order to explore and make sense of quantitative results.  It is used to probe qualitative results by explaining them via quantitative results.  Essentially, using quantitative results to enhance your qualitative results.

Which method would you most likely use?  If your methodological fit suggests you to use a mixed-methods research project, does your world view colors your choice?

Resources

Methodological fit

Do you know what methodology you should use for your research project?

If there is a lot of extensive literature for a topic, then, according to Edmonson and McManus (2007) one could make a contribution to a mature theory then quantitative methodology would be the best methodological fit. If one strays and does a qualitative methodology in this case, they could run into reinventing the wheel error and may fail to fill a gap in the body of knowledge.

If there is just a little literature for a topic, then one could make a contribution to a nascent theory via qualitative methodologies, which in turn would be the best methodological fit (Edmonson & McManus, 2007).  If you do a quantitative research project here, you may be jumping the gun and running into possible false conclusions caused by confounding variables and may still fail to fill the gap in the body of knowledge.

Finally, one can stray from both pure qualitative and quantitative methodologies, and go into a mixed-methods study, and this can occur when there is enough research that the body of knowledge isn’t considered nascent, but not enough to be considered mature (Edmonson & McManus, 2007). Going one route here would do an injustice in filling in the gap in the body of knowledge, because you may be missing key insights that the each part of the mixed methodology (both qualitative and quantitative) can bring to the field.

So, prior to deciding which methodology you should choose, you should do an in-depth literature review.  You cannot pick an appropriate methodology without knowing the body of knowledge.

Hint: The more quantitative research articles you find in a body of knowledge, the more likely your project will be dealing with either a mixed-methods (low number of articles) or a quantitative method (high number of articles) project. If you see none, you may be working on a qualitative methodology.

Reference

  • Edmondson, A., & McManus, S. (2007). Methodological fit in management field research. Academy of Management Review, 32(4), 1155–1179. CYBRARY.

Worldviews and Approaches to Inquiry

The four worldviews according to Cresswell (2013) are postpositivism (akin to quantitative methods), constructivism (akin to qualitative methods), advocacy (akin to advocating action), and pragmatism (akin to mixed methods).   There are positives and negatives for each world view. For pragmatists, they use what truth and what methods from anywhere that works at the time they need it, to get the results they need.  Though the pragmatist research style takes time to conduct.  The advocacy places importance on creating an action item for social change to diminish inequity gaps between asymmetric power relationships like those that exist with class structure and minorities.  Though this research is noble, the moral arc of history bends towards justice, but very slowly, it took centuries for race equality to be where it is at today, it took over 60 years for gender equality, and 40 years for LGBT equality.  Yet, there are still inequalities amongst these groups and the majority that have yet to be resolved.  For instance: Equal Pay for Equal Work for All, Employment/Housing Non-Discrimination for LGBT, Racial Profiling, etc.  The constructivist viewpoint researchers seek to understand the world around them through subjective means.  They use their own understanding and interpretation of historical and cultural settings of participants to shape their interpretation of the open-ended data they collect.  This can lead to an interpretation that is shaped by the researcher’s background and not representative of the whole situation at hand.  Finally, postpositivism looks at the world in numbers, knowing their limitation that not everything can be described in numbers, they choose to propose an alternative hypothesis where they can either accept or reject the hypothesis. Numbers are imperfect and fallible.

My personal world view is akin to a pragmatist world view.  My background in math, science, technology, and management help me synthesize ideas from multiple fields to drive innovation.  It has allowed me to learn rapidly because I can see how one field ties to the other and makes me more adaptable.   However, I also lean a bit more strongly to the math and science side of myself, which is a postpostivism view.

Resource:

The Role of Theory

The theory is intertwined with the research process, thus a thorough understanding of theory must involve the understanding of the relationship between theory and research (Bryman & Bell, 2007).  When looking at research from a deductive role (developing and testing a problem and hypothesis) the theory is presented at the beginning.  The theory here is being tested, as it helps define the problem, its parameters (boundaries) and a hypothesis to test.  Whereas an inductive role uses data and research to build a theory.  Theories can be grand (too hard to pinpoint and test) or they can be mid-range (easier to test, but it is still too big to test it under all assumptions) (Bryman & Bell, 2007).

Where you write your theory depends on the type of world view you have (positivism at the beginning of the paper, or constructivism at the beginning or end of the paper) (Creswell, 2013).   My particular focus will be on the postpositivism view (quantitative methods), so I will dissect the placement of the theory primarily on a quantitative research study (which are mostly deductive in nature).  Placement of the theory in the introduction lit review, or after the hypothesis runs into the issue that it will make it harder for the reader to isolate and separate the theory from their respective sections (Cresswell, 2013).  There is another disadvantage from what Creswell (2013) states for the after the hypothesis approach: you may forget to discuss the origins and rationale for the theory.  Cresswell (2013), suggests as a research tip to separate the theory and create a brand new section for it so that it is easily identified and its origin and rationale can be elaborated on.

However, separating the theory section from the rest of the paper can still get the paper tossed out of being published in a journal if it is still fuzzy to decipher amongst your peers and the editor.  Feldman’s 2004 editorial states that if the question & theory is succinct, grammatically correct, non-trivial, and makes a difference, it would help you get your results published.  However, he also states (like many of our professors do) we need to find what are the key articles and references in the past 5 years, that we should be exhaustive yet exclusive with our dataset, and establish clear boundary conditions such that we can adequately define independent and dependent variable would help you get your results published (Feldman, 2004).  The latter set of conditions helps build your theory, whereas the first set of conditions speaks to the readability of the theory.  If it is hard to read your theory because it’s so convoluted, then why should anyone care to read it?

Resources:

  • Bryman, A. & Bell, E. (2007) Business Research Methods. (2nd ed.). Location: Oxford University Press.
  • Creswell, J. W. (2013). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4th Edition. [VitalSource Bookshelf version]. Retrieved from http://online.vitalsource.com/books/9781483321479/epubcfi/6/24
  • Feldman, D. C. (2004). What are we talking about when we talk about theory? Journal of Management, 30(5), 565–567.

Literature reviews

Side Note: This particular post was on my to-do list for a long time.

A literature review as a process containing a deep consideration of the current literature, to aid in identifying the current gaps in the existing knowledge, as well as building up the context for your research project (Gall, Gall, & Borg, 2006).  The literature review helps the researcher to build upon the works of other researchers, for the purpose of contributing to the collective knowledge. Our goal in the literature review will be undermined if we conduct any of the following common flaws (Gall et al., 2006):

  1. A literature review that becomes a standalone piece in the final document
  2. Analyzing results from studies that are not sound in their methodology
  3. Include the search procedures used to create this literature review
  4. Having only one study on particular ideas in the review, which may suggest the idea is not mature enough

For a literature review, one should be learning their field by reviewing the collective knowledge in the field by studying:

  • The beginning of {your topic}
  • The essence of {your topic}
  • Historical overview {your topic}
  • Politics of {your topic}
  • The Technology of {your topic}
  • Leaders in {your topic}
  • Current literature findings of {your topic}
  • Overview of research techniques {your topic}
  • The 21st century {your topic} Strategy

Creswell’s (2014), proposed that a literature map (similar to a mind map) of the research is a useful way to organize the literature, identify ideas with a small number of sources, determine the current issues in the existing knowledge, and determine the reviewers current gap in their understanding of the existing knowledge.  Finally, Creswell in 2014, listed what a good outline for a quantitative literature review should have:

  1. Introduction paragraph
  2. Review of topic one, which contains the independent variable(s).
  3. Review of topic two, which contains the dependent variable(s).
  4. Review of topic three, which provides how the independent variable(s) relate to the dependent variable(s).
  5. Summarize with highlights of key studies/major themes, to state why more research is needed.

Cresswell’s is generally a good method, but not the only one.  You can use a chronological literature review, where you build your story from the beginning to the present. In my dissertation, my literature review had to tie multiple topics into one: Big Data, Financial forecasting, and Hurricane forecasts.  I had to use the diffusion of innovation theory to transition between Financial and Hurricane forecast, to make the leap and justify the methodologies I will use later on.  In the end, you are the one that will be writing your literature review and the more of them you read, the easier it will be to define how you should write yours.

Here is a little gem I found during my second year in my dissertation: Dr. Guy White (2014) in the following youtube video has described a way to effectively and practically build your literature review. I use this technique all the time.  All of my friends that have seen this video have loved this method of putting together their literature reviews.

References

Internal validity in qualitative studies

Internal validity is determining the accuracy of the findings in qualitative research from the viewpoints of the researcher, participants or reader (Creswell, 2013). There are many validity strategies like: Triangulation of different data sources, member checking, rich thick description of the findings, clarifying any bias, presenting negative or discrepant information, prolong the time in the field, peer debriefing, external auditor to review the project, etc.

Triangulation of different data sources for observational work is an idea where I would examine evidence from multiple sources of data to justify the themes that I create through coding.  Converging themes from multiple sources of data and/or perspectives from participants would add to the validity of the study.  Thus, in order to increase the validity of the thematic codes would be to present the thematic codes from analysis of multiple sources like:

  • Interviews from N number of participants (until data saturation is reached)
  • Observations of the participants
    • Repeated observations will be taken, during multiple different types of shifts, with or without the same participants and during different random days of the week over a one-month period.
    • Observational Goals: Tracking what information is used (type and time stamps, instrumentations, etc.)
    • Observational Goals 2: Through videotaping, I hope to track conversations between participants sharing the same shift. Field notes would contain: “Why the conversation was initiated?”, “What was discussed?”, “Were there decisions made regarding the area of study”, “What is the bodily-based behavior portrayed by the specialists in the discussion?”, and “What was the outcome of that discussion?”
  • Document Analysis

The aforementioned, in particular, will help ensure internal validity in quiet a few studies.

 References:

Ethical issues involving human subjects

In Creswell (2013), it is stated that ethical issues can occur at all phases of the study (prior to the study, in the beginning, during data collection, analysis, and reporting).  Since we deal with data from people about people, we as researchers need to protect our participants and promote the integrity of research by guarding against misconduct and improperly reflecting the data.  Because we deal with people, it is our obligation to assure that interviewees do not get harmed as a result of our research (Rubin, 2012). The following anticipated risks are from Crewell (2013) and Rubin (2012):

  • Prior to conducting the study
    • We must seek an Institutional Review Board (IRB) approval before we conduct a study.
    • I must gain local permission from the agency, organization, corporation for which the study will take place and from the participants to conduct this study.
  • Beginning the study
    • We will not pressure participants to sign consent forms. To make sure that you have high participation rates, you need to make sure that the purpose of this study is compelling enough that the participants will see that it would be a value-added experience to them as well as to the field of study that they don’t want to say no.
      • We should also conduct an informal needs assessment to ensure that the participant’s needs are addressed in the study, to ensure a high participation rate.
      • But, we will tell the participants that they have the right not to sign the consent form.
    • Collecting data
      • Respecting the site and keep disruption to a minimum, especially if I am conducting observations. The goal of the observation in this study is not to be an active participant, but taking field notes of key interactions that occur while the participants are doing what they need to do.
      • Make sure that all the participants in the study receive the same treatment to avoid data quality issues while collecting it.
      • We should be respectful and straightforward to the participants.
      • Discuss the purpose of this study and how the data will be used with the participants is key to establishing trust and this would allow them to start thinking about the topic of the study. This can be accomplished by sending them an email prior to the interview as to the purpose of the study and the time we are requesting of them.
      • As we are asking our interviewing questions, we should avoid leading questions. That is why questions may be asked in a particular order.  In some cases, questions can build on one another.
      • We should avoid sharing personal impressions. Given that we know what the final questions in the interview are, as we should ask them questions while not giving any indication of what we are looking for so that they don’t end up contaminating our data.
      • Avoid disclosing sensitive or proprietary information.
    • Analyzing data
      • Avoid only disclosing one set of results, thus we must report on multiple perspectives and report contrary findings.
      • Keeping the privacy of the participants, assuring that the names have been removed from the results as well as any other identifying indicators.
      • Honor promises, if I offer to the participant a chance to read and correct their interviews, I should do so as soon as possible after the interview.
    • Reporting, sharing and storing data
      • Avoid situations where there is a temptation to falsify evidence, data, findings or conclusions. This can be accomplished through using unbiased language appropriate for audiences.
      • Avoid disclosing harmful information of the specialist.
      • Be able to have data in a shareable format, however with keeping the privacy of the specialist as the main priority, while keeping the raw data and other materials for 5 years in a secure location. Part of this data should consist of the complete proof of compliance, IRB, lack of conflict of interest, for if and when that is requested.

References:

Observational protocol and qualitative documentations

As a researcher, you could be a non-participant to a full-on participant when observing your subjects in a study.  Thus, the observed/empathized behavioral and activities of individuals in the study are jotted down in field notes (Creswell, 2013).  Most researchers use an observational protocol to jotting down these notes as they observe their subjects.  According to Creswell (2013), this protocol could consist of: “separate descriptive notes (portraits of the participants, a reconstruction of dialogue, a description of the physical setting, accounts of particular events, or activities) [to] reflective notes (the researcher’s personal thoughts, such as “speculation, feelings, problems, ideas, hunches, impressions, and prejudices), … this form might [have] demographic information about the time, place, and date of the field setting where the observation takes place.”

Whereas, observational work can be combined with in-depth interviewing, and sometimes the observational work (which can be an everyday activity) can help prepare the researcher for the interviews (Rubin, 2012).  Doing so can increase the quality of the interviews because the interviewers know what the researcher has seen or read and can provide more information on those materials.  This can also allow the researcher to master the terminology before entering the interview. Finally, Rubin (2012) also states that cultural norms become more visible through observation rather than just a pure in-depth interview.

In Creswell (2013), Qualitative Documents are information contained within documents that could help a researcher out in their study that could be either public (newspapers, meeting minutes, official reports) and/or private (personal journals/diaries, letters, emails, internal manuals, written procedures, etc.) documents.  This can also include pictures, videos, educational materials, books, files. Whereas, Artifact Analysis is the analysis of the written text, usually are charts, flow sheets, intake forms, reports, etc.

The main analysis approach of this document would be to read the document to gain a subject matter understanding.  Document analysis would aid in quickly grouping, sorting and resort the data obtained for a study.  This manual will not be included in the coded dataset, but will help provide appropriate codes/categories for the interview analysis, in other words give me suggestions about what might be related to what.   Finally, one way to interpret this document would be for triangulation of data (data from multiple sources that are highly correlated) between the observation, interviews and this document.   

References