Ethical issues involving human subjects

In Creswell (2013), it is stated that ethical issues can occur at all phases of the study (prior to the study, in the beginning, during data collection, analysis, and reporting).  Since we deal with data from people about people, we as researchers need to protect our participants and promote the integrity of research by guarding against misconduct and improperly reflecting the data.  Because we deal with people, it is our obligation to assure that interviewees do not get harmed as a result of our research (Rubin, 2012). The following anticipated risks are from Crewell (2013) and Rubin (2012):

  • Prior to conducting the study
    • We must seek an Institutional Review Board (IRB) approval before we conduct a study.
    • I must gain local permission from the agency, organization, corporation for which the study will take place and from the participants to conduct this study.
  • Beginning the study
    • We will not pressure participants to sign consent forms. To make sure that you have high participation rates, you need to make sure that the purpose of this study is compelling enough that the participants will see that it would be a value-added experience to them as well as to the field of study that they don’t want to say no.
      • We should also conduct an informal needs assessment to ensure that the participant’s needs are addressed in the study, to ensure a high participation rate.
      • But, we will tell the participants that they have the right not to sign the consent form.
    • Collecting data
      • Respecting the site and keep disruption to a minimum, especially if I am conducting observations. The goal of the observation in this study is not to be an active participant, but taking field notes of key interactions that occur while the participants are doing what they need to do.
      • Make sure that all the participants in the study receive the same treatment to avoid data quality issues while collecting it.
      • We should be respectful and straightforward to the participants.
      • Discuss the purpose of this study and how the data will be used with the participants is key to establishing trust and this would allow them to start thinking about the topic of the study. This can be accomplished by sending them an email prior to the interview as to the purpose of the study and the time we are requesting of them.
      • As we are asking our interviewing questions, we should avoid leading questions. That is why questions may be asked in a particular order.  In some cases, questions can build on one another.
      • We should avoid sharing personal impressions. Given that we know what the final questions in the interview are, as we should ask them questions while not giving any indication of what we are looking for so that they don’t end up contaminating our data.
      • Avoid disclosing sensitive or proprietary information.
    • Analyzing data
      • Avoid only disclosing one set of results, thus we must report on multiple perspectives and report contrary findings.
      • Keeping the privacy of the participants, assuring that the names have been removed from the results as well as any other identifying indicators.
      • Honor promises, if I offer to the participant a chance to read and correct their interviews, I should do so as soon as possible after the interview.
    • Reporting, sharing and storing data
      • Avoid situations where there is a temptation to falsify evidence, data, findings or conclusions. This can be accomplished through using unbiased language appropriate for audiences.
      • Avoid disclosing harmful information of the specialist.
      • Be able to have data in a shareable format, however with keeping the privacy of the specialist as the main priority, while keeping the raw data and other materials for 5 years in a secure location. Part of this data should consist of the complete proof of compliance, IRB, lack of conflict of interest, for if and when that is requested.

References:

Data analytics lifecycle

What is the data analytics Lifecycle?

The scientific method helps give a framework for the data analytics lifecycle (Dietrich, 2013). According to Dietrich (2013), it is a cyclical life cycle that has iterative parts in each of its six steps:

  • Discovery
  • Pre-processing data
  • Model planning
  • Model building
  • Communicate results
  • Operationalize

However Erl, Buhler, & Khattak (2016), suggested that it is divided in nine steps:

  • Business case evaluation
  • Data identification
  • Data acquisition & filtering
  • Data extraction
  • Data validation & cleansing
  • Data aggregation & representation
  • Data analysis
  • Data visualization
  • Utilization of analysis results

Prajapati (2013), stated five steps:

  • Identifying the problem
  • Designing data requirements
  • Pre-processing data
  • Data analysis
  • Data visualizing

Between these three different lifecycle versions, there is a general pattern that emerges, but it also suggests that the field of data analytics is still too nascent to pin down an exact data analytics lifecycle.  For the purpose of this discussion the lifecycle that will be used is from Services (2015), which uses the Dietrich (2013) lifecycle. Note that both Services (2015) and Dietrich (2013) model is iterative and not static steps.  This lifecycle model allows all key team members to conduct planning work up front and towards the end of the data analytics project to drive success (Dietrich, 2013).

When is it beneficial for stakeholders to be involved?

If following an agile development processes the key stakeholders should be involved in all the lifecycles. That is because the key stakeholders are known as business user, project sponsor, project manager, business intelligence analyst, database administers, data engineer, and data scientist (Services, 2015).  Some of the benefits of applying the Agile development processes to this lifecycle is because it allows for iterative feedback for speed-to-market, improved first-time quality, visibility, risk management, flexibility to pivot when needed, controlling costs, and improved satisfaction through engagement (Waters, 2007).  Allowing the stakeholders to participate in most of these steps can allow the following work to be done to their specifications.

For the first step, discovery, the business learns its domain and its relevant history with lessons learned from previous projects (Services, 2015). Before proceeding ask: “Do I have enough information to draft an analytic plan and share for peer review?” (Dietrich, 2013; Services, 2015). Pre-processing data, also known as data preparation is where a copy of the data is placed in a sandbox (not the original), where the data scientists and team can extract, load and transform (ELT) the copied data (Services, 2015). In this stage, data could also be cleaned, aggregated, augmented, and formatted (Prajapati, 2013). Before proceeding ask: “Do I have enough good quality data to start building the model?” (Dietrich, 2013; Services, 2015). Model planning is when the data scientist and team determines the appropriate models, algorithms, workflow of the data, which helps identify hidden insights between the variables (Services, 2015).  Before proceeding ask: “Do I have a good idea about the type of model to try? Can I refine the analytic plan?” (Dietrich, 2013; Services, 2015). Model building helps sets roughly about 2/3 of the data for training the model and 1/3 of the data for testing the model for production purposes and discovering hidden insights (Prajapati, 2013; Services, 2015). Before proceeding ask: “Is the model robust enough? Have we failed for sure?” (Dietrich, 2013; Services, 2015).   Communicating results could be done visualization of data to the major stakeholders to see if the results are a success or failure (Services, 2015).  Visualization is done in this step is supposed to be interactive with all parties involved in this project (Prajapati, 2013). Finally, the operationalize step is when the data is ready to provide reports, documents, on a pre-defined time interval such that key decision makers could receive the vital data needed (Services, 2015).

References

What is Business Intelligence?

Business Intelligence (BI) is gathering, managing and analyzing data that is vital for the survival of business in this current hyper-competitive environment (Thomas, 2001 & Vuori, 2006). A BI practitioner is to help decision makers from being overwhelmed with a huge wealth of data. Thus they act as a filter because decision makers will ignore any information that is not useful or meaningful (Vuori, 2006).

The BI Cycle is a continuous cycle, which could be easily reduced to planning the information you want to collect, ethically collect reliable information, analyzing the data to form intelligence, and disseminating the intelligence in an understandable way (Thomas, 2001). It can be expanded into six steps, per Vuori (2006):

  1. Defining information needs
  2. Information gathering
  3. Information processing
  4. Analysis
  5. Dissemination
  6. Utilization and Feedback

A good BI system would make use of a knowledge database and a communication system (Thomas, 2001). With this system and cycle and the correct data, we can have information on our competitors, new technology, public policy, customer sentiment, market forces, supply chain information, etc. Having this information at the disposal of the decision maker will allow for data-driven decisions, to increase their company’s competitive advantage.

Three BI cycle characteristics that drive productivity

  1. Identifying Needs versus Wants in the first phase: Are we defining “Information that is wanted but that is not really needed”, “Information that lacks and that is recognized to be needed” or “Information that is needed but not known to be needed, wanted nor asked for” (Vuori, 2006)? The last two are extremely important. The second one satisfies the end-user of the BI; the other can identify huge revelations. In the last case, if a company only looks at the most active or their biggest competitor they may lose sight of the smaller competitor gaining traction. Getting the right information that is needed is key to not wasting time and increase productivity.
    • Influences the BI practitioner organization
    • Influences the Decision Makers (from first line managers to executive level)
    • Departments in which the data/information is collected from
  2. Protecting their Intellectual Capital: When companies have high turnover rates, or when sensitive/proprietary information is transported on drives or in the minds of the employees, or when senior personnel accidentally give out information in conferences/symposiums (Thomas, 2001), we run the risk of becoming vulnerable as a company. Another example is the supply chain if one company uses a key supplier and their competitor uses that same key supplier to produce a similar product mix. Then what guarantees are being used to ensure that information is being transported between the two companies through the supplier? Information leaks can lead to a loss of a competitive advantage. Protecting the intellectual capital will allow companies not to have to constantly create new products and focus on improving their current product mix.
    • All employees
    • Supply chain (horizontally and vertically)
    • Production lines
    • Human Resources/Legal
    • Management
  3. Dissemination of the correct analysis: This will allow managers to make data-driven decisions that should help protect the business, enter a new market space, etc. If the practitioners of BI, could give the decision maker the information they need based on their analysis and nothing more, we would be saving time, reducing decision fatigue, and time wasted on producing the analytics. Thus, constant communication must occur between the practitioner and decision makers to avoid non-value added work. Feedback cycles, help make future work/endeavors to become more productive over time.
    • Influences the BI practitioner organization
    • Influences the Decision Makers (from first line managers to executive level)
    • Communications departments

An example of an innovative use of BI, is DeKalb County, GA. The CIO, has leveraged BI and analytics, to set up smart policing initiatives (where police are being used more effectively and efficiently to prevent crimes and lowering crime rates), enhance public safety (develop and maintain green neighborhoods), promote jobs and economic development (Matelski, 2015). The CIO has taken data from multiple systems and followed the cycle above to ask the right questions, to identify the needs for particular data, its collection, processing, and analysis to its dissemination to the key decision makers (via intuitive dashboards and key performance indicators).

References: