Data Allocation Strategies

Data allocations are how one logical group of data gets spread across a destination data set, i.e. a group of applications which uses multiple servers (Apptio, 2015). According to ETL-Tools (n.d.), they state that a depending on the data allocation one can get different granularity levels. This can be a judgment call. and understanding your allocation strategy is vital for developing and understanding your data models (Apptio, 2015; ETL-Tools, n.d.).

The robustness and accuracy of the model depend on the allocation strategy between data sets, especially because the wrong allocation can create data fallout (Apptio, 2015). Data fallout is where data isn’t assigned between data sets. For instance, like how most SQL join (join left, join right, etc.) statements fail to combine every line of data between two data sets.

ETL-Tools (n.d.), stated that there are dynamic and fixed level granularity, however Apptio (2015), stated there can be many different levels of granularity. The following are some of the different data allocation strategies (Apptio, 2015; Dhamdhere, 2014; ETL-Tools, n.d.):

  1. Even spread allocation: data allocation where all data points are assigned the same allocation no matter what (i.e. every budget in the household gets the total sum of dollars divided by the number of budgets, regardless that the mortgage costs more than the utilities). It is the easiest to implement but its too overly simplified.
  2. Fixed allocation: data allocation based on data that doesn’t change, which stays constant (i.e. credit card limits). Easy to implement but the logic can be risky for data sets that can change over time.
  3. Assumption-based allocation (or manually assigned percentage or weights): data allocation based on arbitrary means or an educated approximation (i.e. budgets, but not a breakdown). Uses subject matter experts, but it is as good as the level of expertise making the estimates.
  4. Relationship-based allocation: data allocation based on the association between items (i.e. hurricane max wind-speeds and hurricane minimum central pressure). This can be easily understood, however, there may be some nuance that can be lost. In the given example there can be a lag between hurricane max wind-speeds and hurricane minimum central pressure, meaning a high correlation but still has errors.
  5. Dynamic allocation: data allocations based on data that can change off of a calculated field (i.e. tornado wind-speed to e-Fujita scale). Easily understood, unfortunately, it is still an approximation at a higher level of fidelity than lower levels of allocations.
  6. Attribute-based allocation: data allocations weighted by a static attribute of an item (i.e. corporate cell phone costs and data usage by service provider like AT&T, Verizon, T-mobile; Direct spend weighting of shared expenses). Reflects real-life data usage, but lacks granularity when you want to drill down to find the root cause.
  7. Consumption-based allocation: data allocation by measured consumption (i.e. checkbook line item, general ledgers, activity-based costing). Huge data sets needed, greater fidelity, but must be updated frequently.
  8. Multi-dimensional allocation: data allocation based on multiple factors. It could be the most accurate level of allocation for complex systems, it can be hard to understand from an intuitive level therefore not as transparent as a consumption-based allocation.

The higher the number, the more mature/higher the level of granularity of the data. Sometimes it is best to start at a level 1 maturity and work our way up to a level 8. Dhamdhere (2014), suggests that for best practice consumption-based allocation (i.e. activity-based costing) is a best practice when it comes to allocation strategies given its focus on accuracy. However, some levels of maturity may not be acceptable in certain cases (ETL-tools, n.d.). Please take into consideration what is the best allocation strategy for yourself, for the task before you, and the expectations of the stakeholders.

Resources:

Foul on the X-axis and more

There are multiple ways to use data to justify any story or agenda one has. My p-hacking post shows how statistics have been used to get statistically significant results. Therefore you can get your work to publish, and with journal articles and editors not glorifying replication studies, it can be hard to fund them. However, there are also ways to manipulate graphs to meet any narrative you want. Take the figure below, which was published by the Georgia Department of Public Health Website on May 10, 2020. Notice something funny going on in the x-axis, it looks like a Dr. Who’s voyage across time trying to solve the Corona Virus crisis. The dates on the x-axis are not in chronological order (Bump, 2020; Fowler, 2020, Mariano & Trubey, 2020; McFall-Johnsen, 2020, Wallace, 2020). The dates are in the order they need to be, to make it appear that the number of coronavirus cases in Georgia’s top 5 impacted counties is decreasing over time.

Figure 1: May 10 top five impacted counties bar chart from the Georgia Department of Public Health website.

The figure above, if the dates were lined up appropriately would tell a different story. Once this chart was made public, it garnered tons of media coverage and was later fixed. But, this happens all the time when people have an agenda. They mess with the axis, to give them the result they want. It is really rare though to see a real-life example of it on the x-axis.

But wait, there’s more! Notice the grouping order of the top five impacted counties. Pick a color, it looks like the Covid-19 counts per county are playing musical chairs. What was done here was, they ordered each day as top five counties in descending count order, which makes it even harder to understand and interpret, again sewing a narrative that may not be accurate (Bump, 2020; Fowler, 2020, Mariano & Trubey, 2020; McFall-Johnsen, 2020, Wallace, 2020).

Now according to Fowler (2020), there are issues in how the number of Covid-19 cases gets counted here, which adds to misinformation and sews further distrust. It is just another way to build a narrative you wish you had, but carving out an explicit definition of what is in and what is out, you can cause an artificial skew in your data, again to favor a narrative or produce false results that could be accidentally generalized. Here Fowler explains:

“When a new positive case is reported, Georgia assigns that date retroactively to the first sign of symptoms a patient had – or when the test was performed, or when the results were completed. “

Understanding that the virus had many asymptomatic carriers that never got reported is also part of the issue. Understanding that you could be asymptomatic for days and still have Covid-19 in your system, means that the definition above is completely inaccurate. Also, Fowler explains that if there was a Covid-19 test, there is such a backlog of tests, that it could take days to report a positive case, so reporting the last 14 days, these numbers along with the definition will see those numbers shift wildly throughout each iteration of the graph. So, when the figure one was fixed, the last 14 days will inherently show a decrease in cases, due to backlog, definition, and understanding of the virus, see figure 2.

Figure 2: May 19 top five impacted counties bar chart from the Georgia Department of Public Health website.

They did fix the ordering of the counties and the x-axis. But after it was reported by Fox News, Washington Post, and Business Insiders, to report a few. However, the definition of what counts as a Covid-19 case distorts the numbers and still tells the wrong story. It is easy to see this effect when you compare May 4-9 data between Figure 1 and Figure 2. Figure 2 has a higher incidence of Covid-19 recorded, over that same period. That is why definitions and criteria matter just as much as how graphs can be manipulated.

Mariano & Trubey (2020) does have a point, some errors are expected during a time of chaos, but, common chairmanship behavior should be observed. However, be careful of how data is collected, how it is represented on graphs and look at not only the commonly manipulated Y-axis but also the X-axis. That is why the methodology sections in peer-reviewed work are extremely important.

Resources:

Parallel Programming: Compelling Topics

(0) A thread is a unit (or sequence of code) that can be executed by a scheduler, essentially a task (Sanden, 2011). A single thread (task) will have one program counter and a sequence of code. Multi-threading occurs when one program counter shares a common code. Thus, the counter in multi-threading has many sequences of code that can be assigned to different processors to run in parallel (simultaneously) to speed up a task. Another way for multi-threading is to have the counter execute the same code on different processors with different inputs. If data is shared between the threads, there is a need for a “safe” object through synchronization, where one thread can access the data stored in a “safe” object at one time. It is through these “safe” objects that a thread can communicate with another thread.

(1) Sanden (2011) shows to use synchronized objects (concurrency in Java), which is a “safe” object, that are protected by locks in critical synchronized methods.  Through Java we can create threads by: (1) extend class Thread or (2) implement the interface Runnable.  The latter defines the code of a thread under a method: void run ( ), and the thread completes its execution when it reaches the end of the method (which is essentially like a subroutine in FORTRAN).  Using the former you need the contractors public Thread ( ) and public Thread (Runnable runObject) along with methods like public start ( ).

(2) Shared objects force mutual exclusion on threads that try to call it are “safe objects”.  The mutual exclusion on threads/operations can be relaxed when threads don’t change any data, this may be a read of the data in the “safe object” (Sanden, 2011).

(3) Deadlock occurs while you are getting an additional resource while holding another or more resource, especially when it creates a circularity. To prevent deadlocks, resources need to be controlled.  One should do a wait chain diagram to make sure your design can help prevent a deadlock.  Especially when there is a mix of transactions occurring.  A good example of a deadlock is a stalemate in Chess or as Stacy said, a circular firing squad.

(4) In a distributed system nodes can talk (cooperate) to each other and coordinate their systems.  However, the different nodes can execute concurrently, there is no global clock in which all nodes function on, and some of these nodes can fail independently.  Since nodes talk to each other, we must study them as they interact with each other.  Thus, a need to use logical clocks (because we don’t have global clocks) which show that distances in time are lost. In logical clocks: all nodes agree on an order of events, partially (where something can happen before another event).  They only describe the order of events, not with respect to time.  If nodes are completely disjoint in a logical clock, then a node can fail independently. (This was my favorite subject because I can now visualize more about what I was reading and the complex nature of nodes).

(5) An event thread is a totally ordered sequence of event occurrences, and where a control thread processes each occurrence in turn.  In the event thread, we can have 2 occurrences act in either:

  • x — > y
  • y — >
  • x || y

Events in this thread must be essential to the situation they are being used for and independent of any software design.  Essential threads can be shared like by time, domain, or by software, while others are not shared, as they occur inside the software.

References

Parallel Programming: State Diagram Example

Capture

  • Enter into state S0 into the superstate S1 through event 1 and yields action a1.
  • When entering into superstate S1, we must go through state S12, with action a7 to enter and action a3 to exit.
    • If action a3 yields an event e9, which yielded action a9, we enter into state S13, causing action a6 and action a12 to exit.
      • If action a12 yields an event e5, we will get action a5 and we hit the superstate S1 and begin again to state S2.
      • If action a12 yields an event e9, we will use action a1 an enter state S112 (under the S11 superstate) with an entry of an action a11.
        • Event e2 acts on S112, to get action 2 which enters the superstate S11.
          • Entering into the superstate through state S112 we get an exit criterion of action a14 and we end.
          • If exiting state S112 we do event e1 and action a1 we are sent back to state S12 to start again.
          • If we exit state S112 we do event e3 and action a3 which is used to enter into state S1 follow 1.a.
    • If action a3 in state S12 yields event e4 and action a4, we enter the superstate S11. Entering super state S11 this way we enter into state S111 with an entry action of a8.
      • We then carry out event e9 and action a1 to get to state S112. If this happens follow 1.a.i.2.

Parallel Processing: Ada Tasking and State Modeling

Sample Code

1  protected Queue is
2          procedure Enqueue (elt : in Q_Element);     
3                                            -- Insert elt @ the tail end of the queue
4          function First_in_line return Q_Element;
5                                            -- Return the element at the head of the
6                                            -- queue, or no_elt.
7          procedure Dequeue;                  
8                                            -- If the queue is not empty, remove the
9                                            -- element at its head
10 end Queue;
11
12 task type Worker;
13
14 task body Worker is
15   elt : Q_Element;                        -- Element retrieved from queue
16   begin
17      while true loop
18           elt := Queue.First_in_line;     -- Get element at head of queue
19           if elt = no_elt then            -- Let the task loop until there
20              delay 0.1;                   -- is something in the queue
21           else
22               Process (elt);              -- Process elt. This takes some time
23               Queue.Dequeue;              -- Remove element from queue
24           end if;
25     end loop;
26 end Worker;

Comparing and Contrasting Ada’s Protected Function to Java’s (non)synchronized

Java’s safe objects are synchronized objects, which are usually contained in methods that are “synchronized” or “non-synchronized”.  For the non-synchronized methods, the safe objects within these methods are mostly considered to be read-only. Whereas in synchronized methods, safe objects can be written and read, but usually have wait loops at the beginning with a certain wait condition (i.e. while (condition) {wait( );}).  This wait loop forces the thread to wait until when the condition becomes true, in order to prevent multiple threads from editing the same safe object at the same time.   Usually, wait loops located elsewhere in the Java synchronized methods are (uncommon).

Safe objects in Ada are protected objects are in the “protected procedure” or a “protected function”.  Unlike the non-synchronized method in Java (where it should be read-only), the protected function in Ada is read-only.  Java’s synchronized version has a wait function that stalls a thread, in the Ada’s protected entry (i.e. entry x ( ) when condition is …) is only entered when the condition is true, thus you can have multiple entries where data could manipulate in multiple ways similar to an if-else function.  For example, entry x ( ) when condition is true and another one right after could be entry x ( ) when condition is false.  Though, this can be expanded to n different conditions, where.  With these entries, the barrier is always tested first compared to wait.  However, we could requeue (not a subprogram call-thus the tread doesn’t return to the point after the requeue call) on another entry, but it’s uncommon to have them located elsewhere in the program.

Describe what worker does

Workers must get the element (elt) from the first line item in the queue and then loop through the task until there is an element in the queue for which the worker can process the element.  The element is stored in the elt array.  If there is no element delay the process by 0.1 units of time and keep looping.  Once the worker has obtained an element, they can begin processing the element, then we can remove the element from the current queue.

Adapt Queue and work for multiple instances of worker can process data elements

In order for one worker to process each element, we must first do is change “task body worker is” to “protected body worker is” on line 14.  Change “elt: Q_Element;” to procedure get (elt: Q_Element) is in order to get the element from the queue on line 15.

Once there is an element in the first inline of the queue, the worker must first dequeue it in order to process it, this should protect the data and allow for another worker to work on the next first inline element.  Thus, I would be proposing to switch lines 22 to 23 and 23 to 22.  If this isn’t preferred, we can create a new array called work_in_progress where we create a get, put, and remove procedure for this array, which should go before line 22 and then follow my proposed sequence.  This will allow the worker to say I got this element, I will work on it, and if all is successful we don’t need to re-add the element back into the queue and delete it from the work_in_progress array, but I don’t want to hold up other workers from working on other elements.  However, if the worker says I failed to process this array, please return it back into the queue and add it into the elt array again for another worker to process it. To avoid an endless loop, if an element cannot be processed by three different workers we can create another array to store non-compliant elements in and call Dequeue on the main elt array.   However, we can simply and only switch lines 22 and 23 with each other if and only if this change shows that processing the element could never fail.

In Queue line 2 must have entries, for instance “entry Enqueue (elt : in Q_Element) when count >= 0 is … end Enqueue”, to allow for waiting until there are actually element to be added from the array elt.  Doing entries in Queue, would be eliminating the need for the while true loop to search if there is an elt in the first of the line in lines 19, 20, & 21.   Thus, we are making our conditions check first rather than later on in the code. Similarly, we can do a change to line 7 to “entry Dequeue (elt : in Q_Element) when count > 0 is … end Dequeue”, to allow for waiting until there is actually element for deletion from the array elt.  Though this is more for an efficiency issue and allows for the worker to say I got this element and it’s ok to delete from the queue. With all these changes we must make sure that on line 18 we must make sure we are pulling an element from an elt array.

The loop where instances of workers wait is crude

Line 18 and 23 can be combined with Queue.Pop (elt) above the if-statement in on line 18, to avoid the crude loop where threads of workers wait for something in the queue.  The pop allows for no “busy waiting”.  But, we must create a procedure in Queue called procedure pop which returns a query on the first line item on the array elt and removes it.

 

The context with an image

Sometimes as a data scientist or regular scientist, we produce beautiful charts that are chock-full of meaning and data, however to those in the outside world, it can be misconstrued.  To avoid the scenario of misreading your graphs on a dashboard, paper, or even a blog post, sometimes context is needed.  The amount of context needed will depend on the complexity of understanding and severity of misinterpretation. The higher the complexity the more contextual text is needed to help the reader digest the information you are presenting. The higher the severity of misinterpretation, i.e. life-threatening if misread or loss of millions of dollars, should also include more contextual text.

Contextual text can help a reader understand your tables, graphs, or dashboards but not every instance requires the same level of detail throughout.  The following are just meer examples of what light, medium, and heavy context could include:

Light Context (bullet points)

  • Source system or source details
  • Details on allocations impacting objects in a model
  • Details on data joins or algorithms used
  • Data nuances (excludes region x)

Medium Context (Calling out use cases)

  • A succinct explanation of what the user should be able to get out of the report/reporting area, graph, table, or dashboard

Heavy Context (Paragraph Explanations)

  • The best example is the results section of a scientific peer-reviewed journal, which not only has a figure description, but they go into detail about areas to pay attention to, outliers, etc.

Below is an example from the National Hurricane Center’s (NHC) 5-day forecast cone for Tropical Storm Sebastian.  Notice the

“Note: The cone contains the probable path of the storm center but does not show the size of the storm.  Hazardous conditions can occur outside of the cone.” (NHC, 2019).

This line alone falls under light context until you add the key below, which is a succinct explanation of how to read the graphic, making the whole graphic fall under medium context.

download

A secondary image produced originally by the NHC for Hurricane Beryl in 2016 shows an example of a heavy context below the NHC image, when text is added by an app. In this application, where this image is pulled from, the following block of text states the following (Appadvice.com, n.d.):

“This graphic shows an approximate representation of coastal areas under a hurricane warning (red), hurricane watch (pink), tropical storm warning (blue) and tropical storm watch (yellow). The orange circle indicates the current position of the center of the tropical cyclone. The black line, when selected, and dots show the National Hurricane Center (NHC) forecast track of the center at the times indicated. The dot indicating the forecast center location will be black if the cyclone forecast to be tropical and will be white with a black outline if the cycle is …”

750x750bb

Resources:

Strategic Key Elements Case

To understand a corporate strategy, you need to understand the different key elements of a corporate strategy.  Most corporate strategies are showcased in the investor’s annual report, usually in the first 10ish pages.  The key elements to a corporate strategy include: Mission, Scope, Competitive Advantage, Action Program, and Decision-making guidance. Below is a case study from Boeing’s 2017 Annual Report, prior to the 737 Max situation.

Strategy key elements:

  • Mission: “Our purpose and mission is to connect, protect, explore and inspire the world through aerospace innovation. We aspire to be the best in aerospace and an enduring global industrial Champion.” (Page 1)
  • Scope: “…manufacturer of commercial airplanes and defense, space and security systems and a major provider of government and commercial aerospace services…. Our products and tailored services include commercial and military aircraft, satellites, weapons, electronic  and defense system, launch systems, digital aviation services, engineering modifications and maintenance, supply chain services and training” (Table of Contents)
  • Competitive Advantage:
    • Supply Chain integration: “leverages the talents of skilled people working for Boeing suppliers worldwide, including 1.3 million people at 13,600 U.S. companies” (Table of Contents)
    • Intellectual property: “We own numerous patents and have licenses for the use of patents owned by others, which relate to our products and their manufacture. In addition to owning a large portfolio of intellectual property, we also license intellectual property to and from third parties.” (Page 2)
  • Action Program:  “The importance of our purpose and mission demands that we work with the utmost integrity and excellence and embrace the enduring values that define who we are today and the company we aspire to be tomorrow. These core values—integrity, quality, safety, diversity and inclusion, trust and respect, corporate citizenship and stakeholder success—remind us all that how we do our work is every bit as important as the work itself. Living these values also means being best-in-class in community and environmental stewardship.” (Page 8)
  • Decision-Making Guidance:
    • “Our ongoing pursuit of improved affordability and productivity, as well as strong performance across our core business areas…have us well positioned to fund our future.” (Page 7)
    • “Succeeding in rapidly changing global markets requires that we think and do things differently. It demands change, a willingness to embrace it and the agility to both drive and respond to external forces.” (Page 7)
    • “… we continue to make meaningful progress toward building a zero-injury workplace, achieving world-class quality and creating the kind of culture that breaks down organizational barriers, eliminates bureaucracy and unleashes the full capabilities of our people.” (Page 8)

 

Resources:

Parallel Programming: Vector Clocks

Groups of nodes act together, can send messages (multicast) to a group, and the messages are received by all the nodes that are in the group (Sandén, 2011).  If there is a broadcast, all nodes in the system get the same message.  In a multicast, the messages can reach the nodes in a group in a different order: First In First Out order, casual order, or total order (atomic multicast if it is reliable).

Per Sandén (2011), a multicast can occur if the source is a member of the group, but it cannot span across groups in causal order. Two-phase, total order multicast systems can look like a vector clock but they are not, and each message sends or receive will increment on this system by one as they talk between the systems.

Below is an example of a vector clock:

To G1

  • m1 suggested time at 6 and  m2 suggested time at 11
  • m1 commit time at 7 and m2 commit time at 12

To G2

  • m1 suggested time at 7 and  m2 suggested time at 12
  • m1 commit time at 8 and m2 commit time at 13vectorclock

Reference

Parallel Programming: State diagram of Maekawa’s voting algorithm

Sandén (2011) defines state diagrams as a way to show the possible states an object could be on.  He also defines, that events are action verbs that occur on an arrow between two events (if an action doesn’t change the state it can be listed in the state).  Whereas an action can have conditions on them.

Thus, a state diagram shows the transition from state to state as events occur.  An event usually has many occurrences, and they are instantaneous.  Finally, a super-state can encompass multiple states (Sandén, 2011).  An activity is an operation that takes time, and it has the keyword “do /”

The goal of this post was to make a state diagram of Maekawa’s voting algorithm on the “Maekawa’s algorithm” within the “Distributed mutual exclusion” set. This can be done in various ways. One option is the following 5 states:

  • Released and not voted
  • Released and voted
  • Wanted and not voted
  • Wanted and voted
  • Held and voted (for self)

Events are:

  • request_received, etc., for messages arriving from other nodes
  • acquire for when the local node wants the lock
  • release for when the local node gives up the lock.

A possible solution is shown below:

statediagram

Reference

Qualitative Analysis: Coding Project Report of a Virtual Interview Question

The virtual interview question: Explain what being a doctoral student means for you? How has your life changed since starting your doctoral journey?

Description of your coding process

The steps I followed in this coding process were to read the responses once, at least one week before this individual project assignment was due.  This allowed me to think of generic themes, and codes at a super high level throughout the week.  Then after the week was over, I quickly went to wordle.net to create a word cloud on the top 50 most used words in this virtual interview and found out the results below.

wordle

Figure 1: Screenshot for wordle.net results which were used to help develop sub-codes and codes, words that bigger appear more often in the virtual interview than those words that are smaller.

The most telling themes from Figure 1 are: Time, Family, Life, Work, Student, Learning, Frist, Opportunity, Research, People, etc.  This helped create some codes and some of the sub-codes like prioritization, for family, etc.  Figure 1 has also helped me to confirm my ideas for codes that I have been thinking already in my head for the past week, thus I felt ready to begin coding.  After, deciding on the initial set of codes, I did some manual coding, while asking the questions: “What is the person saying? And how they are saying it? And could there be a double meaning in the sentences?”  The last question helped me identify if each sentence in this virtual interview had multiple codes within it.  I used QDA Miner Lite as my software of choice for coding, it is an open-source product and there are plenty of end-user tutorials made by different researchers from many fields on how to effectively use this software effectively on YouTube.  After the initial manual coding, I revisited the initial coding book.  Some of the subcodes that fell under betterment, were moved into the future code as it better fit that theme than just pure betterment. This reanalysis of coding went on for all codes.  As I re-read the responses for the third time, some new subcodes got added as well.  The reason for re-reading this virtual interview a third time was to make sure not many other codes could be created or were missing.

Topical Coding Scheme (Code Book)

The codebook that was derived is as follows:

  • Family
    • For Family
    • Started by Family
    • First in the Family
  • Perseverance
    • Exhausted
    • Pushing through
    • Life Challenges
    • Drive/Motivation
    • Goals
  • Betterment
    • Upgrade skills
    • Personal Growth
    • Maturity
    • Understanding
    • Priority Reanalysis
  • Future
    • More rewarding
    • Better Life
    • Foresight
  • Proving something
    • To others

 

Diagram of findings

Below are images developed through the analytical/automated part QDA Miner Lite:

fig2

Figure 2: Distribution of codes in percentages throughout the virtual interview.

Figure 3: Distribution of codes in frequency throughout the virtual interview.

fig4

Figure 4: Distribution of codes in frequency throughout the virtual interview in terms of a word cloud where more frequent codes appear bigger than less frequent codes.

Brief narrative summary of finding referring to your graphic diagram

Given figures 2-4, one could say that the biggest theme for going into the doctoral program is the prospect of a better life and hoping to change the world, as they more frequently showed up in the interview.  One student states that their degree would open many doors, “Pursuing and obtaining this level of degree would help to open doors that I may not be able to walk through otherwise.” While another student says that hopefully, their research will change the future lives of many “The research that I am going to do will hopefully allow people to truly pursue after their dreams in this ever-changing age, and let the imagination of what is possible within the business world be the limit.” Other students are a bit more practical with their responses stating things like “…move up in my organization and make contributions to the existing knowledge” and finally “More opportunities open for you as well as more responsibility for being credible and usefulness as a cog in the system”

Another concept that kept repeating here is that this is done for family, and because of family work, and school, the life of a doctoral student in this class has to be reprioritized (hence the code priority reanalysis).  This is primarily seen as all forms of graphical output show that these are the two most significant things that drive towards the degree.  One student went to one extreme, “Excluding family and school members, I am void of the three ‘Ps’ (NO – people, pets, or plants). I quit my full-time job and will be having the TV signal turned off after the Super Bowl to force additional focus.”  Another student said that time was the most important thing they had and that it has changed significantly, “The most tangible thing that has changed in my life since I became a doctoral student has been my schedule.  Since this term began I have put myself on a strict schedule designating specific time for studies, my wife, and time for myself.”  Finally, another student says balance is key for them: “Having to balance family time, work, school, and other social responsibilities, has been another adjusted change while on this educational journey. The support of my family has been very instrumental in helping me to succeed and the journey has been a great experience thus far.”  There are 7 instances in which these two codes overlap/included within each other, which apparently happen 80% of the time.

Thus, from this virtual interview, I am able to conclude that family is mentioned with priority reanalysis in order to meet the goal of the doctoral degree and that time management a component of priority reanalysis is key.  There are students that take this reanalysis to the extreme as aforementioned, but if they feel that is the only way they could accomplish this degree in a timely manner, then who am I to judge.  After all, it is the job of the researcher, when coding to be non-biased.  However, the family could drive people to complete the degree, it is the prospects of a better life and changing the world for the better is what was mentioned most.

Appendix A

An output file from qualitative software can be generated by using QDA Miner Lite.