Parallel Programming: Compelling Topics

(0) A thread is a unit (or sequence of code) that can be executed by a scheduler, essentially a task (Sanden, 2011). A single thread (task) will have one program counter and a sequence of code. Multi-threading occurs when one program counter shares a common code. Thus, the counter in multi-threading has many sequences of code that can be assigned to different processors to run in parallel (simultaneously) to speed up a task. Another way for multi-threading is to have the counter execute the same code on different processors with different inputs. If data is shared between the threads, there is a need for a “safe” object through synchronization, where one thread can access the data stored in a “safe” object at one time. It is through these “safe” objects that a thread can communicate with another thread.

(1) Sanden (2011) shows to use synchronized objects (concurrency in Java), which is a “safe” object, that are protected by locks in critical synchronized methods.  Through Java we can create threads by: (1) extend class Thread or (2) implement the interface Runnable.  The latter defines the code of a thread under a method: void run ( ), and the thread completes its execution when it reaches the end of the method (which is essentially like a subroutine in FORTRAN).  Using the former you need the contractors public Thread ( ) and public Thread (Runnable runObject) along with methods like public start ( ).

(2) Shared objects force mutual exclusion on threads that try to call it are “safe objects”.  The mutual exclusion on threads/operations can be relaxed when threads don’t change any data, this may be a read of the data in the “safe object” (Sanden, 2011).

(3) Deadlock occurs while you are getting an additional resource while holding another or more resource, especially when it creates a circularity. To prevent deadlocks, resources need to be controlled.  One should do a wait chain diagram to make sure your design can help prevent a deadlock.  Especially when there is a mix of transactions occurring.  A good example of a deadlock is a stalemate in Chess or as Stacy said, a circular firing squad.

(4) In a distributed system nodes can talk (cooperate) to each other and coordinate their systems.  However, the different nodes can execute concurrently, there is no global clock in which all nodes function on, and some of these nodes can fail independently.  Since nodes talk to each other, we must study them as they interact with each other.  Thus, a need to use logical clocks (because we don’t have global clocks) which show that distances in time are lost. In logical clocks: all nodes agree on an order of events, partially (where something can happen before another event).  They only describe the order of events, not with respect to time.  If nodes are completely disjoint in a logical clock, then a node can fail independently. (This was my favorite subject because I can now visualize more about what I was reading and the complex nature of nodes).

(5) An event thread is a totally ordered sequence of event occurrences, and where a control thread processes each occurrence in turn.  In the event thread, we can have 2 occurrences act in either:

  • x — > y
  • y — >
  • x || y

Events in this thread must be essential to the situation they are being used for and independent of any software design.  Essential threads can be shared like by time, domain, or by software, while others are not shared, as they occur inside the software.

References

Parallel Programming: Practical examples of a thread

Here is a simple problem: A boy and a girl toss a ball back and forth to each other. Assume that the boy is one thread (node) and the girl is another thread, and b is data.

Boy = m

Girl = f

Ball = b

  • m has b
    1. m throws b –> f catches b
  • f has b
    1. f throws b –> m catches b

Assuming we could drop the ball, and holding everything else constant.

  • m has b
    1. m throws b –> f catches b
    2. m throws b –> f drops b
      1. f picks up the dropped b
  • f has b
    1. f throws b –> m catches b
    2. f throws b –> m drops b
      1. m picks up the dropped b

 

Suppose you add a third player.

Boy = m

Girl = f

Ball = b

3rd player = x

  • m has b
    1. m throws b –> f catches b
    2. m throws b –> x catches b
  • f has b
    1. f throws b –> m catches b
    2. f throws b –> x catches b
  • x has b
    1. x throws b –> m catches b
    2. x throws b –> f catches b

Assuming we could drop the ball, and holding everything else constant.

  • m has b
    1. m throws b –> f catches b
    2. m throws b –> f drops b
      1. f picks up the dropped b
    3. m throws b –> x catches b
    4. m throws b –> x drops b
      1. x picks up the drooped b
  • f has b
    1. f throws b –> m catches b
    2. f throws b –> m drops b
      1. m picks up the dropped b
    3. f throws b –> x catches b
    4. f throws b –> x drops b
      1. x picks up the dropped b
  • x has b
    1. x throws b –> m catches b
    2. x throws b –> m drops b
      1. m picks up the dropped b
    3. x throws b –> f catches b
    4. x throws b –> f drops b
      1. f picks up the dropped b

Will that change the thread models? What if the throwing pattern is not static; that is, the boy can throw to the girl or to the third player, and so forth? 

In this example: Yes, there is an additional thread that gets added, because each player is a tread that can catch or drop a ball.  Each player is a thread on its own, transferring data ‘b’ amongst them and throwing the ‘b’ is locking the data before transferring and catching ‘b’ is unlocking the data.  After the ball is dropped (maybe calculated randomly), the player with the ball now has to pick it up, which can be equivalent to analyze the data based on a certain condition that is met like account balance is < 500 or else.  The model changes with the additional player because each person has a choice to make now on which person should receive the ball next, which is not present in the first model when there were two threads.  If there exists a static toss like

  • f –> m –> x –> f

Then the model doesn’t change, because there is no choice now.

Parallel Programming: Deadlocks

Deadlock occurs while you are getting an additional resource while holding another or more resource, especially when it creates a circularity  (Sandén, 2011).

Sandén (2011), stated that to prevent deadlocks, resources need to be controlled.  One should do a wait chain diagram to make sure your design can help prevent a deadlock.  Especially when there is a mix of transactions occurring.  It is also best to know how many threads/entities are needed to be called on simultaneously before a deadlock can occur, especially true when you have multiple threads calling on shared resources.

Thus, we should manage the resources to ensure no circularity, limit the number of entities to just below the threshold to cause a deadlock, eliminate wait.

There are many in real life like the one shown in Sandén (2011) with each of 4 cars halfway into an intersection. The following is a real-life suggested a deadlock scenario:  

There is one set of measuring cup (1/2 a cup).  There are no other ways to measure this amount.  Jack and Jill are backing a cake at the same time.  They have all the objects need, eggs, cake mix, oil, and milk.  However, they need the only measuring cup to measure oil and milk and they reach for it at the same time.  This is a deadlock.

To un-deadlock this scenario, Jack can pour the eggs, and cake mix, while Jill measures and pours the oil and milk.  When Jill is done, Jack measures and pours the oil and milk and Jill pours the cake mix and eggs.  The same could be done with up to four people.  Where each person is a thread and the measuring cup is the resource.

Once we introduce a fifth or more person, the wait chain has unnecessarily long periods of wait for one thread to be able to begin to use a resource.

Reference

Parallel Programming: Resource Guard

A quick note:

“In the resource-guard-thread pattern, resource-guard threads represent resources.  Such threads are arranged in a virtual assembly line and connected by queues implemented as safe objects” (Saden, 2011)

By the definition above, the search and insertion threads have exclusive data to perform subdivision and legalization through an insertion point, not a queue, thus this is a resource-user thread pattern.

“As long as each resource user has exclusive access to no more than one resource at a time, the designer can usually choose between a solution with resource-guard threads and one with resource threads.  In this sense, the two patterns are dual.” (Saden, 2011)

A dual solution would look like: The search and insertion threads would return an index to a safe object, which would house all the data.  The data can then be a queue from in order to proceed with step two which is subdivision and legalization.

Reference

Parallel Programming: Synchronized Objects

Sanden (2011) shows how to use synchronized objects (concurrency in Java), which is a “safe” object, that are protected by locks in critical synchronized methods.  Through Java we can create threads by: (1) extend class Thread or (2) implement the interface Runnable.  The latter defines the code of a thread under a method: void run ( ), and the thread completes its execution when it reaches the end of the method (which is essentially a subroutine in FORTRAN).  Using the former you need the contractors public Thread ( ) and public Thread (Runnable runObject) along with methods like public start ( ).

Additional Examples:

MapReduce

According to Hortonworks (2013), MapReduce’s Process in a high level is: Input -> Map -> Shuffle and Sort -> Reduce -> Output.

Tasks:  Mappers, create and process transactions on a data set filed away in a distributed system and places the wanted data on a map/aggregate with a certain key.  Reducers will know what the key values are, and will take all the values stored in a similar map but in different nodes on a cluster (per the distributed system) from the mapper to reduce the amount of data that is relevant (Hortonworks, 2013). Reducers can work on different keys.

Example: A great example of this a MapReduce: Request, is to look at all CTU graduate students and sum up their current outstanding school loans per degree level.  Thus, the final output from our example would be:

  • Doctoral Students Current Outstanding School Loan Amount
  • Master Students Current Outstanding School Loan Amount.

Now let’s assume that this ran in Hadoop, which can do MapReduce.   Also, let’s assume that I could use 50 nodes (threads) to process this transaction request.  The bad data that gets thrown out in the mapper phase would be the Undergraduate Students, given that it does not match the initial search criteria.  The safe data will be those that are associated with Doctoral and Masters Students.  So, during the mapping phase, the threads will assign Doctoral Students to one key, and Master students would get another key.  Each node (thread) will use the same keys for their respective students, thus the keys are similar in all nodes (threads).  The reducer uses these keys and the safe objects in them, to sum up, all of the current outstanding school loan amounts get processed under the correct group.  Thus, once all nodes (threads) use the reducer part, we will have our two amounts:

  • Doctoral Students Current Outstanding School Loan
  • Masters Students Current Outstanding School Loan

Complexity could be added if we only wanted to look into graduate students that are currently active and non-active service members.  Or they could be complicated by gender, profession, diversity signifiers, we can even map to the current industry.

Resources

Parallel Programming: Threads

A thread is a unit (or sequence of code) that can be executed by a scheduler, essentially a task (Sanden, 2011). A single thread (task) will have one program counter and a sequence of code. Multi-threading occurs when one program counter shares a common code. Thus, the counter in multi-threading has many sequences of code that can be assigned to different processors to run in parallel (simultaneously) to speed up a task. Another way for multi-threading is to have the counter execute the same code on different processors with different inputs. If data is shared between the threads, there is a need for a “safe” object through synchronization, where one thread can access the data stored in a “safe” object at one time. It is through these “safe” objects that a thread can communicate with another thread.

An additional example that may help illustrate the material: 

Maybe we would like to know the average of the sum of all the credits and the average of the sum of all the debits made in personal checking accounts in December in Suntrust Bank. After Map-Reduce techniques using multiple threading, we can go through their entire database system to find accounts and timestamp transactions, map out all the data and reduce it to what we need to return the two numbers in our query. 

Resources: