Parallel Programming: Compelling Topics

(0) A thread is a unit (or sequence of code) that can be executed by a scheduler, essentially a task (Sanden, 2011). A single thread (task) will have one program counter and a sequence of code. Multi-threading occurs when one program counter shares a common code. Thus, the counter in multi-threading has many sequences of code that can be assigned to different processors to run in parallel (simultaneously) to speed up a task. Another way for multi-threading is to have the counter execute the same code on different processors with different inputs. If data is shared between the threads, there is a need for a “safe” object through synchronization, where one thread can access the data stored in a “safe” object at one time. It is through these “safe” objects that a thread can communicate with another thread.

(1) Sanden (2011) shows to use synchronized objects (concurrency in Java), which is a “safe” object, that are protected by locks in critical synchronized methods.  Through Java we can create threads by: (1) extend class Thread or (2) implement the interface Runnable.  The latter defines the code of a thread under a method: void run ( ), and the thread completes its execution when it reaches the end of the method (which is essentially like a subroutine in FORTRAN).  Using the former you need the contractors public Thread ( ) and public Thread (Runnable runObject) along with methods like public start ( ).

(2) Shared objects force mutual exclusion on threads that try to call it are “safe objects”.  The mutual exclusion on threads/operations can be relaxed when threads don’t change any data, this may be a read of the data in the “safe object” (Sanden, 2011).

(3) Deadlock occurs while you are getting an additional resource while holding another or more resource, especially when it creates a circularity. To prevent deadlocks, resources need to be controlled.  One should do a wait chain diagram to make sure your design can help prevent a deadlock.  Especially when there is a mix of transactions occurring.  A good example of a deadlock is a stalemate in Chess or as Stacy said, a circular firing squad.

(4) In a distributed system nodes can talk (cooperate) to each other and coordinate their systems.  However, the different nodes can execute concurrently, there is no global clock in which all nodes function on, and some of these nodes can fail independently.  Since nodes talk to each other, we must study them as they interact with each other.  Thus, a need to use logical clocks (because we don’t have global clocks) which show that distances in time are lost. In logical clocks: all nodes agree on an order of events, partially (where something can happen before another event).  They only describe the order of events, not with respect to time.  If nodes are completely disjoint in a logical clock, then a node can fail independently. (This was my favorite subject because I can now visualize more about what I was reading and the complex nature of nodes).

(5) An event thread is a totally ordered sequence of event occurrences, and where a control thread processes each occurrence in turn.  In the event thread, we can have 2 occurrences act in either:

  • x — > y
  • y — >
  • x || y

Events in this thread must be essential to the situation they are being used for and independent of any software design.  Essential threads can be shared like by time, domain, or by software, while others are not shared, as they occur inside the software.

References

Parallel Programming: State Diagram Example

Capture

  • Enter into state S0 into the superstate S1 through event 1 and yields action a1.
  • When entering into superstate S1, we must go through state S12, with action a7 to enter and action a3 to exit.
    • If action a3 yields an event e9, which yielded action a9, we enter into state S13, causing action a6 and action a12 to exit.
      • If action a12 yields an event e5, we will get action a5 and we hit the superstate S1 and begin again to state S2.
      • If action a12 yields an event e9, we will use action a1 an enter state S112 (under the S11 superstate) with an entry of an action a11.
        • Event e2 acts on S112, to get action 2 which enters the superstate S11.
          • Entering into the superstate through state S112 we get an exit criterion of action a14 and we end.
          • If exiting state S112 we do event e1 and action a1 we are sent back to state S12 to start again.
          • If we exit state S112 we do event e3 and action a3 which is used to enter into state S1 follow 1.a.
    • If action a3 in state S12 yields event e4 and action a4, we enter the superstate S11. Entering super state S11 this way we enter into state S111 with an entry action of a8.
      • We then carry out event e9 and action a1 to get to state S112. If this happens follow 1.a.i.2.

Parallel Processing: Ada Tasking and State Modeling

Sample Code

1  protected Queue is
2          procedure Enqueue (elt : in Q_Element);     
3                                            -- Insert elt @ the tail end of the queue
4          function First_in_line return Q_Element;
5                                            -- Return the element at the head of the
6                                            -- queue, or no_elt.
7          procedure Dequeue;                  
8                                            -- If the queue is not empty, remove the
9                                            -- element at its head
10 end Queue;
11
12 task type Worker;
13
14 task body Worker is
15   elt : Q_Element;                        -- Element retrieved from queue
16   begin
17      while true loop
18           elt := Queue.First_in_line;     -- Get element at head of queue
19           if elt = no_elt then            -- Let the task loop until there
20              delay 0.1;                   -- is something in the queue
21           else
22               Process (elt);              -- Process elt. This takes some time
23               Queue.Dequeue;              -- Remove element from queue
24           end if;
25     end loop;
26 end Worker;

Comparing and Contrasting Ada’s Protected Function to Java’s (non)synchronized

Java’s safe objects are synchronized objects, which are usually contained in methods that are “synchronized” or “non-synchronized”.  For the non-synchronized methods, the safe objects within these methods are mostly considered to be read-only. Whereas in synchronized methods, safe objects can be written and read, but usually have wait loops at the beginning with a certain wait condition (i.e. while (condition) {wait( );}).  This wait loop forces the thread to wait until when the condition becomes true, in order to prevent multiple threads from editing the same safe object at the same time.   Usually, wait loops located elsewhere in the Java synchronized methods are (uncommon).

Safe objects in Ada are protected objects are in the “protected procedure” or a “protected function”.  Unlike the non-synchronized method in Java (where it should be read-only), the protected function in Ada is read-only.  Java’s synchronized version has a wait function that stalls a thread, in the Ada’s protected entry (i.e. entry x ( ) when condition is …) is only entered when the condition is true, thus you can have multiple entries where data could manipulate in multiple ways similar to an if-else function.  For example, entry x ( ) when condition is true and another one right after could be entry x ( ) when condition is false.  Though, this can be expanded to n different conditions, where.  With these entries, the barrier is always tested first compared to wait.  However, we could requeue (not a subprogram call-thus the tread doesn’t return to the point after the requeue call) on another entry, but it’s uncommon to have them located elsewhere in the program.

Describe what worker does

Workers must get the element (elt) from the first line item in the queue and then loop through the task until there is an element in the queue for which the worker can process the element.  The element is stored in the elt array.  If there is no element delay the process by 0.1 units of time and keep looping.  Once the worker has obtained an element, they can begin processing the element, then we can remove the element from the current queue.

Adapt Queue and work for multiple instances of worker can process data elements

In order for one worker to process each element, we must first do is change “task body worker is” to “protected body worker is” on line 14.  Change “elt: Q_Element;” to procedure get (elt: Q_Element) is in order to get the element from the queue on line 15.

Once there is an element in the first inline of the queue, the worker must first dequeue it in order to process it, this should protect the data and allow for another worker to work on the next first inline element.  Thus, I would be proposing to switch lines 22 to 23 and 23 to 22.  If this isn’t preferred, we can create a new array called work_in_progress where we create a get, put, and remove procedure for this array, which should go before line 22 and then follow my proposed sequence.  This will allow the worker to say I got this element, I will work on it, and if all is successful we don’t need to re-add the element back into the queue and delete it from the work_in_progress array, but I don’t want to hold up other workers from working on other elements.  However, if the worker says I failed to process this array, please return it back into the queue and add it into the elt array again for another worker to process it. To avoid an endless loop, if an element cannot be processed by three different workers we can create another array to store non-compliant elements in and call Dequeue on the main elt array.   However, we can simply and only switch lines 22 and 23 with each other if and only if this change shows that processing the element could never fail.

In Queue line 2 must have entries, for instance “entry Enqueue (elt : in Q_Element) when count >= 0 is … end Enqueue”, to allow for waiting until there are actually element to be added from the array elt.  Doing entries in Queue, would be eliminating the need for the while true loop to search if there is an elt in the first of the line in lines 19, 20, & 21.   Thus, we are making our conditions check first rather than later on in the code. Similarly, we can do a change to line 7 to “entry Dequeue (elt : in Q_Element) when count > 0 is … end Dequeue”, to allow for waiting until there is actually element for deletion from the array elt.  Though this is more for an efficiency issue and allows for the worker to say I got this element and it’s ok to delete from the queue. With all these changes we must make sure that on line 18 we must make sure we are pulling an element from an elt array.

The loop where instances of workers wait is crude

Line 18 and 23 can be combined with Queue.Pop (elt) above the if-statement in on line 18, to avoid the crude loop where threads of workers wait for something in the queue.  The pop allows for no “busy waiting”.  But, we must create a procedure in Queue called procedure pop which returns a query on the first line item on the array elt and removes it.

 

Parallel Programming: Msynch 

pseudo code

class Msynch  {
     int replies;
     int currentState = 1;
        synchronized void acquire ( ) {
 // Called by thread wanting access to a critical section
              while (currentState != 1) {wait ( );}
              replies = 0; currentState = 2;
              //         
              // (Here, 5 messages are sent)         
              //         
        while (replies < 5) {wait ( );} // Await 5 replies         
              currentState = 3;    
        }   
        synchronized void replyReceived ( ) {
        // Called by communication thread when reply is received         
              replies++;         
              notifyAll ( );
        }    
        synchronized void release ( ) {
        // Called by a thread releasing the critical section    
              currentState = 1;        
              notifyAll ( );   
        }
}

 

class Msynch1  {    
        int replies;    
        int currentState = 1;      
        synchronized void acquire ( ) {
       // Called by thread wanting access to a critical section         
              while (currentState != 1) {yield ( );}         
              replies = 0; currentState = 2;         
              //         
              // (Here, 5 messages are sent)         
              //         
              if (replies < 5) {wait ( );} // Await 5 replies         
              currentState = 3;    
        }     
        synchronized void replyReceived ( ) {
        // Called by communication thread when reply is received
              replies++;
        }       
        synchronized void release ( ) {
        // Called by a thread releasing the critical section    
              currentState = 1;        
              notifyAll ( );   
        } 
}

From the two sets of code above, I have highlighted three differences (three synchronization related errors in Msynch1) in three different colors, as identified by a line-by-line code comparison.  The reason why

  1. notifyAll ( ); is missing, therefore this code won’t work because without this line of code we will not be able to unblock any other thread. Thus, this error will not allow for replyReceived ( ) to be synchronized. This missing part of the code should activate all threads in the wait set to allow them to compete with each other for the lock based on priorities.
  2. {yield ( );} won’t work is because it won’t block until queue not full like in {wait ( );}. Thus, the wait( ) function, which aids in releasing the threads lock is needed. When a thread calls wait( ) it will unlock the object.  After returning from wait( ) call, it will re-lock the object.
  3. if won’t work is because the wait( ) call should be in a “wait loop”: while (condition {wait( ); } as shown in Msynch. Without it, the thread cannot retest the condition after it returns from the wait ( ) call. With the if-statement, the condition is only tested once, unlike with the while-statement.

An additional fourth error was identified after reviewing the notes from class in Java threads shortened_1.

  1. Best practices are not followed in either Msynch and Msynch1 where the wait loop must actually reside in a try block, as follows: while condition) try {wait( );)}.

When the first thread (appThread) in Msynch calls acquire( ) the first time, it currentState = 1, so it enters into the wait loop.  Thus, its replies are initialized at zero and currentState =1.  This thread sends out 5 messages to other threads (calling on replyReceived( )).  As long as replies are less than five it stays locked and the currentState remains equal to two.  Once it has received 5 replies from any five threads, the code will unlock and it increments the currentState by one, so now it is equaled to three.

As the initial thread  (appThread) running in aquire( ) calls out other threads (commThreads) for at least 5 messages, these other threads do so by calling replyRecieved( ).  This code increments the number of replies by one each time a thread calls it and unlocks replies so it can increment it by one and then locks it so that another thread calling replyReceived( ) can increment it by one.  Thus, once five threads, any five threads can successfully run replyReceived( ), then we can increment currentState =3 as the lock on currentState is removed.

Words to define

Semaphores: In a coded programmed, where there exists a code between a lock and unlock instruction, it becomes known as a critical section.  That critical section can only be accessed by one thread at a time.  If a thread sees a semaphore that is open, the thread will close it in one uninterruptible and automatic operation, and if that was successful the thread can confidently proceed into the critical section.  When the thread completes its task in the critical section, it will reopen the semaphore. This, changes of state in the semaphore are important, because if a thread sees that the semaphore is closed that thread stalls.  Thus, this ensures that only one thread at a time can work on the critical section of the code (Oracle, 2015).

Test-and-set (TSET/TAND/TSL): It is a special set of hardware instructions, which semaphores operate in order to have both read access and conditional write access by testing a bit of the code (Sanden, 2011).  This will eventually allow a thread to eventually work on the critical section of the code.   Essentially, a semaphore is open if the bit is 1 and closed if the bit is 0 (Oracle, 2015).  If the bit is 1, the test set of instructions will attempt to close the semaphore by changing and setting the bit to 0.  This TSET is conducted automatically and cannot be interrupted.

Preemption: Multi-threading can be set with or without each thread having an assigned priority preemptively.  If a priority is set amongst the threads, a thread can be suspended (forestalled) by a higher-priority thread at any one time.  This becomes important when the ratio of thread to computer cores is high (like 10 threads on a uniprocessor).  Preemption becomes obsolete when the number of threads is less than the number of cores provided for the code to run (Sanden, 2011).

References

Parallel Programming: Practical examples of a thread

Here is a simple problem: A boy and a girl toss a ball back and forth to each other. Assume that the boy is one thread (node) and the girl is another thread, and b is data.

Boy = m

Girl = f

Ball = b

  • m has b
    1. m throws b –> f catches b
  • f has b
    1. f throws b –> m catches b

Assuming we could drop the ball, and holding everything else constant.

  • m has b
    1. m throws b –> f catches b
    2. m throws b –> f drops b
      1. f picks up the dropped b
  • f has b
    1. f throws b –> m catches b
    2. f throws b –> m drops b
      1. m picks up the dropped b

 

Suppose you add a third player.

Boy = m

Girl = f

Ball = b

3rd player = x

  • m has b
    1. m throws b –> f catches b
    2. m throws b –> x catches b
  • f has b
    1. f throws b –> m catches b
    2. f throws b –> x catches b
  • x has b
    1. x throws b –> m catches b
    2. x throws b –> f catches b

Assuming we could drop the ball, and holding everything else constant.

  • m has b
    1. m throws b –> f catches b
    2. m throws b –> f drops b
      1. f picks up the dropped b
    3. m throws b –> x catches b
    4. m throws b –> x drops b
      1. x picks up the drooped b
  • f has b
    1. f throws b –> m catches b
    2. f throws b –> m drops b
      1. m picks up the dropped b
    3. f throws b –> x catches b
    4. f throws b –> x drops b
      1. x picks up the dropped b
  • x has b
    1. x throws b –> m catches b
    2. x throws b –> m drops b
      1. m picks up the dropped b
    3. x throws b –> f catches b
    4. x throws b –> f drops b
      1. f picks up the dropped b

Will that change the thread models? What if the throwing pattern is not static; that is, the boy can throw to the girl or to the third player, and so forth? 

In this example: Yes, there is an additional thread that gets added, because each player is a tread that can catch or drop a ball.  Each player is a thread on its own, transferring data ‘b’ amongst them and throwing the ‘b’ is locking the data before transferring and catching ‘b’ is unlocking the data.  After the ball is dropped (maybe calculated randomly), the player with the ball now has to pick it up, which can be equivalent to analyze the data based on a certain condition that is met like account balance is < 500 or else.  The model changes with the additional player because each person has a choice to make now on which person should receive the ball next, which is not present in the first model when there were two threads.  If there exists a static toss like

  • f –> m –> x –> f

Then the model doesn’t change, because there is no choice now.

Parallel Programming: Logical Clocks

In a distributed system nodes can talk (cooperate) to each other and coordinate their systems.  However, the different nodes can execute concurrently, there is no global clock in which all nodes function on, and some of these nodes can fail independently (Sandén, 2011).  Since nodes talk to each other, we must study them as they interact with each other.  Thus, a need to use logical clocks (because we don’t have global clocks) which show that distances in time are lost (Sandén, 2011). In logical clocks: all nodes agree on an order of events, partially (where something can happen before another event).  They only describe the order of events, not with respect to time.  If nodes are completely disjoint in a logical clock, then a node can fail independently. This is one way to visualize the complex nature of nodes.

The following is an example of a logical clock:

Capture

Reference

Parallel Programming: Locks

A lock for a Node could be for the latest service request. Nodes in a group have to agree on which one of them holds the lock during any one moment of time, which can be seen on a vector graph if we note which one holds the lock. A node can release and request a lock.

Mutual exclusion algorithms can have a centralized coordinator node that handles the requests for the lock, which then means if that node fails so will the program (Sandén, 2011). Mutual exclusion algorithms can allow for a contention-based exclusion where nodes compete for the lock equally, and a queue is created for pending requests.  Finally, controlled exclusions have a logical piece of code to visit each node at a regulated interval of time to lend them the lock.

Lamport’s clock can help order the contention-based scenario where every node is trying to get the lock and it can only be had through a queue (Sandén, 2011). The queue tracks their request through a timestamp. Nodes can earn a lock if it has all the reply messages it needs to run its task and it’s on the top of the list in its queue.

Sandén (2011), states that multicast is done to all nodes that the lock has been released, and abolishing this message can optimize the process. Thus, the node should get the request from the next in the queue and postpone it until is done with the lock.

Reference

Parallel Programming: Logical Clocks

Per Sandén (2011), in a distributed system nodes can talk (cooperate) to each other and coordinate their systems.  However, the different nodes can execute concurrently, there is no global clock in which all nodes function on, and some of these nodes can fail independently.  Since nodes talk to each other, we must study them as they interact with each other.  Thus, there is a need to use logical clocks (because we don’t have global clocks) which show that distances in time that is lost.

In logical clocks: all nodes agree on an order of events, partially (where something can happen before another event).  They only describe the order of events, not with respect to time.  If nodes are completely disjoint in a logical clock, then a node can fail independently.

Reference

Parallel Programming: Deadlocks

Deadlock occurs while you are getting an additional resource while holding another or more resource, especially when it creates a circularity  (Sandén, 2011).

Sandén (2011), stated that to prevent deadlocks, resources need to be controlled.  One should do a wait chain diagram to make sure your design can help prevent a deadlock.  Especially when there is a mix of transactions occurring.  It is also best to know how many threads/entities are needed to be called on simultaneously before a deadlock can occur, especially true when you have multiple threads calling on shared resources.

Thus, we should manage the resources to ensure no circularity, limit the number of entities to just below the threshold to cause a deadlock, eliminate wait.

There are many in real life like the one shown in Sandén (2011) with each of 4 cars halfway into an intersection. The following is a real-life suggested a deadlock scenario:  

There is one set of measuring cup (1/2 a cup).  There are no other ways to measure this amount.  Jack and Jill are backing a cake at the same time.  They have all the objects need, eggs, cake mix, oil, and milk.  However, they need the only measuring cup to measure oil and milk and they reach for it at the same time.  This is a deadlock.

To un-deadlock this scenario, Jack can pour the eggs, and cake mix, while Jill measures and pours the oil and milk.  When Jill is done, Jack measures and pours the oil and milk and Jill pours the cake mix and eggs.  The same could be done with up to four people.  Where each person is a thread and the measuring cup is the resource.

Once we introduce a fifth or more person, the wait chain has unnecessarily long periods of wait for one thread to be able to begin to use a resource.

Reference

Parallel Programming: Resource Guard

A quick note:

“In the resource-guard-thread pattern, resource-guard threads represent resources.  Such threads are arranged in a virtual assembly line and connected by queues implemented as safe objects” (Saden, 2011)

By the definition above, the search and insertion threads have exclusive data to perform subdivision and legalization through an insertion point, not a queue, thus this is a resource-user thread pattern.

“As long as each resource user has exclusive access to no more than one resource at a time, the designer can usually choose between a solution with resource-guard threads and one with resource threads.  In this sense, the two patterns are dual.” (Saden, 2011)

A dual solution would look like: The search and insertion threads would return an index to a safe object, which would house all the data.  The data can then be a queue from in order to proceed with step two which is subdivision and legalization.

Reference