Parallel Processing: Ada Tasking and State Modeling

Sample Code

1  protected Queue is
2          procedure Enqueue (elt : in Q_Element);     
3                                            -- Insert elt @ the tail end of the queue
4          function First_in_line return Q_Element;
5                                            -- Return the element at the head of the
6                                            -- queue, or no_elt.
7          procedure Dequeue;                  
8                                            -- If the queue is not empty, remove the
9                                            -- element at its head
10 end Queue;
11
12 task type Worker;
13
14 task body Worker is
15   elt : Q_Element;                        -- Element retrieved from queue
16   begin
17      while true loop
18           elt := Queue.First_in_line;     -- Get element at head of queue
19           if elt = no_elt then            -- Let the task loop until there
20              delay 0.1;                   -- is something in the queue
21           else
22               Process (elt);              -- Process elt. This takes some time
23               Queue.Dequeue;              -- Remove element from queue
24           end if;
25     end loop;
26 end Worker;

Comparing and Contrasting Ada’s Protected Function to Java’s (non)synchronized

Java’s safe objects are synchronized objects, which are usually contained in methods that are “synchronized” or “non-synchronized”.  For the non-synchronized methods, the safe objects within these methods are mostly considered to be read-only. Whereas in synchronized methods, safe objects can be written and read, but usually have wait loops at the beginning with a certain wait condition (i.e. while (condition) {wait( );}).  This wait loop forces the thread to wait until when the condition becomes true, in order to prevent multiple threads from editing the same safe object at the same time.   Usually, wait loops located elsewhere in the Java synchronized methods are (uncommon).

Safe objects in Ada are protected objects are in the “protected procedure” or a “protected function”.  Unlike the non-synchronized method in Java (where it should be read-only), the protected function in Ada is read-only.  Java’s synchronized version has a wait function that stalls a thread, in the Ada’s protected entry (i.e. entry x ( ) when condition is …) is only entered when the condition is true, thus you can have multiple entries where data could manipulate in multiple ways similar to an if-else function.  For example, entry x ( ) when condition is true and another one right after could be entry x ( ) when condition is false.  Though, this can be expanded to n different conditions, where.  With these entries, the barrier is always tested first compared to wait.  However, we could requeue (not a subprogram call-thus the tread doesn’t return to the point after the requeue call) on another entry, but it’s uncommon to have them located elsewhere in the program.

Describe what worker does

Workers must get the element (elt) from the first line item in the queue and then loop through the task until there is an element in the queue for which the worker can process the element.  The element is stored in the elt array.  If there is no element delay the process by 0.1 units of time and keep looping.  Once the worker has obtained an element, they can begin processing the element, then we can remove the element from the current queue.

Adapt Queue and work for multiple instances of worker can process data elements

In order for one worker to process each element, we must first do is change “task body worker is” to “protected body worker is” on line 14.  Change “elt: Q_Element;” to procedure get (elt: Q_Element) is in order to get the element from the queue on line 15.

Once there is an element in the first inline of the queue, the worker must first dequeue it in order to process it, this should protect the data and allow for another worker to work on the next first inline element.  Thus, I would be proposing to switch lines 22 to 23 and 23 to 22.  If this isn’t preferred, we can create a new array called work_in_progress where we create a get, put, and remove procedure for this array, which should go before line 22 and then follow my proposed sequence.  This will allow the worker to say I got this element, I will work on it, and if all is successful we don’t need to re-add the element back into the queue and delete it from the work_in_progress array, but I don’t want to hold up other workers from working on other elements.  However, if the worker says I failed to process this array, please return it back into the queue and add it into the elt array again for another worker to process it. To avoid an endless loop, if an element cannot be processed by three different workers we can create another array to store non-compliant elements in and call Dequeue on the main elt array.   However, we can simply and only switch lines 22 and 23 with each other if and only if this change shows that processing the element could never fail.

In Queue line 2 must have entries, for instance “entry Enqueue (elt : in Q_Element) when count >= 0 is … end Enqueue”, to allow for waiting until there are actually element to be added from the array elt.  Doing entries in Queue, would be eliminating the need for the while true loop to search if there is an elt in the first of the line in lines 19, 20, & 21.   Thus, we are making our conditions check first rather than later on in the code. Similarly, we can do a change to line 7 to “entry Dequeue (elt : in Q_Element) when count > 0 is … end Dequeue”, to allow for waiting until there is actually element for deletion from the array elt.  Though this is more for an efficiency issue and allows for the worker to say I got this element and it’s ok to delete from the queue. With all these changes we must make sure that on line 18 we must make sure we are pulling an element from an elt array.

The loop where instances of workers wait is crude

Line 18 and 23 can be combined with Queue.Pop (elt) above the if-statement in on line 18, to avoid the crude loop where threads of workers wait for something in the queue.  The pop allows for no “busy waiting”.  But, we must create a procedure in Queue called procedure pop which returns a query on the first line item on the array elt and removes it.

 

Parallel Programming: State diagram of Maekawa’s voting algorithm

Sandén (2011) defines state diagrams as a way to show the possible states an object could be on.  He also defines, that events are action verbs that occur on an arrow between two events (if an action doesn’t change the state it can be listed in the state).  Whereas an action can have conditions on them.

Thus, a state diagram shows the transition from state to state as events occur.  An event usually has many occurrences, and they are instantaneous.  Finally, a super-state can encompass multiple states (Sandén, 2011).  An activity is an operation that takes time, and it has the keyword “do /”

The goal of this post was to make a state diagram of Maekawa’s voting algorithm on the “Maekawa’s algorithm” within the “Distributed mutual exclusion” set. This can be done in various ways. One option is the following 5 states:

  • Released and not voted
  • Released and voted
  • Wanted and not voted
  • Wanted and voted
  • Held and voted (for self)

Events are:

  • request_received, etc., for messages arriving from other nodes
  • acquire for when the local node wants the lock
  • release for when the local node gives up the lock.

A possible solution is shown below:

statediagram

Reference

Parallel Programming: Msynch 

pseudo code

class Msynch  {
     int replies;
     int currentState = 1;
        synchronized void acquire ( ) {
 // Called by thread wanting access to a critical section
              while (currentState != 1) {wait ( );}
              replies = 0; currentState = 2;
              //         
              // (Here, 5 messages are sent)         
              //         
        while (replies < 5) {wait ( );} // Await 5 replies         
              currentState = 3;    
        }   
        synchronized void replyReceived ( ) {
        // Called by communication thread when reply is received         
              replies++;         
              notifyAll ( );
        }    
        synchronized void release ( ) {
        // Called by a thread releasing the critical section    
              currentState = 1;        
              notifyAll ( );   
        }
}

 

class Msynch1  {    
        int replies;    
        int currentState = 1;      
        synchronized void acquire ( ) {
       // Called by thread wanting access to a critical section         
              while (currentState != 1) {yield ( );}         
              replies = 0; currentState = 2;         
              //         
              // (Here, 5 messages are sent)         
              //         
              if (replies < 5) {wait ( );} // Await 5 replies         
              currentState = 3;    
        }     
        synchronized void replyReceived ( ) {
        // Called by communication thread when reply is received
              replies++;
        }       
        synchronized void release ( ) {
        // Called by a thread releasing the critical section    
              currentState = 1;        
              notifyAll ( );   
        } 
}

From the two sets of code above, I have highlighted three differences (three synchronization related errors in Msynch1) in three different colors, as identified by a line-by-line code comparison.  The reason why

  1. notifyAll ( ); is missing, therefore this code won’t work because without this line of code we will not be able to unblock any other thread. Thus, this error will not allow for replyReceived ( ) to be synchronized. This missing part of the code should activate all threads in the wait set to allow them to compete with each other for the lock based on priorities.
  2. {yield ( );} won’t work is because it won’t block until queue not full like in {wait ( );}. Thus, the wait( ) function, which aids in releasing the threads lock is needed. When a thread calls wait( ) it will unlock the object.  After returning from wait( ) call, it will re-lock the object.
  3. if won’t work is because the wait( ) call should be in a “wait loop”: while (condition {wait( ); } as shown in Msynch. Without it, the thread cannot retest the condition after it returns from the wait ( ) call. With the if-statement, the condition is only tested once, unlike with the while-statement.

An additional fourth error was identified after reviewing the notes from class in Java threads shortened_1.

  1. Best practices are not followed in either Msynch and Msynch1 where the wait loop must actually reside in a try block, as follows: while condition) try {wait( );)}.

When the first thread (appThread) in Msynch calls acquire( ) the first time, it currentState = 1, so it enters into the wait loop.  Thus, its replies are initialized at zero and currentState =1.  This thread sends out 5 messages to other threads (calling on replyReceived( )).  As long as replies are less than five it stays locked and the currentState remains equal to two.  Once it has received 5 replies from any five threads, the code will unlock and it increments the currentState by one, so now it is equaled to three.

As the initial thread  (appThread) running in aquire( ) calls out other threads (commThreads) for at least 5 messages, these other threads do so by calling replyRecieved( ).  This code increments the number of replies by one each time a thread calls it and unlocks replies so it can increment it by one and then locks it so that another thread calling replyReceived( ) can increment it by one.  Thus, once five threads, any five threads can successfully run replyReceived( ), then we can increment currentState =3 as the lock on currentState is removed.

Words to define

Semaphores: In a coded programmed, where there exists a code between a lock and unlock instruction, it becomes known as a critical section.  That critical section can only be accessed by one thread at a time.  If a thread sees a semaphore that is open, the thread will close it in one uninterruptible and automatic operation, and if that was successful the thread can confidently proceed into the critical section.  When the thread completes its task in the critical section, it will reopen the semaphore. This, changes of state in the semaphore are important, because if a thread sees that the semaphore is closed that thread stalls.  Thus, this ensures that only one thread at a time can work on the critical section of the code (Oracle, 2015).

Test-and-set (TSET/TAND/TSL): It is a special set of hardware instructions, which semaphores operate in order to have both read access and conditional write access by testing a bit of the code (Sanden, 2011).  This will eventually allow a thread to eventually work on the critical section of the code.   Essentially, a semaphore is open if the bit is 1 and closed if the bit is 0 (Oracle, 2015).  If the bit is 1, the test set of instructions will attempt to close the semaphore by changing and setting the bit to 0.  This TSET is conducted automatically and cannot be interrupted.

Preemption: Multi-threading can be set with or without each thread having an assigned priority preemptively.  If a priority is set amongst the threads, a thread can be suspended (forestalled) by a higher-priority thread at any one time.  This becomes important when the ratio of thread to computer cores is high (like 10 threads on a uniprocessor).  Preemption becomes obsolete when the number of threads is less than the number of cores provided for the code to run (Sanden, 2011).

References

Parallel Programming: Practical examples of a thread

Here is a simple problem: A boy and a girl toss a ball back and forth to each other. Assume that the boy is one thread (node) and the girl is another thread, and b is data.

Boy = m

Girl = f

Ball = b

  • m has b
    1. m throws b –> f catches b
  • f has b
    1. f throws b –> m catches b

Assuming we could drop the ball, and holding everything else constant.

  • m has b
    1. m throws b –> f catches b
    2. m throws b –> f drops b
      1. f picks up the dropped b
  • f has b
    1. f throws b –> m catches b
    2. f throws b –> m drops b
      1. m picks up the dropped b

 

Suppose you add a third player.

Boy = m

Girl = f

Ball = b

3rd player = x

  • m has b
    1. m throws b –> f catches b
    2. m throws b –> x catches b
  • f has b
    1. f throws b –> m catches b
    2. f throws b –> x catches b
  • x has b
    1. x throws b –> m catches b
    2. x throws b –> f catches b

Assuming we could drop the ball, and holding everything else constant.

  • m has b
    1. m throws b –> f catches b
    2. m throws b –> f drops b
      1. f picks up the dropped b
    3. m throws b –> x catches b
    4. m throws b –> x drops b
      1. x picks up the drooped b
  • f has b
    1. f throws b –> m catches b
    2. f throws b –> m drops b
      1. m picks up the dropped b
    3. f throws b –> x catches b
    4. f throws b –> x drops b
      1. x picks up the dropped b
  • x has b
    1. x throws b –> m catches b
    2. x throws b –> m drops b
      1. m picks up the dropped b
    3. x throws b –> f catches b
    4. x throws b –> f drops b
      1. f picks up the dropped b

Will that change the thread models? What if the throwing pattern is not static; that is, the boy can throw to the girl or to the third player, and so forth? 

In this example: Yes, there is an additional thread that gets added, because each player is a tread that can catch or drop a ball.  Each player is a thread on its own, transferring data ‘b’ amongst them and throwing the ‘b’ is locking the data before transferring and catching ‘b’ is unlocking the data.  After the ball is dropped (maybe calculated randomly), the player with the ball now has to pick it up, which can be equivalent to analyze the data based on a certain condition that is met like account balance is < 500 or else.  The model changes with the additional player because each person has a choice to make now on which person should receive the ball next, which is not present in the first model when there were two threads.  If there exists a static toss like

  • f –> m –> x –> f

Then the model doesn’t change, because there is no choice now.

Parallel Programming: Logical Clocks

In a distributed system nodes can talk (cooperate) to each other and coordinate their systems.  However, the different nodes can execute concurrently, there is no global clock in which all nodes function on, and some of these nodes can fail independently (Sandén, 2011).  Since nodes talk to each other, we must study them as they interact with each other.  Thus, a need to use logical clocks (because we don’t have global clocks) which show that distances in time are lost (Sandén, 2011). In logical clocks: all nodes agree on an order of events, partially (where something can happen before another event).  They only describe the order of events, not with respect to time.  If nodes are completely disjoint in a logical clock, then a node can fail independently. This is one way to visualize the complex nature of nodes.

The following is an example of a logical clock:

Capture

Reference

Parallel Programming: Logical Clocks

Per Sandén (2011), in a distributed system nodes can talk (cooperate) to each other and coordinate their systems.  However, the different nodes can execute concurrently, there is no global clock in which all nodes function on, and some of these nodes can fail independently.  Since nodes talk to each other, we must study them as they interact with each other.  Thus, there is a need to use logical clocks (because we don’t have global clocks) which show that distances in time that is lost.

In logical clocks: all nodes agree on an order of events, partially (where something can happen before another event).  They only describe the order of events, not with respect to time.  If nodes are completely disjoint in a logical clock, then a node can fail independently.

Reference

Parallel Programming: Deadlocks

Deadlock occurs while you are getting an additional resource while holding another or more resource, especially when it creates a circularity  (Sandén, 2011).

Sandén (2011), stated that to prevent deadlocks, resources need to be controlled.  One should do a wait chain diagram to make sure your design can help prevent a deadlock.  Especially when there is a mix of transactions occurring.  It is also best to know how many threads/entities are needed to be called on simultaneously before a deadlock can occur, especially true when you have multiple threads calling on shared resources.

Thus, we should manage the resources to ensure no circularity, limit the number of entities to just below the threshold to cause a deadlock, eliminate wait.

There are many in real life like the one shown in Sandén (2011) with each of 4 cars halfway into an intersection. The following is a real-life suggested a deadlock scenario:  

There is one set of measuring cup (1/2 a cup).  There are no other ways to measure this amount.  Jack and Jill are backing a cake at the same time.  They have all the objects need, eggs, cake mix, oil, and milk.  However, they need the only measuring cup to measure oil and milk and they reach for it at the same time.  This is a deadlock.

To un-deadlock this scenario, Jack can pour the eggs, and cake mix, while Jill measures and pours the oil and milk.  When Jill is done, Jack measures and pours the oil and milk and Jill pours the cake mix and eggs.  The same could be done with up to four people.  Where each person is a thread and the measuring cup is the resource.

Once we introduce a fifth or more person, the wait chain has unnecessarily long periods of wait for one thread to be able to begin to use a resource.

Reference

Parallel Programming: Safe objects and shared objects

Shared objects that force mutual exclusion on threads that try to call it are “safe objects”.  The mutual exclusion on threads/operations can be relaxed when threads don’t change any data, this may be a read of the data in the “safe object” (Sanden, 2011).  In the examples for this course, we have dealt with such Java “safe objects” which are called synchronized.

  1. A safe object in a jukebox represents the CD player. Customer threads call an operation to queue up to play a song.
    • Input into Song Queue: data can be added by multiple people on multiple devices that only have one set of CDs, and can only play one song from a CD.  Data is stored in an array.
    • Change Song order in the Queue: The Song Queue can be prioritized based on predefined parameters, like the DJ, can have ultimate priority to adjust the order and make their own request, but customers have a less priority.  If there is a tiered pay structure then we can see a higher priority placed on a Song on the Song Queue for those willing to pay more. This means that the data stored in the array can be rearranged depending on the thread’s priority.
    • Remove Song from Queue: after the song is done playing, the song’s name is removed from the Song Queue position number one. This will force the array values to shift up by one.
    • Read Song Queue: though not needed to be mutually exclusive, it is still an operation that is needed in order to find the next song to play.  This shouldn’t change any data in the array, it is only reading the song in position 0 of the array.
  1. In a different design, the safe object in a jukebox represents a queue of song requests. Customer threads call an operation to add a song request to the queue. A CD thread calls a different operation to retrieve the next request.
    • All of those that are required for a song queue in the previous example could be applied to this example or a subset.  An example of a sufficient subset would be {Input into Song Queue, Remove Song from Queue, Read Song Queue}
    • Locate the next CD request: Based on the data in Input into Song Queue, pull, locate the CD containing the next Song to be played.
    • Play Song on CD: One song from one CD can be played at any time.
    • Transition Song on CD: As one song ends, fade out the noise exponentially in the last 10 seconds and begin the next song on the Song Queue by increasing the song volume exponentially in the first 5 seconds to normal volume.
    • Put away the CD from the last song played: places the cd back into its predetermined location for future use. Once completed it will call on the Locate next CD Request Safe Operation.

References:

Parallel Programming: Synchronized Objects

Sanden (2011) shows how to use synchronized objects (concurrency in Java), which is a “safe” object, that are protected by locks in critical synchronized methods.  Through Java we can create threads by: (1) extend class Thread or (2) implement the interface Runnable.  The latter defines the code of a thread under a method: void run ( ), and the thread completes its execution when it reaches the end of the method (which is essentially a subroutine in FORTRAN).  Using the former you need the contractors public Thread ( ) and public Thread (Runnable runObject) along with methods like public start ( ).

Additional Examples:

MapReduce

According to Hortonworks (2013), MapReduce’s Process in a high level is: Input -> Map -> Shuffle and Sort -> Reduce -> Output.

Tasks:  Mappers, create and process transactions on a data set filed away in a distributed system and places the wanted data on a map/aggregate with a certain key.  Reducers will know what the key values are, and will take all the values stored in a similar map but in different nodes on a cluster (per the distributed system) from the mapper to reduce the amount of data that is relevant (Hortonworks, 2013). Reducers can work on different keys.

Example: A great example of this a MapReduce: Request, is to look at all CTU graduate students and sum up their current outstanding school loans per degree level.  Thus, the final output from our example would be:

  • Doctoral Students Current Outstanding School Loan Amount
  • Master Students Current Outstanding School Loan Amount.

Now let’s assume that this ran in Hadoop, which can do MapReduce.   Also, let’s assume that I could use 50 nodes (threads) to process this transaction request.  The bad data that gets thrown out in the mapper phase would be the Undergraduate Students, given that it does not match the initial search criteria.  The safe data will be those that are associated with Doctoral and Masters Students.  So, during the mapping phase, the threads will assign Doctoral Students to one key, and Master students would get another key.  Each node (thread) will use the same keys for their respective students, thus the keys are similar in all nodes (threads).  The reducer uses these keys and the safe objects in them, to sum up, all of the current outstanding school loan amounts get processed under the correct group.  Thus, once all nodes (threads) use the reducer part, we will have our two amounts:

  • Doctoral Students Current Outstanding School Loan
  • Masters Students Current Outstanding School Loan

Complexity could be added if we only wanted to look into graduate students that are currently active and non-active service members.  Or they could be complicated by gender, profession, diversity signifiers, we can even map to the current industry.

Resources

Data Tools: Hadoop Vs Spark

 

Apache Spark

Apache Spark started from a working group inside and outside of UC Berkley, in search of an open-sourced, multi-pass algorithm batch processing model of MapReduce (Zaharia et al., 2012). Spark can have applications written in Java, Scala, Python, R, and interfaces with SQL, which increases ease of use (Spark, n.d.; Zaharia et al., 2012).

Essentially, Spark is a high-performance computing cluster framework, but it doesn’t have its distributed file system and thus uses Hadoop Distributed File System (HDFS, HBase) as in input and output (Gu & Li, 2013).  Not only can it access data from HDFS, HBase, it can also access data from Cassandra, Hive, Tachyon, and any other Hadoop data source (Spark, n.d.).  However, Spark uses its data structure called Resilient Distribution Datasets (RDD) which cache’s data and is a read-only operation to improve its processing time as long as there is enough memory for it in all the nodes of a cluster (Gu & Li, 2013; Zaharia et al., 2012). Spark tries to avoid data reloading from the disk that is why it stores its data in the node’s cache system, for initial and intermediate results (Gu & Li, 2013).

Machines in the cluster can be rebuilt if lost, thus making the RDDs are fault-tolerant without requiring replication (Gu &LI, 2013; Zaharia et al., 2012).  Each RDD is tracked in a lineage graph, and reruns the operations if data becomes lost, therefore reconstructing data, even if all the nodes running spark were to fail (Zaharia et al., 2012).

Hadoop

Hadoop is Java-based system that allows for manipulation and calculations to be done by calling on MapReduce function on its HDFS system (Hortonworks, 2013; IBM, n.d.).

HFDS big data is broken up into smaller blocks across different locations, no matter the type or amount of data, each of these blocs can be still located, which can be aggregated like a set of Legos throughout a distributed database system (IBM, n.d.; Minelli, Chambers, & Dhiraj, 2013). Data blocks are distributed across multiple servers.  This block system provides an easy way to scale up or down the data needs of the company and allows for MapReduce to do it tasks on the smaller sets of the data for faster processing (IBM, n.d). IBM (n.d.) boasts that the data blocks in the HFDS are small enough that they can be easily duplicated (for disaster recovery purposes) in two different servers (or more, depending on your data needs), offering fault tolerance as well. Therefore, IBM’s (n.d.) MapReduce functions use the HFDS to run its procedures on the server in which the data is stored, where data is stored in a memory, not in cache and allow for continuous service.

MapReduce contains two job types that work in parallel on distributed systems: (1) Mappers which creates & processes transactions on the system by mapping/aggregating data by key values, and (2) Reducers which know what that key value is, will take all those values stored in a map and reduce the data to what is relevant (Hortonworks, 2013; Sathupadi, 2010). Reducers can work on different keys, and when huge amounts of data are entered into MapReduce, then the Mapper maps the data, where the data is then shuffled and sorted before it is reduced (Hortonworks, 2013).  Once the data is reduced, the researcher gets the output that they sought.

Significant Differences between Hadoop and Apache Spark              

Spark is faster than Hadoop in iterative operations by 25x-40x for really small datasets, 3x-5x for relatively large datasets, but Spark is more memory intensive, and speed advantage disappears when available memory goes down to zero with really large datasets (Gu & Li, 2013).  Apache Spark, on their website, boasts that they can run programs 100X faster than Hadoop’s MapReduce in Memory (Spark, n.d.). Spark outperforms Hadoop by 10x on iterative machine learning jobs (Gu & Li, 2013). Also, Spark runs 10x faster than Hadoop on disk memory (Spark, n.d.).

Gu and Li (2013), recommend that if speed to the solution is not an issue, but memory is, then Spark shouldn’t be prioritized over Hadoop; however, if speed to the solution is critical and the job is iterative Spark should be prioritized.

References

  • Gu, L., & Li, H. (2013). Memory or time: Performance evaluation for iterative operation on hadoop and spark. InHigh Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), 2013 IEEE 10th International Conference on (pp. 721-727). IEEE.