Sanden (2011) shows how to use synchronized objects (concurrency in Java), which is a “safe” object, that are protected by locks in critical synchronized methods. Through Java we can create threads by: (1) extend class Thread or (2) implement the interface Runnable. The latter defines the code of a thread under a method: void run ( ), and the thread completes its execution when it reaches the end of the method (which is essentially a subroutine in FORTRAN). Using the former you need the contractors public Thread ( ) and public Thread (Runnable runObject) along with methods like public start ( ).
According to Hortonworks (2013), MapReduce’s Process in a high level is: Input -> Map -> Shuffle and Sort -> Reduce -> Output.
Tasks: Mappers, create and process transactions on a data set filed away in a distributed system and places the wanted data on a map/aggregate with a certain key. Reducers will know what the key values are, and will take all the values stored in a similar map but in different nodes on a cluster (per the distributed system) from the mapper to reduce the amount of data that is relevant (Hortonworks, 2013). Reducers can work on different keys.
Example: A great example of this a MapReduce: Request, is to look at all CTU graduate students and sum up their current outstanding school loans per degree level. Thus, the final output from our example would be:
- Doctoral Students Current Outstanding School Loan Amount
- Master Students Current Outstanding School Loan Amount.
Now let’s assume that this ran in Hadoop, which can do MapReduce. Also, let’s assume that I could use 50 nodes (threads) to process this transaction request. The bad data that gets thrown out in the mapper phase would be the Undergraduate Students, given that it does not match the initial search criteria. The safe data will be those that are associated with Doctoral and Masters Students. So, during the mapping phase, the threads will assign Doctoral Students to one key, and Master students would get another key. Each node (thread) will use the same keys for their respective students, thus the keys are similar in all nodes (threads). The reducer uses these keys and the safe objects in them, to sum up, all of the current outstanding school loan amounts get processed under the correct group. Thus, once all nodes (threads) use the reducer part, we will have our two amounts:
- Doctoral Students Current Outstanding School Loan
- Masters Students Current Outstanding School Loan
Complexity could be added if we only wanted to look into graduate students that are currently active and non-active service members. Or they could be complicated by gender, profession, diversity signifiers, we can even map to the current industry.
- Hortonworks (2013). Introduction to MapReduce. Retrieved from https://www.youtube.com/watch?v=ht3dNvdNDzI
- Sandén, B. I. (2011-01-14). Design of Multithreaded Software: The Entity-Life Modeling Approach, 1st Edition. [VitalSource Bookshelf Online]. Retrieved from https://bookshelf.vitalsource.com/#/books/9781119143086/