Case Study: Open source Cloud Computing Tools: A case study with a weather application
Focus on: Hadoop V0.20, which has a Platform as a Service cloud solution, which have parallel processing capabilities
Cluster size: 6 nodes, with Hadoop, Eucalyptus, and Django-Python clouds interfaces installed
Variables: Managing historical average temperature, rainfall, humidity data, and weather conditions per latitude and longitude across time and mapping it on top of a Google’s Map user interface
Data Source: Yahoo! Weather Page
Results/Benefits to the Industry: The Hadoop platform has been evaluated by ten different criteria and compared to Eucalyptus and Django-Python, from a scale of 0-3, where 0 “indicates [a] lack of adequate feature support” and 3 “indicates that the particular tool provides [an] adequate feature to fulfill the criterion.”
Table 1: The criterion matrix and numerical scores have been adopted from Greer, Rodriguez-Martinez, and Seguel (2010) results.
|Management Tools||Tools to deploy, configure, and maintain the system||0|
|Development Tools||Tools to build new applications or features||3|
|Node Extensibility||Ability to add new nodes without re-initialization||3|
|Use of Standards||Use of TCP/IP, SSH, etc.||3|
|Security||Built-in security as oppose to use of 3rd party patches.||3|
|Reliability||Resilience to failures||3|
|Learning Curve||Time to learn technology||2|
|Scalability||Capacity to grow without degrading performance||–|
|Cost of Ownership||Investments needed for usage||2|
|Support||Availability of 3rd party support||3|
Eucalyptus scored 18, and Django-Python scored 20, therefore making Hadoop a better solution for this case study. They study mentioned that:
- Management tools: configuration was done by hand with XML and text and not graphical user interface
- Development tools: Eclipse plug-in aids in debugging Hadoop applications
- Node Extensibility: Hadoop can accept new nodes with no interruption in service
- Use of standards: uses TCP/IP, SSH, SQL, JDK 1.6 (Java Standard), Python V2.6, and Apache tools
- Security: password protected user-accounts and encryption
- Reliability: Fault-tolerance is presented, and the user is shielded from the effects
- Learning curve: It is not intuitive and required some experimentation after practicing from online tutorials
- Scalability: not assessed due to the limits of the study (6-nodes is not enough)
- Cost of Ownership: To be effective Hadoop needs a cluster, even if they are cheap machines
- Support: there is a third party support for Hadoop
The authors talk about how Hadoop fails in providing a real-time response, and that part of the batch code should include email requests to be sent out at the start, key points of the iteration, or even at the end of the job when the output is ready. The speed of Hadoop is slower to the other two solutions that were evaluated, but the fault tolerance features make up for it. For set-up and configuration, Hadoop is simple to use.
Use in the most ample manner?
Hadoop was not fully used in my opinion and the opinion of the authors because they stated that they could not scale their research because the study was limited to a 6-node cluster. Hadoop is built for big data sets from various sources, formats, etc. to be ingested and processed to help deliver data-driven insights and the features of scalability that address this point were not addressed adequately in this study.
- Greer, M., Rodriguez-Martinez, M., & Seguel, J. (2010). Open Source Cloud Computing Tools: A Case Study with a Weather Application.Florida: IEEE Open Source Cloud Computing.