Adv DBs: Data Abstractions

Data Abstraction

Text can be abstracted for information and knowledge through either hard clustering where a word has only one connection or soft clustering where a word can have multiple connections to other words (Kulkarni & Kinariwala, 2013).  Clustering, in general, is grouping things together with similar characteristics.  It is hard to do hard clustering with sentences of a paragraph or even prose because they are interconnected with the sentences above and below it.  Also, clusters within prose can overlap with each other.  Thus, it is proposed that soft clustering should be used for the analysis of sentences within the prose.  The method proposed in Kulkarni & Kinariwala is Page Rank, in order to show the importance of a sentence(s) within a document (thus helping summarize a document). The weakness of this paper lies with the fact that they propose an idea without testing it.  They didn’t develop any code or analyzed any data set to say whether their hypothesis on page rank was correct.  Thus, this was a wonderful thought experiment.  The strength if proven correct with other studies is that it maps out the limitations & strengths of hard and soft clustering in data mining within prose and between the prose of a similar nature.

Management issues in systems development

Information is seen as of great value to humanitarian efforts to accomplish their missions.  In Van de Walle & Comes (2015), they state that the United Nations had delivered methods for humanitarian Information Management revolving around checking, sharing, and use of the data.  Checking data revolves around reliability and verifiability, sharing data revolves around interoperability (data formats), accessibility, and sustainability, whereas the use of data deals with timeliness and relevance.  After interviewing humanitarians in two different disaster scenarios, Syria and Typhoon Haiyan for about 1-1.5 hours, they were able to conclude that standard processes can be followed for natural disasters like a landfalling hurricane.  Standard processes lent itself to inflexibility and not meeting all the intricate needs. In a more complicated relief effort like in Syria, confidentiality and unreliable data sources (sometimes coming in the format like an old spy movie, under the table, etc.), affected the entire process.  Finally, this small sample size of two events and humanitarian people interviewed suggest that further research is definitely needed before generalizations in developing systems of Information Management between natural disasters and geopolitical disasters can be made. The main strength of this paper is the analysis of breaking down information management of disasters with respect to standards imposed by the UN.  It also illustrates that information management is end-to-end.  My research hopes to help improve pre-disaster conditions and their research covers aid for post-disaster.  The same disaster, Hurricane landfalling, has a change in key information that is needed to carry out their respective tasks.  In other words, hurricane wind speeds are no longer needed after it passed over a city and left a wake of destruction, and the death toll is not important before the hurricane makes landfall.   But, we need wind speeds to improve forecasts and mitigate death tolls, and we need the current death toll, to make sure we can keep it from rising after the disaster has struck.

References

  • Van de Walle, B. & Comes, T. (2015) On the Nature of Information Management in Complex and Natural Disasters. Procedia Engineering, Pages 403-411.
  • Kulkarni, B. M., & Kinariwala, S. A. (2013). Review on Fuzzy Approach to Sentence Level Text Clustering. International Journal of Scientific Research and Education. Pages 3845-3850.