Adv Topics: The Internet of Things and Web 4.0

The IoT is the explosion of device/sensor data, which is growing the amount of structured data exponentially with tremendous opportunities (Jaffe, 2014; Power, 2015). Both Atzori (2010) and Patel (2013) classified the Web 4.0 as the symbiotic web, where data interactions occur between humans and smart devices, the internet of things (IoT). These smart devices can be wired to the internet or connected via wireless sensors through enhanced communication protocols (Atzori, 2010). Thus, these smart devices would have read and write concurrency with humans, where the largest potential of web 4.0 has these smart devices analyze data online and begin to migrate the online world into the reality (Patel, 2013). Besides interacting with the internet and the real world, the internet of things smart devices would be able to interact with each other (Atzori, 2010). Sakr (2014) stated that this web ecosystem is built off of four key items:

  • Data devices where data is gathered from multiple sources that generate the data
  • Data collectors are devices or people that collect data
  • Data aggregation from the IoT, people, Radio Frequency Identification tags, etc.
  • Data users and data buyers are people that derive value out of the data

Some of the potential benefits of IoT are: assisted living, e-health, enhanced learning, government, retail, financial, automation, industrial manufacturing, logistics, business/process management, and intelligent transport (Sakr, 2014; Atzori, 2010). Atzori (2010) suggests that there are three different definitions or vision on the use of IoT, which is based on the device’s orientation:

  • Things oriented which are designed for status and traceability of objects via RFID or similar technology
  • Internet-oriented which are designed for light internet protocol where the device is addressable and reachable via the internet
  • Semantic-oriented where devices aid in creating reasoning over the data that is generated by these devices by exploiting models

Some of IoT can fall on one, two, or all three definitions or visions for IoT use.

Performance Bottlenecks for IoT

In 2016, IoT has two main issues, if it is left on its own and it is not tied to anything else (Jaffe, 2014; Newman, 2016):

  • The devices cannot deal with the massive amounts of data generated and collected
  • The devices cannot learn from the data it generates and receives

Thus, artificial intelligence (AI) should be able to store and mine all the data that is gathered from a wide range of sensors to give it meaning and value (Canton, 2016; Jaffe, 2014). AI would bring out the potential of IoT through quickly and naturally collect, analyzing, organizing, and feeding valuable data to key stakeholders, transforming the field into the Internet of Learning-Things (IoLT) from the standard IoT (Jaffe, 2014; Newman, 2016). However, this would mean a change in the infrastructure of the web to handle IoLT or IoT. Thus, Atzori (2010) listed some of the potential performance bottlenecks for IoT on a network level:

  • The vast number of internet oriented devices that will be taking up the last few IPv4 addresses, thus there is a need to move to IPv6 to support all the devices that will come online soon. This is just one version of the indexing problem.
  • Things oriented and internet oriented devices could spend a time in sleep mode, which is not typical for current devices using the existing IP networks.
  • IoT devices when connecting to the internet produce smaller packets of data at a higher frequency than current devices.
  • Each of the devices would have to use a common interface and standard protocols as other devices, which can quickly flood the network and increase the complexity of middleware software layer design.
  • IoT is vastly various objects, where each device with their function and has its way of communicating. There is a need to create a level of abstraction to homogenate data transfer and access of data through a standard process.

Proposed solutions would be to use NoSQL (Not only Structured Query Language) databases to help with collection, storage, and analysis of IoT data that is heterogeneous, lacking a common interface with standard protocols and can deal with data of various sizes. This can solve one aspect of the indexing problem of IoT. NoSQL databases are databases that are used to store data in non-relational databases i.e. graphical, document store, column-oriented, key-value, and object-oriented databases (Sadalage & Fowler, 2012; Services, 2015).

  • Document stores use a key/value pair that could store data in JSON, BSON, or XML
  • Graphical databases are use networks diagrams to show the relationship between items in a graphical format
  • Column-oriented databases are perfect for sparse datasets, where data is grouped together in columns rather than rows

Retail is currently using thing oriented RFID for inventory tracking and in-store foot traffic if installed on shopping carts to be used for understanding customer wants (Mitchell, n.d.). Thus, Mitchell (n.d.) suggested that the use of video cameras and mobile device Wi-Fi traffic could help identify if the customer wanted an item or a group of items by seeking hotspots of dwelling time, so that store managers can optimize the store layouts to increase flow and higher revenue. However, these retailers must be considering the added data sources and have the supporting infrastructure to avoid performance bottlenecks to get to reap the rewards of utilizing IoT to generate data-driven decisions.

Resources:

  • Atzori, L., Antonio Iera, A., & Morabito, G. (2010). The Internet of things: A survey. Computer Networks, 54(2). 787–2,805

Adv Topics: The architecture of the Internet

Introduction

Kelly (2007) stated that there are 100 billion clicks per day and 55 trillion links. But, the internet is very pervasive to the human kind. In 2012, 2.27 billion people used the internet. However, globally 1.7 billion people are actively engaged with the internet (Li, 2010; Sakr, 2014). An actively engaged user of the internet would be using social media, where they can watch videos, share status updates, and curated content, with a goal of actively engaging other users (Li, 2010). Cloud-based services rely on the internet to provide services like distributed processing, large scale data storage, support distributed transactions, data manipulation, etc. (Brookshear & Brylow, 2014; Sakr, 2014). Thus, the internet plays a vital role in enabling big data analysis and social engagement, given its humble beginning for storing research projects between multiple research institutions in the 1960s (Brookshear & Brylow, 2014).

Internet Architecture

The internet has evolved into a socio-technical system. This evolution has come about in five distinct stages:

  • Web 1.0: Created by Tim Berners-Lee in the 1980s, where it was originally defined as a way of connecting static read-only information hosted across multiple computational components primarily for companies (Patel, 2013). The internet was a network of computational networks and where communication that is done throughout the internet is governed by the TCP/IP protocol (Brookshear & Brylow, 2014). The internet’s architecture relies on three components: Uniform Resource Identifier (URI), Hyper Text Transfer Protocol (HTTP), and HyperText Markup language (HTML) (Jacobs & Walsh, 2004).
    • A URI is a unique address that is common across all corners of the web and agreed upon convention by the internet community, which is made up of characters and numerical values that allow an end-user of the internet to locate and retrieve information (Brookshear & Brylow, 2014; Jacobs & Walsh, 2004; Patel 2013). This unique addressing convention allows the internet to store massive amounts of information without URI collision, which is when two URI hold the same value, which can confound search engines (Jacobs & Walsh, 2004). An example of a URI could be www.skyhernandez.com.
    • For a computer to locate this URI’s information which is hosted on a web server, a web browser like Goggle chrome, Microsoft edge, or Firefox uses the HTTP protocol to retrieve the information to be displayed by the browser (Brookshear & Brylow, 2014). The web browser would send an HTTP GET request for www.skyhernandez.com via the TCP/IP port 80, and as long as the browser is given access to the information stored in the URI, then the web server sends an HTTP POST or PUT to give the web browser the sought after information (Jacobs & Walsh, 2004). In a sense, HTTP protocols play an important role in information access management.
    • Once the HTTP POST is sent from the web server consisting the information stored in www.skyhernandez.com, the web browser must now convert the information and display it on the computer screen. HTML is a <tag> based notational code that helps a browser read the information sent by the web server and display it in a simple or rich data format (Brookshear & Brylow, 2014; Patel 2013). The <html /> tags are used by the HTTP protocol to identify the content type and encoding style for displaying the information of the web server onto the browser (Jacobs & Walsh, 2004).   The <head /> and <title /> tags are used as metadata about the document. Whereas the <body /> tag contains the information of the URI, and <a /> tags helps you link to other relevant URIs easily.
  • Web 2.0: Changed the state of the internet from a read-only state to a read/write state and had grown communities that hold a common interest (Patel, 2013). This version of the web led to more social interaction, giving people and content importance on the web, due to the introduction of social media tools through the introduction of web applications (Li, 2010; Patel, 2013; Sakr, 2014). Web applications can include event-driven and object-oriented programming that are designed to handle concurrent activities for multiple users and had a graphical user interface (Connolly & Begg, 2014; Sandén, 2011). Key technologies include:
    • Weblogs (Blogs), Video logs (vlogs), and audio logs (podcasts) are all content in various styles that are published chronologically in descending time order, which can be tagged with keywords for categorization and available when people need them (Li, 2010; Patel, 2013). These logs can be used for fact-finding when content is stored chronologically (Connolly& Begg, 2014).
    • Really Simple Syndication (RSS) is a web and data feed format that summarizes data via producing an open standard format, XML file (Patel, 2013; Services, 2015). This is regularly used for data collection (Services, 2015).
    • Wikis are editable and expandable by those who have access to the data, and information can be restored or rolled back (Patel, 2013). Wiki editors ensure data quality and encourage participation from the community in providing meaningful content to the community (Li, 2010).
  • Web 3.0: This is the state the web at 2017. Involves the semantic web that is driven by data integration through the uses of metadata (Patel, 2013). This version of the web supports a worldwide database with static HTML documents, dynamically rendered data, next standard HTML (HTML5), and links between documents with hopes of creating an interconnected and interrelated openly accessible world data such that tagged micro-content can be easily discoverable through search engines (Connolly & Begg, 2014; Patel, 2013). This new version of HTML, HTML5 can handle multimedia and graphical content and introduces new tags like <section />, <article />, <nav />, and <header />, which are great for semantic content (Connolly & Begg, 2014). Also, end-users are beginning to build dynamic web applications for others to interact with (Patel, 2013). Key technologies include:
    • Extensible Markup Language is a tag based metalanguage (Patel, 2013). These tags not limited to the tags defined by other people and can be created at the pace of the author rather than waiting for a standard body to approve a tag structure (UK Web Design Company, n.d.).
    • Resource Description Framework (RDF) is based on URI triples <subject, predicate, object>, which helps describes data properties and classes (Connolley & Begg, 2014; Patel, 2013). RDF is usually represented at the top of the data set with a @prefix (Connolly & Begg, 2014). For instance,

@prefix: <https://skyhernandez.com&gt; s: Author <https://skyhernandez.wordpress.com/about/&gt;. <https://skyhernandez.wordpress.com/about/&gt;. s:Name “Dr. Skylar Hernandez”. <https://skyhernandez.wordpress.com/about/&gt; s:e-mail “dr.sky.hernandez@gmail.com.”

  • Web 4.0: It is considered the symbiotic web, where data interactions occur between humans and smart devices, the internet of things (Atzori, 2010; Patel, 2013). These smart devices can be wired to the internet or connected via wireless sensors through enhanced communication protocols (Atzori, 2010). Thus, these smart devices would have read and write concurrency with humans, where the largest potential of web 4.0 has these smart devices analyze data online and begin to migrate the online world into the real world (Patel, 2013). Besides interacting with the internet and the real world, the internet of things smart devices would be able to interact with each other (Atzori, 2010). Sakr (2014) stated that this web ecosystem is built off of four key items:
    • Data devices where data is gathered from multiple sources that generate the data
    • Data collectors are devices or people that collect data
    • Data aggregation from the IoT, people, RFIDs, etc.
    • Data users and data buyers are people that derive value out of the data
  • Web 5.0: Previous iterations of the web do not perceive people’s emotion, but one day it could be able to understand a person’s emotional (Patel, 2013). Kelly (2007) predicted that in 5,000 days the internet would become one machine and all other devices would be a window into this machine. In 2007, Kelly stated that this one machine “the internet” has the processing capability of one human brain, but in 5,000 days it will have the processing capability of all the humanity.

Performance Bottlenecks & Root Causes

There is a trend in terms of “performance bottleneck” to access large-scale Web data as the Web technology evolves. A bottleneck is when the flow of information is slowed down or stopped in its entirety that it can cause a bad end-user experience (TechTarget, 2007, Thomas, 2012). Performance bottlenecks can cause an application to perform poorly towards expectations (Thomas, 2012). As the internet evolved, there are new performance bottlenecks that begin to appear:

  • Web 1.0: When two URI hold the same value, it can confound search engine, hence the move to IPv4 to IPv6 (Jacobs & Walsh, 2004). Transfer protocols rely on network devices: network interface card, firewall, cables, tight security, load balancers, routers, etc., which all affect the flow of information (bandwidth) (Jacobs & Walsh, 2004; Thomas, 2012). Finally, HTTP information is pulled from the web server, thus low capacity computers, broken links, tight security, poor configurations, which can result in HTTP errors (4xx, 5xx), lots of open connections, memory leaks, lengthy queues, extensive data scans, database deadlock, etc. (Thomas, 2012).
  • Web 2.0: Database demands on the web application, because there are more write and read transactions (Sakr, 2014). Plus, all the performance bottlenecks from web 1.0.
  • Web 3.0: Searching for information in the data is tough and time-consuming without a computer processing application (UK Web Design Company, n.d.). Data is tied to the logic and language like HTML without a readily made browser to simply explore the data and therefore may require HTML or other software to process the data (Brewton, Yuan, & Akowuah, 2012; UK Web Design Company, n.d.). Syntax and tags are redundant, which can consume huge amounts of bytes, and slow down processing speeds (Hiroshi, 2007).
  • Web 4.0: IoT creates performance bottlenecks but it also has its issues, if it is left on its own and it is not tied to anything else (Jaffe, 2014; Newman, 2016): (a) the devices cannot deal with the massive amounts of data generated and collected; and (b) the devices cannot learn from the data it generates and collects. Finally, there are also infrastructure potential performance bottlenecks for IoT (Atzori, 2010): (a) the huge number of internet oriented devices that will be taking up the last few IPv4 addresses; (b) things oriented and internet oriented devices could spend a time in sleep mode, which is not typical for current devices using the current IP networks; (c) IoT devices when connecting to the internet produce smaller packets of data at higher frequency than current devices; (d) each of the devices would have to use a common interface and standard protocols as other devices, which can easily flood the network and increase complexity of middleware software layer design; and (e) IoT are vastly heterogeneous objects, where each device with their own function and has its own way of communicating.
  • Web 5.0: Give the assumption of what this version of the web would become, a possible performance bottleneck would be a number of resources consumed to keep the web operating.

Overall, Kelly (2007) stated that there are 100 billion clicks per day and 55 trillion links and that there are 2 million emails per second, 1 million IM messages, etc. big data will impact the performance of the web. Big data will be primarily impacting the web server, because of the increasing size of information and potential lack of bandwidth, big data can slow down the throughput performance (Thomas, 2012).

High-level strategies to mitigate

The web should evolve with time to keep up with the demands and needs of society, and it is predicted change significantly. What the web evolves into is yet to be seen, but it will be quite unique. However, with multiple heterogeneous types of devices (IoT) trying to connect to the web, a standardized protocol and interface to the web should be adopted (Atzori, 2010). A move to IPv6 from IPv4 to accommodate the massive number of IoT devices that expected to generate data and connect to the web must happen to accommodate this opportunity to gain more data.

Given, Kelly (2007) centralized view of the web system, the information and data are stored distributively, and better algorithms are needed to connect relevant data and information more efficiently. Data storage is cheap, but at the rate of data creation by IoT and other sources, processing speeds through parallel and distributed processing must increase to take advantage of this explosion of big data into the web. This is the gap created by the fact that data collection is outpacing the speed of data processing and analysis. Given this gap, a data scientist should prioritize which subset of the data is valuable enough to analyze, to solve certain problems. This is a great workaround, however, it ignores the full data set at times, which was generated from a system. It’s not enough to analyze what is deemed to be valuable parts of the data, because another part of the data may reveal more insight (the whole is better than the sum of its parts argument).

NoSQL database types and extensible markup languages have enhanced how information and data are related to each other, but standardization and perhaps an automated 1:1 mapping of the same data being represented in different NoSQL databases may be needed to gain further insights faster. Web application developers should also be more resource conscious, trying to get more computational results for fewer resources.

Resources:

  • Atzori, L., Antonio Iera, A., & Morabito, G. (2010). The Internet of things: A survey. Computer Networks, 54(2). 787–2,805
  • Connolly, T., & Begg, C. (2014). Database Systems: A Practical Approach to Design, Implementation, and Management, (6th ed.). Pearson Learning Solutions. VitalBook file.
  • Sandén, B. I. (2011). Design of Multithreaded Software: The Entity-Life Modeling Approach. Wiley-Blackwell. VitalBook file.
  • Sakr, S. (2014). Large Scale and Big Data, (1st ed.). Vitalbook file.
  • Services, EMC E. (2015) Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. John Wiley & Sons P&T. VitalBook file.