Introduction to Emerging Technologies PDF

Summary

This document provides an introduction to emerging technologies covered in a course offered at Addis Ababa University's School of Information Science. It includes sections dedicated to data science, artificial intelligence, internet of things, augmented reality, and other emerging fields, like nanotechnology and biotechnology. The document is a textbook-style overview.

Full Transcript

Addis Ababa University College of Natural Science School of Information Science Introduction to Emerging Technology October 2019 Table of contents 1. Chapter One: Introduction to Emerging Technologies.................................................

Addis Ababa University College of Natural Science School of Information Science Introduction to Emerging Technology October 2019 Table of contents 1. Chapter One: Introduction to Emerging Technologies...................................................................... 5 1.1 Revolution of Technologies...................................................................................................... 5 1.2 Role of data for emerging Technologies................................................................................. 12 1.3 Enabling device and networks for emerging technologies....................................................... 14 1.4 Human to Machine Interface.................................................................................................. 23 1.5 Future Trend in Emerging Technologies.................................................................................. 28 2 Chapter Two: Overview for Data Science....................................................................................... 29 2.1 An Overview of Data Science.................................................................................................. 29 2.2 What is data and information................................................................................................. 29 2.3 Data types and its representation.......................................................................................... 30 Data value Chain................................................................................................................................ 32 2.4 Basic concepts of big data...................................................................................................... 34 3. Chapter Three: Intoroduction to Artificial Intillegence (AI).......................................................... 41 3.1. An overview of AI................................................................................................................... 41 3.3. What is AI........................................................................................................................... 42 3.4. History of AI........................................................................................................................... 45 3.5. Levels of AI......................................................................................................................... 46 3.6. Types of AI......................................................................................................................... 48 3.7. Applications of AI................................................................................................................... 49 Agriculture..................................................................................................................................... 49 Health............................................................................................................................................ 51 Business (Emerging market)........................................................................................................... 53 Education...................................................................................................................................... 56 3.8. AI tools and platforms (egg: scratch/object tracking).............................................................. 58 3.9. Sample application with hands on activity (simulation based)................................................ 59 4. INTERNET OF THINGS (IoT)............................................................................................................. 62 4.1 Overview of IoT...................................................................................................................... 62 4.1.1 What is IoT?................................................................................................................... 64 4.1.2 History of IoT.................................................................................................................. 65 4.1.3 Advantages of IoT........................................................................................................... 66 4.1.4 Challenges of IoT............................................................................................................ 68 1 4.2 How IoT works....................................................................................................................... 70 4.2.1 Architecture of IoT.......................................................................................................... 70 4.2.2. Devices and network...................................................................................................... 75 4.2.3 IoT trends.............................................................................................................................. 75 4.2.4 IoT networks......................................................................................................................... 76 4.2.5 IoT device examples and applications.................................................................................... 80 4.2.6 A Networks and Devices Management Platform for the Internet of Things............................ 80 4.3 Applications of IoT................................................................................................................. 82 4.3.1 Smart home.................................................................................................................... 83 4.3.2 Smart grid....................................................................................................................... 84 4.3.3 Smart city....................................................................................................................... 84 4.3.4 Wearable devices........................................................................................................... 85 4.3.5 Smart farming................................................................................................................ 86 4.4 IoT platforms and development tools.......................................................................................... 87 4.4.1 IoT platforms......................................................................................................................... 87 4.4.2 IoT development tools.......................................................................................................... 88 5. AUGMENTED REALITY.................................................................................................................... 90 5.1 Introduction to AR.................................................................................................................. 90 5.2 Virtual reality (VR), Augmented Reality (AR) vs mixed reality (MR)......................................... 91 5.2.1 WHAT IS VR, AR, AND MR?.................................................................................................... 91 5.2.2 WHAT ARE THE DIFFERENCES BETWEEN VR, AR, AND MR?.................................................... 92 5.3 The architecture of AR systems.............................................................................................. 93 5.3.1 AR Components.............................................................................................................. 93 5.3.2 AR Architecture.............................................................................................................. 94 5.3.3 AR System client/server communication............................................................................. 100 5.3.4 AR System clients............................................................................................................... 101 5.3.5 AR System database server................................................................................................. 103 5.3.3 AR Devices.......................................................................................................................... 104 5.4 Application of AR systems (education, medical, assistance, entertainment) workshop-oriented hands demo..................................................................................................................................... 109 6. ETHICS AND PROFESSIONALISM OF EMERGING TECHNOLOGIES.................................................. 117 6.1 Technology and ethics.......................................................................................................... 117 6.1.1 GENERAL ETHICAL PRINCIPLES...................................................................................... 118 2 6.1.2 PROFESSIONAL RESPONSIBILITIES................................................................................. 118 6.1.3 PROFESSIONAL LEADERSHIP PRINCIPLES....................................................................... 119 6.2 Digital privacy...................................................................................................................... 119 6.3 Accountability and trust............................................................................................................. 120 6.4 Treats and challenges........................................................................................................... 121 6.4.1 Ethical and regulatory challenges................................................................................. 121 6.4.2 Treats........................................................................................................................... 125 7 Other emerging technologies....................................................................................................... 128 7.1 Nanotechnology................................................................................................................... 128 7.1.1 How it started..................................................................................................................... 128 7.1.2 Fundamental concepts in nanoscience and nanotechnology............................................... 128 7.1.3Applications of nanotechnology:.......................................................................................... 130 7.2 Biotechnology...................................................................................................................... 131 7.2.1 History................................................................................................................................ 132 7.2.2 Application of biotechnology............................................................................................... 132 7.3 Blockchain technology.......................................................................................................... 133 7.4 Cloud and quantum computing............................................................................................ 133 7.5 Autonomic computing (AC)........................................................................................................ 133 7.5.1 Characteristics of Autonomic Systems................................................................................. 134 7.6 Computer vision......................................................................................................................... 135 7.6.1 Definition............................................................................................................................ 135 7.6.2 How computer vision works.......................................................................................... 137 7.7 Embedded systems.............................................................................................................. 138 7.8 Cybersecurity....................................................................................................................... 139 7.8.1 Definition............................................................................................................................ 139 7.8.2 Cybersecurity checklist........................................................................................................ 140 7.8.3 Types of cybersecurity threats............................................................................................. 140 7.8.4 Benefits of cybersecurity..................................................................................................... 141 7.8.5 Cybersecurity vendors......................................................................................................... 141 7.9 Additive manufacturing (3D Printing)................................................................................... 142 7.9.1 3D Printing: It's All About the Printer................................................................................... 142 7.9.2 Additive Manufacturing: A Bytes-to-Parts Supply Chain....................................................... 142 3 4 1. Chapter One: Introduction to Emerging Technologies 1.1 Revolution of Technologies Britannica dictionary defines revolution as “in social and political science, a major, sudden, and hence typically violent alteration in government and in related associations and structures”. The term is used by analogy in such expressions as the Industrial Revolution, where it refers to a radical and profound change in economic relationships and technological conditions. From this definition, we can see revolution is a radical and profound change in economic relationships and technological conditions. Therefore, we can devide revolution of technology profound change in our human being life into 4-steps such as agriculture revolution (First revolution), industrial revolution (Second revolution), information revolution (Third revolution), and knowledge revolution (Fourth revolution) that will come in the future. Three revolution from first revolution to third revolution has already passed. However, fourth revolution is coming now and its impact is bigger than any other revolutions that we had have an experience in the past. Therefore, young generations of future will take this fourth revolution and have to live under this situation. It means that we have to prepare from primary school to university. Fourh revolution can also call as smart revolution because future’s revolution will be smart society by new knowledge, ICT, AI, and etc. Figure 1 The history of industrial revolution and representative technology 5 This material deals with what we have to prepare and how we have to teach and learn for the future. Figure 1 shows the pattern of revolution, ICT paradigm, and representative technology in each revolution. The characteristics of each revolution will be described in the each section. Agriculture revolution Agriculture revolution was started from 16000. Human being make a life with a hunting, getting a catch fruit and vegetable, and others. This life style continues to agriculture and settles down at one place. Human being always has to move to catch something and to eat. They cannot sometimes obtain eating material because of cold or heavy raining or other difficult situations. Even after they set up at one place, their eating material always is shortage because of seed, no agriculture technology or limited land. This life style continues till first agriculture revolution. However, this life style of human being had been closed because the agricultural revolution was developed through plant and crop at one place. Timeline of the Agricultural Revolution  First agriculture revolution: The first agricultural revolution is the period of transition from a hunting-and-gathering society to one based on stationary farming.  Generally saying as Neolithic Revolution, it is about 12000 years ago. It is difficult to tell exact time periods because they were separate revolutions and happened before written recore or history. The Neolithic Revolution would allow for a steady food supply as well as more permanent home. It was also the invention of agriculture.  The technology developed for this period includes simple metal tools to cultivate the land.  Second agriculture revolution: The second agricultural revolution went hand in hand with the Industrial Revolution in the 18th and early 19th. New technology was introduced to agriculture for mass crop. Farmers were no longer to limited farms and commercial farming became an idea worth exploring. Growing population and industrial revolution sparked the need for this revolution.  The technology developed for this revolution includes the seed drill, which enabled farmers to easily plant rows, new fertilizers were also introduced as well as artificial feed.  Third agriculture revolution: The green Revolution was the introduction of advanced technology and agricultural practices to farms to make farms more efficient. This revolution was sparked by the increasing awareness that the Earth is not renewable and that farms could not keep expanding outward and efficiency of land.  Fourth agriculture revolution: The cureent agriculture is changing because of AI and ICT. That is, AI and ICT is introducing to agriculture to analyze and to adjust humidity, temperature without weaher condition. Current agriculture do not have competitiveness without recognition of customer’s taste. 6 Figure 2 Agriculture revolutions: This area also is changing by AI and ICT as 4th revolution Industrial revolution The industrial revolution took place during the late 1700s and early 1800s. The industrial revolution began in Great Britain and quickly spread throughout the world. The agricultural societies became more industrialized and urban. The railroad, the cotton gin, electricity and other inventions permanently changed society. (Fig. 1=3) This revolution affected social, cultural, and economic conditions. This revolution left a profound impact on how people lived and the way businesses operated as well as the creation of capitalism and the modern cities of today. By the mid-18th century, the population growth and increasing foreign trade caused a greater demand for manufactured goods. Mass production was achieved by replacing water and animal power with steam power. Industrialization was fast by the invention of new machinery and technology. James Watt’s improvements to the steam engine and Matthew Boulton on the creation of the rotative engine were crucial for industrial production. Their machinery could function much faster with rotary movements and without human power. Coal became a key factor in the success of industrialization. Coal was used to produce the steam power on which industry depended. Improvements in mining technology ensured that more coal could be extracted to power the factories and run railway trains and steamships. Britain’s cotton and metalworking industries became internationally important. 7 Figure 3 Industrial revolution: This area also is changing by AI and ICT as 4th revolution Information revolution The information revolution started from invention of broad cast technology on 1922 (Fig. 1-4). However, revolution was directly wrought by computer technology, the storage device, and access to information since the mid-1980s. Many information material could be stored on the device and was manipulated on computer networks. Their technology allow instant retrieval from anywhere in the world and storage at speeds. During this revolution, individuals can easily communicate with each other worldwide and share information by using the same computer networks. Information revolution was driven by three factors. Firstly, information-based occupations grew throughout the 20th century. Almost office work dealt with information. They produced a latent demand for more efficient storage and processing systems. Secondly, All occupation and office work were provided by the advent of cheap PC in the 1980s and the 1990s that followed the development of the microprocessor in the 1970s. Previously, computer technology had been so expensive that it could only be used by large organizations for special purposes. However, it was so cheap that its cost was no longer a significant issue. Cheap personal computers spread out information and materials with user-friendly operating systems. It enabled vastly more people to make direct and convenient use of computerized information. The third factor, it was the Internet that made a crucial contribution from the early 1990s. A global computer network could be utilized to connect information providers and information consumers anywhere in the world. The information revolution has already had major effects on both business and personal life. It allows many personal and business networks can be connected quickly by the Internet. People can communicate worldwide via e-mail and other Internet-based social network. Jobs have 8 declined in such areas as banking, real estate, interviewing, and etc. Also new professions and business have been created such as web designer, IT survice, contents, and so on. The best one is that this has led to concerns over the potential for growing economic and social patterns. However, there are concerns over privacy and security to get over. The information revolution is still in its early stages. It will give an impact more on everywhere and its opportunities will emerge more in the 4th industrial revolution with AI, biotechnology, and etc. Figure 4 Information revolution gives a directly influence on 4th industrial revolution The 4th Industrial revolution The Fourth Industrial Revolution (March, 2017, ISBN-10: 9781524758868), Professor Klaus Schwab, founder and executive chairman of the World Economic Forum, was described. He describes that the enormous potential for the technologies of the Fourth Industrial Revolution as well as the possible risks. He also said, "The changes are so profound that, from the perspective of human history, there has never been a time of greater promise or potential peril. My concern, however, is that decision-makers are too often caught in traditional, linear (and non-disruptive) thinking or too absorbed by immediate concerns to think strategically about the forces of disruption and innovation shaping our future." The Fourth Industrial Revolution describes the exponential changes to the way we live, work and relate to one another due to the adoption of cyber-physical systems, the Internet of Things (IoT), big data, AI (Artificial Intelligence), and its combined technologies. As we implement smart technologies in our factories and workplaces, connected machines will interact, visualize the entire production chain and make decisions autonomously for home and car. This revolution is expected to impact all disciplines, industries, social pattern, and economies. While in some ways it's an extension of the computerization of the 3rd Industrial Revolution, due to the velocity, scope and systems impact of the changes of the fourth 9 revolution. The Fourth Industrial Revolution is disrupting almost every industry in every country and creating massive change in a non-linear way at unprecedented speed. This revolution will give an impact bigger than the previous one to under developing or advanced country at the same time. The country that do not prepared will be merged socially and economically. Figure 5 Expected area of 4th industrial revolution ( EU committee’s report, 2016) 10 Figure 6 Related topics for preparation of 4th industrial revolution. 11 1.2 Role of data for emerging Technologies We are living in the age of big data. Data is regarded as the new oil and strategic asset, and drives or even determines the future of science, technology, the economy, and possibly everything in our world today and tomorrow. Data have not only triggered tremendous hype and buzz, but more importantly presents enormous challenges that in turn bring incredible innovation and economic opportunities. This reshaping and paradigm shifting is driven not just by data itself but all other aspects that could be created, transformed, and/or adjusted by understanding, exploring, and utilizing data. The preceding trend and its potential have triggered new debate about data- intensive scientific discovery as a emerging technologies, the so-called “fourth industrial revolution,” There is no doubt, nevertheless, that the potential of data science and analytics to enable data-driven theory, economy, and professional development is increasingly being recognized. This involves not only core disciplines such as computing, informatics, and statistics, but also the broad-based fields of business, social science, and health/medical science. 12 Figure 7 Data Science Domains and Disciplines Data Science Disciplinary The art of data science has attracted increasing interest from a wide range of domains and disciplines. Accordingly, communities or proposers from diverse backgrounds, with contrasting aspirations, have presented very different views or foci. Some examples are that data science is the new generation of statistics, is a consolidation of several interdisciplinary fields, or is a new body of knowledge. Data science also has implications for providing capabilities and practices for the data profession, or for generating business strategies. Statisticians have had much to say about data science, since it is they who actually created the term “data science” and promoted the upgrading of statistics to data science a a broader discipline. Intensive discussions have taken place within the research and academic community about creating data science as an academic discipline. This involves not only statistics, but also a multidisciplinary body of 13 knowledge that includes computing, communication, management, and decision. The concept of data science is correspondingly defined from the perspective of disciplinary and course development: for example, treating data science as a mixture of statistics, mathematics, computer science, graphic design, data mining, human-computer interaction, and information visualization. In contrast to big data that has been driven by data-oriented business and private enterprise, researchers and scientists also play a driving role in the data science agenda. Migrating from the original push in the statistics communities, various disciplines have been involved in promoting the disciplinary development of data science. The aim is to manage a growing gap between our awareness of that information and our understanding of it. In addition to the promotion activities in core analytics disciplines such as statistics, mathematics, computing, and artificial intelligence, the extended recognition and undertaking of domain- specific data science seems to repeat the evolutionary history of the computer and computer-based applications. 1.3 Enabling device and networks for emerging technologies In the world of digital electronic systems, there are four basic kinds of devices: memory, microprocessors, logic and networks. Memory devices store random information such as the contents of a spreadsheet or database. Microprocessors execute software instructions to perform a wide variety of tasks such as running a word processing program or video game. Logic devices provide specific functions, including device-to-device interfacing, data communication, signal processing, data display, timing and control operations, and almost every other function a system must perform. Programmable logic refers to a general class of devices which can be configured to perform a variety of logic functions. The devices range from simple PROM, programmable read-only memory, devices which can implement simple combinatorial logic, to PAL's, programmable array logic, to FPGA's, field programmable gate arrays. All these devices share the feature that they are programmed to perform specific functions. Programmable logic as we know it today started with devices known as Programmable Array Logic. These devices get their name from the programmable AND array which is part of the device. These devices have pins which can be programmed to be logic inputs. Each pin will be one logic variable. Each output can be programmed to be active for a particular sum of products of the input terms. In technology, 'networking' is connecting a system of computers to share information. Computer networks are very essential to todays globalization as the world evolves to an advanced planet in Information Technology. Internet is just that type of service, and new ideas and better systems are being developed 14 everyday. One of the key contributing factors of the Information Technology rise in the world is network and data communication because technology’s advancement is not only on the gadgets but the system as well. Programmable Logic Device Logic devices can be classified into two broad categories - fixed and programmable.  Fixed Logic Devices As the name suggests, the circuits in a fixed logic device are permanent, they perform one function or set of functions - once manufactured, they cannot be changed. With fixed logic devices, the time required to go from design, to prototypes, to a final manufacturing run can take from several months to more than a year, depending on the complexity of the device. And, if the device does not work properly, or if the requirements change, a new design must be developed  Programmable Logic Devices A programmable logic device(PLD) is an electronic component used to build reconfigurable digital circuits. Unlike a logic gate, which has a fixed function, a PLD has an undefined function at the time of manufacture. Before the PLD can be used in a circuit it must be programmed, that is, reconfigured by using a specialized program. Simple programmable logic devices (SPLD) are the simplest, smallest and least-expensive forms of programmable logic devices. SPLDs can be used in boards to replace standard logic components (AND, OR, and NOT gates), such as 7400-series TTL. Figure 8 PLDs family tree with process technology 15 They typically comprise 4 to 22 fully connected macrocells. These macrocells typically consist of some combinatorial logic (such as AND OR gates) and a flip-flop. In other words, a small Boolean logic equation can be built within each macrocell. This equation will combine the state of some number of binary inputs into a binary output and, if necessary, store that output in the flip-flop until the next clock edge. Of course, the particulars of the available logic gates and flip- flops are specific to each manufacturer and product family. But the general idea is always the same. Most SPLDs use either fuses or non-volatile memory cells (EPROM, EEPROM, FLASH, and others) to define the functionality. On the other hand PLDs are standard, off-the-shelf parts that offer customers a wide range of logic capacity, features, speed, and voltage characteristics - and these devices can be changed at any time to perform any number of functions. With programmable logic devices, designers use inexpensive software tools to quickly develop, simulate, and test their designs. Then, a design can be quickly programmed into a device, and immediately tested in a live circuit. The PLD that is used for this prototyping is exactly the same PLD that will be used in the final production of a piece of end equipment, such as a network router, a DSL modem, a DVD player, or an automotive navigation system. There are no NRE costs and the final design is completed much faster than that of a custom, fixed logic device. Another key benefit of using PLDs is that during the design phase customers can change the circuitry as often as they want until the design operates to their satisfaction. That's because PLDs are based on re-writeable memory technology - to change the design, simply reprogram the device. Once the design is final, customers can go into immediate production by simply programming as many PLDs as they need with the final software design file. These devices are also known as:  Programmable array logic (PAL)  Generic array logic (GAL)  Programmable logic arrays (PLA)  Field-programmable logic arrays (FPLA)  Programmable logic devices (PLD) PLDs are often used for address decoding, where they have several clear advantages over the 7400-series TTL parts that they replaced: One chip requires less board area, power, and wiring than several do. The design inside the chip is flexible, so a change in the logic does not require any rewiring of the board. Rather, simply replacing one PLD with another part that has been programmed with the new design can alter the decoding logic. Programmable Logic Devices (PLDs) are digital devices with configurable logic and flip-flops linked together with programmable interconnect. Logic devices provide specific functions, including: 16 Device-to-device interfacing  Data communication  Signal processing  Data display  Timing  Control operations  Almost every other function a system must perform Figure 9 Programmable Logic Devices Networking Computer networks are very essential to todays globalization as the world evolves to everything connecting in Information Technology. One of the key contributing factors of the Information Technology rise in the world is network and data communication because technology’s advancement is not only on the gadgets but the system as well. History of Computer Networks Networking started long ago by ARPANET. When Russia launched their SPUTNIK Satellite in Space in 1957.The American started an agency named Advance Research Project Agency (ARPA) and launched their 1st satellite within 18 month after establishment. Then sharing of the information in another computer they use ARPANET. In the 1960s, computer networking was essentially synonymous with mainframe computing and telephony services and the distinction between local and wide area networks did not yet exist. Mainframes were typically “networked” to a series of dumb terminals with serial connections running on RS-232 or some other electrical interface. Then in 1969, ARPANET comes in INDIA and INDIAN switched this name to NETWORK. Development of the network began in 1969, based on designs developed during the 1960s. The ARPANET evolved into the modern Internet. If a terminal in one city needed to connect with a mainframe in another city, a 300-baud long-haul modem would use the existing analog Public Switched 17 Telephone Network (PSTN) to form the connection. The technology was primitive indeed, but it was an exciting time nevertheless. The quality and reliability of the PSTN increased significantly in 1962 with the introduction of pulse code modulation (PCM), which converted analog voice signals into digital sequences of bits. DS0 (Digital Signal Zero) became the basic 64-Kbps channel, and the entire hierarchy of the digital telephone system was soon built on this foundation. When the backbone of the Bell system became digital, transmission characteristics improved due to higher quality and less noise. This was eventually extended all the way to local loop subscribers using ISDN. The first commercial touch-tone phone was also introduced in 1962. In the 1980s, the growth of client/server LAN architectures continued while that of mainframe computing environments declined. However, the biggest development in the area of LAN networking in the 1980s was the evolution and standardization of Ethernet. While the DIX consortium worked on standard Ethernet in the late 1970s, the IEEE began its Project 802 initiative, which aimed to develop a single, unified standard for all LANs. Figure 10 Evolution of Networking Technology The development of the Network File System (NFS) by Sun Microsystems in 1985 resulted in a proliferation of diskless UNIX workstations with built-in Ethernet interfaces that also drove the demand for Ethernet and accelerated the deployment of bridging technologies for segmenting LANs. Also around 1985, increasing numbers of UNIX machines and LANs were connected to ARPANET, which until that time had been mainly a network of mainframe and minicomputer systems. The first UNIX implementation of TCP/IP came in v4.2 of Berkeley’s BSD UNIX, from which other vendors such as Sun Microsystems quickly ported their versions of TCP/IP. The network processor trend goes back to the days of the Internet boom in the late 1990s. It was launched with all the hype surrounding anything related to the Internet as “the new technology on the block”. As 18 usual with a new technology, marketing people promised a new revolution in sight and it resulted in tens of startup companies dedicated to this area. Several applications were envisioned for it at different layers of the network architecture. As time went by, not all the high expectations were realized and the bubble burst out as the Internet bubble itself. The high demand for increased processing speed (as a result of communication speed surpassing processing speed) and the demand for adaptability (as a result of convergence of voice and data networks) and the prospect of whole new set of emerging services added to the need for a new paradigm in network devices. High level of programmability was sought to support new services and protocols but at very high performance. Besides, because of faster change of pace, short time-to-market and longer product life-time were other important factors driving the concept of network processor. In the 1980s, general or normal processors were used for networking process which was quite slow and took longer period to load. But later on the processor changed and now the networking processors are different and are made in such a way to boost the networking in any way possible. Back in the late 80s, they used specialized softwares to configure networks. It was then they started using Microsoft’s Windows Server application. The software was then upgraded to 2003 Server, 2008 Server and the latest is 2010 Server by Microsoft. The development environment includes the Internet Exchange Architecture Software Development Kit (IXA SDK) which provides easy-touse graphical simulation environment for developing, debugging, and optimizing a network application. The other advantage of Intel SDK is that it has preserved the programming environment so the developers can easily migrate from the older products to the new one and only be concerned about the new features and tools provided by the new product. Software Defined Networking, or SDN, is cloud-based software that allows for management of the network from one central point. The key is virtualization, which makes it so software can run separately from hardware. It would be automatically responsive, and information technology personnel could view all problems from one location and have a much easier time troubleshooting. The Security is a very important thing in networking world. Security purpose should be undertaken because without security, through networking one can hack through any information and alter the information for their own purpose. During the 1980s, the security that was used was not much but it was more than enough because during that time networking was a closed design. They used the network in the organization itself and used firewalls to prevent hackers from hacking. Nowadays there are a lot of security measures that can be chosen. The networking industries have evolved enormously from the late 1980s till today. The hardware have been upgraded and the software has changed a lot. 19 Future Trend of Networks As corporate bandwidth requirements continue to surge exponentially with every passing year, it becomes clear that bandwidth demands as well as the business requirements of the modern digital workspace are setting the stage for the implementation of new, advanced technologies. These technologies give rise to fresh possibilities and further fuel the demand for adding intelligent systems to our daily lives and greater reliance on tech support, both in the home and work fronts. With software trends emerging regularly in the IT scene, digital services and people are becoming further intertwined to characterize everything that’s new the world of network technology this year. These recent advancements are more than likely to disrupt existing operations and foster an era of digitization and intelligence throughout the business sector. Topics of future trends in networking technology include the following:  5G technology 5G technology serves to enhance not just the mobile device experience but the entirety of the communications tech environment. 5G will provide that by annexing radio spectrum that are 1 millisecond. This will allow further development in such arenas as driverless cars. Imagine movies downloaded in a matter of seconds. The greater bandwidth of 5G alongside support for extremely low latency will help fuel groundbreaking applications in the virtual and augmented reality and health-care industries. Look for the fields of vehicle- to-vehicle communication and tactile feedback remote surgery, both of which require strong cellular support to unlock their full potential. Moreover, IoT gadgets are growing fast in almost all verticals, such as business, retail, homes, industries, and others. 20 Figure 11 Bandwitdth Expansion and 5 G Technology 5G would be made possible by SDN and support a variety of devices and applications. One such application is virtual reality, or computer simulated, 3D images. 5G could also provide extended bandwidths to make gaming or accessing social networks more exciting using wearable, devices that communicate with the network, are mobile, and have the ability to be easily transported by wearing them. Another promising application is no-touch computing, in which we can speak or direct our computers to perform without use of a mouse or keyboard. Expect to see a huge surge in the number of systems connecting to always-on 5G networks in the next few years.  Network developments in edge computing Although cloud systems have made a big splash earlier, right now it’s all about cloud-to-the-edge technology. Edge computing networks are the solution for several computing hurdles, including low connectivity and overcrowded bandwidth. Edge computing promises solutions that decrease latency by keeping all computations close to system endpoints. Edge computing is one technological area that’s expanded greatly in the past few years. Combined with IoT and AI, it’s led to innovative methods, like using AI to secure IoT systems. And that’s not all: deep learning is being leveraged by datacenters to improve network speed and reduce the mass of transferred data at the edge of networks..  Rise of decentralization Regular architectures directed traffic onto the datacenter for centralizing Internet access and security. However, greater collaboration between suppliers and partners and extensive cloud usage has disputed 21 this model. The growth of direct cloud interconnection and cloud security services are encouraging many companies to adopt a decentralized approach for optimizing their connectivity to cloud platforms.  Changing perspectives on ML and AI  Machine learning(ML) and AI will continue to remain hot favorites among vendor marketing teams. Unfortunately, the majority of those teams will fail to properly understand what either of these technologies is as well as the right way to harness their potential. Nearly all network devices are currently instrumented, transmitting telemetry to large data lakes. However, our capacity to find true insights is still lacking. Like IoT, gathering data is easy, but it’s more challenging to convert that data into usable insights.  More attention to network security Network security is one of the key motivators for the rise of new IT services. With hackers and cybercriminals becoming more sophisticated, IT infrastructure is extending gradually into cloud-based, virtual platforms, leaving most client and company data exposed to security risks. Aside from implementing standard firewalls and monitoring user access, companies must implement stronger cyber security strategies that allow developers to consider innovative defense approaches. Because one of the potential drawbacks of new networking technology is the security gaps exposed to hackers. That’s why insightful cyber security measures are necessary.  Going wireless Advanced wireless technology as well as related security and management has led numerous companies to go wireless-first. Doing so eliminates the charges related to moves, additions and modifications to the fixed and wired LAN infrastructure. Moreover, it promises greater reliability and resilience. Cloud- specific “as-a-service” deployments and advanced monitoring tools and features are being deployed as well to provide increased performance insight and visibility.  Cloud repatriation Cloud repatriation is when apps move from the cloud back to on-premises. This indicates that datacenters have not lost their relevance yet. The majority of repatriation activity centers on businesses attempting to discover balance or equilibrium. This does not signify that the cloud is losing relevance; merely that is has been somewhat over-hyped.  Smart automation expands Companies often spend huge amounts on network automation so they don’t fall behind. Pointed solutions and manual scripting cannot scale to complement the considerable rise in network demands. Thus, expect to see a surge in smart and innovative network automation solutions that manage devices, ensure compliance across hybrid and on-premises deployments, and automate services. The upcoming generation of networks will feature machine learning and AI to ward off security challenges and network complexity.  Networking technology: Keep up with the changes 22 Communications and information technology are advancing rapidly and it is necessary for companies to consider which of the emerging technologies is ideal for their business. Implementing the right networking technology allows the organization to get the most benefits. 1.4 Human to Machine Interface The Association for Computing Machinery (ACM) defines human–computer interaction as "a discipline concerned with the design, evaluation and implementation of interactive computing systems for human use and with the study of major phenomena surrounding them". An important facet of HCI is user satisfaction (or simply End User Computing Satisfaction). "Because human– computer interaction studies a human and a machine in communication, it draws from supporting knowledge on both the machine and the human side. On the machine side, techniques in computer graphics, operating systems, programming languages, and development environments are relevant. On the human side, communication theory, graphic and industrial design disciplines, linguistics, social sciences, cognitive psychology, social psychology, and human factors such as computer user satisfaction are relevant. And, of course, engineering and design methods are relevant. Due to the multidisciplinary nature of HCI, people with different backgrounds contribute to its success. HCI is also sometimes termed human–machine interaction (HMI), man-machine interaction (MMI) or computer-human interaction (CHI). Humans interact with computers in many ways; the interface between humans and computers is crucial to facilitate this interaction. Desktop applications, internet browsers, handheld computers, and computer kiosks make use of the prevalent graphical user interfaces (GUI) of today. HMI is all about how people and automated systems interact and communicate with each other. That has long ceased to be confined to just traditional machines in industry and now also relates to computers, digital systems or devices for the IoT. More and more devices are connected and automatically carry out tasks. Operating all of these machines, systems and devices needs to be intuitive and must not place excessive demands on users. Smooth communication between people and machines requires interfaces: The place where or action by which a user engages with the machine. Simple examples are light switches or the pedals and steering wheel in a car: An action is triggered when you flick a switch, turn the steering wheel or step on a pedal. However, a system can also be controlled by text being keyed in, a mouse, touch screens, voice or gestures. Voice user interfaces (VUI) are used for speech recognition and synthesizing systems, and the emerging multi-modal and GUI allow humans to engage with embodied character agents in a way that cannot be achieved with other interface paradigms. The growth in human–computer interaction field has been in quality of interaction, and in different branching in its history. 23 Instead of designing regular interfaces, the different research branches have had a different focus on the concepts of multimodality rather than unimodality, intelligent adaptive interfaces rather than command/action based ones, and finally active rather than passive interfaces. [ Poorly designed human-machine interfaces can lead to many unexpected problems. A classic example is the Three Mile Island accident in USA, a nuclear meltdown accident, where investigations concluded that the design of the human-machine interface was at least partly responsible for the disaster. Similarly, accidents in aviation have resulted from manufacturers' decisions to use non-standard flight instrument or throttle quadrant layouts: even though the new designs were proposed to be superior in basic human-machine interaction, pilots had already ingrained the "standard" layout and thus the conceptually good idea actually had undesirable results. Figure 12 Technology Trends of Human Machine Interface The devices are either controlled directly: Users touch the smartphone’s screen or issue a verbal command. Or the systems automatically identify what people want: Traffic lights change color on their own when a vehicle drives over the inductive loop in the road’s surface. Other technologies are not so much there to control devices, but rather to complement our sensory organs. One example of that is virtual reality glasses. There are also digital assistants: Chatbots, for instance, reply automatically to requests from customers and keep on learning. Eliza, the first chatbot, was invented in the 1960s, but soon ran up against its limitations: It couldn’t answer follow-up questions. That’s different now. Today’s chatbots “work” in customer service and give written or spoken information on departure times or services, for example. To do that, they respond to keywords, examine the user’s input and reply on the basis of 24 preprogramed rules and routines. Modern chatbots work with artificial intelligence. Digital assistants like Google Home and Google Assistant are also chatbots. They all learn from the requests and thus expand their repertoire on their own, without direct intervention by a human. They can remember earlier conversations, make connections and expand their vocabulary. Google’s voice assistant can deduce queries from their context with the aid of artificial intelligence, for example. The more chatbots understand and the better they respond, the closer we come to communication that resembles a conversation between two people. Big data also plays a role here: If more information is available to the bots, they can respond in a more specific way and give more appropriate replies. Yet voice recognition is still not perfect. The assistants do not understand every request because of disturbance from background noise. In addition, they’re often not able to distinguish between a human voice and a TV, for example. The voice recognition error rate in 2013 was 23 percent, according to the U.S. Consumer Technology Association (CTA). In 2016, Microsoft’s researchers brought that down to below six percent for the first time. But that’s still not enough. Infineon intends to significantly improve voice control together with the British semiconductor manufacturer XMOS. The company supplies voice processing modules for devices in the Internet of Things. A new solution presented by Infineon and XMOS at the beginning of 2017 uses smart microphones. It enables assistants to pinpoint the human voice in the midst of other noises: A combination of radar and silicon microphone sensors from Infineon identifies the position and the distance of the speaker from the microphones, with far field voice processing technology from XMOS being used to capture speech. Gesture control has a number of advantages over touch screens: Users don’t have to touch the device, for example, and can thus issue commands from a distance. Gesture control is an alternative to voice control, not least in the public sphere. After all, speaking with your smart wearable on the subway might be unpleasant for some and provoke unwanted attention. Gesture control also opens up the third dimension, away from two-dimensional user interfaces. Google and Infineon have developed a new type of gesture control. They use radar technology for this: Infineon’s radar chip can receive waves reflected from the user’s finger. That means if someone moves their hand, it’s registered by the chip. Google algorithms then process these signals. That even works in the dark, remotely or with dirty fingers. The same uniform hand movements apply to all gesture control devices. The gesture control chip can be used in all possible devices, such as loudspeakers or smart watches. Modern human-machine interaction has long been more than just moving a lever or pressing a button. Technologies that augment reality can also be an interface between human and machine. Topics in human-computer interaction include the following: User customization End-user development studies how ordinary users could routinely tailor applications to their own needs and to invent new applications based on their understanding of their own domains. With 25 their deeper knowledge, users could increasingly be important sources of new applications at the expense of generic programmers with systems expertise but low domain expertise. Embedded computation Computation is passing beyond computers into every object for which uses can be found. Embedded systems make the environment alive with little computations and automated processes, from computerized cooking appliances to lighting and plumbing fixtures to window blinds to automobile braking systems to greeting cards. The expected difference in the future is the addition of networked communications that will allow many of these embedded computations to coordinate with each other and with the user. Human interfaces to these embedded devices will in many cases be disparate from those appropriate to workstations. Augmented reality Augmented reality refers to the notion of layering relevant information into our vision of the world. Existing projects show real-time statistics to users performing difficult tasks, such as manufacturing. Future work might include augmenting our social interactions by providing additional information about those we converse with. Social computing In recent years, there has been an explosion of social science research focusing on interactions as the unit of analysis. Much of this research draws from psychology, social psychology, and sociology. For example, one study found out that people expected a computer with a man's name to cost more than a machine with a woman's name. Other research finds that individuals perceive their interactions with computers more positively than humans, despite behaving the same way towards these machines. Knowledge-driven human–computer interaction In human and computer interactions, a semantic gap usually exists between human and computer's understandings towards mutual behaviors. Ontology, as a formal representation of domain-specific knowledge, can be used to address this problem, through solving the semantic ambiguities between the two parties. Emotions and human-computer interaction In the interaction of humans and computers, research has studied how computers can detect, process and react to human emotions to develop emotionally intelligent information systems. Researchers have suggested several 'affect-detection channels'. The potential of telling human emotions in an automated and digital fashion lies in improvements to the effectiveness of human- computer interaction. The influence of emotions in human-computer interaction has been studied in fields such as financial decision making using ECG and organizational knowledge sharing using eye tracking and face readers as affect-detection channels. In these fields it has been shown that affect-detection channels have the potential to detect human emotions and that information 26 systems can incorporate the data obtained from affect-detection channels to improve decision models. Brain–computer interfaces A brain–computer interface (BCI), is a direct communication pathway between an enhanced or wired brain and an external device. BCI differs from neuromodulation in that it allows for bidirectional information flow. BCIs are often directed at researching, mapping, assisting, augmenting, or repairing human cognitive or sensory-motor functions. Figure 13 Brief concept Drawing of Brain Computer Interface 27 1.5 Future Trend in Emerging Technologies 28 2 Chapter Two: Overview for Data Science 2.1 An Overview of Data Science Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured, semi structured and unstructured data. Data science continues to evolve as one of the most promising and in-demand career paths for skilled professionals. Today, successful data professionals understand that they must advance past the traditional skills of analyzing large amounts of data, data mining, and programming skills. In order to uncover useful intelligence for their organizations, data scientists must master the full spectrum of the data science life cycle and possess a level of flexibility and understanding to maximize returns at each phase of the process. Data scientists need to be curious and result-oriented, with exceptional industry-specific knowledge and communication skills that allow them to explain highly technical results to their non-technical counterparts. They possess a strong quantitative background in statistics and linear algebra as well as programming knowledge with focuses in data warehousing, mining, and modeling to build and analyze algorithms. In this chapter, we will talk about basic definitions of data and information, data types and representation, data value change and basic concepts of big data. 2.2 What is data and information What is data? Data can be defined as a representation of facts, concepts, or instructions in a formalized manner, which should be suitable for communication, interpretation, or processing by human or electronic machine. Data is represented with the help of characters such as alphabets (A-Z, a-z), digits (0-9) or special characters (+,-,/,*,,= etc.) What is Information? Information is organized or classified data, which has some meaningful values for the receiver. Information is the processed data on which decisions and actions are based. Information is a data that has been processed into a form that is meaningful to recipient and is of real or perceived value in the current or the prospective action or decision of recipient. For the decision to be meaningful, the processed data must qualify for the following characteristics −  Timely − Information should be available when required.  Accuracy − Information should be accurate.  Completeness − Information should be complete. 29 Data Vs Information Data can be described as unprocessed facts and figures. Plain collected data as raw facts cannot help in decision-making. However, data is the raw material that is organized, structured, and interpreted to create useful information systems. Data is defined as 'groups of non-random symbols in the form of text, images, and voice representing quantities, action and objects'. Information is interpreted data; created from organized, structured, and processed data in a particular context. Data Processing Cycle Data processing is the re-structuring or re-ordering of data by people or machine to increase their usefulness and add values for a particular purpose. Data processing consists of the following basic steps - input, processing, and output. These three steps constitute the data processing cycle. Figure 14 Data Processing Cycle  Input − In this step, the input data is prepared in some convenient form for processing. The form will depend on the processing machine. For example, when electronic computers are used, the input data can be recorded on any one of the several types of input medium, such as magnetic disks, tapes, and so on.  Processing − In this step, the input data is changed to produce data in a more useful form. For example, pay-checks can be calculated from the time cards, or a summary of sales for the month can be calculated from the sales orders.  Output − At this stage, the result of the proceeding processing step is collected. The particular form of the output data depends on the use of the data. For example, output data may be pay-checks for employees. 2.3 Data types and its representation In computer science and computer programming, a data type or simply type is an attribute of data which tells the compiler or interpreter how the programmer intends to use the data. Almost 30 all programming languages explicitly include the notion of data type, though different languages may use different terminology. Common data types include: O Integers O Booleans O Characters O floating-point numbers O alphanumeric strings A data type constrains the values that an expression, such as a variable or a function, might take. This data type defines the operations that can be done on the data, the meaning of the data, and the way values of that type can be stored. On other hand, for the analysis of data, it is important to understand that there are three common types of data types or structures: Figure 15 Data types for Analysis Structured Data Structured data is data that adheres to a pre-defined data model and is therefore straightforward to analyze. Structured data conforms to a tabular format with relationship between the different rows and columns. Common examples of structured data are Excel files or SQL databases. Each of these have structured rows and columns that can be sorted. Structured data depends on the existence of a data model – a model of how data can be stored, processed and accessed. Because of a data model, each field is discrete and can be accesses separately or jointly along with data from other fields. This makes structured data extremely powerful: it is possible to quickly aggregate data from various locations in the database. Structured data is is considered the most ‘traditional’ form of data storage, since the earliest versions of database management systems (DBMS) were able to store, process and access structured data. Unstructured Data 31 Unstructured data is information that either does not have a predefined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in structured databases. Common examples of unstructured data include audio, video files or No-SQL databases. The ability to store and process unstructured data has greatly grown in recent years, with many new technologies and tools coming to the market that are able to store specialized types of unstructured data. MongoDB, for example, is optimized to store documents. Apache Graph, as an opposite example, is optimized for storing relationships between nodes. The ability to analyze unstructured data is especially relevant in the context of Big Data, since a large part of data in organizations is unstructured. Think about pictures, videos or PDF documents. The ability to extract value from unstructured data is one of main drivers behind the quick growth of Big Data. Semi-structured Data Semi-structured data is a form of structured data that does not conform with the formal structure of data models associated with relational databases or other forms of data tables, but nonetheless contain tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known as self-describing structure. Examples of semi-structured data include JSON and XML are forms of semi-structured data. The reason that this third category exists (between structured and unstructured data) is because semi-structured data is considerably easier to analyze than unstructured data. Many Big Data solutions and tools have the ability to ‘read’ and process either JSON or XML. This reduces the complexity to analyze structured data, compared to unstructured data. Metadata – Data about Data A last category of data type is metadata. From a technical point of view, this is not a separate data structure, but it is one of the most important elements for Big Data analysis and big data solutions. Metadata is data about data. It provides additional information about a specific set of data. In a set of photographs, for example, metadata could describe when and where the photos were taken. The metadata then provides fields for dates and locations which, by themselves, can be considered structured data. Because of this reason, metadata is frequently used by Big Data solutions for initial analysis. Data value Chain 32 The Data Value Chain is introduced to describe the information flow within a big data system as a series of steps needed to generate value and useful insights from data. The Big Data Value Chain identifies the following key high-level activities: Figure 16 Data Value Chain Data Acquisition It is the process of gathering, filtering, and cleaning data before it is put in a data warehouse or any other storage solution on which data analysis can be carried out. Data acquisition is one of the major big data challenges in terms of infrastructure requirements. The infrastructure required to support the acquisition of big data must deliver low, predictable latency in both capturing data and in executing queries; be able to handle very high transaction volumes, often in a distributed environment; and support flexible and dynamic data structures. Data Analysis It is concerned with making the raw data acquired amenable to use in decision-making as well as domain-specific usage. Data analysis involves exploring, transforming, and modelling data with the goal of highlighting relevant data, synthesising and extracting useful hidden information with high potential from a business point of view. Related areas include data mining, business intelligence, and machine learning. Chapter 4 covers data analysis. Data Curation It is the active management of data over its life cycle to ensure it meets the necessary data quality requirements for its effective usage. Data curation processes can be categorized into different activities such as content creation, selection, classification, transformation, validation, and preservation. Data curation is performed by expert curators that are responsible for improving the accessibility and quality of data. Data curators (also known as scientific curators, or data 33 annotators) hold the responsibility of ensuring that data are trustworthy, discoverable, accessible, reusable, and fit their purpose. A key trend for the curation of big data utilizes community and crowd sourcing approaches. Data Storage It is the persistence and management of data in a scalable way that satisfies the needs of applications that require fast access to the data. Relational Database Management Systems (RDBMS) have been the main, and almost unique, solution to the storage paradigm for nearly 40 years. However, the ACID (Atomicity, Consistency, Isolation, and Durability) properties that guarantee database transactions lack flexibility with regard to schema changes and the performance and fault tolerance when data volumes and complexity grow, making them unsuitable for big data scenarios. NoSQL technologies have been designed with the scalability goal in mind and present a wide range of solutions based on alternative data models. Data Usage It covers the data-driven business activities that need access to data, its analysis, and the tools needed to integrate the data analysis within the business activity. Data usage in business decision-making can enhance competitiveness through reduction of costs, increased added value, or any other parameter that can be measured against existing performance criteria 2.4 Basic concepts of big data Big data is a blanket term for the non-traditional strategies and technologies needed to gather, organize, process, and gather insights from large datasets. While the problem of working with data that exceeds the computing power or storage of a single computer is not new, the pervasiveness, scale, and value of this type of computing has greatly expanded in recent years. In this section, we will talk about big data on a fundamental level and define common concepts you might come across. We will also take a high-level look at some of the processes and technologies currently being used in this space. What Is Big Data? An exact definition of “big data” is difficult to nail down because projects, vendors, practitioners, and business professionals use it quite differently. With that in mind, generally speaking, big data is:  large datasets  the category of computing strategies and technologies that are used to handle large datasets 34 In this context, “large dataset” means a dataset too large to reasonably process or store with traditional tooling or on a single computer. This means that the common scale of big datasets is constantly shifting and may vary significantly from organization to organization. Why Are Big Data Systems Different? The basic requirements for working with big data are the same as the requirements for working with datasets of any size. However, the massive scale, the speed of ingesting and processing, and the characteristics of the data that must be dealt with at each stage of the process present significant new challenges when designing solutions. The goal of most big data systems is to surface insights and connections from large volumes of heterogeneous data that would not be possible using conventional methods. In 2001, Gartner’s Doug Laney first presented what became known as the “three Vs of big data” to describe some of the characteristics that make big data different from other data processing: Volume The sheer scale of the information processed helps define big data systems. These datasets can be orders of magnitude larger than traditional datasets, which demands more thought at each stage of the processing and storage life cycle. Often, because the work requirements exceed the capabilities of a single computer, this becomes a challenge of pooling, allocating, and coordinating resources from groups of computers. Cluster management and algorithms capable of breaking tasks into smaller pieces become increasingly important. Velocity Another way in which big data differs significantly from other data systems is the speed that information moves through the system. Data is frequently flowing into the system from multiple sources and is often expected to be processed in real time to gain insights and update the current understanding of the system. This focus on near instant feedback has driven many big data practitioners away from a batch- oriented approach and closer to a real-time streaming system. Data is constantly being added, massaged, processed, and analyzed in order to keep up with the influx of new information and to surface valuable information early when it is most relevant. These ideas require robust systems with highly available components to guard against failures along the data pipeline. Variety Big data problems are often unique because of the wide range of both the sources being processed and their relative quality. 35 Data can be ingested from internal systems like application and server logs, from social media feeds and other external APIs, from physical device sensors, and from other providers. Big data seeks to handle potentially useful data regardless of where it’s coming from by consolidating all information into a single system. The formats and types of media can vary significantly as well. Rich media like images, video files, and audio recordings are ingested alongside text files, structured logs, etc. While more traditional data processing systems might expect data to enter the pipeline already labeled, formatted, and organized, big data systems usually accept and store data closer to its raw state. Ideally, any transformations or changes to the raw data will happen in memory at the time of processing. Other Characteristics Various individuals and organizations have suggested expanding the original three Vs, though these proposals have tended to describe challenges rather than qualities of big data. Some common additions are:  Veracity: The variety of sources and the complexity of the processing can lead to challenges in evaluating the quality of the data (and consequently, the quality of the resulting analysis)  Variability: Variation in the data leads to wide variation in quality. Additional resources may be needed to identify, process, or filter low quality data to make it more useful.  Value: The ultimate challenge of big data is delivering value. Sometimes, the systems and processes in place are complex enough that using the data and extracting actual value can become difficult. What Does a Big Data Life Cycle Look Like? So how is data actually processed when dealing with a big data system? While approaches to implementation differ, there are some commonalities in the strategies and software that we can talk about generally. While the steps presented below might not be true in all cases, they are widely used. The general categories of activities involved with big data processing are:  Ingesting data into the system  Persisting the data in storage  Computing and Analyzing data  Visualizing the results Before we look at these four workflow categories in detail, we will take a moment to talk about clustered computing, an important strategy employed by most big data solutions. Setting up a computing cluster is often the foundation for technology used in each of the life cycle stages. Clustered Computing 36 Because of the qualities of big data, individual computers are often inadequate for handling the data at most stages. To better address the high storage and computational needs of big data, computer clusters are a better fit. Big data clustering software combines the resources of many smaller machines, seeking to provide a number of benefits:  Resource Pooling: Combining the available storage space to hold data is a clear benefit, but CPU and memory pooling is also extremely important. Processing large datasets requires large amounts of all three of these resources.  High Availability: Clusters can provide varying levels of fault tolerance and availability guarantees to prevent hardware or software failures from affecting access to data and processing. This becomes increasingly important as we continue to emphasize the importance of real-time analytics.  Easy Scalability: Clusters make it easy to scale horizontally by adding additional machines to the group. This means the system can react to changes in resource requirements without expanding the physical resources on a machine. Using clusters requires a solution for managing cluster membership, coordinating resource sharing, and scheduling actual work on individual nodes. Cluster membership and resource allocation can be handled by software like Hadoop’s YARN (which stands for Yet Another Resource Negotiator) or Apache Mesos. The assembled computing cluster often acts as a foundation which other software interfaces with to process the data. The machines involved in the computing cluster are also typically involved with the management of a distributed storage system, which we will talk about when we discuss data persistence. Ingesting Data into the System Data ingestion is the process of taking raw data and adding it to the system. The complexity of this operation depends heavily on the format and quality of the data sources and how far the data is from the desired state prior to processing. One way that data can be added to a big data system are dedicated ingestion tools. Technologies like Apache Sqoop can take existing data from relational databases and add it to a big data system. Similarly, Apache Flume and Apache Chukwa are projects designed to aggregate and import application and server logs. Queuing systems like Apache Kafka can also be used as an interface between various data generators and a big data system. Ingestion frameworks like Gobblin can help to aggregate and normalize the output of these tools at the end of the ingestion pipeline. 37 During the ingestion process, some level of analysis, sorting, and labelling usually takes place. This process is sometimes called ETL, which stands for extract, transform, and load. While this term conventionally refers to legacy data warehousing processes, some of the same concepts apply to data entering the big data system. Typical operations might include modifying the incoming data to format it, categorizing and labelling data, filtering out unneeded or bad data, or potentially validating that it adheres to certain requirements. With those capabilities in mind, ideally, the captured data should be kept as raw as possible for greater flexibility further on down the pipeline. Persisting the Data in Storage The ingestion processes typically hand the data off to the components that manage storage, so that it can be reliably persisted to disk. While this seems like it would be a simple operation, the volume of incoming data, the requirements for availability, and the distributed computing layer make more complex storage systems necessary. This usually means leveraging a distributed file system for raw data storage. Solutions like Apache Hadoop’s HDFS filesystem allow large quantities of data to be written across multiple nodes in the cluster. This ensures that the data can be accessed by compute resources, can be loaded into the cluster’s RAM for in-memory operations, and can gracefully handle component failures. Other distributed filesystems can be used in place of HDFS including Ceph and GlusterFS. Data can also be imported into other distributed systems for more structured access. Distributed databases, especially NoSQL databases, are well-suited for this role because they are often designed with the same fault tolerant considerations and can handle heterogeneous data. There are many different types of distributed databases to choose from depending on how you want to organize and present the data. Computing and Analyzing Data Once the data is available, the system can begin processing the data to surface actual information. The computation layer is perhaps the most diverse part of the system as the requirements and best approach can vary significantly depending on what type of insights desired. Data is often processed repeatedly, either iteratively by a single tool or by using a number of tools to surface different types of insights. Batch processing is one method of computing over a large dataset. The process involves breaking work up into smaller pieces, scheduling each piece on an individual machine, reshuffling the data based on the intermediate results, and then calculating and assembling the final result. These steps are often referred to individually as splitting, mapping, shuffling, reducing, and assembling, or collectively as a distributed map reduce algorithm. This is the strategy used by Apache Hadoop’s MapReduce. Batch processing is most useful when dealing with very large datasets that require quite a bit of computation. 38 While batch processing is a good fit for certain types of data and computation, other workloads require more real-time processing. Real-time processing demands that information be processed and made ready immediately and requires the system to react as new information becomes available. One way of achieving this is stream processing, which operates on a continuous stream of data composed of individual items. Another common characteristic of real-time processors is in-memory computing, which works with representations of the data in the cluster’s memory to avoid having to write back to disk. Apache Storm, Apache Flink, and Apache Spark provide different ways of achieving real- time or near real-time processing. There are trade-offs with each of these technologies, which can affect which approach is best for any individual problem. In general, real-time processing is best suited for analyzing smaller chunks of data that are changing or being added to the system rapidly. The above examples represent computational frameworks. However, there are many other ways of computing over or analyzing data within a big data system. These tools frequently plug into the above frameworks and provide additional interfaces for interacting with the underlying layers. For instance, Apache Hive provides a data warehouse interface for Hadoop, Apache Pig provides a high level querying interface, while SQL-like interactions with data can be achieved with projects like Apache Drill, Apache Impala, Apache Spark SQL, and Presto. For machine learning, projects like Apache SystemML, Apache Mahout, and Apache Spark’s MLlib can be useful. For straight analytics programming that has wide support in the big data ecosystem, both R and Python are popular choices. Visualizing the Results Due to the type of information being processed in big data systems, recognizing trends or changes in data over time is often more important than the values themselves. Visualizing data is one of the most useful ways to spot trends and make sense of a large number of data points. Real-time processing is frequently used to visualize application and server metrics. The data changes frequently and large deltas in the metrics typically indicate significant impacts on the health of the systems or organization. In these cases, projects like Prometheus can be useful for processing the data streams as a time-series database and visualizing that information. One popular way of visualizing data is with the Elastic Stack, formerly known as the ELK stack. Composed of Logstash for data collection, Elasticsearch for indexing data, and Kibana for visualization, the Elastic stack can be used with big data systems to visually interface with the results of calculations or raw metrics. A similar stack can be achieved using Apache Solr for indexing and a Kibana fork called Banana for visualization. The stack created by these is called Silk. Another visualization technology typically used for interactive data science work is a data “notebook”. These projects allow for interactive exploration and visualization of the data in a format conducive to sharing, presenting, or collaborating. Popular examples of this type of visualization interface are Jupyter Notebook and Apache Zeppelin. 39 Bibliography Data Science: A Comprehensive Overview LONGBING CAO , University of Technology Sydney, Australia 2017 Smith, F.J., Data science as an academic discipline. Data Science Journal, 5, 2006. pp.163–164. Mike Loukides, “What is Data Scoence?”, O’Reilly Media,Inc.”2011. pp10-22. Thomas L. Floyd, “Digital Fundamentals with PLD Programming”, Pearson Prentice Hall, 2006, Prakash G. Gupta, “Data Communications and Computer Networking”, Pretice Hall, 2006 Martin L. Shoemaker, “Human-Machine Interface”, Independently Published, 2019 “An Introduction to Big Data Concepts and Terminology” [Online]. Available : https://www.digitalocean.com/community/tutorials/an-introduction-to-big-data-concepts-and- terminology [Accessed: September 7, 2019] “Data Types: Structured vs. Unstructured Data” [Online]. Available : https://www.bigdataframework.org/data-types-structured-vs-unstructured-data/ Accessed September 7, 2019. “What is Data Science? “https://datascience.berkeley.edu/about/what-is-data-science/ [Online]. Available : [Accessed: September 7, 2019] “The Data Value Chain”https://opendatawatch.com/reference/the-data-value-chain-executive- summary/ [Online]. Available : [Accessed: September 7, 2019] [10 ] “Big Data Analytics – Data Scientist” https://www.tutorialspoint.com/big_data_analytics/data_scientist.htm [Online]. Available : [Accessed: September 7, 2019] 40 3. Chapter Three: Intoroduction to Artificial Intillegence (AI) 3.1. An overview of AI In recent years, accelerated urbanization, globalization and the abundance of products, services and information has begun to fundamentally transform our society. As individuals, we are experiencing an increasingly complex and demanding environment. In response, mobile applications and automated services are being developed, allowing us to more effectively navigate this complex new world. All this is made possible by powerful algorithms that are slowly acquiring fundamental human-like capabilities, such as vision, speech and navigation. Collectively, these computer algorithms are called artificial intelligence (AI). Beyond emulating these ordinary human capabilities, AI is quickly moving forward to master more specialized tasks performed routinely by human experts. In today's world, technology is also growing very fast, and we are getting in touch with different new technologies day by day. In the modern world computers and the algorithms that govern them are seen everywhere; from the smartphones in our pockets, to the transportation systems we ride to work, to the computers that control our economy and banks. Many of these algorithms fall under the general umbrella of the field of artificial intelligence (AI). Artificial Intelligence is a field that originally was founded by computer scientists in the 1950s but has since become a multidisciplinary field with applications in nearly every aspect of human life. Figure1: Areas which contribute to Artificial Intelligence (AI) 41 The field of Artificial Intelligence was initially founded to answer the question: is it possible to build a machine that has intelligence, specifically a human level of intelligence. A necessary step in the pursuit of creating a machine intelligence was understanding the very nature of knowledge representation, reasoning, learning, perception, and problem solving. Through an understanding of these areas AI researchers discovered much narrower applications that a machine can perform and the field of artificial intelligence was expanded. And now Artificial Intelligence (AI) is one of the fascinating and universal fields of Computer Science which has a great scope in future, which holds a tendency to cause a machine to work as a human. Artificial intelligence (AI) is transforming many aspects of our personal and professional lives, from logistics systems that select the fastest shipping routes to dig-ital assistants that unlock doors, turn on lights, and get to know our shopping preferences. The most advanced AI systems use machine learning technology to analyze current conditions and learn from experience. Within the workplace, these self-directed agents are giving rise to the intelligent enterprise: organizations where people make decisions with the help of intelligent machines. 3.3. What is AI According to the father of Artificial Intelligence, John McCarthy, it is “The science and engineering of making intelligent machines, especially intelligent computer programs”. Artificial Intelligence is a way of making a computer, a computer-controlled robot, or a software think intelligently, in the similar manner the intelligent humans think. AI is accomplished by studying how human brain thinks, and how humans learn, decide, and work while trying to solve a problem, and then using the outcomes of this study as a basis of developing intelligent software and systems. Artificial Intelligence (AI) is about algorithms enabled by constraints exposed by representations that support models targeted at thinking, perception and action. Where an Algorithm is an unambiguous specification of how to solve a particular problem. A model is a representation of entities and relationships between them. For example a Computer model, can be a simulation used to reproduce behavior of a system, which in turn can be used to make predictions. 42 Figure 2: Overview of modern AI Artificial intelligence (AI): A broad discipline with the goal of creating intelligent machines, as opposed to the natural intelligence that is demonstrated by humans and animals. AI is the theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. AI is the creation of a computer program that can learn to think and function on its own, kind of like robots that don’t need to be told what to do all the time. In the modern age, AI is the enabler technology. The following are technologies that uses AI: 43 i. Machine Learning – A subset of AI that often uses statistical techniques to give machines the ability to "learn" from data without being explicitly given the instructions for how to do so. This process is known as “training” a “model” using a learning “algorithm” that progressively improves model performance on a specific task. ii. Robotics – Robotics deals with the design, construction, operation, and use of robots, as well as computer systems for their control, sensory feedback, and information processing. These technologies are used to develop machines that can substitute for humans and replicate human actions. iii. Machine Automation – Machine automation is any Information Technology (IT) that designed to control the work of machines. iv. Virtual Reality – Virtual Reality is the computer-generated simulation of a three- dimensional image or environment that can be interacted with in a seemingly real or physical way by a person using special electronic equipment, such as a helmet with a screen inside or gloves fitted with sensors. v. Cloud Computing – Cloud Computing the practice of using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer. Cloud Compuing involves delivering hosted services over the Internet. These services are broadly divided into three categories: Infrastructure-as-a- Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS). vi. Augmented Reality – Augmented Reality refers to a technology that superimposes a computer-generated image with sound, text and effects on a user's view of the real world, thus enhancing the user’s real world experience. vii. Neural Networks – A neural network is a type of machine learning which models itself after the human brain. This creates an artificial neural network that via an algorithm allows the computer to learn by incorporating new data. viii. Big Data/ Internet Of Things(IoT) – Big Data refers to extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions. Internet of Things (IoT) refers to the set of devices and systems that interconnect real-world sensors and actuators to the Internet. ix. Computer Vision: Enabling machines to analyse, understand and manipulate images and video. 44 The modern AI is based on ‘machine learning’ that enables software to perform difficult tasks more effectively by learning through training instead of following sets of rules. Deep learning, a subset of machine learning, is alos delivering breakthrough results in fields including computer vision and language processing. Knowledge engineering is a core part of AI research. Machines can often act and react like humans only if they have abundant information relating to the world. Artificial intelligence must have access to objects, categories, properties and relations between all of them to implement knowledge engineering. Initiating common sense, reasoning and problem-solving power in machines is a difficult and tedious task. Machine perception deals with the capability to use sensory inputs to deduce the different aspects of the world, while computer vision is the power to analyze visual inputs with a few sub- problems such as facial, object and gesture recognition. Robotics is also a major field related to AI. Robots require intelligence to handle tasks such as object manipulation and navigation, along with sub-problems of localization, motion planning and mapping. Components of an AI System include the following: i. Applications: Image recognition, Speech recognition, Chatbots, Natural language generation, and Sentiment analysis. ii. Types of Models: Deep learning, Machine learning, and Neural Networks. iii. Software/Hardware for training and running models: Graphic Processing Units (GPUs), Parallel processing tools (like Spark), Cloud data storage and computer platforms. iv. Programming languages for building models: Python, TensorFlow, Java, and C/C++, etc. 3.4. History of AI It all started with Augusta Ada Lovelace (1842), the world's first programmer, who wrote programs about 100 years before there were computers to run them. And she said, “The analytical engine has no pretensions to originate anything. It can do whatever we know how to order it to perform.” And then nothing much happened until about 1950, when Alan Turing 45 wrote his famous paper, which introduced the Turing Test. And then the modern era really began with a paper written by Marvin Minsky in 1960, titled “Steps Toward Artificial Intelligence.” The following section summarizes the short history of AI. 1956 The term “artificial intelligence” is coined by John McCharty at

Use Quizgecko on...
Browser
Browser