coontent.docx

Software tells hardware what to do. Software takes the form of **code**, which is a **written list of step-by-step instructions to hardware** -- basically a digital "To Do" list. A code is written with a specific purpose. Here are some examples you may be familiar with: - "Carry out mathematical operations based on keyboard digits a user presses on"; - "Convert keyboard digits that a user presses into text on a computer screen"; - "Show the content of a website address that a user has typed in on the computer screen"; Once a code fulfills the purpose it was created for, the final product is referred to as "software". Codes are written by **programmers** in many different types of languages that digital devices understand. We'll give you an overview of some of these languages later on. To start with, though, let's answer a central question: "Why does a computer do anything at all?" After all, your table doesn't do anything just because you give it a "To Do" list. So why does a computer? Let's have a look at this in the next video. Software is split into two categories: **(a)** system software; and **(b)** application software. - **System software** includes **all codes that ensure your hardware runs properly**. - **Application software** (i.e. "**apps**") includes **all the codes that allow users to accomplish specific tasks** (e.g. write texts, create slides, solve math problems, play games, play videos); - Assuming that the system software on your organization's digital devices is installed and running correctly, it can choose from a variety of application software types to build its Enterprise Information System. - We will look more closely at many of these in Chapter 3. But as a teaser, let's play a game and see how many types of organizational apps you're already familiar with. Data comes in two forms: it can be "discrete" or "continuous". What's the difference? **Discrete data** can be counted and separated into distinct categories. For example,\ if you count the number of apples in a basket, that is discrete data. You can count each apple and the result will always be a whole number, like 3, 4, or 5. In contrast, **continuous data** has no inherent cut off point. Imagine you measure the length of a piece of string. Depending on how precise your measuring equipment is, the length could be 2.5 cm. Or it could be 2.51 cm. Or 2.514 cm. Or even 2.514283491 cm.\ In fact, there's never any final cut off point at which you could say, "This is the exact length of the string." The "cut off point" you end up using is determined by the precision of your measuring equipment, but not by the length of the piece of string itself. Even though they're not entirely\ the same thing, we often refer to discrete data as "**digital data"**. and to continuous data as "**analog data**". Continuous data is often referred to as "analog data" because it occurs most frequently in nature (i.e. "non-computer environments"). Temperatures, distances, speed, sound waves, the intensity of sunlight: these are all examples of natural phenomena that are sent out as continuous signals. In contrast -- as you will see later in this section - whenever a computer does anything,\ its activities ultimately come down to 1's and 0's -- so to two discrete numbers.\ That's why discrete signals are often referred to as "digital data". Now we know that "FALSE" is a Boolean, "12" is an integer, "12.35" is a float, and "Whaddup Bro?" is a string. But as you heard in an earlier video, computers have to convert each of the values above into a sequence of 1s and 0s before they can do anything useful with them. There's a good reason for this. You learned earlier in this lesson that the\ Central Processing Unit (CPU) is a chip that is the "brain" of the computer. The CPUs of modern PCs contain anywhere from several hundred million to over 20 billion **transistors**. You can see a magnified image of a CPU on the left. All those tiny metal pieces are transistors. And by the way: other computer components like Random Access Memory (RAM) and the Graphics Processing Unit (GPU) have transistors, too. Transistors are tiny electronic switches that can be turned "on" or "off". Think about that for a moment. "On" or "Off"\... "1" or "0"\... Do Any similarities come to mind? It becomes simple to represent any data on a computer if you convert it to a sequence of 1s and 0s. Based on the sequence of 1s and 0s you've converted the data into, [the computer selects a matching number of transistors]. A transistor is turned "**on**" if the computer wants to represent a "1". And it's left "**off**" if the computer wants to represent a "0". Later, the computer "remembers" the value simply because it regularly sends electricity through its CPU. If a transistor is turned "on", then the electric current can flow through it. Tf the transistor is "off", it can't. Due to this match of "on"/"off" with "1"/"0", most computers are based on a system called the 'binary system'. As far back as Ancient Egypt, humans have represented numbers by using the **decimal system**. The decimal system is based on the digit values "0", "1", "2", "3", "4", "5", "6, "7", "8", and "9" -- so ten symbols. Since we only have ten symbols in the decimals system, you can't represent this number of bitcoins with just one digit. You have to use two digits, i.e. "12". That's how the decimal system works. When we write down a value, every digit in a number sequence represents one of ten possible symbols. If your number is HIGHER than that, you need to include an extra digit and start over. For example, to represent the number of kilometers between Vienna and Bregenz, you'd even need THREE digits: "570". **Bottom line:** under the decimal system, any time the "remainder" of a value is more than "9", you represent it by including a further digit. As the decimal system has been an omnipresent component of human civilization for millennia, we tend to think that it's the ONLY way values can be represented. But it's not. There are plenty of other number systems. One is the binary system\... A **binary system** can represent exactly the same values as a decimal system. But it only uses two symbols instead of ten. These two symbols are "0" and "1". As soon as a the value you want to illustrate exceeds "1", you need to add more digits. We've seen that with the binary system,\ we can represent any number. These binary numbers mimic the "on"/"off" states of the transistors in computer chips. And that is how computers process them. But what about letters? Can we convert them to binary numbers, too? The answer is: absolutely. We do this based on a code called the **ASCII Code** (*American Standard Code for Information Interchange*). The ASCII Code was first published in 1963 and represents alphabet letters (and numbers [whenever they're used as text]) by assigning specific binary numbers to them. Each letter is represented by a binary number with eight digits. Apples, oranges, tables, college textbooks, and self-driving buses are all very different objects. But when you put them on a scale, each of them weighs something. The same is true for any type of data we use on a computer. Booleans, integers, floats, strings, images, and audio are all very different in their purposes. But they each consist of binary numbers. Some of these binary numbers can be represented by a single transistor\ (i.e. a Boolean value), while some of them need millions of transistors (e.g. all the binary numbers needed to represent a single image). When we weigh ourselves on a scale in Central Europe, the base measurement is a milligram. Since most of us weigh many millions of milligrams, the Metric System provides us further units of measurements -- just to keep the values small: The gram. The kilogram. The ton. The kiloton. The megaton. The gigaton. I think you get the picture. Data is also "weighed" in pre-defined units of measurements. The smallest unit is the **bit**. It represents one binary digit (or **one transistor**). Right now, the data you've typed into or created with your computer is being represented by millions of little transistors in the CPU, RAM, and GPU that are each either "on" or "off". But what happens if you turn the computer off? Or if a blackout knocks out the electricity supply in your neighborhood for an hour. Is that data still on your computer? I think anyone who doesn't have "autosave" activated on Word or Pages knows the answer to that. The transistors in your PC are needed to do things each time you give your computer new instructions. For a single session, a few million transistors can "temporarily" be used to represent your data. But in the long run, those transistors are needed to represent all the OTHER data your computer will be processing in the future. Bottom line: if you want to use your data after your computer shuts down, you need to store it. You've done this countless times, by clicking on "Save as\..." and choosing a file name. As a result, your data is stored on your PC's hard drive. Or on a USB stick. Or on a cloud. Those are great solutions for individuals. But are they ideal for organizations, too? For argument's sake, let's assume that you love dairy products: milk, cheese, yoghurt, and butter. There's a problem with such products. They perish quickly unless they're stored somewhere cool. If you have your own office area in your organization, you might set up a tiny private refrigerator. Problem solved -- so long as you're the only one that needs access to the dairy products we just named. But what if your entire OFFICE loves dairy products? And what if you all pool together money to buy a big batch of dairy products every week. Then the "private refrigerator" isn't such a great solution anymore. Either **(a)** people will constantly be phoning you to bring them their dairy products; or **(b)** people have to walk all the way to your office each time they want gorgonzola. If all the data in your database is stored in a structured, centralized manner (like food on the shelves of a refrigerator), we call it a **relational database**. You've worked with spreadsheet software like Excel. When you open a new Excel file, the first thing you see is an empty table. That table consists of an endless number of columns and rows. These columns and rows are split up into individual cells, in which you enter data. For simple tasks, one table is usually enough. But sometimes -- maybe because you're organized -- you want to split up your data over multiple tables. That's very easy in Excel. By clicking on a tab at the bottom of the window, you can access another worksheet -- or table. And another. And another. Relational Databases are not the exact same thing as spreadsheet software. But they DO share many commonalities. 1. **Database Tables:** The structure of a relational database is based on tables.\ Just like with Excel, you can limit yourself to one single table. But most organizations have far more than just one table in their database. They typically split their data into categories (e.g. "customer data", "order data", "supplier data", "inventory data", "employee data", "product data", etc.). Then they create separate tables for each data category. For example, the database outline on the left shows that this company's data will be clustered into seven tables. A table in a relational database is split into rows and columns. **2. Rows in a Database Tables:** Each **row** in a database table represents a single **record** (or "instance") of interconnected data. **3. Columns in a Database Tables:** Each **column** in a database table represents a specific type of information to be stored. Let's make this clearer with an example. - An organization wants to store various pieces of customer data. In its database, it creates a new table called "customer\_data". - It creates the following columns in this table: **(1)** "first\_name", **(2)** "last\_name", **(3)** "address", and **(4)** "phone". - Now it enters a new record (i.e. row) for a customer. The data in this record is "John", "Smith", "Währinger Gürtel 97", "0664123456789". As you can imagine, the organization enters records (i.e. rows) for each of its customers. Can you think of a potential problem? Exactly. What if TWO customers are named John Smith, and both live at Währinger Gürtel 97 and both have the same phone number? Maybe because they're twins with -- let's face it -- very, very uncreative parents. And they live together and share a phone. In such a case, how can the organization access the 'correct' John Smith? There's a solution for this problem, and it's a part of any table in any relational database. The organization creates a column in the table that is a **primary key**. Basically, whenever a record is entered, a further piece of data (usually a number that increases by 1 for each new row),\ is included. This way, each row can always be uniquely identified. Even with two John Smiths. **4. Primary Keys in a Database Tables:** One column in every table is reserved for a primary key (i.e. data that allows users to access every row individually, even if all the other data in two rows is identical) Each **column** in a database table represents a specific type of information to be stored. Let's have a look at the final customer\_data table on the next slide\... Once a database table has been set up, entering records into that table becomes as simple as entering values into the cell of a spreadsheet. Accessing , modifying and deleting records is just as easy. Users can do all of these things through a type of software that is called a **database management system** (**DBMS**). Most DBMS are based on a simple programming language that was created specifically to work with relational databases: **SQL**. SQL stands for "**structured query language**". As the name suggests, SQL only 'works' because the data in relational databases is stored in a highly structured manner (i.e. in tables, columns, and rows). For this reason, we refer to the data in a relational database as "**structured data**". Relational databases are excellent tools through which to support your organization's operating activities (e.g. customer relationship management, inventory management, transactions processing, supply chain management, employee data management). However, a key component of modern-day strategy development lies in **business intelligence** (**BI**) activities. Such activities involve the\ use of software to analyze a very large volume of raw historical data. The software identifies patterns and trends in this data. It shares its insights -- and sometimes even recommendations - with the managers, who can then develop strategy that is highly data-based. Traditional relational databases are not ideally suited for handling the large amounts of data needed for good business intelligence. Because of this, organizations often set up a **data warehouse**. A data warehouse is ALSO a database. But it is developed with the main purpose of analyzing large amounts of (historical) data. Its advantages compared to a "regular" relational database are: - **Scalability:** Data warehouses can easily be "expanded" to process large amounts of data and user requests; - **Data Integration:** Data warehouses are designed to integrate data from multiple (internal and external) sources; - **Data Quality:** Data warehouses use data cleansing and transformation processes to ensure data quality and consistency. - We've mentioned the mega-trend "Big Data" a few times already. Big Data impacts modern - organizations in countless ways. One of those ways is the type of database an organization uses. - The 'relational databases' that we've discussed so far are excellently suited for handling smaller, - well-structured quantities of data. If an organization only wants to store data that are related to e.g. sales transactions, employee and customer data, inventory levels, and so forth -- then a relational database is the ideal option. - Increasingly, though, organizations want -- or need -- to store far more data than the above. We've already mentioned Business Intelligence. In order for BI analytics to deliver useful results, the BI software needs to draw on very large quantities of data -- as does any type of artificial intelligence an organization uses. Further, there's the ascension of the Internet of Things (IoT). Any organization that manufacture digital devices -- or that uses data created by those devices -- will have to store that data somewhere: for strategic, operational and legal reasons. - [Bottom line:] Organizations are going to have to store all that Big Data someplace. - Relational databases are excellent at processing smaller quantities of data that are well-structured. The problem is that Big Data is neither 'small' nor 'well-structured'. And as soon as there is too much data, relational databases suffer from many issues: performance, speed, cost, and data compatibility. - For this reason, a new type of database "for Big Data" has emerged: - the "NoSQL database". - **NoSQL databases** -- or "**non-relational databases**" -- have been designed to handle very large amounts of **unstructured data** (i.e. data that have not been split into tables, columns and rows) that were created by many different sources. Perfect for Big Data. - NoSQL databases are **[NOT based on any fixed type of structure]**. This means that you do not have to "pre-define" the type of data that will be stored by creating tables. Instead, new data is added to the database, entirely "as is". - Let's look at an example: A company wants to store the following data in a database: - An important data trend is the emergence of the **data lake**. A data lake has many similarities to a data warehouse. In both cases, they are used by an organization to collects a large quantity of data. And in both cases, the underlying technology is often a NoSQL database. However, the purpose of a data lake is very different from that of a data warehouse. While data warehouses are specifically created so that managers can carry out business intelligence analytics (i.e. analysis of large data samples for the purpose of making strategic decisions), data lakes are far more "experimental" in their purpose: - A data lake draws (or "ingests") its data from **[many different sources]** (or "streams"), such as social media platforms, websites, mobile apps, IoT devices and sensors, healthcare records, public government datasets and company databases. - These streams provide the data lake with massive amounts of **[raw, unstructured data]**. - This data is then cleaned and filtered with the help of AI-based software. - Finally, the data is analyzed by artificial intelligence - often for machine learning purposes -- to identify and learn new things. This **[learning process is far more]** **[open-ended]** than is the case with business intelligence analytics (i.e. data warehouses). There is no automatic rule in a data lake that the AI may only learn things that directly affect business strategy.

Document Details

Tags

Related

Full Transcript

Upgrade to continue