Data Collection, Warehousing, and Mining

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What does DBMS primarily focus on?

  • Developing web applications.
  • Designing computer hardware.
  • Creating and managing databases. (correct)
  • Managing computer networks.

Data warehousing involves storing data normalized in the most efficient manner to reduce redundancy.

False (B)

What is data mining used for in marketing?

Data mining is used to improve market segmentation by extracting and analyzing customer data to tailor marketing campaigns.

A ______ is a field in a database table that uniquely identifies each record.

<p>primary key</p> Signup and view all the answers

Match the following data quality characteristics with their descriptions:

<p>Accuracy = Data is correct and precise. Consistency = Data does not contradict other data in the database. Currency = Data is up-to-date. Completeness = Data has no significant missing pieces</p> Signup and view all the answers

What does data validation primarily ensure?

<p>Data is accurate and in the correct format. (D)</p> Signup and view all the answers

Logging changes in a database directly prevents unauthorized access.

<p>False (B)</p> Signup and view all the answers

How do parallel data sets protect against data loss and corruption?

<p>They are checked against the original data at intervals, and differences indicate data corruption or deletion.</p> Signup and view all the answers

[Blank] control limits the number of people who can change a database and what changes they can make.

<p>Access</p> Signup and view all the answers

Match each type of SQL key with its description:

<p>Primary Key = Uniquely identifies each record in a table. Foreign Key = Links to the primary key of another table to establish relationships. Composite Key = Combines multiple fields to uniquely identify a record. Alternative Key = A field containing unique values that could be used as the primary key but is not currently set as the primary key</p> Signup and view all the answers

What is the main goal of normalisation in databases?

<p>To efficiently organize data and avoid redundancy. (A)</p> Signup and view all the answers

In first normal form (1NF), a table can have multiple values in a single column.

<p>False (B)</p> Signup and view all the answers

What is transitive dependency in database design, and which normal form addresses it?

<p>Transitive dependency occurs when a non-key attribute depends on another non-key attribute rather than the primary key, and it is addressed in the Third Normal Form (3NF).</p> Signup and view all the answers

A database with missing information and inaccuracies suffers from low data ______.

<p>integrity</p> Signup and view all the answers

Match the types of anomalies with their effects in databases:

<p>Insertion Anomaly = Required data cannot be added unless another piece of unavailable data is also added. Deletion Anomaly = Deleting a record causes the loss of other related data. Modification Anomaly = Changing data requires changes in multiple places.</p> Signup and view all the answers

What is a key aspect of data independence in databases?

<p>Data and applications are separate. (C)</p> Signup and view all the answers

Data redundancy always improves the efficiency and reduces the size of a database.

<p>False (B)</p> Signup and view all the answers

Name one advantage that data warehousing offers over regular databases in terms of data visibility.

<p>Data warehouses make incorrect data entries or data corruption more visible by allowing data analysis.</p> Signup and view all the answers

An ______ trail records exactly who made changes, what the user changed, and when the changes were made in a database.

<p>audit</p> Signup and view all the answers

Match the person involved with a database with their primary responsibility:

<p>Developer = Designs and develops the database. Administrator = Checks the database usage and maintains access. End User = Uses the database to retrieve information.</p> Signup and view all the answers

Which of the following correctly describes 'Data Mining'?

<p>Identifying patterns and trends in large datasets to improve decision making. (D)</p> Signup and view all the answers

RFID (Radio Frequency Identification) technology can only be used for tracking products in warehouses and retail stores.

<p>False (B)</p> Signup and view all the answers

List three GUI components commonly found in web forms that help users input data correctly and limit errors.

<p>Checkboxes, combo boxes, and text boxes.</p> Signup and view all the answers

Unlike most databases, data ______ takes the data from these databases and stores it in a non-normalised way.

<p>warehousing</p> Signup and view all the answers

Match each term related to digital data with its function:

<p>Digital Sensor = Converts data and transmits it digitally. Cookie = Message given to a web browser by a web server. Transaction Tracking = Records transaction data (type, store location, employee). Location Based Data = Data that provides information about different things that can be shown on maps</p> Signup and view all the answers

Flashcards

What is Data?

Unprocessed numbers including facts or signals.

What are Databases?

A collection of organised data, often used to store a wide range of information by programmers and web developers.

What is DBMS?

Software responsible for creating and managing databases, including managing data security.

What are Forms?

An online interactive page that enables user input, typically containing GUI components.

Signup and view all the flashcards

What are Tags?

Electronic tags that transmit a radio frequency, often used for tracking or identifying items.

Signup and view all the flashcards

What is RFID?

Radio Frequency Identification: tiny chips storing kilobytes of information, scanned for data retrieval.

Signup and view all the flashcards

What is a Digital Sensor?

An electronic or electrochemical sensor converting data and transmission digitally, sensing physical properties.

Signup and view all the flashcards

What role do databases play on the internet?

Databases store information shown on many websites, especially those where users create content.

Signup and view all the flashcards

What is a Cookie?

A message given to a web browser by a web server, stored in a text file to identify users and customize web pages.

Signup and view all the flashcards

Give an example of a type of information stored in databases?

This is a credit card payment, automatic toll gates, cookies and cell phone calls.

Signup and view all the flashcards

What is Location Based Data?

Data providing information about different things displayed on maps, including static and non-static information.

Signup and view all the flashcards

What are Location Based Services?

Services using software applications and location-based databases, such as finding the best route or tracking a stolen vehicle.

Signup and view all the flashcards

What is Data Warehousing?

A technique for storing data from multiple databases in a non-normalized way for analysis.

Signup and view all the flashcards

What is the difference between data warehousing and databases?

Data warehouses store large quantities of historical data for analysis whereas databases store current transactions.

Signup and view all the flashcards

What is Data Mining?

A process to identify trends and patterns between different data sets in large databases to improve decision making.

Signup and view all the flashcards

What is the Data mining process?

Extracting relevant data, looking for patterns, and discovering knowledge from the patterns.

Signup and view all the flashcards

What is SQL?

A programming language created to manipulate and extract data from a database, specifying fields, tables, and conditions.

Signup and view all the flashcards

What is the purpose of Data Mining?

Data can support decision skills and help make decisions or develop strategies.

Signup and view all the flashcards

What is Data Integrity?

Refers to how accurate and consistent the data is within a database which prevents missing and incorrect information.

Signup and view all the flashcards

What does data independence refer to?

Separation between Data and any applications in which its being used. This allows updates without having to recompile.

Signup and view all the flashcards

Data Redundancy

Having the exact same data at different locations in a database.

Signup and view all the flashcards

What is Quality Data?

Data not being repeated in multiple tables along with protection from crashes, hacking, and accidental deletion.

Signup and view all the flashcards

What does data mining allow you to do with patterns?

Data mining uncovers patterns that allows understanding of what happened.

Signup and view all the flashcards

How to Protect Data?

Protecting a database from several different threats which can be done through data validation, data verification, and access control.

Signup and view all the flashcards

What is Logging Changes?

Recording any changes made by users to a database to check database consistency and accuracy, and discourage sabotage.

Signup and view all the flashcards

Study Notes

Chapter Overview

  • This chapter covers Data collection, Data warehousing, Data mining and Caring for and managing data

Learning Outcomes

  • Provide an overview of data collection
  • Provide examples of data collection
  • Describe data warehousing
  • Compare data warehousing with databases
  • Describe data mining and provide examples
  • Learn how data should be cared for and managed

Databases in a Nutshell

  • Computers store data for instructions, application data in RAM, and user application files
  • Files and databases are the common structures used to store data
  • While a user operates an application, the data is saved in the computer's memory
  • Data intended for later use persists in a database or file on more permanent storage

Files and Databases

  • Data is unprocessed information
  • To be usable, data requires processing and organization into meaningful information

Databases

  • Databases consist of organized data
  • Databases serve as the most important tool for programmers and web developers for storing data
  • May store application settings, website text, graphics, status updates, messages, and social network comments

Database Management Software (DBMS)

  • DBMS is the software responsible for managing databases, including creation, table construction, and security
  • Popular database management software examples: Microsoft SQL Server, Microsoft Access, MySQL, and SQLite

Data Collection

  • Manually adding data to a database is inefficient and only suitable for small databases
  • Most databases use automatic techniques to capture data

Forms

  • Web forms are interactive online pages for user input
  • Web forms contain GUI components like checkboxes, combo boxes, spinners, drop-down lists, and text boxes
  • Web forms streamline business by limiting paperwork and documentation, favoring online documentation

Tags

  • Electronic tags transmit radio frequency data to a tag reader and vice versa
  • Tags track or identify items and are common in merchandising warehouses, vehicle tracking, and pet tracking

RFID (Radio Frequency Identification)

  • RFID involves tiny chips storing kilobytes of information, scannable for display and database addition
  • Thousands of businesses use RFID to tag products in warehouses

RFID Examples

  • Products in a warehouse are automatically scanned and removed from the database when removed from the warehouse
  • Tools are tracked to see who is using them and when
  • Tickets at events open gates automatically and add data to the database
  • Public transport cards record trips on a database to deduct costs
  • Products sold in shops are scanned and their details are added to the bill, updating inventory

E-Tolls and RFID

  • In December 2014, SANRAL launched an e-toll system in Gauteng to fund a R20 billion highway project
  • Motorists purchased e-tags read by toll gantries
  • Cameras with RFID readers recorded vehicle data to generate monthly invoices

Digital Sensors

  • Digital sensors are electronic or electrochemical devices where data conversion and transmission are done digitally
  • Examples of data sensed include temperature, distance, humidity, and light
  • Wireless sensor tags connect events in the physical world, such as motion, door/window status, temperature, or smartphones

Invisible Online Data Collection

  • Databases are critical for storing website information, especially on user-generated content platforms (YouTube, Facebook, Wikipedia)
  • These sites automatically store user-entered data in databases, including status updates, likes, tweets, and uploaded media
  • Personal information like email addresses, usernames, and passwords are also stored

Cookies

  • Cookie is a message from a web server to a web browser, stored in a text file.
  • The browser returns the message to the server each time the browser requests a page
  • Cookies identify and customize web pages for users, often through a form for personal information
  • Online advertising companies use big databases to track users and activity across web pages

Database Usage

  • Databases are used for credit card payments, automatic toll gates, cookies, and cell phone calls
  • Software is made to read the information and record it in a database automatically
  • Automatic reports can be generated, such as credit card statements

Transaction Tracking

  • Transaction data, like transaction type, store location, employee, customer information and payment details are sent to the corporate database
  • Data is stored on credit cards, store cards and store loyalty cards
  • Transaction tracking offers benefits like consumer safety, improved user experience, fraud detection, tracking browsing history and demographic profiles
  • A downside of data tracking is the possible misuse of personal information

Location Based Data

  • Location-based data provides mappable data, including static data like roads and buildings, and dynamic data like vehicles or traffic
  • Data comes from GPS and geographic positioning systems

Location Based Services (LBS)

  • Location-based services use software applications and location-based databases to provide services such as finding the best route, stolen vehicles, or nearby services
  • Smartphones and tablets are better at location-based computing: weather applications, food ordering applications, and car sharing services
  • Companies mine databases to improve their decision making

Data Warehousing

  • Data warehousing stores data from multiple databases in a non-normalized way, using more storage space

Data Warehousing Details

  • Data warehousing helps in reporting, analytics, and data mining
  • Data warehouse does not contain copies of the original databases, it is a new database
  • A data warehouse makes data available and ready for analysis by users in different departments, who can create graphs and reports

Data Warehousing vs Database

  • A data warehouse stores large amounts of historical data, but a database stores current transactions
  • Normalization refines a database's structure to minimize redundancy and improve integrity

Data Mining

  • Data mining identifies trends and patterns between different sets of data in large databases
  • Right data selection shows trends and patterns between data that can dramatically improve decision making

Data Mining Examples

  • Data mining helps improve market segmentation using customer data to direct personalized loyalty campaigns
  • Data mining in marketing predicts users likely to unsubscribe from services, what they will search, or what will achieve a successful reponse rate

Data Mining Process

  • To mine a database, you will extract relevant data, look for patterns in the data, and discover knowledge from the patterns

Extracting the Relevant Data

  • Select only the data that is useful from a large database to use
  • It can be extracted from the datasets using SQL by specifying the fields to extract, which data table to use and the conditions

Look For Patterns in the Data

  • Working with large amounts of data requires looking for patterns to understand the dataset
  • These patterns can result in knowledge, used to make better decisions and develop strategies

Discover Knowledge

  • Identify patterns, you have turned an overwhelming amount of disorganised data into a few useful facts
  • Confirmed the situation, informed decisions can be made, or strategies developed

Example Data Mining

  • Data mining is used for Government Social Grants Social grants are administered by the South African Social Security Agency (SASSA)
  • SASSA is mandated to provide comprehensive social security services against vulnerability and poverty within the constitutional legislative framework
  • Most social grants are means tested to assess the value of assets and income
  • The Government conducts an annual General Household Survey (GHS) to measure the living circumstances of South African households to collect big data
  • Data mining techniques the relevant data that will be useful is extracted to obtain information/ knowledge

Data Mining in Facebook

  • Facebook accumulates all personal data over time - data collection is happening in more dimensions than are ever understood by most users
  • Using data integration, it's then mixed with other data sources that, as end-users, will never be aware
  • Apps, that use data analytics, are used to analyze friends of friends comments, textual analysis, online behaviour, and so on, to compile data about users
  • Information/knowledge is then used to determine current emotional state, correlate how sad or depressed someone might be, suggest possible friends etc.

Value of Data

  • Online shopping websites can charge owners a fee for placing an advertisement of products on the website, if a database already contains many other products
  • To gather the data needed to sell products, the website's creator can ask sellers to enter the important data for their products on the website, from where it is added to the database

Database Usefulness

  • For a database to be useful, it needs to record and store valuable and useful data
  • It is valuable to record and store in your database:
    • Will I ever use the data in this field?
    • Will anyone else use the data in this field?
    • What fields do I need specifically for my application?
    • What fields would I need for my application in the future?

Characteristics of Quality Data

  • Accurate: The data needs to be both correct and precise
  • Consistent: The data in one part of your database should not contradict or differ from the data in another part of your database
  • Current: The information to be of high-quality, it is important that it is up-to-date
  • Complete: In a database, incomplete data is almost as bad as inaccurate data
  • Relevant Good quality data is relevant to the people who are using it

Data Protection

  • Databases need to be protected from several different threats, including incorrect data entry, data corruption, data loss, accidental data deletion, purposeful data deletion and unauthorised access
  • Multiple tools and techniques protect large databases

Data Validation

  • Data validation is the process in which you check whether the data is accurate, in the correct format or of the correct type before allowing your database to record it
  • It ensures that the data in your database is consistent and accurate

Data Verification

  • Data verification is a manual technique that can be used to make sure that the data on a database is correct and accurate

Data Verification Types

  • Full verification requires that each piece of data that is entered into a database is read and checked by someone - can be very time consuming
  • Sample verification, in which a randomly selected sample of data is checked to ensure there are not systematic errors - possible to miss small mistakes

Data Integrity

  • Data integrity we are referring to the reliability, accuracy and how trustworthy data is over its entire lifecycle
  • Uncorrupted data (integrity) is considered to be ‘clean data' that stays unchanged throughout its lifecycle
  • Many DBMSs have built-in integrity controls that help to maintain the data integrity

Logging changes

  • Logging is the process of recording any changes made by users to a database
  • Called creating an audit trail, the audit trail records exactly:
    • who made the changes
    • what the user changed
    • When they made the changes

Data Warehousing

  • Data warehousing is a technique used for storing data from more than one database, it is usually stored in a way that is secure, reliable and easy to retrieve

Data Warehouses Importance

  • Help improve data integrit, make incorrect data entries or data corruption more visible by allowing data analysis
  • Help improve data integrit, make data loss more visible allowing the problem to be fixed
  • Can be used to recover critical data if it is deleted or corrupted

Access Control

  • Access control refers to managing and controlling the parts of a database that users have access to
  • Limiting the number of people who can change a database, and by limiting what changes each user can make, you can reduce the damage that any single user can do to a database
  • Three important ways to control access to your data: Passwords: ensure that only the owner of a username can log in with that username User rights: determine which tables and fields every username can access, and what changes (if any) the user can make to these tables Good database security: ensures that the data is secure and that outside people cannot find other ways to access the database

Parallel Data Sets

  • Backups are the most important tool to protect databases from data loss and data corruption
  • To ensure that data has not been corrupted or deleted, the database is checked at intervals against a perfect copy of it, called a parallel data set
  • For parallel data sets, If there are differences, it means that data was either corrupted or deleted
  • Database backups should be protected as securely as the database itself

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Data Warehousing and OLAP Technology Quiz
20 questions
Data Warehousing and Data Mining Quiz
11 questions
Data Warehousing and Data Mining: Strategic Information
10 questions
Data Mining & Warehousing - BCS-403
64 questions
Use Quizgecko on...
Browser
Browser