Information Resources & Metadata

**1. What are three categories that describe the nature of information resources? Give an example of each. How do you characterize the relationships within each category of information?** **Structured Information**: This type of information is *reasonably ordered and can be* *broken down into component parts* and organized into hierarchies (facts & dana). An example is a sales transaction with clearly defined fields for date, customer number, item number, and amount. Relationships within structured information are *straightforward,* such as a customer's order being related to the customer record and the items purchased being related to the order itself. **Unstructured Information**: This type of information has *no inherent structure or order*, and the parts can't be easily linked together. An example is a manila folder containing assorted items about a lawsuit, such as photos, handwritten notes, newspaper articles, or affidavits. Relationships within unstructured information are *difficult to identify*, and the information is often stored in a disorganized manner. **Semi-Structured Information**: This type of information *falls between structured and* *unstructured* information. It shows at least *some structure, such as web pages* that have dates, titles, and authors. An example is a web page with a title, subtitle, content, and a few images. Relationships within semi-structured information are *easier to query* and combine than unstructured information, but they *lack the strong structure* of fully structured data. **2. What is metadata? What does metadata describe for structured information? For unstructured information? Give an example of each type of metadata.** Metadata is data about data, and it clarifies the nature of the information. For structured information: describes the definitions of each of the fields, tables, and their relationships. Exp, in a database, metadata might describe that a field called \"BirthDate\" is of type \"Date/Time\" and has a specific format like \"MM/DD/YYYY.\" For unstructured information: used to describe properties of a document or other resource, such as the title, author, or creation date. For example, metadata for a photo might include the date taken, the photographer's name, and keywords like \"beach\" or \"vacation.\" **3. What are the characteristics of information that affect quality? What are examples of each?** The characteristics of information that affect quality include: **Accuracy**: Mistakes in birth dates, spelling, or price reduce the quality of the information. For example, a customer's address with an incorrect zip code is inaccurate. **Precision**: Rounding to the nearest mile might not reduce quality much when estimating the drive to the mall, but for property surveys, "about 2 miles" is unacceptable. **Completeness**: Omitting the zip code on a customer's address record might not be a problem because the zip can be determined by the address, but leaving off the house number would delay the order. **Consistency**: Reports that show "total sales by region" may conflict because the people generating the reports are using slightly different definitions. **Timeliness**: Outdated information has less value than up-to-date information. For example, in stock trading, timeliness is measured in fractions of a second. **Bias**: Biased information lacks objectivity, and that reduces its value and quality. For example, a manager might include canceled orders to make sales seem higher. **Duplication**: Information can be redundant, resulting in misleading and exaggerated summaries. For example, a customer might appear more than once in a database if their address changes. **4. What were the early design approaches to managing information resources?** **File Processing Systems**: Before databases, information was stored in computer files, and each department maintained its own records. For example, the payroll office maintained personnel records and had its own computer programs to manage its set of files. These systems led to problems such as data redundancy, lack of integration, inconsistent data definitions, and data dependence. **Hierarchical Databases**: These databases organized information in a tree-like structure, resembling an organizational chart. They worked well for one-to-many relationships but struggled with many-to-many relationships. **Network Databases**: These databases allowed more flexible linking of entities that didn't fall along a neat hierarchy and could handle many-to-many relationships. However, they were complex to manage. **Relational Databases**: Invented by E.F. Codd, relational databases organize information into tables of records that are related to one another by linking a field in one table to a field in another table with matching data. This approach became the standard due to its flexibility and efficiency. **5. What are the major disadvantages of file processing systems? What are four specific problems associated with file processing systems?** **Data Redundancy and Inconsistency**: Each department maintained its own records, leading to redundant and inconsistent data. For example, the payroll office might list an employee's name as \"ANNAMARIE,\" while the personnel office shows it as \"ANNMARIE.\" **Lack of Data Integration**: Integrating data from separate systems was difficult. For example, payroll might maintain information about name and pay history, while personnel records contain gender and ethnicity. Combining this data required extra programming effort. **Inconsistent Data Definitions**: Different systems might format data differently. For example, phone numbers might include dashes in one system but be treated as numbers in another. **Data Dependence**: Programs and files were highly interconnected, making maintenance difficult. Even minor changes required significant effort, and IT staff often fell behind **6. Following the file processing model of data management, what three architectures** **emerged for integrated databases? What are the advantages of each? Are there** **Disadvantages?** **Hierarchical Database**: Resembles an organizational chart or an upside-down tree. It works well for one-to-many relationships but struggles with many-to-many relationships. For example, a hospital might use a hierarchical database to organize departments and doctors. Advantages: Simple and efficient for one-to-many relationships. Disadvantages: Inflexible for complex relationships. **Network Database**: Resembles a lattice or web, allowing records to be linked in multiple ways. It supports many-to-many relationships. Advantages: More flexible than hierarchical databases. Disadvantages: Complex to design and manage. **Relational Database**: Organizes information into tables of records that are related to one another by linking a field in one table to a field in another table with matching data. This is the most widely used architecture. Advantages: Highly flexible, easy to query, and supports complex relationships. Disadvantages: Requires careful planning and normalization to avoid redundancy **7. What are the steps in planning a relational data model? Are there benefits to the planning stage?** 1\. **Identify Entities and Attributes:** Determine what entities (e.g., employees, clients, projects) and their attributes (e.g., employee ID, last name, birth date) need to be Included. 2\. **Define Primary Keys**: Ensure each record in a table has a unique primary key, such as an employee ID. 3\. **Normalize the Data Model**: Refine entities and their relationships to minimize duplication and ensure data integrity. 4\. **Establish Relationships**: Use foreign keys to link tables together, such as linking an employee table to a department table using a department ID. Benefits of the planning stage: Proper planning ensures the database is well-structured, reduces redundancy, and makes it easier to maintain and query the database. It also minimizes the need for costly changes later. **8. What are primary keys and foreign keys? How are they used to create links between tables in a relational database?** 1\. **Primary Key**: A field (or group of fields) that makes each record unique in a table. For example, an employee ID uniquely identifies each employee in an employee table. 2\. **Foreign Key**: A field in one table that is the primary key in another table. It is used to link records between tables. For example, a department ID in the employee table links to the department table, where department ID is the primary key. How they create links: Foreign keys establish relationships between tables. For example, the department ID in the employee table links to the department table, allowing you to retrieve information about the department an employee belongs to. **9. What is the typical strategy to access a database? How do users access an Access database? Are there other strategies to access database systems?** **Typical Strategy**: Most people access a database through an application interface with user friendly web-based forms. These forms allow users to securely enter, edit, delete, and retrieve data. **Accessing an Access Database**: Users can access an Access database through forms and reports generated by the software. Access provides tools for creating forms and reports,making it easy for users to interact with the database. **Other Strategies**: Databases can also be accessed through query languages like SQL, interactive voice response (IVR) systems, mobile apps, and natural language interfaces. **10. What is the role of the database administrator in managing the database? What is the career outlook for this job?** **Role of the Database Administrator (DBA):** The DBA is responsible for the efficient operation of the company's databases. Tasks include monitoring and optimizing performance, troubleshooting bottlenecks, setting up new databases, enhancing security, planning capacity requirements, designing backup and disaster recovery plans, and working with department heads and the IT team to resolve problems and build innovative applications. **Career Outlook**: The job outlook for database administrators is strong, with a projected 10-year job growth of 31% and a median salary of \$87,200. The role is critical as organizations increasingly rely on data for decision-making. **11. What is SQL? How is it used to query a database?** **SQL (Structured Query Language):** standard query language used to manipulate information in relational databases How it is used: to create, read, update, and delete records in a database A simple SQL query to retrieve the last name and first name of an employee with the last name \"Park\" would look like this: SELECT LastName, FirstName FROM Employees WHERE LastName = \"Park\"; **12. What is IVR? How is it used to query a database?** **IVR (Interactive Voice Response):** IVR is a technology that facilitates access to databases from signals transmitted by telephone. It allows users to retrieve information and enter data using voice or keypad inputs. **How it is used:** IVR systems are often used in customer service to access account information, retrieve data, and enter information into a database. For example, a customer calling a bank might use IVR to check their account balance or transfer funds by following voice prompts and entering numbers on their phone. **13. What is a shadow system? Why are shadow systems sometimes used in organizations? How are they managed? What are the advantages of shadow systems? What are the disadvantages?** **Shadow System**: a smaller database or information system developed by individuals or departments outside of the IT department. These systems focus on specific information requirements and are not managed by central IT staff. **Why they are used**: Shadow systems are often created because employees want to get their jobs done more efficiently and quickly, especially when the central IT systems are slow to adapt or lack specific functionality. **How they are managed**: Shadow systems are typically managed by the individuals or departments that create them. They are often built using tools like Microsoft Access or Excel. **Advantages:** They provide quick solutions to specific problems. They allow departments to customize systems to their needs. **Disadvantages:** They may not be consistent with the organization's central database. They can lead to data redundancy and inconsistency. They may be abandoned or become unusable if the creator leaves the organization. **14. What is master data management? What is a data steward? What is the role of master** **data management in an organization's integration strategy?** **Master Data Management (MDM**): an approach that addresses inconsistencies in how employees use data by achieving uniform definitions for entities and their attributes across all business units. It ensures that everyone in the organization uses the same definitions for terms like \"employee,\" \"sale,\" or \"student.\" **Data Steward**: a person responsible for ensuring that people adhere to the definitions for master data in their organizational units. They act as watchdogs and bridge builders to maintain data consistency. **Role in Integration Strategy:** MDM is critical for integrating data from multiple sources, especially during mergers or when combining systems from different departments. It ensures that reports and summaries are consistent and accurate **15. What is a data warehouse? What are the three steps in building a data warehouse?** **Data Warehouse:** A central data repository containing information drawn from multiple sources, used for analysis, intelligence gathering, and strategic planning **Three Steps in Building a Data Warehouse**: 1\. **Extract**: Data is extracted from its home database. 2\. **Transform**: Data is transformed and cleansed to adhere to common data definitions. F3. **Load**: Data is loaded into the data warehouse, which is then refreshed at regular Intervals. **16. What are examples of internal sources of data for a data warehouse? What are** **examples of external sources of data for a data warehouse?** **Internal Sources:** Operational data from the company's own systems, such as customer records, transactions, inventory, and human resources information. **External Sources:** Data from government agencies (e.g., U.S. Census Bureau), market research firms, or social media platforms. For example, a jewelry company might add U.S. Census data on median household income by zip code to analyze customer preferences. **17. What are four examples of data warehouse architectures? Which approach is suitable** **to meet today's growing demand for real-time information?** Four Examples of Data Warehouse Architectures: **Relational Database**: Uses the same relational DBMS for the data warehouse as for the operational database, but optimized for fast retrieval and reporting. **Data Cubes**: Creates multidimensional cubes for complex, grouped data arranged in Hierarchies. **Virtual Federated Warehouse**: Relies on a collection of existing databases, with software extracting and transforming data in real time. **Data Warehouse Appliance**: A prepackaged solution that includes hardware, software, maintenance, and support. Suitable Approach for Real-Time Information: The virtual federated warehouse is suitable for real-time information because it extracts and transforms data in real time rather than taking periodic snapshots. **18. What is big data? What are the defining features of big data?** **Big Data**: collections of data that are enormous in size, varied in content, and accumulate very quickly. They are difficult to store and analyze using traditional approaches. Defining Features (the \"Three Vs\"): **Volume**: Data collections can take up petabytes of storage and are continually Growing. **Velocity**: Data sources change and grow at very fast speeds. **Variety**: Data includes structured, semi-structured, and unstructured information. **19. What is data mining? What is the difference between data mining and data dredging?What is the goal of data mining?** **Data Mining**: type of intelligence gathering that uses statistical techniques to explore large data sets, hunting for hidden patterns and relationships that are undetectable in routine reports. **Difference from Data Dredging**: Data dredging refers to finding relationships that might occur by accident and have little value, whereas data mining focuses on discovering meaningful and actionable insights. **Goal of Data Mining**: to uncover patterns and relationships that can help organizations make better decisions, predict future trends, and gain a competitive advantage **20. What are examples of databases without boundaries?** **Databases Without Boundaries**: databases where most of the records are entered and managed by people outside the enterprise. Examples include: **Craigslist**: A database of classified ads where users post and manage their own listings. **Instagram**: Users upload and manage their own photos, which are stored in Instagram's database. **Google Person Finder**: A database used during disasters to help people find missing family members **21. How do ownership issues affect information management? How do information management needs differ among stakeholder groups?** **Ownership Issues**: arise because people often view information resources protectively, even when the organization claims ownership. For example, salespeople may want to protect access to their sales leads, or departments may want to control who can modify their records. These issues can lead to conflicts and delays in making changes to the database. **Differences Among Stakeholder Groups:** **Top-Level Management:** Needs strategic information and insights from big data. **Operating Units:** Need transaction-level reports and systems that support fast moving business requirements. **Customers:** Want simple, reliable user interfaces and quick access to information. **Government Agencies:** Require compliance reports using their own definitions.

Information Resources & Metadata

Document Details

Tags

Related

Summary

Full Transcript