CSC122 compiled note.pdf

THIS NOTE WAS COMPILED BY GAJI OUT OF WHAT WE WERE TAUGHT PHYSICALLY AND ONLINE, DO NOT DEPEND ON ONLY THIS NOTE, ENSURE YOU READ FAR AND WIDE FROM TEMPLE’S DRIVE, GOODLUCK. CSC 122: ### 1. Information System (IS) Definition: An Information System (IS) is a structured system designed to collect, process, store, and distribute information to support decision-making and control within an organization. It integrates various technological, human, and procedural elements to manage and utilize data effectively. Components: - Hardware: Physical devices like computers, servers, and storage systems that run software and store data. - Software: Applications and operating systems that process data and perform tasks (e.g., databases, enterprise applications). - Data: Raw facts and figures that are processed to generate meaningful information (e.g., sales data, customer records). - Procedures: Established methods and policies for managing and operating the system (e.g., data entry protocols, backup routines). - People: Users and IT staff involved in operating and interacting with the system, including managers, end-users, and technical support. 2. Management Information System (MIS) Definition: A Management Information System (MIS) is a type of information system specifically designed to support management activities by providing relevant information for decision-making, planning, and control. Features: - Systematic Data Processing: Ensures data is organized, processed, and managed in a structured manner. - Integrated Approach: Combines various subsystems (e.g., financial, human resources) to provide a comprehensive view. - Decision Support: Offers tools and information to aid in managerial decision-making. - Automated Reporting: Provides automated generation of reports and visualizations to streamline information dissemination. Components: - Hardware: Servers, computers, and networking infrastructure. - Software: MIS applications, decision support systems (DSS), and reporting tools. - Data: Information used for analysis and reporting. - Procedures: Methods for data collection, processing, and dissemination. - Users: Managers and decision-makers who use the system for strategic and operational purposes. Scope: - Operational Management: Handles routine operations and processes. - Tactical Management: Supports mid-level management with performance analysis and resource allocation. - Strategic Management: Assists top management in long-term planning and strategic decision-making. 3. Information Retrieval (IR) Definition: Information Retrieval (IR) is the process of finding relevant information from a large collection of data or documents based on a user’s query or information need. It involves techniques for searching, indexing, and retrieving information efficiently. Techniques: 1. Boolean Retrieval: - Definition: Uses Boolean operators (AND, OR, NOT) to retrieve documents that match specific query criteria. Documents either match or don’t match the query conditions. - Usage: Simple and effective for queries with clear inclusion or exclusion criteria. 2. Query Processing: - Definition: Involves transforming a user's query into a format that can be effectively used to search the document collection. - Steps: Includes tokenization (breaking text into words), stemming (reducing words to their base form), removing stop words (common words that add little value), and query expansion (adding synonyms or related terms). 3. Matching: - Definition: The process of comparing documents with the query to determine their relevance. Various matching algorithms are used to assess how well documents meet the query criteria. 4. Ranking: - Definition: Ordering search results based on relevance to the query. The most relevant documents appear at the top of the search results. - Techniques: May involve scoring methods like term frequency- inverse document frequency (TF-IDF) or more complex algorithms. Models: 1. Vector Space Model: - Definition: Represents documents and queries as vectors in a multi-dimensional space. Documents are ranked based on their cosine similarity to the query vector. - Usage: Widely used in search engines and document retrieval systems. 2. Probabilistic Model: - Definition: Uses probability theory to estimate the likelihood of a document being relevant to a query. The model ranks documents based on their probability scores. - Usage: Provides a probabilistic approach to relevance and ranking. 3. Latent Semantic Indexing (LSI): - Definition: Uses singular value decomposition (SVD) to identify patterns and relationships between terms and documents. It captures the underlying semantic structure. - Usage: Helps in addressing synonymy and polysemy in text retrieval. 4. Natural Language Processing (NLP): - Definition: Involves techniques for understanding and processing human language, including text analysis, sentiment analysis, and language generation. - Usage: Enhances search and retrieval by understanding context and meaning in natural language. 5. Relevance Feedback: - Definition: A technique where the system refines search results based on user feedback about the relevance of previously retrieved documents. - Usage: Improves retrieval accuracy by incorporating user preferences and feedback. 6. Clustering Expansion: - Definition: Involves grouping similar documents together to expand and improve search results based on cluster characteristics. - Usage: Enhances retrieval by organizing documents into meaningful clusters. 7. Machine Learning Techniques: - Definition: Uses algorithms and models to learn from data and improve retrieval performance over time. Includes supervised and unsupervised learning methods. - Usage: Applies techniques such as classification, regression, and clustering to enhance search and ranking. 8. Deep Learning: - Definition: Uses neural networks with multiple layers (deep learning models) to automatically learn and extract features from data. - Usage: Advanced approach for tasks like image recognition, language modeling, and complex pattern recognition. 4. Database Definition: A Database is an organized collection of structured information or data, typically stored electronically, which allows for efficient data management, retrieval, and manipulation. Components: - Tables: Structured data organized into rows and columns. - Queries: Requests made to retrieve or manipulate data (e.g., SQL queries). - Forms: User interfaces for entering and modifying data. - Reports: Summarized and analyzed data presented in a readable format. Purpose: To facilitate efficient storage, retrieval, and management of data for various applications, including transaction processing, data analysis, and reporting. 5. File Organization Definition: File Organization refers to the methods used to store and manage files in a file system or database to optimize data access and retrieval. Techniques: - Sequential File Organization: Data is stored in a sequential order based on a key field. Suitable for processing records in a specific sequence. - Indexed File Organization: Uses an index to quickly locate and access records based on key fields. Improves search efficiency. - Direct File Organization: Data is stored based on a specific address or location (e.g., hashing). Allows direct access to records. - Heap File Organization: Data is stored randomly without a specific order. Suitable for applications where data is frequently added or deleted. 1. Data Warehousing: - Definition: The process of collecting, storing, and managing large volumes of data from various sources in a central repository for analysis and reporting. - Components: Data integration, ETL (Extract, Transform, Load) processes, and data marts. 2. Data Mining: - Definition: The process of discovering patterns, correlations, and useful information from large datasets using techniques such as clustering, association rules, and classification. - Applications: Market basket analysis, fraud detection, and customer segmentation. 3. Big Data Technologies: - Definition: Technologies designed to handle and analyze large and complex datasets that traditional data processing systems cannot manage efficiently. - Examples: Hadoop, Spark, and NoSQL databases.

CSC122 compiled note.pdf

Document Details

Tags

Related

Full Transcript