Business Intelligence Chapter 2

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What defines data integrity in the context of analytics?

  • Accuracy, completeness, consistency, and validity of data (correct)
  • Visualization of data trends
  • The quantity of data collected
  • Speed of data retrieval

What is the primary characteristic of structured data?

  • It is difficult to access and interpret
  • It primarily includes images and voice data
  • It is highly organized and easily understood by machine language (correct)
  • It consists of non-standardized formats

Which of the following is NOT considered a characteristic of analytics-ready data?

  • Data richness
  • Data currency
  • Data obsolescence (correct)
  • Data accessibility

What type of data includes a combination of textual, imagery, and voice content?

<p>Unstructured data (D)</p> Signup and view all the answers

Which characteristic of data ensures its relevance to a specific study?

<p>Data relevancy (A)</p> Signup and view all the answers

What is the primary goal of dimensionality reduction?

<p>To decrease the dataset's complexity by reducing features while retaining important properties (A)</p> Signup and view all the answers

What is the process of discretization primarily used for?

<p>Transforming continuous data into categories for simplification (B)</p> Signup and view all the answers

Which of the following best describes the purpose of data normalization?

<p>To standardize data formats across a system (A)</p> Signup and view all the answers

What technique is primarily applied to very large datasets to simplify analysis?

<p>Discretization (C)</p> Signup and view all the answers

Which method focuses specifically on maintaining class representation in a sample?

<p>Balancing (D)</p> Signup and view all the answers

Flashcards

Data

Facts collected from experiences, observations, or experiments.

Structured Data

Data that is organized, structured, and easily understood by computers. Examples include names, dates, and addresses.

Unstructured Data

Data that doesn't have a predefined format, making it more difficult for computers to interpret. Examples include text documents, images, and videos.

Data Integrity

Accuracy, completeness, consistency, and validity of an organization's data. It's essential for reliable analytics.

Signup and view all the flashcards

Data Granularity

The level of detail in data. High granularity means a lot of detail, while low granularity provides broader categories.

Signup and view all the flashcards

Dimensionality Reduction

The process of reducing the number of features (variables) in a dataset while preserving the most important information, making data analysis more efficient.

Signup and view all the flashcards

Variable Selection

Techniques used to select the most relevant variables from a dataset, improving model accuracy and reducing noise.

Signup and view all the flashcards

Sampling

The process of choosing a subset of cases or samples from a larger population to be representative of the whole.

Signup and view all the flashcards

Balancing/Stratification

Adjusting the distribution of cases in a dataset to ensure balance across different categories, preventing biases.

Signup and view all the flashcards

Discretization

The process of transforming continuous data into discrete categories or intervals, simplifying analysis and making data more manageable.

Signup and view all the flashcards

Study Notes

Business Intelligence, Analytics, and Data Science: A Managerial Perspective

  • Chapter 2 focuses on descriptive analytics, covering data nature, statistical modeling, and visualization.
  • Data is a collection of facts, often obtained from experiences, observations, or experiments.
  • Data types include numbers, words, images, and more.
  • Data is the foundational element for deriving information and knowledge.
  • Data quality and integrity are crucial for analytics. Data integrity encompasses accuracy, completeness, consistency, and validity.
  • Metrics for analytics-ready data include source reliability, content accuracy, accessibility, security/privacy, richness, consistency, currency/timeliness, validity, and granularity.
  • Data is categorized into structured, unstructured, and semi-structured forms.
  • Structured data is standardized, follows a format, and is easily accessed (e.g., names, dates, addresses, etc.).
  • Unstructured data includes any combination of text, images, voice, and web content.
  • Semi-structured data falls between structured and unstructured, with some organizational structure (e.g., XML, JSON, log files).

Data Categorization

  • Categorical variables represent types or groups (e.g., race, sex, age group).
    • Nominal data are used for labeling without quantitative value (e.g., gender, color).
    • Ordinal data have an inherent order (e.g., Likert scale, educational level).
  • Numerical variables represent measured values that can be logically ordered.
    • Interval data have order and difference between values (e.g., temperature).
    • Ratio data have order, difference, and a meaningful zero point (e.g., height, income).

Data Preprocessing

  • Real-world data is often dirty, requiring preprocessing for analytics.
  • Data preprocessing involves data consolidation, cleaning, transforming, and reduction.
    • Data reduction techniques include dimensional reduction and variable selection for variables and sampling/stratification for cases/samples.

Statistical Modeling

  • Statistics is a set of mathematical techniques to characterize and interpret data.
    • Descriptive statistics describe data, while inferential statistics draw inferences about a population from sample data.
  • Measures of central tendency include the arithmetic mean, median, and mode.

Dispersion

  • Dispersion, or variability, measures the spread or variation in a given variable.
  • Measures of dispersion include range (max - min), variance, standard deviation, and mean absolute deviation (MAD).

Data Visualization Techniques

  • Data visualization uses visual representations to explore, make sense of, and communicate data.
  • Visualizations can span from histograms to graphs, charts, illustrations, etc.
  • Visual analytics combines information visualization with predictive analytics techniques.
  • Performance dashboards are used for combining and visualising key information from multiple sources.

Regression Modeling

  • Regression is a technique in statistics used to understand or model the relationship between variables.
  • Regression can be used to build models that allow for prediction and analysis of data.
  • It typically involves determining the explanatory (input) and response (output) variables.

Business Reporting

  • Reports translate information into actionable decisions.
  • Reports involve various functions, such as communication, maintaining departmental efficiency, providing analysis results, persuasion, and knowledge management for the organization.
  • Business reports may vary in format, distribution, and source of information.
  • Reports can utilize key performance indicators (KPIs), and can use various types of visualizations to support the presentation.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

More Like This

Use Quizgecko on...
Browser
Browser