Brief History of Data Analytics

Study Notes

Historical Overview of Data Analytics

Data analytics began as a manual process for comparing statistics and extracting business insights, which was time-consuming and inefficient.
In the 1980s in India, organizations primarily used computers for automating accounting tasks like maintaining employee records, payroll, and leave records, using COBOL programming language.
Popular spreadsheet applications in that era included Lotus123 and VisiCalc, although effective data analysis was limited.

Evolution in the 1990s

Introduction of Windows-based spreadsheet applications, notably Microsoft Excel, allowed non-accounting departments (Marketing, Production, etc.) to utilize spreadsheets for data management.
Enhanced capabilities included visual charts, graphs, and pivot tables, which facilitated data summarization.
Emergence of powerful database software such as Oracle and Microsoft SQL Server enabled robust database creation based on Relational Database Management System (RDBMS) principles.
SPSS (Statistical Package for the Social Sciences), introduced in 1991-92, allowed for serious statistical analyses using established theories and methodologies.

Advances from 2001 to 2010

Businesses started employing advanced machine learning (ML) algorithms and interactive data visualizations to improve decision-making and gain competitive advantages.
Data was being recognized as a vital asset despite its often random and unorganized nature.
Types of data recognized include unstructured, semi-structured, and structured data, where unstructured data, such as emails or social media posts, is unsuitable for detailed analysis.

Understanding Data Types

Unstructured Data: Random formats like letters, memos, and chat transcripts; unsuitable for serious analysis.
Semi-Structured Data: More organized (e.g., registers, financial reports); can be analyzed for basic statistics and visualizations like charts.
Structured Data: Organized in advanced storage formats such as RDBMS and OLAP; efficient for input and retrieval, requiring skilled personnel for manipulation.

Data Engineering

Data engineering focuses on creating and managing the infrastructure for data collection, storage, and processing.
Key components include data modeling, integration, transformation, and ensuring data security and governance.
Data engineers work with big data platforms like Hadoop and Spark to develop data pipelines for efficient data processing.

Concept of Big Data

Big data encompasses large collections of diverse data types growing exponentially, challenging traditional data management systems.
Typically stored in data warehouses or lakes and analyzed with software designed for large data sets, like MongoDB and Tableau.
Applications in machine learning and predictive modeling aid in solving complex business challenges.

Statistical Inference and Sampling

Direct statistical analyses are optimized through sampling when dealing with large populations, where collecting full data proves impractical.
Statistical inference allows for conclusions about a population based on sample analyses, emphasizing the importance of representativity of various demographic factors.

Understanding Deciles and Quartiles

Deciles: Divides a dataset into 10 equal parts, providing insight into data distribution.
Quartiles: Divide a dataset into four parts, showcasing the concentration of data points in different segments.
Visualization using number lines aids in understanding data distribution across quartiles.

Impact of Outliers

Outliers can significantly affect the mean and median of data, with the mean being more sensitive to outliers.
Outliers are identified using the 1.5 x IQR rule, determining lower and upper bounds for data points.

Measures of Spread

Range: Represents the difference between the largest and smallest observation in a dataset, indicating variability.

Brief History of Data Analytics

Choose a study mode

Podcast

Questions and Answers

What were the common uses of computers by businesses in India during the 1980s?

How did the introduction of Ms Excel in the 1990s change data handling in businesses?

What was the significance of SPSS introduced in 1991-92?

During the 2001-2010 period, what advancements did businesses incorporate into data analytics?

Why is data considered a valuable asset for businesses today?

What are some examples of unstructured data, and why is it unsuitable for serious analysis?

How does semi-structured data differ from unstructured data, and what formats can it take?

What types of basic analysis can be performed on semi-structured data?

Why is semi-structured data not suitable for detailed analytics using advanced computer programs?

What visualizations can be created from semi-structured data, and why are they important?

Study Notes

Historical Overview of Data Analytics

Evolution in the 1990s

Advances from 2001 to 2010

Understanding Data Types

Data Engineering

Concept of Big Data

Statistical Inference and Sampling

Understanding Deciles and Quartiles

Impact of Outliers

Measures of Spread

Studying That Suits You

More Like This

Exploring Python: History, Features, and Uses

Nursing Health History Data Analysis Quiz

Historia de la Estadística

History chap 5.2

Quick Share