데이터 모델링과 데이터베이스 정규화

Definition: Process of creating a data model to visually represent data structures, relationships, and constraints.
Types:
- Conceptual: High-level view, focuses on user needs.
- Logical: Detailed structure without implementation details.
- Physical: Implementation details on specific database systems.
Entities and Attributes:
- Entities: Objects or things in the database (e.g., Customers).
- Attributes: Details describing entities (e.g., Customer Name, Address).
Relationships: Define how entities interact (e.g., One-to-Many, Many-to-Many).

Purpose: Minimize redundancy and dependency by organizing fields and tables.
Normal Forms:
- 1NF (First Normal Form): Eliminate repeating groups; ensure atomicity.
- 2NF (Second Normal Form): Remove subsets of data that apply to multiple rows; relies on 1NF.
- 3NF (Third Normal Form): Remove columns not dependent on the primary key; relies on 2NF.
- BCNF (Boyce-Codd Normal Form): A stronger version of 3NF; every determinant must be a candidate key.
Denormalization: Sometimes used for optimization by intentionally introducing redundancy.

Basic Components:
- SELECT: Retrieve data from a database.
- FROM: Specify the table from which to retrieve.
- WHERE: Filter records based on conditions.
- JOIN: Combine rows from two or more tables based on related columns.
- GROUP BY: Aggregate data based on a specific column.
- ORDER BY: Sort results by one or more columns.
Common Query Types:
- DML (Data Manipulation Language): INSERT, UPDATE, DELETE operations.
- DDL (Data Definition Language): CREATE, ALTER, DROP for database schema.
- DCL (Data Control Language): GRANT, REVOKE for permissions.

Indexing: Create indexes on columns frequently used in queries to improve retrieval speeds.
Query Optimization:
- Analyze and restructure queries for efficiency.
- Use EXPLAIN plans to understand query execution paths.
Partitioning: Split tables into smaller segments to improve performance on large datasets.
Caching: Use query caching to store the results of frequently run queries.

Definition: Extract, Transform, Load; process for moving and transforming data between systems.
Phases:
- Extract: Retrieve data from various sources (databases, APIs, flat files).
- Transform: Cleanse, format, and structure the data to meet reporting and analysis requirements (e.g., aggregations, data type conversions).
- Load: Transfer transformed data into the target database/storage system.
Tools: Use ETL tools like Talend, Apache Nifi, and Informatica for automation and efficiency.
Best Practices:
- Ensure data integrity and consistency during the process.
- Monitor ETL performance and retry failed jobs as necessary.

데이터 모델은 데이터 구조, 관계 및 제약 조건을 시각적으로 표현하기 위해 사용되는 프로세스입니다.
유형:
- 개념적: 사용자 요구 사항에 중점을 둔 고수준 뷰입니다.
- 논리적: 구현 세부 정보 없이 상세한 구조입니다.
- 물리적: 특정 데이터베이스 시스템에 대한 구현 세부 정보입니다.
엔티티 및 속성:
- 엔티티: 데이터베이스의 객체 또는 사물 (예: 고객)
- 속성: 엔티티를 설명하는 세부 정보 (예: 고객 이름, 주소)
관계: 엔티티가 어떻게 상호 작용하는지 정의 (예: 일대다, 다대다).

목적: 필드 및 테이블을 구성하여 중복성 및 종속성을 최소화합니다.
정규 형식:
- 1NF (첫 번째 정규 형식): 반복되는 그룹을 제거하고 원자성을 보장합니다.
- 2NF (두 번째 정규 형식): 여러 행에 적용되는 데이터 하위 집합을 제거하고 1NF에 의존합니다.
- 3NF (세 번째 정규 형식): 기본 키에 종속되지 않은 열을 제거하고 2NF에 의존합니다.
- BCNF (보이스-코드 정규 형식): 3NF의 강력한 버전으로 모든 결정자가 후보 키여야 합니다.
비정규화: 중복성을 의도적으로 도입하여 최적화에 사용되는 경우가 있습니다.

기본 구성 요소:
- SELECT: 데이터베이스에서 데이터를 검색합니다.
- FROM: 데이터를 검색할 테이블을 지정합니다.
- WHERE: 조건에 따라 레코드를 필터링합니다.
- JOIN: 관련 열을 기반으로 두 개 이상의 테이블에서 행을 결합합니다.
- GROUP BY: 특정 열을 기반으로 데이터를 집계합니다.
- ORDER BY: 하나 이상의 열을 기반으로 결과를 정렬합니다.
일반적인 쿼리 유형:
- DML (데이터 조작 언어): INSERT, UPDATE, DELETE 작업
- DDL (데이터 정의 언어): 데이터베이스 스키마에 대한 CREATE, ALTER, DROP
- DCL (데이터 제어 언어): 권한에 대한 GRANT, REVOKE

정의: 추출, 변환, 로드; 시스템 간에 데이터를 이동하고 변환하는 프로세스입니다.
단계:
- 추출: 다양한 소스(데이터베이스, API, 플랫 파일)에서 데이터를 검색합니다.
- 변환: 보고 및 분석 요구 사항을 충족하도록 데이터를 정리, 형식화 및 구조화합니다(예: 집계, 데이터 유형 변환).
- 로드: 변환된 데이터를 대상 데이터베이스/저장 시스템으로 전송합니다.
도구: Talend, Apache Nifi 및 Informatica와 같은 ETL 도구를 사용하여 자동화 및 효율성을 높입니다.
모범 사례:
- 프로세스 동안 데이터 무결성 및 일관성을 보장합니다.
- 필요에 따라 ETL 성능을 모니터링하고 실패한 작업을 다시 시도합니다.