Ch1 What is Spark_OCR.pdf
Document Details
Uploaded by EnrapturedElf
Tags
Related
- Lab #3.1 - Apache Spark Stream Processing - Truck Fleet Lab.pdf
- Lecture #4.1 - Spark Structured Streaming API II.pdf
- Lab #5.1 - Apache Spark Stream Processing - Truck Fleet Lab II PDF
- Lecture #9.1 - Data Processing - Apache Spark ML API.pdf
- Lecture #12.1 - Spark in Production Scenarios.pdf
- ELT with Apache Spark PDF
Full Transcript
Chapter 1. What Is Apache Spark? Apache Spark is a unified computing engine and a set of libraries for parallel data processing on computer clusters. As of this writing, Spark is the most actively developed open source engine for this task, making it a standard tool for any developer or data scienti...
Chapter 1. What Is Apache Spark? Apache Spark is a unified computing engine and a set of libraries for parallel data processing on computer clusters. As of this writing, Spark is the most actively developed open source engine for this task, making it a standard tool for any developer or data scientist interested in big data. Spark supports multiple widely used programming languages (Python, Java, Scala, and R), includes libraries for diverse tasks ranging from SQL to streaming and machine learning, and runs anywhere from a laptop to a cluster of thousands of servers. This makes it an easy system to start with and scale-up to big data processing or incredibly large scale. Figure 1-1 illustrates all the components and libraries Spark offers to end-users. S+ruc+ured Streaming Advanced Analytics Libraries k Eco^y^+em S+ruc+ured API