Full Transcript

Chapter 1. What Is Apache Spark? Apache Spark is a unified computing engine and a set of libraries for parallel data processing on computer clusters. As of this writing, Spark is the most actively developed open source engine for this task, making it a standard tool for any developer or data scienti...

Chapter 1. What Is Apache Spark? Apache Spark is a unified computing engine and a set of libraries for parallel data processing on computer clusters. As of this writing, Spark is the most actively developed open source engine for this task, making it a standard tool for any developer or data scientist interested in big data. Spark supports multiple widely used programming languages (Python, Java, Scala, and R), includes libraries for diverse tasks ranging from SQL to streaming and machine learning, and runs anywhere from a laptop to a cluster of thousands of servers. This makes it an easy system to start with and scale-up to big data processing or incredibly large scale. Figure 1-1 illustrates all the components and libraries Spark offers to end-users. S+ruc+ured Streaming Advanced Analytics Libraries k Eco^y^+em S+ruc+ured API

Use Quizgecko on...
Browser
Browser