书名：Frank Kane's Taming Big Data with Apache Spark and Python
作者名：Frank Kane
本章字数：91字
更新时间：2025-02-20 14:21:24

What is Spark?

According to Apache, Spark is a fast and general engine for large-scale data processing. This is actually a really good summary of what it's all about. If you have a really massive dataset that can represent anything - weblogs, genomics data, you name it - Spark can slice and dice that data up. It can distribute the processing among a huge cluster of computers, taking a data analysis problem that's just too big to run on one machine and divide and conquer it by splitting it up among multiple machines.