Spark Benefits by Andrew FigPope

1 of 13

Slide Notes

Spark Benefits

Published on Nov 18, 2015

A overview of why Spark is the next big thing in Data Science

View Outline

PRESENTATION OUTLINE

spark

Igniting a Big Data Revolution

Photo by xavi talleda

The Future of MapReduce

Photo by Zach Dischner

Software Advantages

10x-100x faster than MapReduce
More contributors than any other engine
Single codebase for ML, Streaming, and Batch
Built on solid foundation (Scalding, Cascalog)
Highly composable, and compact

It has more contributors than MapReduce itself, and allows for regression to be coded in 20 lines (compared to 15,000 in pure MapReduce).

Photo by Joel Abroad

What People Are Saying

"Leading candidate for a successor to MapReduce"
"Spark-powered applications are operating on more real-time data"
"Spark has surpassed MapReduce as an execution framework"
"Spark is becoming the most powerful platform for data scientists"
“More general and powerful alternative to Hadoop's MapReduce.”

"Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis." -- InformationWeek

"Leading candidate for a successor to MapReduce" -- Cloudera

"Spark-powered applications are operating on more real-time data, which ultimately enables faster fraud detection, better personalization of media, higher quality from manufacturing processes and other operational analytic use cases." -- MapR

"We use HFDS as the underlying cheap storage, and will continue to do so, and some of our legacy customers still use MapReduce and Hive – both of which are still available within xPatterns. However, for new customers & deployments we consider MapReduce a legacy technology and recommend all new code to be written in Spark as the lowest-level execution framework, given the substantial speed advantages and simpler programming model." -- Atigeo

"Spark is becoming the most powerful platform for data scientists because it unifies everything into a single platform whose foundation is Spark" -- InfoQ

“More general and powerful alternative to Hadoop's MapReduce.” -- DataBricks

Photo by deep_schismic