Spark is new technology that is on the top of Hadoop Distributed File System (HDFS) that is characterized as “a fast and general engine for large-scale data processing.” Spark have few key features that make it the most interesting upcoming technology in big data world after Apache Hadoop in 2005.
Advantages of Spark are as follows:
Speed : It run programs up to 100x faster than Hadoop MapReduce in memory or 10x faster on disk.Also Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing.
Ease of Use : It write applications quickly in Java, Scala or Python.Spark offers over 80 high-level operators that make it easy to build parallel apps and can used interactively from the Scala and Python shells.
Generality : Combine SQL, streaming, and complex analytics.Spark powers a stack of high-level tools including Shark for SQL, MLlib for machine learning, GraphX, and Spark Streaming. Also these frameworks seamlessly combine in the same application.
Integrated with Hadoop : Spark can run on Hadoop 2's YARN cluster manager and can read any existing Hadoop data.If there is a Hadoop 2 cluster then it can run Spark without any further installation .Spark is easy to run standalone or on EC2 or Mesos. It can read from HDFS, HBase, Cassandra, and any Hadoop data source.
Advantages of Spark are as follows:
Speed : It run programs up to 100x faster than Hadoop MapReduce in memory or 10x faster on disk.Also Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing.
Ease of Use : It write applications quickly in Java, Scala or Python.Spark offers over 80 high-level operators that make it easy to build parallel apps and can used interactively from the Scala and Python shells.
Generality : Combine SQL, streaming, and complex analytics.Spark powers a stack of high-level tools including Shark for SQL, MLlib for machine learning, GraphX, and Spark Streaming. Also these frameworks seamlessly combine in the same application.
Integrated with Hadoop : Spark can run on Hadoop 2's YARN cluster manager and can read any existing Hadoop data.If there is a Hadoop 2 cluster then it can run Spark without any further installation .Spark is easy to run standalone or on EC2 or Mesos. It can read from HDFS, HBase, Cassandra, and any Hadoop data source.