神刀安全网

Getting Started with Heron on Apache Mesos and Apache Kafka

Getting Started with Heron on Apache Mesos and Apache Kafka

Heron has been  Open Sourced , woo! Heron is Twitter’s  distributed stream computation system for running Storm compatible  topologies in production.

A Heron topology is a directed acyclic graph used to process streams of data. Heron topologies consist of three basic components: spouts and bolts , which are connected via streams of tuples . Below is a visual illustration of a simple topology:

Getting Started with Heron on Apache Mesos and Apache Kafka

Spouts are responsible for emitting tuples into the topology, while bolts are responsible for processing those tuples. In the diagram above, spout S1 feeds tuples to bolts B1 and B2 for processing; in turn, bolt B1 feeds processed tuples to bolts B3 and B4 , while bolt B2 feeds processed tuples to bolt B4 .

This is just a simple example; you can use bolts and spouts to form arbitrarily complex topologies.

Topology Lifecycle

Once you’ve set up a Heron cluster , you can use Heron’s CLI tool to manage the entire lifecycle of a topology, which typically goes through the following stages:

  1. submit the topology to the cluster. The topology is not yet processing streams but is ready to be activated.
  2. activate the topology. The topology will begin processing streams in accordance with the topology architecture that you’ve created.
  3. restart an active topology if, for example, you need update the topology configuration.
  4. deactivate the topology. Once deactivated, the topology will stop processing but remain running in the cluster.
  5. kill a topology to completely remove it from the cluster. It is no longer known to the Heron cluster and can no longer be activated. Once killed, the only way to run that topology is to re-submit it.

Spouts

A Heron spout is a source of streams, responsible for emitting tuples into the topology. A spout may, for example, read data from a Kestrel queue or read tweets from the Twitter API and emit tuples to one or more bolts. Information on building spouts can be found in Building Spouts .

Bolts

A Heron bolt consumes streams of tuples emitted by spouts and performs some set of user-defined processing operations on those tuples, which may include performing complex stream transformations, performing storage operations, aggregating multiple streams into one, emitting tuples to other bolts within the topology, and much more. Information on building bolts can be found in Building Bolts .

Data Model

Heron has a fundamentally tuple-driven data model. You can find more information in Heron’s Data Model .

Logical Plan

A topology’s logical plan is analagous to a database query plan .

Physical Plan

A topology’s physical plan is related to its logical plan but with the crucial difference that a physical plan maps the actual execution logic of a topology, including the machines running each spout or bolt and more. Here’s a rough visual representation of a physical plan:

Getting Started with Heron on Apache Mesos and Apache Kafka If you want to get your toes wet first give the local cluster a shot  http://twitter.github.io/heron/docs/getting-started/ .

If your ready for waste deep then lets get Heron running on Mesos with Kafka!!!

git checkout -b v2 origin/v2

./build-ubuntu.sh #requires Docker

./setup-cli-ubuntu.sh #setup cli and re-pack core package

# start vagrant image, requires Vagrant & Virtualbox

cd contrib/kafka9/vagrant

vagrant up

#ssh to vagrant and start a 0.9 broker, note broker endpoint address

vagrant ssh master

mkdir -p /vagrant/dist/topologies

wget -P /vagrant/dist/topologies/ https://s3.amazonaws.com/repo.elodina/kafka-09-mirror_deploy.jar

./setup-brokers.sh 0 1 #uses Mesos Kafka Scheduler

# start a topology using the script which uses heron cli under the hood:./submit-09-topology-mesos.sh topologyname slave0:5000 foo bar

At this point you can navigate to the Mesos UI http://192.168.3.5:5050/#/ and see things working. Every topology is a Mesos Framework.

Getting Started with Heron on Apache Mesos and Apache Kafka Tasks on the cluster

Getting Started with Heron on Apache Mesos and Apache Kafka

Start Heron Tracker

The Heron Tracker is a web service that continuously gathers information about your Heron cluster. You can launch the tracker by running the command (which is already installed if you started with your getting your toes wet, else ). You need to update the statemanager for your heron_tracker.yaml like so.

statemgrs:

type: “zookeeper”

name: “zk”

host: “192.168.3.5”

port: 2181

rootpath: “/heron”

tunnelhost: “localhost”

You can reach Heron Tracker in your browser at http://localhost:8888

Start Heron UI

Heron UI is a user interface that uses Heron Tracker to provide detailed visual representations of your Heron topologies. To launch Heron UI:

You can open Heron UI in your browser at http://localhost:8889 .

Getting Started with Heron on Apache Mesos and Apache Kafka

click “topologyname” or what you called it when running submit above.

Getting Started with Heron on Apache Mesos and Apache Kafka

The Heron UI provides a lot of information about a topology or a part of it quickly, thus reducing debugging time considerably. Some of these features are listed below. A complete set of features can be found in following sections.

  1. See logical plan of a topology
  2. See physical plan of a topology
  3. Configs of a topology
  4. See some basic metrics for each of the instances and components
  5. Links to get logs, memory histogram, jstack, heapdump and exceptions of a particular instance

Now onto the Apache Kafka Spout & Bolt & Example topology which mirrors data from one topic to another (foo to bar or whatever set as the params above). Here is how to build and work on the topology yourself.

#back in github.com/elodina/heron directory

git checkout -b kafka_spout origin/kafka_spout

wget -q ‘ https://github.com/bazelbuild/bazel/releases/download/0.1.2/bazel-0.1.2-installer-linux-x86_64.sh&#8217 ;

chmod +x bazel-0.1.2-installer-linux-x86_64.sh

./bazel-0.1.2-installer-linux-x86_64.sh –user

./bazel_configure.py

~/bin/bazel build –config=ubuntu //contrib/kafka/examples/src/java:all-targets

cp ./bazel-bin/contrib/kafka/examples/src/java/kafka-09-mirror_deploy.jar /vagrant/dist/topologies/

How is Heron better than Storm?

  1. The provisioning of resources is abstracted from the duties of the cluster manager so that Heron can play nice with the rest of the shared infrastructure.
  2. Each Heron Instance executes only a single task so is easy to debug.
  3. The design makes transparent which component of the topology is failing or slowing down as the metrics collection is granular and can easily map an issue to a specific process in the system.
  4. Heron allows a topology writer to specify exactly the resources for each component, avoiding over-provisioning.
  5. Having a Topology Manager per topology enables them to be managed independently and failure of one topology does not affect the others.
  6. The backpressure mechanism gives a consistent rate of delivering results and a way to reason about the system. It is also a key mechanism that supports migration of topologies from one set of containers to another.
  7. It does not have any single point of failure.

Some benchmarks from @manuzhang

Getting Started with Heron on Apache Mesos and Apache Kafka Getting Started with Heron on Apache Mesos and Apache Kafka Getting Started with Heron on Apache Mesos and Apache Kafka

More good stuff like this from the Heron paper if you haven’t already read it.

There are also more ways to run Heron besides Native Mesos Framework. Heron supports deployment on Apache Aurora out of the box as well as for Slurm too. Apache REEF and others are in the works and looking forward to the future of topology driven as a service solutions for the enterprise.

~ Joestein

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Getting Started with Heron on Apache Mesos and Apache Kafka

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址