Saturday, November 5, 2011

JMX metrics in Reconnoiter

We started using Hadoop, written in Java and it exposes it's metrics thru JMX, which is a Java standard for monitoring and metrics publishing. Since we use Reconnoiter for all other metrics collection and graphing, I wanted to get this JMX data into it.

Reconnoiter has a Java component, called Jezebel, that functions as a bridge between Java world and Reconnoiter. What it does is it listens on port 8083 for a check request. When the request comes from noitd, it performs an on-demand check and formats the output of the check as a Resmon-style XML document. Inside noitd is a simple Lua module, that composes the check request, submits it to Jezebel and parses the result into metrics. Sounds a little complicated, but is in fact quite flexible. Unfortunately, the documentation on Jezebel check configuration is sparse (to be diplomatic), so I decided to record what I did to make it running.

Getting Jezebel to run
First, we have to make sure Jezebel is running. It does not require any configuration as such. If you installed Reconnoiter properly, it can be started by executing:

jezebel -l /var/log/jezebel.log -p /var/run/jezebel.pid

To test, if Jezebel is running correctly, we can manufacture a test request and submit it to Jezebel. Let's create an XML file with the following content, same as it will be coming from noit, name it check.xml, for example. 

<?xml version="1.0" encoding="utf8"?>
<check target="127.0.0.1" target_ip="127.0.0.1" 
  module="com.omniti.jezebel.SampleCheck" name="jezebelcheck" 
  period="10000" timeout="5000">
  <config/>
</check>

The check attributes get passed to the check class, together with the config parameters (of which there are none in this request, because it is a request to a dummy check class). Now, let's post it to Jezebel and see what happens:

wget -qO - --post-file=check.xml http://localhost:8083/dispatch/com.omniti.jezebel.SampleCheck

You should get output like this:

<!DOCTYPE ResmonResults SYSTEM "http://labs.omniti.com/resmon/trunk/resources/resmon.dtd">
<ResmonResults>
<ResmonResult module="com.omniti.jezebel.SampleCheck" service="jezebelcheck">
<last_update>1320526541</last_update>
<metric name="x" type="i">1234</metric>
<metric name="y" type="n">12.342354</metric>
<metric name="mood" type="s">happy</metric>
</ResmonResult>
</ResmonResults>

Configuring the Lua module
When Jezebel is running, next step is to configure it as noitd module, so you can use it in the checks. Jezebel in fact is composed of several modules itself, each module represented by a Java class implementing the JezebelCheck interface. 

The available classes are:
  • com.omniti.jezebel.SampleCheck - an example module, useful for checking if Jezebel is running, as we did above, it always returns the same three metrics.
  • com.omniti.jezebel.check.jmx - this is what we are interested in, JMX metrics collection.
  • com.omniti.jezebel.check.mysql - this and the other three classes are bult on top of JDBC and allow running arbitrary queries to pull metrics from various databases.
  • com.omniti.jezebel.check.oracle
  • com.omniti.jezebel.check.postgres
  • com.omniti.jezebel.check.sqlserver
We'll configure just two of these, SampleCheck and jmx, like this:

<jezebel>
  <config>
    <url>http://127.0.0.1:8083/dispatch</url>
  </config>
  <module loader="lua" name="com.omniti.jezebel.SampleCheck" 
    object="noit.module.jezebel"/>
  <module loader="lua" name="com.omniti.jezebel.check.jmx" 
    object="noit.module.jezebel"/>
</jezebel>

After restarting noitd, the modules will be available to be used in checks.

Configuring the JMX check
Finally, we are getting to the interesting part, where we add a new check and start pulling some metrics. To add a new check you can either use the interactive console, which I love, or edit the noit.conf file. The JMX module has few configuration values:
  • target - obviously, this is the IP address of the host running the JMX provider
  • port - the port the provider is exposed at
  • username - I'm sure you can guess this one
  • password - ditto
  • mbean_domains - in JMX, each metric is part of a domain and with this parameter, you can limit the domains you want. Specify all the domains you want collected as a space separated list. If this is left empty, all the domains are pulled (which can generate an awful lot of not-so-useful metrics sometimes)

The final check configuration will look like this:
<hadoop module="com.omniti.jezebel.check.jmx" period="60000" >
  <check uuid="5daa2e54-b1fb-4a78-ae53-0f16adc307cf" 
    target="10.1.16.123" name="hadoopNameNode">
    <config>
      <port>8004</port>
      <mbean_domains>hadoop</mbean_domains>
    </config>
  </check>
  <check uuid="d5f14417-63d6-4210-9794-92274b553417" 
    target="10.1.16.123" name="hadoopJobTracker">
    <config>
      <port>8008</port>
      <mbean_domains>hadoop</mbean_domains>
    </config>
  </check>
</hadoop>

This is the configuration we have to get our Hadoop NameNode and JobTracker metrics. 

No comments:

Post a Comment