Tuesday, March 2, 2010

Hidden errors in PHP affect performance

I've come across an interesting thing recently, that I didn't think mattered, but, as I'm about to demonstrate, it does.

So we all know about different levels of error reporting in PHP, right? We all know, that we must pay close attention to E_WARNING and above, and sometimes the occasional E_NOTICE carries some wisdom. (Almost) nobody cares about those annoying E_STRICT messages. In fact, even if you set the E_ALL reporting level, these messages are not displayed or logged, so maybe you even do not know they are there. I sure ignored them myself mostly.

Well, not anymore. Gather around kids, 'cause I'm about to tell you one thing that just might convince you to turn the E_STRICT reporting on. Even these seemingly insignificant messages negatively affect performance. What surprised me most was, that even if you have these messages suppressed, the penalty is still there.

But don't take my word for it, let's have some examples.


Statically calling non-static method


PHP5 is pretty open minded about the usage of static methods and properties. You can call any method from a class, without instantiating said class. The manual rightly states, that "Calling non-static methods statically generates an E_STRICT level warning." Yeah, who cares, right? Well, you should.

Try this code:


$cnt=100000;

class One {
public function test1a() { return 1; }
public static function test1b() { return 1; }
}

$t1 = microtime(true);
for ($i=1; $i<=$cnt; $i++) One::test1a();
$t2 = microtime(true);
for ($i=1; $i<=$cnt; $i++) One::test1b();
$t3 = microtime(true);

$time1=$t2-$t1;
$time2=$t3-$t2;

printf("Test run 1 (bad): %1.3f ms\n", $time1*1000);
printf("Test run 2 (good) : %1.3f ms\n", $time2*1000);
printf("Avg. diff.: %1.3f us\n", ($time1-$time2)*1000000/$cnt);


This simple code runs 100000 times a static call first to a plain public method and then a static call to a statically defined method of the same class. If I run this snippet on my average machine, I get this result:


Test run 1 (bad): 244.614 ms
Test run 2 (good) : 88.351 ms
Avg. diff.: 1.563 us


There you have it. Each static call of the non-static method costs extra 1.5 microsecond. Now, that might not be much, but if you stick it in a heavy loop, it adds up.

Accessing an undefined array element

One of the more annoying notices is the one where PHP tells you, that the array element you are trying to access is not defined. I bet, that is one of the big reasons why many people have these turned off. I mean, what is wrong with just testing like this?


if ($myarray[1]) {
do something...
} else {
do the other thing
}


What is wrong is, this statement will generate the notice. And--I'm sure you know where I am headed with this--it incurs a performance hit. Again, here's the proof.


<?php

$cnt=100000;

function test2a() {
$a = array();
$a[0] = 1;
if ($a[1]) {
return true;
} else {
return false;
}
}

function test2b() {
$a = array();
$a[0] = 1;
if ($a[0]) {
return true;
} else {
return false;
}
}

function test2c() {
$a[1] = 1;
if (isset($a[0]) && $a[0]) {
return true;
} else {
return false;
}
}

$t1 = microtime(true);
for ($i=1; $i<=$cnt; $i++) test2a();
$t2 = microtime(true);
for ($i=1; $i<=$cnt; $i++) test2b();
$t3 = microtime(true);
for ($i=1; $i<=$cnt; $i++) test2c();
$t4 = microtime(true);
$time1=$t2-$t1;
$time2=$t3-$t2;
$time3=$t4-$t3;
printf("Test run 1 : %1.3f ms\n", $time1*1000);
printf("Test run 2 : %1.3f ms\n", $time2*1000);
printf("Test run 3 : %1.3f ms\n", $time3*1000);
printf("Avg. diff. 1-2: %1.3f us\n", ($time1-$time2)*1000000/$cnt);
printf("Avg. diff. 2-3: %1.3f us\n", ($time2-$time3)*1000000/$cnt);


The result on the same machine reads:


Test run 1 : 259.571 ms
Test run 2 : 160.604 ms
Test run 3 : 156.587 ms
Avg. diff. 1-2: 0.990 us
Avg. diff. 2-3: 0.040 us


This sample is a bit longer but still simple. The test2a() function reads an element of the array that is not defined. This would generate a notice, if we had them turned on.

Very similar function test2b() performs the same code but in this case the element is defined (this is a correct scenario and no warning or notice is generated). Hence the shorter runtime.

The third version of the same function if very careful and checks the existence of the array element before trying to use it. This might seem like an extra work, sometime unnecessary you might even say. The test however proves, that this extra test comes at virtually no cost performance wise and I dare say it's worth it ten times over.

Conclusion

I have demonstrated what you might have suspected all along: sloppy coding harms performance. In this case, the ignored warnings and notices and strict messages that you think you sucessfully dealt with when you turned them off, will come back and haunt you. So, my advice is simple. Be brave! Put that error_reporting = E_ALL | E_STRICT in your php.ini and kill all the wee bugs once and for all.

PS: Obviously, you want to do that on you development box first, but I'm sure I don't have to tell you THAT, right?

Thursday, July 30, 2009

Index usage monitoring in Oracle

Indices (or indexes, if you will) are always good, right? Well, Tom Kyte would disagree with you and so would I. In fact, indices will always impede data manipulation performance, so they should be used only where they are useful in queries. But determining if an index is useful can be sometimes pretty difficult, mainly in a complex application with thousands of different queries, some even dynamically generated.

In this case, one of the approaches is to create indices on all columns where it possibly may make sense and monitor, if they are actually ever used by the CBO planner. I will show you how to do that. This method was tested in 11g, but should work in 10g as well.

The magic is done by an ALTER INDEX indexname MONITORING USAGE.

Starting the monitoring

You will need an ALTER INDEX privilege on te indices you want to monitor, so preferrably you do it as a SYSDBA. Fire up your SQL*Plus and connect to the database. First, we will create a script to enable the monitoring. Replace the MYUSER with whatever user you want to monitor the indices for.

set heading off
set echo off
set feedback off
set pages 10000
spool startmonitor.sql
select 'alter index '||owner||'.'||index_name
||' monitoring usage;'
from dba_indexes
where owner ='MYUSER';
spool off


Then we simply run the script from within the console.

SQL> @startmonitor.sql

You might see some error messages when the script tries to alter a LOB index, which is not possible, just ignore that.

Now you want to keep your application running for some time. At least for it to perform all kinds of work and jobs it normally does. So for example if you have jobs planned for every hour, leave it running like that for at least an hour. If you have daily jobs, leave it on for a whole day.

Stop the monitoring

In the meantime you can prepare a script to turn monitoring off.

set heading off
set echo off
set feedback off
set pages 10000
spool stopmonitor.sql
select 'alter index '||owner||'.'||index_name
||' nomonitoring usage;'
from dba_indexes
where owner ='MYUSER';
spool off


After you reach your desired monitoring period, again, simply run the off-script:

SQL> @stopmonitor.sql

Results


Now, for the results. The index usage overview is stored in the v$object_usage system view. The tricky part here is, that you only see the indices for a currently logged in user. So, even if you were doing the monitoring under a SYS or some other user, now you must login as the user whose indices you were monitoring, in our example MYUSER.

When you do that, identifying those harmful indices is as simple as this:
SELECT INDEX_NAME, TABLE_NAME
FROM V$OBJECT_USAGE
WHERE USED='NO';


Some of those indices might be primary keys, so you don't want to be dropping those, but all other unused indices should go. Unless, of course, you know you will need them in a very near future.

Wednesday, July 15, 2009

The hunt for the UNKNOWN SQL in Oracle

If you are looking at your AWR report or browsing thru performance views in your Oracle Enterprise Manager console, chances are, that some of those SQLs, eating at your CPU or I/O bandwidth are marked as UNKNOWN. This is frustrating, to say the least, but in cases when these mysterious queries are killing your database, it may become a matter of life and death to find out just what does the Oracle do.

Most commonly, these queries are LOB operations. That is, they represent reading and writing the actual CLOB or BLOB data by the appropriate oci functions. The question then stands: what table? And this is where this little post comes to the rescue.

We will start with the only thing we are able to get from the report--the SQL ID, which is a hashed fingerprint of our culprit. It looks something like "2v82uk48x55vt". Regular SQL queries can be viewed by querying the system view V$SQL (or GV$SQL in RAC environments).

SELECT SQL_TEXT
FROM GV$SQL
WHERE SQL_ID='2v82uk48x55vt';

This, however, will not work for those unknown SQLs. The query will come up empty. All hope is not lost yet, though. There are other places to look. One of them is the X$KGLOB table, which is part of the library cache.


SELECT KGLNAOBJ
FROM X$KGLOB
WHERE KGLOBT03 = '2v82uk48x55vt';


This query will yield a value, which on the first look, looks like gibberish. It will read something like "table_4_9_15f8c_0_0_". On the second look, this string containts one interesting part and that is the fourth underscore separated segment: 15f8c. This, in fact, is the hexadecimal representation of the object_id of the object involved in the operation. Now remains only to query the DBA_OBJECTS view to find out, which table is being operated on.


SELECT OWNER, OBJECT_NAME, OBJECT_TYPE
FROM DBA_OBJECTS
WHERE OBJECT_ID=TO_NUMBER('15f8c','XXXXXXXX');


And there you have it, the schema and the name of the table, which was causing that unknown SQL in your stats.

Friday, July 10, 2009

Performance analysis reports in Oracle

Sometimes you wonder why your database is slow or you just want to know what those oompa-loompas inside the Oracle black box are up to. One of the tools in your investigation can be the so-called Automatic Workload Repository, which contains an overload of information about what is going on in your database.

Some information from AWR can be seen inside the Enterprise manager console, but for serious analysis you will want to generate an AWR report in plaintext or more probabaly HTML format.

The report can be generated by running a SQL*Plus script. Logon to your database server as usual and run a following command:


sqlplus / as sysdba @$ORACLE_HOME/rdbms/admin/awrrpti.sql


The script will prompt you for your choices about how you want your report.

  1. First, you must choose an output format. You probably will want a html version, but sometimes Oracle support team will ask you for a plaintext version.
  2. You must enter a database ID. You most likely have only one database and the available choices are listed, so just copy the ten digit DB ID here.
  3. If you run more than one instance, enter the number of the instance you want your report from. Unfortunately, you cannot generate a cumulative report for all your instances in RAC environment, so just pick one.
  4. The script asks you, how far back in days you want to look for AWR snapshots. If you want just the recent activity, enter 1.
  5. The script lists all available snapshots for the number of days specified in the previous step. If you entered 1, it will list just today's snapshots. Most commonly, the snapshot is done every hour. First, you must select a beginning snapshot and in the second step you select the ending snapshot. If you want the report for previous whole hour, enter the last two snapshot numbers, for example 2873 and 2874.
  6. At last, you get prompted for a file name. The default is probably OK for most people, so just hit enter and the system gets to work on your report.
The generated report HTML file is located in the directory you ran the script from. Transfer the file to your computer or somewhere, where you can access it and open it in your favorite browser.

Your developers will most likely be interested in the section reporting the most frequently and most costly SQL statements. Your DBA will find interesting information about I/O activity, memory advisory tables etc. etc.

In a future article, I will try to point out some interesting sections in this report. Until then, self-study time.

Tuesday, June 2, 2009

How to build Oracle Instant client under Linux

The Oracle Intant Client is a minimalistic version of necessary runtime libraries needed to run an application which uses an OCI interface to connect to the Oracle database. In this article I will describe how to build your very own Instant client.

If you ever installed a full Oracle client, you know how painfull process it is and how much diskspace you need. Especially if you are running a large farm of webservers and need to maintain an identical version of the client libraries, the Instant client comes handy. Oracle provides the client free to download on their pages, but there are times, when the exact version of the client you need is not available. In our case, we needed to apply a special patch to the client library and all patches are always issued against a full client.

Pre-requisites


The bad news is, you will still have to install the full client on one of your servers, prefferably with the same configuration and system version as is your target system. I will describe the build for a RedHat ES or CentOS system, though probably any RPM-base distribution will do. For building the instant client RPMs you will need the rpm-build package. If you have a yum setup, it's as simple as:

yum -y install rpm-build


After you have the full client installed, you may want to install the necessary patches to your client install.

Create the Instant client


The actual process can be found in the manual, but it's well hidden. Thanks to Chris Jones for pointing me in the right direction.

To build the client, run the following commands:
mkdir -p $ORACLE_HOME/rdbms/install/instantclient/light
cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk ilibociei


If all goes well, the ZIP archives and mainly the RPMs will be created for you, which you may transfer to your other servers and install as usual. If the install fails with the following error message, you must install the rpm-build package (see above).

-bb: unknown option
make: *** [/u01/app/oracle/product/11.1.0/client_1/rdbms/install/instantclient/libociei.so] Error 1

The RPM files with the client, as well as other libraries like JDBC, ODBC or the SQL*Plus application, will be located in the following directory (assuming you are building 32-bit versions):
cd $ORACLE_HOME/rdbms/install/instantclient/rpm/RPMS/i386

Enjoy your DIY Instant Client!

Thursday, May 28, 2009

Calling SAP RFC from PHP thru message server

If you have the need to call your SAP R/3 system from PHP (which we do), you have several possibilities to do so. One of them is using the proprietary RFC protocol. Fortunately, there exists a PHP SAPRFC library for this purpose.

Because we want high availability and scalability in our SAP, we use multiple application servers and spread the load between them with help from a so called message server. Connection thru this message server has some special requirements, though, so I hope this short post will help somebody in the same position that we were.

Normally, you'd use something like this to connect:


<?php
$rfc = saprfc_open (array (
"ASHOST"=>"server",
"SYSNR"=>"01",
"CLIENT"=>"001",
"USER"=>"test",
"PASSWD" =>"test"
) );

?>


Where ASHOST is your application server hostname, SYSNR is the system number etc. Even if you are using a message server, you can still connect to any of your application servers like this directly.

With the message server however, you have to use different connection variables:


<?php
$rfc = saprfc_open (array (
"CLIENT"=>"001",
"USER"=>"test",
"PASSWD" =>"test",
"MSHOST"=>"server1",
"R3NAME"=>"ABC",
"GROUP"=>"PUBLIC"
) );

?>


Where MSHOST is the hostname of your message server, R3NAME is the three-letter system name and the GROUP parameter specifies a logon group to use. If you do not use logon groups, leave the parameter empty or use a " " string (blank space). Logon group PUBLIC is almost always defined as well.

But, here is the catch: this will not work by itself. You may have noticed, that in the connect parameters, we never specified which port to use to connect. This must be defined in you /etc/services file on the server, where you run the PHP code.

You must add a line to /etc/services specifying on what port does the message server listen on:


sapmsABC 3601/tcp


The service name is constructed as sapms<SID>, where the <SID> is the three-letter system name used in the R3NAME connection attribute. The port obviously must be the one you use. If you do not know the port number, it is most usually 3600 + system number. So if our system has system number 01, the port is 3601. If the system number is 40, the port is 3640.

After adding this line to your services definition, the connection should start working.

Monday, May 25, 2009

Building a load balancer with open source software

So your website has reached a level of traffic, where your old server is getting slower and slower and your users just keep coming and coming. You can always throw more hardware at it, but it gets more and more costly, when you want high performance from a single machine.

And then there is the other problem: when your web server dies, your website dies with it. Of course you keep backups of everything, but still it takes a lot of time to get a new hardware, install it and restore the backups. Even if you keep a hot stand-by server, you have to be around to redirect the website traffic to the new server.

What you need is to spread your website load to more servers and for that, you need a load balancer in front of your web server farm. You can buy a ready-made box for this purpose, but they tend to be costly. In this article, I'll explain how to build a simple load balancer using a great piece of open source software called Pound.

How does it work?


Pound keeps a pool of your web servers. If a request arrives at pound, it chooses a web server randomly and forwards the request to this server. You can define weights of servers, enabling you to use servers with different performance. Pound waits for the response from the web server and forwards the response back to the requesting client.

If a web server dies or is unresponsive, it is removed from the pool of active servers and pound stops forwarding requests to it. It re-tries the server repeatedly and when it becomes available, it returns it back to the active pool.

Installing pound


Pound runs under any UNIX-type system. I have personal experience with FreeBSD and Linux. Like I always say, the best Linux/Unix-type system is the one you know, so pick any distribution you are familiar with. Chances are, that packages exist for your distribution. I will not go into much detail, I leave it to the kind reader to install pound as such.

My favorite Linux distribution is CentOS, where installation is as simple as downloading the pound packages from the famous dag repository and installing with:


yum -y localinstall pound-2.4-1.el5.rf.i386.rpm


After that, you end up with a vanilla load balancer, that is not just ready to serve your visitors. The important file we'll inspect and edit now is the pound.conf, usually located in /etc directory.

Prepare your landscape


I assume you already have your web servers ready. In some future article I may delve deeper into the actual web server configuration, but basically what you want to do is make an exact copy or copies of your web server, changing only the IP adress, hostname and possibily deleting old logfiles. You will need the IP addresses of your server. Under normal circumstances, I am all for using hostnames, instead of IP addresses, but this is one of the few exceptions, where usage of IP adresses is justifiable and even preferred.

For the purpose of this article, I'll assume you have just two web servers with following IP addresses:
  1. Server: 192.168.1.10
  2. Server: 192.168.1.20


/etc/pound.conf


Following is a simple config file, with comments. The comments are delimited by a hash sign (#) and you may remove them if you want to use this file.

# The Control directive contains a path and filename, 
# where pound will create a handle, which can later
# be used for controlling pound via the
# poundctl utility

Control "/tmp/poundctl"


# In this directive we tell pound to listen on port 80
# on any IP address the server has.

ListenHTTP
Address 0.0.0.0
Port 80
End


# For the purpose of this article, we'll have
# just one simple service, which will serve
# any and all requests arriving
# at the balancer's port 80.

Service

# Specify Server 1.
# The Port is the port on the web server, where http
# server listens on.
# The TimeOut directive tells pound how long to wait
# for a response from the web server. If the web
# server does not respond in this amount of seconds,
# it is considered dead and pound removes it
# from the pool of usable server.

BackEnd
Address 192.168.1.10
Port 80
TimeOut 30
End

# Server 2
BackEnd
Address 192.168.1.20
Port 80
TimeOut 30
End

End


Starting pound


If you installed pound from RPM packages, you probably have all you need to launch. Just issue the standard:

service pound start

To enable automatic startup at the server start, issue:

chkconfig pound on

...and you're done


You have just created your first load balancer. It is a very simple one, but it does the job. I will try to describe some more advanced features and techniques in some future article.

After you reach a certain amount of traffic you will want to do some tuning as well, but for many cases, the default settings of pound and Linux kernel are OK.