Thứ Sáu, 28 tháng 2, 2014

Tài liệu Practical mod_perl-CHAPTER 9:Essential Tools for Performance Tuning pptx

This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Server Benchmarking
|
327
Request rate: 854.1 req/s (1.2 ms/req)
Request size [B]: 79.0
Reply rate [replies/s]: min 855.6 avg 855.6 max 855.6 stddev 0.0 (1 samples)
Reply time [ms]: response 19.5 transfer 0.0
Reply size [B]: header 184.0 content 6.0 footer 2.0 (total 192.0)
Reply status: 1xx=0 2xx=5000 3xx=0 4xx=0 5xx=0
CPU time [s]: user 0.33 system 1.53 (user 5.6% system 26.1% total 31.8%)
Net I/O: 224.4 KB/s (1.8*10^6 bps)
Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0
As before, we are mostly interested in the average Reply rate—855, almost exactly the
same result reported by ab in the previous section. Notice that when we tried rate
900 for this particular setup, the reported request rate went down drastically, since
the server’s performance gets worse when there are more requests than it can handle.
http_load
http_load is yet another utility that does web server load testing. It can simulate a 33.6
Kbps modem connection (-throttle) and allows you to provide a file with a list of URLs
that will be fetched randomly. You can specify how many parallel connections to run
(-parallel N) and the number of requests to generate per second (-rate N). Finally, you
can tell the utility when to stop by specifying either the test time length (-seconds N)or
the total number of fetches (-fetches N).
Again, we will try to verify the results reported by ab (claiming that the script under
test can handle about 855 requests per second on our machine). Therefore we run
http_load with a rate of 860 requests per second, for 5 seconds in total. We invoke is
on the file urls, containing a single URL:
http://localhost/perl/simple_test.pl
Here is the generated output:
panic% http_load -rate 860 -seconds 5 urls
4278 fetches, 325 max parallel, 25668 bytes, in 5.00351 seconds
6 mean bytes/connection
855 fetches/sec, 5130 bytes/sec
msecs/connect: 20.0881 mean, 3006.54 max, 0.099 min
msecs/first-response: 51.3568 mean, 342.488 max, 1.423 min
HTTP response codes:
code 200 4278
This application also reports almost exactly the same response-rate capability: 855
requests per second. Of course, you may think that it’s because we have specified a
rate close to this number. But no, if we try the same test with a higher rate:
panic% http_load -rate 870 -seconds 5 urls
4045 fetches, 254 max parallel, 24270 bytes, in 5.00735 seconds
,ch09.23629 Page 327 Thursday, November 18, 2004 12:39 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
328
|
Chapter 9: Essential Tools for Performance Tuning
6 mean bytes/connection
807.813 fetches/sec, 4846.88 bytes/sec
msecs/connect: 78.4026 mean, 3005.08 max, 0.102 min
we can see that the performance goes down—it reports a response rate of only 808
requests per second.
The nice thing about this utility is that you can list a few URLs to test. The URLs
that get fetched are chosen randomly from the specified file.
Note that when you provide a file with a list of URLs, you must make sure that you
don’t have empty lines in it. If you do, the utility will fail and complain:
./http_load: unknown protocol -
Other Web Server Benchmark Utilities
The following are also interesting benchmarking applications implemented in Perl:
HTTP::WebTest
The HTTP::WebTest module (available from CPAN) runs tests on remote URLs or
local web files containing Perl, JSP, HTML, JavaScript, etc. and generates a
detailed test report.
HTTP::Monkeywrench
HTTP::Monkeywrench
is a test-harness application to test the integrity of a user’s
path through a web site.
Apache::Recorder and HTTP::RecordedSession
Apache::Recorder
(available from CPAN) is a mod_perl handler that records an
HTTP session and stores it on the web server’s filesystem.
HTTP::
RecordedSession
reads the recorded session from the filesystem and formats it for
playback using
HTTP::WebTest or HTTP::Monkeywrench. This is useful when writ-
ing acceptance and regression tests.
Many other benchmark utilities are available both for free and for money. If you find
that none of these suits your needs, it’s quite easy to roll your own utility. The easi-
est way to do this is to write a Perl script that uses the
LWP::Parallel::UserAgent and
Time::HiRes modules. The former module allows you to open many parallel connec-
tions and the latter allows you to take time samples with microsecond resolution.
Perl Code Benchmarking
If you want to benchmark your Perl code, you can use the Benchmark module. For
example, let’s say that our code generates many long strings and finally prints them
out. We wonder what is the most efficient way to handle this task—we can try to
concatenate the strings into a single string, or we can store them (or references to
them) in an array before generating the output. The easiest way to get an answer is to
try each approach, so we wrote the benchmark shown in Example 9-3.
,ch09.23629 Page 328 Thursday, November 18, 2004 12:39 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Perl Code Benchmarking
|
329
As you can see, we generate three big strings and then use three anonymous func-
tions to print them out. The first one (
ref_array) stores the references to the strings
in an array. The second function (
array) stores the strings themselves in an array.
The third function (
concat) concatenates the three strings into a single string. At the
end of each function we print the stored data. If the data structure includes refer-
ences, they are first dereferenced (relevant for the first function only). We execute
each subtest 100,000 times to get more precise results. If your results are too close
and are below 1 CPU clocks, you should try setting the number of iterations to a big-
ger number. Let’s execute this benchmark and check the results:
panic% perl strings_benchmark.pl
Benchmark: timing 100000 iterations of array, concat, ref_array
array: 2 wallclock secs ( 2.64 usr + 0.23 sys = 2.87 CPU)
concat: 2 wallclock secs ( 1.95 usr + 0.07 sys = 2.02 CPU)
ref_array: 3 wallclock secs ( 2.02 usr + 0.22 sys = 2.24 CPU)
First, it’s important to remember that the reported wallclock times can be misleading
and thus should not be relied upon. If during one of the subtests your computer was
Example 9-3. strings_benchmark.pl
use Benchmark;
use Symbol;
my $fh = gensym;
open $fh, ">/dev/null" or die $!;
my($one, $two, $three) = map { $_ x 4096 } 'a' 'c';
timethese(100_000, {
ref_array => sub {
my @a;
push @a, \($one, $two, $three);
my_print(@a);
},
array => sub {
my @a;
push @a, $one, $two, $three;
my_print(@a);
},
concat => sub {
my $s;
$s .= $one;
$s .= $two;
$s .= $three;
my_print($s);
},
});
sub my_print {
for (@_) {
print $fh ref($_) ? $$_ : $_;
}
}
,ch09.23629 Page 329 Thursday, November 18, 2004 12:39 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
330
|
Chapter 9: Essential Tools for Performance Tuning
more heavily loaded than during the others, it’s possible that this particular subtest will
take more wallclocks to complete, but this doesn’t matter for our purposes. What mat-
ters is the CPU clocks, which tell us the exact amount of CPU time each test took to
complete. You can also see the fraction of the CPU allocated to usr and sys, which
stand for the user and kernel (system) modes, respectively. This tells us what propor-
tions of the time the subtest has spent running code in user mode and in kernel mode.
Now that you know how to read the results, you can see that concatenation outper-
forms the two array functions, because concatenation only has to grow the size of the
string, whereas array functions have to extend the array and, during the print, iterate
over it. Moreover, the array method also creates a string copy before appending the
new element to the array, which makes it the slowest method of the three.
Let’s make the strings much smaller. Using our original code with a small correction:
my($one, $two, $three) = map { $_ x 8 } 'a' 'c';
we now make three strings of 8 characters, instead of 4,096. When we execute the
modified version we get the following picture:
Benchmark: timing 100000 iterations of array, concat, ref_array
array: 1 wallclock secs ( 1.59 usr + 0.01 sys = 1.60 CPU)
concat: 1 wallclock secs ( 1.16 usr + 0.04 sys = 1.20 CPU)
ref_array: 2 wallclock secs ( 1.66 usr + 0.05 sys = 1.71 CPU)
Concatenation still wins, but this time the array method is a bit faster than ref_array,
because the overhead of taking string references before pushing them into an array
and dereferencing them afterward during
print( ) is bigger than the overhead of
making copies of the short strings.
As these examples show, you should benchmark your code by rewriting parts of the
code and comparing the benchmarks of the modified and original versions.
Also note that benchmarks can give different results under different versions of the
Perl interpreter, because each version might have built-in optimizations for some of
the functions. Therefore, if you upgrade your Perl interpreter, it’s best to benchmark
your code again. You may see a completely different result.
Another Perl code benchmarking method is to use the
Time::HiRes module, which
allows you to get the runtime of your code with a fine-grained resolution of the order
of microseconds. Let’s compare a few methods to multiply two numbers (see
Example 9-4).
Example 9-4. hires_benchmark_time.pl
use Time::HiRes qw(gettimeofday tv_interval);
my %subs = (
obvious => sub {
$_[0] * $_[1]
},
decrement => sub {
,ch09.23629 Page 330 Thursday, November 18, 2004 12:39 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Perl Code Benchmarking
|
331
We have used two methods here. The first (obvious) is doing the normal multiplica-
tion,
$z=$x*$y. The second method is using a trick of the systems where there is no
built-in multiplication function available; it uses only the addition and subtraction
operations. The trick is to add
$x for $y times (as you did in school before you
learned multiplication).
When we execute the code, we get:
panic% perl hires_benchmark_time.pl
decrement: Doing 10 * 10 = 100 took 0.000064 seconds
obvious : Doing 10 * 10 = 100 took 0.000016 seconds
decrement: Doing 10 * 100 = 1000 took 0.000029 seconds
obvious : Doing 10 * 100 = 1000 took 0.000013 seconds
decrement: Doing 100 * 10 = 1000 took 0.000098 seconds
obvious : Doing 100 * 10 = 1000 took 0.000013 seconds
decrement: Doing 100 * 100 = 10000 took 0.000093 seconds
obvious : Doing 100 * 100 = 10000 took 0.000012 seconds
Note that if the processor is very fast or the OS has a coarse time-resolution granular-
ity (i.e., cannot count microseconds) you may get zeros as reported times. This of
course shouldn’t be the case with applications that do a lot more work.
If you run this benchmark again, you will notice that the numbers will be slightly dif-
ferent. This is because the code measures absolute time, not the real execution time
(unlike the previous benchmark using the
Benchmark module).
my $a = shift;
my $c = 0;
$c += $_[0] while $a ;
$c;
},
);
for my $x (qw(10 100)) {
for my $y (qw(10 100)) {
for (sort keys %subs) {
my $start_time = [ gettimeofday ];
my $z = $subs{$_}->($x,$y);
my $end_time = [ gettimeofday ];
my $elapsed = tv_interval($start_time,$end_time);
printf "%-9.9s: Doing %3.d * %3.d = %5.d took %f seconds\n",
$_, $x, $y, $z, $elapsed;
}
print "\n";
}
}
Example 9-4. hires_benchmark_time.pl (continued)
,ch09.23629 Page 331 Thursday, November 18, 2004 12:39 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
332
|
Chapter 9: Essential Tools for Performance Tuning
You can see that doing 10*100 as opposed to 100*10 results in quite different results
for the decrement method. When the arguments are
10*100, the code performs the
add 100 operation only 10 times, which is obviously faster than the second invoca-
tion,
100*10, where the code performs the add 10 operation 100 times. However, the
normal multiplication takes a constant time.
Let’s run the same code using the
Benchmark module, as shown in Example 9-5.
Now let’s execute the code:
panic% perl hires_benchmark.pl
Testing 10*10
Benchmark: timing 300000 iterations of decrement, obvious
decrement: 4 wallclock secs ( 4.27 usr + 0.09 sys = 4.36 CPU)
obvious: 1 wallclock secs ( 0.91 usr + 0.00 sys = 0.91 CPU)
Testing 10*100
Benchmark: timing 300000 iterations of decrement, obvious
decrement: 5 wallclock secs ( 3.74 usr + 0.00 sys = 3.74 CPU)
obvious: 0 wallclock secs ( 0.87 usr + 0.00 sys = 0.87 CPU)
Testing 100*10
Benchmark: timing 300000 iterations of decrement, obvious
decrement: 24 wallclock secs (24.41 usr + 0.00 sys = 24.41 CPU)
obvious: 2 wallclock secs ( 0.86 usr + 0.00 sys = 0.86 CPU)
Example 9-5. hires_benchmark.pl
use Benchmark;
my %subs = (
obvious => sub {
$_[0] * $_[1]
},
decrement => sub {
my $a = shift;
my $c = 0;
$c += $_[0] while $a ;
$c;
},
);
for my $x (qw(10 100)) {
for my $y (qw(10 100)) {
print "\nTesting $x*$y\n";
timethese(300_000, {
obvious => sub {$subs{obvious}->($x, $y) },
decrement => sub {$subs{decrement}->($x, $y)},
});
}
}
,ch09.23629 Page 332 Thursday, November 18, 2004 12:39 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Process Memory Measurements
|
333
Testing 100*100
Benchmark: timing 300000 iterations of decrement, obvious
decrement: 23 wallclock secs (23.64 usr + 0.07 sys = 23.71 CPU)
obvious: 0 wallclock secs ( 0.80 usr + 0.00 sys = 0.80 CPU)
You can observe exactly the same behavior, but this time using the average CPU
clocks collected over 300,000 tests and not the absolute time collected over a single
sample. Obviously, you can use the
Time::HiRes module in a benchmark that will
execute the same code many times to report a more precise runtime, similar to the
way the
Benchmark module reports the CPU time.
However, there are situations where getting the average speed is not enough. For
example, if you’re testing some code with various inputs and calculate only the aver-
age processing times, you may not notice that for some particular inputs the code is
very ineffective. Let’s say that the average is 0.72 seconds. This doesn’t reveal the possi-
ble fact that there were a few cases when it took 20 seconds to process the input.
Therefore, getting the variance
*
in addition to the average may be important. Unfortu-
nately
Benchmark.pm cannot provide such results—system timers are rarely good
enough to measure fast code that well, even on single-user systems, so you must run
the code thousands of times to get any significant CPU time. If the code is slow enough
that each single execution can be measured, most likely you can use the profiling tools.
Process Memory Measurements
A very important aspect of performance tuning is to make sure that your applica-
tions don’t use too much memory. If they do, you cannot run many servers, and
therefore in most cases, under a heavy load the overall performance will be degraded.
The code also may leak memory, which is even worse, since if the same process
serves many requests and more memory is used after each request, after a while all
the RAM will be used and the machine will start swapping (i.e., using the swap parti-
tion). This is a very undesirable situation, because when the system starts to swap,
the performance will suffer badly. If memory consumption grows without bound, it
will eventually lead to a machine crash.
The simplest way to figure out how big the processes are and to see whether they are
growing is to watch the output of the top(1) or ps(1) utilities.
For example, here is the output of top(1):
8:51am up 66 days, 1:44, 1 user, load average: 1.09, 2.27, 2.61
95 processes: 92 sleeping, 3 running, 0 zombie, 0 stopped
CPU states: 54.0% user, 9.4% system, 1.7% nice, 34.7% idle
* See Chapter 15 in the book Mastering Algorithms with Perl, by Jon Orwant, Jarkko Hietaniemi, and John
Macdonald (O’Reilly). Of course, there are gazillions of statistics-related books and resources on the Web;
http://mathforum.org/ and http://mathworld.wolfram.com/ are two good starting points for anything that has
to do with mathematics.
,ch09.23629 Page 333 Thursday, November 18, 2004 12:39 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
334
|
Chapter 9: Essential Tools for Performance Tuning
Mem: 387664K av, 309692K used, 77972K free, 111092K shrd, 70944K buff
Swap: 128484K av, 11176K used, 117308K free 170824K cached
PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
29225 nobody 0 0 9760 9760 7132 S 0 12.5 2.5 0:00 httpd_perl
29220 nobody 0 0 9540 9540 7136 S 0 9.0 2.4 0:00 httpd_perl
29215 nobody 1 0 9672 9672 6884 S 0 4.6 2.4 0:01 httpd_perl
29255 root 7 0 1036 1036 824 R 0 3.2 0.2 0:01 top
376 squid 0 0 15920 14M 556 S 0 1.1 3.8 209:12 squid
29227 mysql 5 5 1892 1892 956 S N 0 1.1 0.4 0:00 mysqld
29223 mysql 5 5 1892 1892 956 S N 0 0.9 0.4 0:00 mysqld
29234 mysql 5 5 1892 1892 956 S N 0 0.9 0.4 0:00 mysqld
This starts with overall information about the system and then displays the most
active processes at the given moment. So, for example, if we look at the httpd_perl
processes, we can see the size of the resident (
RSS) and shared (SHARE) memory seg-
ments.
*
This sample was taken on a production server running Linux.
But of course we want to see all the apache/mod_perl processes, and that’s where
ps(1) comes in. The options of this utility vary from one Unix flavor to another, and
some flavors provide their own tools. Let’s check the information about mod_perl
processes:
panic% ps -o pid,user,rss,vsize,%cpu,%mem,ucomm -C httpd_perl
PID USER RSS VSZ %CPU %MEM COMMAND
29213 root 8584 10264 0.0 2.2 httpd_perl
29215 nobody 9740 11316 1.0 2.5 httpd_perl
29216 nobody 9668 11252 0.7 2.4 httpd_perl
29217 nobody 9824 11408 0.6 2.5 httpd_perl
29218 nobody 9712 11292 0.6 2.5 httpd_perl
29219 nobody 8860 10528 0.0 2.2 httpd_perl
29220 nobody 9616 11200 0.5 2.4 httpd_perl
29221 nobody 8860 10528 0.0 2.2 httpd_perl
29222 nobody 8860 10528 0.0 2.2 httpd_perl
29224 nobody 8860 10528 0.0 2.2 httpd_perl
29225 nobody 9760 11340 0.7 2.5 httpd_perl
29235 nobody 9524 11104 0.4 2.4 httpd_perl
Now you can see the resident (RSS) and virtual (VSZ) memory segments (and the
shared memory segment if you ask for it) of all mod_perl processes. Please refer to
the top(1) and ps(1) manpages for more information.
You probably agree that using top(1) and ps(1) is cumbersome if you want to use
memory-size sampling during the benchmark test. We want to have a way to print
memory sizes during program execution at the desired places. The
GTop module,
which is a Perl glue to the
libgtop library, is exactly what we need for that task.
You are fortunate if you run Linux or any of the BSD flavors, as the
libgtop C library
from the GNOME project is supported on those platforms. This library provides an
* You can tell top to sort the entries by memory usage by pressing M while viewing the top screen.
,ch09.23629 Page 334 Thursday, November 18, 2004 12:39 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Apache::Status and Measuring Code Memory Usage
|
335
API to access various system-wide and process-specific information. (Some other
operating systems also support
libgtop.)
With
GTop, if we want to print the memory size of the current process we’d just
execute:
use GTop ( );
print GTop->new->proc_mem($$)->size;
$$ is the Perl special variable that gives the process ID (PID) of the currently running
process.
If you want to look at some other process and you have the necessary permission,
just replace
$$ with the other process’s PID and you can peek inside it. For example,
to check the shared size, you’d do:
print GTop->new->proc_mem($$)->share;
Let’s try to run some tests:
panic% perl -MGTop -e 'my $g = GTop->new->proc_mem($$); \
printf "%5.5s => %d\n",$_,$g->$_( ) for qw(size share vsize rss)'
size => 1519616
share => 1073152
vsize => 2637824
rss => 1515520
We have just printed the memory sizes of the process: the real, the shared, the vir-
tual, and the resident (not swapped out).
There are many other things
GTop can do for you—please refer to its manpage for
more information. We are going to use this module in our performance tuning tips
later in this chapter, so you will be able to exercise it a lot.
If you are running a true BSD system, you may use
BSD::Resource::getrusage instead
of
GTop. For example:
print "used memory = ".(BSD::Resource::getrusage)[2]."\n"
For more information, refer to the BSD::Resource manpage.
The
Apache::VMonitor module, with the help of the GTop module, allows you to
watch all your system information using your favorite browser, from anywhere in the
world, without the need to telnet to your machine. If you are wondering what infor-
mation you can retrieve with
GTop, you should look at Apache::VMonitor, as it utilizes
a large part of the API
GTop provides.
Apache::Status and Measuring Code
Memory Usage
The Apache::Status module allows you to peek inside the Perl interpreter in the
Apache web server. You can watch the status of the Perl interpreter: what modules
,ch09.23629 Page 335 Thursday, November 18, 2004 12:39 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
336
|
Chapter 9: Essential Tools for Performance Tuning
and Registry scripts are compiled in, the content of variables, the sizes of the subrou-
tines, and more.
To configure this module you should add the following section to your httpd.conf
file:
<Location /perl-status>
SetHandler perl-script
PerlHandler +Apache::Status
</Location>
and restart Apache.
Now when you access the location http://localhost:8000/perl-status you will see a
menu (shown in Figure 9-1) that leads you into various sections that will allow you
to explore the innards of the Perl interpreter.
When you use this module for debugging, it’s best to run the web server in single-
server mode (httpd -X). If you don’t you can get confused, because various child pro-
cesses might show different information. It’s simpler to work with a single process.
To enable the
Apache::Status modules to present more exotic information, make
sure that the following modules are installed:
Data::Dumper, Apache::Peek, Devel::
Peek
, B::LexInfo, B::Deparse, B::Terse, and B::TerseSize. Some of these modules are
bundled with Perl; others should be installed by hand.
Figure 9-1. Main menu for Apache::Status
,ch09.23629 Page 336 Thursday, November 18, 2004 12:39 PM

Không có nhận xét nào:

Đăng nhận xét