Subj : Увеличение производительности (apache, sendmail, samba, kernel, i/o)
-------------------------------------------------------------------------------
Apache:
--------
If you are experincing long delays while "Waiting for reply...." with
Apache servers you may
try the following hack I use frequently to speed up your systems
performance. Apache comes
with a limit of 256 Child Processes defined in its hard server limit
(in the source code).
This can kill apache when it is used for a very very busy site.
Normally in the httpd.conf file you can only set a maximum of 256
MaxClients. This is not
enough for a busy site. To change this limit (which the makers of
apache suggest as well),
edit src/includes/httpd.h and change the value of HARD_SERVER_LIMIT.
Bump it up to 1000, then
set your httpd.conf files MaxClients to like 512. That should be
ample. Re-compile apache and
you are done.
Note: Linux itself has a "Max Processes" per user limit. Add this to
your root .bashrc file
(or whatever script your particular shell uses):
ulimit -u unlimited
You must exit and re-login before starting your new apache! Otherwise
you will run into
problems. To verify that you are ready to go, make sure that when you
type ulimit -a as root,
it shows "unlimited" next to max user processes.
(note: you may also do ulimit -u unlimited at the command prompt
before starting httpd instead
of adding it to the .bashrc file, but I always forgot, so I just added
it in the .bashrc file
as a safety net.. Another good place to put ulimit -u unlimited is in
the httpd startup file
in /etc/rc.d/init.d)
------------
Компиляция программ:
--------
To squeeze the most performance from your x86 programs, you can use
full optimization when
compliling with the -O6 flag. Many programs contain -O2 in the
Makefile. -O6 is the highest
level of optimization. It will increase the size of what it produces,
but it runs faster.
Also, you can use pgcc to compile stuff for extra optimization on 586
or later cpu's. More
information can be found at http://www.goof.com/pcg/
------------
When compiling, use -fomit-frame-pointer. This will use the stack for
accessing variables.
Unfortunatly debugging is almost impossible with this option.
Do not use shared libraries: shared libraries are roughly 10% slower,
because they need to
access variables via a base register. There is some overhead for
relocating jumps into a
shared library. But shared libraries can increase the performance when
the system is short of
memory, because shared executables are generally much smaller.
For numerical applications: Avoid numeric exceptions.
--------------
Sendmail:
--------
Here's the key to configuring sendmail for optimal speed and security.
Don't use the m4
configuration system. It's overly complex, and allows for too much
cruft (like the old % hack,
and uucp).
I took the time to write a custom sendmail.cf, and have found it much
easier to put in the
features I want while keeping a secure shop.
Try testing a large (I recommend 8Mb) file copy operation to a local
disk before and after the
change.
---------------
With a network card of 10Mb/s, copying 8Mb file from Win NT4.0 on a
586 AMD 133MHz to RH Linux
2.2.10 on a PII 400MHz using Samba version 2.0.4b using TCP_NODELAY on
its own took 12.39s but
with IPTOS_LOWDELAY SO_SNDBUF=4096 SO_RCVBUF=4096 it took 16.02s. Here
is part of my smb.conf
file (just in case...):
[global]
workgroup = WG
server string = Samba Server
encrypt passwords = Yes
smb passwd file = /etc/smbpasswd
log file = /var/log/samba/log.%m
max log size = 50
security = user
socket options = TCP_NODELAY
domain master = Yes
local master = Yes
preferred master = Yes
os level = 65
wins support = Yes
dns proxy = No
name resolve order = wins lmhosts bcast host
interfaces = 192.168.1.1/24 127.0.0.0/8
bind interfaces only = Yes
hosts allow = 192.168.1.3 127.
hosts deny = ALL
debug level = 1
------------------
Ftpd:
--------
ftpd is bottlenecked by the inetd process, which requires the creation
of child processes for
each incoming ftp request, and further creation of processes during
each directory listing.
One of the most immediate ways to improve FTP performance is to
replace the standard ftpd
server process with a high performance "no-forks" variant. By
pre-spawning the ftp server
processes, and additionally handling (and caching) directory listings
internally, a
significant amount of process creation overhead is avoided.
One well known such product is NcFTPd. While not free software, it
does come with a no-cost
license for personal use or .edu domains.
NcFTPd and more information about it can be found at
http://www.ncftpd.com/ncftpd/
Comparison metrics between NcFTPd and wu-ftpd can be found at
http://www.ncftpd.com/ncftpd/perf-linux.html
Name lookups will be quicker if you insiall a local nameserver, and
forward requests to your
ISPs DNS server:
In /etc/resolv.conf add the line:
nameserver 127.0.0.1
Where x.x.x.x and y.y.y.y are the IP addresses of your ISP's DNS
server and another DNS server
respectively.
There are two really simple configuration changes that can lower the
cpu/memory that named
requires to run.
i.Use of the listen-on directive in the options section of the
named.conf file.
For each interface a nameserver listens on, a pair of filehandles is
opened. On a busy
nameserver, saving every filehandle is a big win.
ii.Turn off unnecessary zone transfers!
It's often interesting to measure the number of zone transfers any
nameserver that is
authoritative for a zone actually does. I've measured 65,000 in an
hour, and blocking the
unauthorised transfers knocked around 15% of CPU usage off of the
named process. (down to
10% of a UltraSPARC 1).
Do this either in the firewall or you can use the allow-transfer
directive in the options
section of the named.conf file and watch the number of zone transfers
that result.
These are common configuration "misses" that really do make a
difference in the way that the
daemons run.
--------------------
Linux kernel:
-------------
kernel network
There are a few settings to be tuned in /proc/sys/net/ipv4 for
intranet servers (and,
unfortunately, benchmarks). Linux 2.2.x supports, and utilizes TCP
options fully, including
window scaling, TCP timestamps, and selective acknowledgements. These
help to get much better
general responsivity/bandwidth utilization on typical internet links,
i.e. lossy, congested,
long-delay, erratic connections. But using these on an intranet where
you can keep an eye on
congestion might not help things. When you are sure that almost all
traffic is going to be on
good Ethernet, or similar high quality, high bandwidth networks, you
might want to try
disabling these one by one, or all of them (but don't forget to check
if you really get better
results, usually it's not detectable, if this is the case, revert to
the defaults, they are
better for the network).
Selective acknowledgements and window scaling is expected to not
improve results on LAN's, and
aren't either costing much more than a few bytes per connection
(usually), but time stamping
might get better performance on longer transfers anywhere. Not using
TCP options at all (i.e.
disabling all of them) is a very-very little bit easier for the
kernel.
To disable TCP timestamps:
echo 0 > /proc/sys/net/ipv4/tcp_timestamps
To disable window scaling:
echo 0 >
/proc/sys/net/ipv4/tcp_window_scaling
To disable selective acknowledgements:
echo 0 > /proc/sys/net/ipv4/tcp_sack
To enable either one, just use echo 1 like above. In 2.2.x kernels,
all of them are defaulting
to 1 (enabled), which is generally the best for internet connected
hosts.
Currently Linux ala Redhat 5.2 has a default limit of 256 IP
aliases... bad news for Web
hosting based on IP address.
You can change the alias max by modifying the following file:
/proc/sys/net/core/net_alias_max
--------------
I/O:
--------
I've got 2x performance increases on massive disk I/O operations (like
cloning disks) by
setting the IDE drivers to use DMA and 32-bit transfers. The kernel
seems to use more
conservative settings unless told otherwise.
The commands are
# /sbin/hdparm -c 1 /dev/hda (or hdb, hdc etc)
to use 32-bit I/O over the PCI bus. (The hdparm(8) manpage says that
you may need to use -c 3
for some chipsets.)
Use
# /sbin/hdparm -d 1 /dev/hda (or hdb, hdc etc)
to enable DMA. This may depend on support for your motherboard chipset
being compiled into
your kernel.
You can test the results of your changes by running hdparm in
performance test mode:
# /sbin/hdparm -t /dev/hda (or hdb, hdc etc)
When you've found the optimal settings, you should consider doing a
# /sbin/hdparm -k 1 /dev/hda (or hdb, hdc etc)
to keep these settings across an IDE reset. I've seen the kernel reset
the IDE controller
occasionally and if you don't set -k 1, the other settings will be
reset to defaults and
you'll lose all your performance gains.
The -m option can be used to change the number of sectors transferred
on each interrupt. You
may get additional gains by tweaking this, but it didn't do anything
for me.
--------------------
You can use the -X12 option of hdparm to set the PIO mode of the disk
to PIO mode 4. Should
get you a slight performance gain. Only use this if you have mode 4
disks. Also, once you have
a set of hdparm options that you are happy with, don't forget to put
the line in your
/etc/rc.d/rc.local file to run it every time you reboot the machine.
--------------------
It's a good idea to let everyone know that kernel has 16 bit EIDE
support turned on by
default. Setting the -m option in hdparm to 16 saves you 16 interrupts
to move 16 blocks
because now 16 blocks are being moved per interrupt instead of default
1 (if I am not
mistaken). Setting -m to 16 could actually be detrimental for slower
machines, -m 4 would be
perfect for my old DX50, YMMV (benchmark!).This option gives me
several additional Mb/s on my
ALi chipset machine (ASUS P5A board). I get 12 Mb/s with hdparm -c1
-m16 -a128 compared to
around 6 Mb/S with defaults on a 10G WD drive. Notice, this is WITHOUT
UDMA support (I can't
find ALi M1541 UDMA patches anywhere (though I've heard they exist!)).