Nowadays ,especially during and even after the time of economic crisis, sysadmins are dealing with one big problem that is usually defined by their business managers - “How to squeeze more and more computing power from server farm “. As during the “.com era” big hardware players were generally rushing towards 1 faster and faster CPU as well as larger amount of RAM. But now that we have multi-cpu servers which demand more and more power, we pack more cores into one central processing unit and build faster SSD hard drives, we also build multi-server farms that allow us to get more computing power that ever imagined. But what is the outcome of this process ?
- In order to control the environment ,we developed automation tools for extremely large datacenters . Puppet is very good example,
- We developed high level load balancers like HAproxy or Linux Virtual Server that guarantee coherency and scalability of our server farms,
- We learned how to synchronize geographically dispersed datacenters with complex clustered filesystems that run gently over tcp/ip stack,
- Long time ago we observed that it is generally good idea to avoid very expensive context switches, so we decided to use hardware-partitioning so different types of IO reside on separate boxes,
- We started to separate even so high-level applications like memory buffer stores (memcached) and data storage nodes (mysql ndb) or implement query-partitioning so our databases work faster and faster,
- As system bottlenecks appear, we learned how to wisely choose hardware so there’s no chance they occur.
We may continue building this list, but taking into account the general hardware-driven development of sys-utils that are the daily bread of every sysadmin life, we totally forgot about very powerful tools that reside inside the linux kernel. So, usually when bottlenecks appear, we act after the problem occurs. How about being more proactive ?
Generally when we are lacking computing power, we add more servers to the rack and everybody’s happy. But what can we do when adding more servers is not an option ? Optimizing applications code ? Rearranging system services and building the system layout from scratch ? Finally – telling our customer that we can’t bear more load. This is of course not possible . Let’s try to look inside the kernel and see what can we squeeze out of it on the very simple example on one instance of Cherokee www server . Everything fulfills one assumption – let it be simple and less time consuming than building our server-farm from scratch.
Research will be done by benchmarking static content throughput via cherokee webserver on Debian Squeeze with various kernel configuration using apache benchmark (ab) that is generally the best and simplest tool for measuring performance of any http application.
We will be tampering with CONFIG_HZ option in kernel. But to fully understand the optimization, we have to understand what is “Timer frequency” and HZ setting in kernel:
HZ is the frequency with which the system’s timer hardware is programmed to interrupt the kernel . Much of the kernel’s internal housekeeping, including process accounting, scheduler time slice accounting, and internal time management, is done in the timer interrupt handler. Thus, the frequency of the timer interrupt affects a number of things; in particular, it puts an upper bound on the resolution of timers used with the kernel. If HZ is 1000 (the i386 default for 2.6 kernels through 2.6.12), then timers will have a best-case resolution of 1ms. If, instead, HZ is 100 (the 2.4 and prior default), that resolution is 10ms. In the other words – setting right HZ will tell our system how frequent it should wake up and see if there’s something to do. Setting higher timer frequency of 1000Hz would enable us to achieve more responsive system better for desktops, while it will be not efficient during large I/O operations because it will spend most of the time dividing big I/O blocks into small pieces of time. We have to remember that interrupt handler ,that in this case will be woken up every 1ms, costs some CPU resources too. So there’s simple image – higher frequency = lower latency + smaller throughput and on the other hand we have opposite sittuation – lower frequency = less responsiveness and larger throughput. There are times, however, when interrupt can be unwelcome. Many processors, when idle, can go into a low-power state until some work comes along. To such processors, the timer interrupt looks like work.
Above concept is nicely illustrated on the picture 1.
Let’s see how will Timer Frequency setting affect Cherokee server performance.
Below is our system configuration :
- Debian stock kernel that comes with fresh installation “Linux debian 2.6.32-5-686 #1 SMP Sat Jul 24 02:27:10 UTC 2010 i686 GNU/Linux” on Debian Squeeze
- Cherokee www server installed from repositories via apt-get with no keepalives, no chunked encoding, error-log and access-log disabled, with dynamic thread policy and enabled I/O cache. Everything can be set easily through beautiful cherokee-admin that comes with the server .
Let’s do the simplest test simulating 1000 customers at a time that want to do 10 static content requests each and repeat our test to achieve the average result. Test are performed on 4 variants of kernel :
- Stock kernel with no modifications ( NO_HZ )
- Stock kernel with CONFIG_HZ=1000
- Stock kernel with CONFIG_HZ=100
- Custom minimalist kernel with NO_HZ – only needed drivers were left inside the kernel and I tried to remove as many unneeded features as possible
I’ve prepared 3 test for each kernel – 2kb file (picture 2) ,80kb file (picture 3) and 10mb file (picture 4) .
- In diagram we can see that NO_HZ and Custom kernel win the competition (the more the better).
- Again on diagram 3 we have the same sittuation – the more the better.
- Last test is very interesting – it appears that Custom kernel with many optimizations won the competition.
Let’s summarize..
During the tests I’ve found that optimizing the kernel for bigger throughput , we can gain up to 20% of performance. “Custom configuration” that won the competition was created with menuconfig following below steps :
- Remove every unneeded kernel interface for external ,usually unused, application – for example Auditing Support (Selinux) , Profiling support, Virtualization support
- Optimizations for embedded systems (huh is bi-Xeon 32GB ram server embedded ? ) and security optimizations like “Heap randomization“
- Remove any no-latency and preemptive optimizations like High Resolution Timer Support, HPET timer, SMT (hyperthreading) which is really obsolete now
- Choose “Server” Preemption mode as we don’t want system to have Desktop-style responsiveness
- Remove any hardware unneeded features like old buses (ISA/PCI)
- Remove any security options (do we really need any system security on our large farm member behind firewall ? )
- Disabling any unneeded TCP/IP options like support for QoS etc. will also help – let the switches and routers do the job
- Leave NO_HZ option so our system will decide how to dynamically assign Timer Frequency – it may also save up to 15% of electric power.
Do You remember 90′s when You recompiled old 2.2 and 2.4 kernels to remove every unneeded module, and generally keep the kernel small ? This approach is still alive! As You can see we can gain so much computing power with the same hardware that server costs could be reduced almost by 20% and also monthly electric-energy costs can be reduced by 10% ! We have to remember that stock kernels are build by system developers that take many tradeoffs to fulfill “good-for-all” concept.
But is “good-for-all” good for You ?




Very detailed and interesting article. Did you also try to adjust the value for “netdev_max_backlog”? I would be interested how it would affect the results. Based on the calculation described on this page: http://tldp.org/HOWTO/KernelAnalysis-HOWTO-5.html