Sunday, December 15, 2013

HIVE with HBASE, No more secrets...

Use the HBaseStorageHandler to register HBase tables with the Hive metastore. You can optionally specify the HBase table as EXTERNAL, in which case Hive will not create to drop that table directly you’ll have to use the HBase shell to do so.

CREATE [EXTERNAL] TABLE foo(...)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
TBLPROPERTIES ('hbase.table.name' = 'bar');


Registering the table is only the first step. As part of that registration, you also need to specify a column mapping. This is how you link Hive column names to the HBase table’s rowkey and columns. Do so using the hbase.columns.mapping SerDe property.


CREATE TABLE foo(rowkey STRING, a STRING, b STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,f:c1,f:c2')
TBLPROPERTIES ('hbase.table.name' = 'bar');


The values provided in the mapping property correspond one-for-one with column names of the hive table. HBase column names are fully qualified by column family, and you use the special token :key to represent the rowkey.

With the column mappings defined, you can now access HBase data just like you would any other Hive data. Only simple query predicates are currently supported.

SELECT * FROM foo WHERE ...;

You can also populate and HBase table using Hive. This works with both INTO and OVERWRITE clauses.

FROM source_hive_table INSERT INTO TABLE my_hbase_table
SELECT source_hive_table.* WHERE ...;



There’s still a little finesse required to get everything wired up properly at runtime. The HBase interaction module is completely optional, so you have to make sure it and it’s HBase dependencies are available on Hive’s classpath.

The installation environment could do a better job of handling this for users, but for the time being you must manage it yourself. Ideally the hive bin script can detect the presence of HBase and automatically make the necessary CLASSPATH adjustments. This enhancement appears to be tracked in HIVE-2055. The last mile is provided by the distribution itself, ensuring the environment variables are set for hive. This functionality is provided by BIGTOP-955.

PROBLEMS GENERALLY FACED

While making  a table with large number of columns, a very common issue is limited size of  column in metastore in hive.  To solve this problem, connect to the configured  database increase the size of this column.

FAILED: Error in metadata: javax.jdo.JDODataStoreException: Add request failed : INSERT INTO COLUMNS (SD_ID,COMMENT,"COLUMN_NAME",TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?)
NestedThrowables:
java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'struct<prop1:int,prop2:int,prop3:int,prop4:int,prop5:int,pro&' to length 4000.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

For metastore configured with derby(default), connect to the Derby database using ‘ij’ and increase the column length to required length.

 ALTER TABLE columns ALTER type_name SET DATA TYPE VARCHAR(8000);


Saturday, August 3, 2013

Apache Web servers alternatives

Web servers have become an important part of today's infrastructure for business, trading, entertainment and information. Some of the Web's sites take millions of hits every day, or even every hour. With all legacy server running Apache server for PHP,  all of us ignore the issues with it. Later landing in a situation where we solve a lot of load and memory issues limiting the performance. Apache server being one among the most experienced in the field of performance and feature, we generally ignore other alternatives to it.
Besides  good practice to  separate responsibilities to different servers, we ignore the fact we don't need MBs of forked process to serve simple files . We should not use  sward where simple talks can work. So instead of making use of heavy processes just to serve some static content i.e. to use the same service to serve dynamic and static pages.

How to measure web server performance?

  • First, Make sure your web server is tuned up for the maximum performance.
  • Turn off throttling by setting Outbound Bandwidth to 0.
  • Set the value of Max Keep-Alive Requests to a big number like 100000.
  • Set Max Connections as high as possible.
  • Set Connection Soft Limit and Connection Hard Limit to 1000 or higher in Per Client Throttling Control, depending on how many concurrent connections you need from one IP address.
  • Set Follow Symbolic Link to Yes and Restrained to No for the testing virtual host

There are three commonly used tools for benchmark: ApacheBench, Httperf and AutoBench.

Using ApacheBench:

ApacheBench is a command line performance-testing tool bundled with Apache httpd. It can simulate hundreds of HTTP/1.0 clients simultaneously accessing same resource on the server.
You can simply run it with command:
 ab -n 10000 -c 100 http://localhost:8088/index.html

or you can use keep-alive requests by
 ab -n 10000 -c 100 -k http://localhost:8088/index.html

For detailed information, please check Apache document.

Using Httperf:
You can get httperf from http://www.hpl.hp.com/personal/David_Mosberger/httperf.html.
Httperf uses HTTP/1.1 protocol by default and always use keep-alive requests. It has more command options, for detailed information please refer to its document.
Here is an example:
./httperf --server localhost --port 8088 --uri /index.html --rate 1000 --num-conn 100 --num-call 100 --timeout 50


Using Autobench:
Autobench is a simple Perl script calling httperf that automates the benchmark process of a web server.
You can get autobench from http://www.xenoclast.org/autobench/ For detailed information, please refer to its document.

You can use a tiny, lightning fast server to handle static documents & images, and pass any more complicated requests on to Apache on the same machine. This way Apache won't tie up its multi-megabyte processes serving simple streams of bytes. You can have Apache only get used, for example, when a php script needs to be executed.

TUX / "Red Hat Content Accelerator (atleast 6 times faster than Apache )
Red Hat Content Accelerator is a kernel-based Web server. It is currently limited to serving static webpages and coordinating with kernel-space modules, user-space modules, and regular user-space Web server daemons to provide dynamic content. Regular user-space Web servers do not need to be altered in any way for Red Hat Content Accelerator to coordinate with them. Red Hat Content Accelerator can serve static content very efficiently from within the Linux kernel.
Red Hat Content Accelerator also has the ability to cache dynamic content. To respond to a request for dynamic data, a Red Hat Content Accelerator module can send a mix of dynamically-generated data and cached pre-generated objects, taking maximal advantage of Red Hat Content Accelerator's zero-copy architecture.
A quick start  demo of installing and starting the service is mentioned here
The main differences between TUX and other webservers include:

  1. TUX runs partly within a customized version of the Linux kernel and partly as a userspace daemon.
  2. With a capable network card, TUX enables scatter-gather DMA from the page cache directly to the network.
  3. TUX is only able to serve static web pages.



kHTTPd
kHTTPd is a http-daemon (webserver) for Linux. kHTTPd is different from other webservers in that it runs from within the Linux-kernel as a module (device-driver). kHTTPd handles only static (file based) web-pages, and passes all requests for non-static information to a regular userspace-webserver such as Apache or Zeus. kHTTPd is actually not much different from a normal http dæmon in principle. The main difference is that it bypasses the syscall layer. “Accelerating” the simple case of serving static pages within the kernel leaves user-space dæmons free to do what they are very good at: generating user-specific, dynamic content. A user-space web server such as Apache, typically loaded with many features and many execution paths, can't be as fast as kHTTPd. There are, however, a few web servers that are as simple as kHTTPd but implemented in user space, so they are not expensive consumers of processor cycles, even compared with kHTTPd.
README is pretty clean and can get you a quick start

thttpd
thttpd is a simple, small, portable, fast, and secure HTTP server. Besides a few use cases for thttpd, but there are several other scenarios in which it may be useful:
  • Slow machines--Machines with old hardware that are able to run a Unix-like operating system (such as NetBSD) are often powerful enough to serve static content with thttpd.
  • Heavily loaded machines--You may have a very powerful machine that runs other heavy processes, such a DBMS, yet you need it to serve some web content (statistics, for example). In this case, thttpd can do well.
  • Simple requirements--Sometimes you may want to provide web content to the public, but you do not need much of the fancy stuff provided by powerful servers such as Apache. In this case, a lightweight server may be enough for your needs. Furthermore, given their smaller code sizes, they have fewer chances to fail and you can audit their code more easily.
  • Serving static content alongside a powerful server--You may have a server running Apache with a load of modules, parsing very complex dynamic pages. These pages often need to include other files, most commonly static images. In this case, thttpd can serve the static data alongside Apache, which will exclusively handle the complex content. There is a section dedicated to this specific use case later.
thttpd notes can help  and get you started


LiteSpeed Web Server
LiteSpeed Web Server is the leading high-performance, high-scalability web server. It is completely Apache interchangeable so LiteSpeed Web Server can quickly replace a major bottleneck in your existing web delivery platform. With its comprehensive range of features and easy-to-use web administration console, LiteSpeed Web Server can help you conquer the challenges of deploying an effective web serving architecture.

LiteSpeed Web Server has superior performance in terms of both raw speed and scalability. Our benchmarks demonstrate that it is more than 6 times faster than Apache. When serving static content, LiteSpeed surpasses well-respected content accelerators including thttpd, boa and TUX. When it comes to dynamic content, LiteSpeed is more than 50% faster in PHP content delivery than Apache with mod_php. Don't just take our word for it. Download the LiteSpeed free or trial version and experience it for yourself.


Caudium & Roxen
Caudium is a Web server based on a fork. The server is written in C and Pike, and Pike is also used to create extensions to the server. Caudium differs from Apache in many ways including the directory structure, programming language, and type of configuration.
scrolling down to http://www.tldp.org/HOWTO/Caudium-HOWTO/index.html can help getting started.

If you are worried about running Dynamic pages
GWAN is the fastest  till date
G-WAN powers the next-generation, massively-scalable EON, Inc PaaS able to deploy the most demanding Web Applications using a variety of programming languages in an elastic, fail-safe, and remarkably efficient CloudG-WAN is known to  an hello.java with 10x less CPU and 24x less RAM handling 11x more requests in 13x less time than Apache Tomcat on a 6-Core. Many other languages (PHP, C#, JS...) benefit even more.

 [1-1000 range] hello world (dynamic contents, HTTP keep-alives)
 
G-WAN + C     Average RPS:801,585 Time:1,551 seconds [00:25:51]
G-WAN + Java  Average RPS:759,726   Time: 1,648 seconds [00:27:28]
G-WAN + Scala Average RPS:757,767   Time: 1,660 seconds [00:27:40]
G-WAN + JS    Average RPS:768,659   Time: 1,696 seconds [00:28:16]
G-WAN + Go    Average RPS:784,113   Time: 1,892 seconds [00:31:32]
G-WAN + Lua   Average RPS:757,588   Time: 1,920 seconds [00:32:00]
G-WAN + Perl  Average RPS:782,088   Time: 1,977 seconds [00:32:57]
G-WAN + Ruby  Average RPS:778,087   Time: 2,054 seconds [00:34:14]
G-WAN + PythonAverage RPS:774,180   Time: 2,110 seconds [00:35:10]
G-WAN + PHP   Average RPS:613,389   Time: 2,212 seconds [00:36:52]

 Tomcat         Average RPS: 76,556   Time:20,312 seconds [05:38:32]
 Node.js        Average RPS: 14,209   Time:80,102 seconds [22:15:02]
 Google Go      Average RPS: 12,801   Time:84,811 seconds [23:33:31]

Performance report


ConcurrencyApache 1.3.33Apache 2.0.55thttpd 2.25bLighttpd 1.4.8LiteSpeed 2.1 StandardLiteSpeed 2.1 EnterpriseLiteSpeed 2.1 Enterprise (2 Clients)
1205921512296235526312620-
109731107051118511566115271587820100
1009241100641189711558131731767427312
2009581100101173011920131781675026808
500866495431154011539120321603525213
1000859092361086710570114421523224820

Small Static File (Non-Keepalive) Benchmark

ConcurrencyApache 1.3.33Apache 2.0.52thttpd 2.25bLighttpd 1.4.8LiteSpeed 2.1 StandardLiteSpeed 2.1 EnterpriseLiteSpeed 2.1 Enterprise (2 Clients)
1367836412288399255465540-
1015782167141076015218217203776442000
10015384159261196814909216875672191352
20014825152161158014865216635841193143
50014734150941123013888215055704583010
100013478142411090614056174035213781050




Saturday, July 27, 2013

whats hot 4.3 jelly beans

As expected, Google officially confirmed Android 4.3 at its event on Wednesday with Android chief Sundar Pichai. Rolling out the updates to its pet devices, user like me are now able to see 4.3 in action in our devices since this friday. Among the new features/improvements in the update are a redesigned camera interface, Bluetooth Low Energy support, performance improvements such as smoother animations, and multi-user restricted profiles. But there’s apparently something else that Google didn’t talk about. Android Police has unearthed a hidden app permissions manager that allows users to selectively disable certain permissions for apps.


New Camera UI


Android 4.3 also offers a new updated Camera app that features a new arch based menu which makes it easier to control and switch camera settings.

Bluetooth Low Energy support



You may not know it, but a whole new family of Bluetooth devices have been arriving. What makes them different from their predecessors is Bluetooth Smart Ready. These are designed as sensors. So, for example, one might check if all windows are locked, while another might measure your heart rate. You get the idea.

Android 4.3  features some Bluetooth updates that let you pair an Android device with low-power gadgets like these sensors. During Google's presentation, we saw an Android device connecting with a Bluetooth Smart-enabled heart-rate monitor that was being powered by the popular Runtastic fitness app.

The update also came with Bluetooth AVRCP 1.3 support, which lets your device now transmit metadata, like a song's title and artist, to Bluetooth controllers.

In Android 4.3, with application programming interface (API) support for Bluetooth Generic Attribute Profile (GATT) services, you can create Android apps that will support these devices. This represents a new and potentially very profitable market for Android developers and their Bluetooth hardware partners.



Multi-user restricted profiles

The biggest addition to Android 4.3 is the Multi-User Restricted Profiles feature, which lets you control the usage of apps and other content on a user level. Multiple user profiles were already available in 4.2.2, but the ability to create restrictions has long been requested, so it's sure to be a big hit.
This feature is for users who have kids. Android has allowed you to have multiple users for some time now, but with this version you can finally have restricted profiles. Technically, it means that you can set up separate environments for each user with fine-grained restrictions in the apps that are available in those environments. Keep junior out of your, ah, questionable apps or Web sites. 
Each restricted profile offers an isolated and secure space with its own local storage, home screens, widgets, and settings. Unlike with users, profiles are created from the tablet owner’s environment, based on the owner’s installed apps and system accounts. The owner controls which installed apps are enabled in the new profile, and access to the owner’s accounts is disabled by default

 Open GL ES 3.0

A big deal for gamers, Open GL ES 3.0 makes the new version of Android more efficient and just plain better at displaying graphics. Google's demo showed us impressive textures, lens flares, and reflections that the older OS would have had trouble displaying. While the upgraded graphics might be indiscernible to the average user, Open GL ES support is still important because of the new possibilities it opens up for developers. Game developers can now take advantage of OpenGL ES 3.0 and EGL extensions as standard features of Android, with access from either framework or native APIs.

New media capabilities

A modular DRM framework enables media application developers to more easily integrate DRM into their own streaming protocols such as MPEG DASH. Apps can also access a built-in VP8 encoder from framework or native APIs for high-quality video capture.

Notification access

Your apps can now access and interact with the stream of status bar notifications as they are posted. You can display them in any way you want, including routing them to nearby Bluetooth devices, and you can update and dismiss notifications as needed.

Improved profiling tools


 New tags in the Systrace tool and on-screen GPU profiling give you new ways to build great performance into your app.

Permission Manager

There is an app available in the Google Play Store called “Permission Manager” and installing this will grant you access to the App Ops functionality. The real question will be whether you want and/or need access to App Ops. For that, read on to see just what can be done using it. In short, App Ops will allow you to set permissions based on individual apps.


Notification Access
People love those notifications at the top of their Android display. I know I do. I'm constantly checking them. Until this new version of Android appeared developers couldn't access this data stream. Now they can. That is, if you, the user, allow them to.
What developers can do is register a notification listener service that, with your blessing, will receive all the data notifications when they're displayed in the status bar. Developers can then launch applications or services for a new class of "smart" apps.
Better Digital Rights Management (DRM)
Google has also added new media DRM framework APIs and improved the existing ones to provide  an integrated set of services for managing licensing and provisioning, accessing low-level codecs, and decoding encrypted media data.
The net effect of these changes is it will make DRM easier to manage and it should make video streams with DRM, which are pretty much all of them these days, look and play better. Like I said, Google is making the best of an annoying commercial video necessity.
OK, go ahead and boo. I know you want too. I hate DRM too. But, here's the painful truth, DRM is here to stay and we might as well try to make the best of it.
That's exactly what Google has done with its new modular DRM framework. This will enable developers to more easily integrate DRM into their own streaming protocols such as MPEG Dynamic Adaptive Streaming over HTTP (DASH) (PDF Link).

Keyboard & input

Android 4.3 comes with an upgraded algorithm for tap-typing recognition that makes text input easier while chatting via messages or even while composing emails. It also brings a new emoji keyboard, which we've previously seen in iOS. The update also adds lower latency input for gamepad buttons and joysticks.
















server performance improvemevt tweek

                                                              

Tickless System

Previously, the Linux kernel periodically interrupted each CPU on a system at a predetermined frequency — 100 Hz, 250 Hz, or 1000 Hz, depending on the platform. The kernel queried the CPU about the processes that it was executing, and used the results for process accounting and load balancing. Known as the timer tick, the kernel performed this interrupt regardless of the power state of the CPU. Therefore, even an idle CPU was responding to up to 1000 of these requests every second. On systems that implemented power saving measures for idle CPUs, the timer tick prevented the CPU from remaining idle long enough for the system to benefit from these power savings.

The tickless kernel feature allows for on-demand timer interrupts. This means that during idle periods, fewer timer interrupts will fire, which should lead to power savings, cooler running systems, and fewer useless context switches.

Kernel option: CONFIG_NO_HZ=y
To set kernel option change kernel option in /boot/config-x.x.x.x-generic

Timer Frequency

You can select the rate at which timer interrupts in the kernel will fire. When a timer interrupt fires on a CPU, the process running on that CPU is interrupted while the timer interrupt is handled. Reducing the rate at which the timer fires allows for fewer interruptions of your running processes. This option is particularly useful for servers with multiple CPUs where processes are not running interactively.
Kernel options: CONFIG_HZ_100=y and CONFIG_HZ=100

Connector

The connector module is a kernel module which reports process events such as forkexec, and exit to userland. This is extremely useful for process monitoring. You can build a simple system to watch mission-critical processes. If the processes die due to a signal (like SIGSEGV, or SIGBUS) or exit unexpectedly you’ll get an asynchronous notification from the kernel. The processes can then be restarted by your monitor keeping downtime to a minimum when unexpected events occur.
Applications that may find these events useful include accounting / auditing (for example, ELSA), system activity monitoring (for example, top), security, and resource management (for example, CKRM). Semantics provide the building blocks for features like per-user-namespace, "files as directories" and versioned file systems.
Kernel options: CONFIG_CONNECTOR=y and CONFIG_PROC_EVENTS=y

Networking

TCP segmentation offload (TSO)

A popular feature among newer NICs is TCP segmentation offload (TSO). This feature allows the kernel to offload the work of dividing large packets into smaller packets to the NIC. This frees up the CPU to do more useful work and reduces the amount of overhead that the CPU passes along the bus. If your NIC supports this feature. TCP offload engine or TOE is a technology used in network interface cards (NIC) to offload processing of the entire TCP/IP stack to the network controller. It is primarily used with high-speed network interfaces, such as gigabit Ethernet and 10 Gigabit Ethernet, where processing overhead of the network stack becomes significant.
sudo ethtool -K eth1 tso on
Data corruption on NFS file systems might be encountered on network adapters without support for error-correcting code (ECC) memory that also have TCP segmentation offloading (TSO) enabled in the driver. Note: data that might be corrupted by the sender still passes the checksum performed by the IP stack of the receiving machine A possible work around to this issue is to disable TSO on network adapters that do not support ECC memory. 
You can check  of it is working using
sudo ethtool -k eth1
netstat -nt | findstr /i offloaded
  TCP    10.100.44.52:49157     1.58.20.40:50442   ESTABLISHED     Offloaded
  TCP    10.100.44.52:49157     1.58.25.15:1191    ESTABLISHED     Offloaded
  TCP    10.100.44.52:49157     1.148.8.6:58308    ESTABLISHED     Offloaded
  TCP    10.100.44.52:49449     1.10.3.2:1025      ESTABLISHED     Offloaded


Intel I/OAT DMA Engine

This kernel option enables the Intel I/OAT DMA engine that is present in recent Xeon CPUs. This option increases network throughput as the DMA engine allows the kernel to offload network data copying from the CPU to the DMA engine. This frees up the CPU to do more useful work.
to check if it is enabled
dmesg | grep ioat
There’s also a sysfs interface where you can get some statistics about the DMA engine. Check the directories under/sys/class/dma/.

Kernel options: CONFIG_DMADEVICES=y and CONFIG_INTEL_IOATDMA=y and CONFIG_DMA_ENGINE=y and CONFIG_NET_DMA=y and CONFIG_ASYNC_TX_DMA=y



Direct Cache Access (DCA)

Intel’s I/OAT also includes a feature called Direct Cache Access (DCA). DCA allows a driver to warm a CPU cache. A few NICs support DCA, the most popular (to my knowledge) is the Intel 10GbE driver (ixgbe). Refer to your NIC driver documentation to see if your NIC supports DCA. To enable DCA, a switch in the BIOS must be flipped. Some vendors supply machines that support DCA, but don’t expose a switch for DCA.
If that is the case, see blog post for how to enable DCA manually.
dmesg | grep dca
dca service started, version 1.8

If DCA is possible on your system but disabled you’ll see:
ioatdma 0000:00:08.0: DCA is disabled in BIOS
Which means you’ll need to enable it in the BIOS or manually.
Kernel option: CONFIG_DCA=y


NAPI

New API (also referred to as NAPI) is an interface to use interrupt mitigation techniques for networking devices in the Linux kernel. Such an approach is intended to reduce the overhead of packet receiving. The idea is to defer incoming message handling until there is a sufficient amount of them so that it is worth handling them all at once.
High-speed networking can create thousands of interrupts per second, all of which tell the system something it already knew: it has lots of packets to process. NAPI allows drivers to run with (some) interrupts disabled during times of high traffic, with a corresponding decrease in system load.
When the system is overwhelmed and must drop packets, it’s better if those packets are disposed of before much effort goes into processing them. NAPI-compliant drivers can often cause packets to be dropped in the network adaptor itself, before the kernel sees them at all. 
NAPI was first incorporated in the 2.5/2.6 kernel but was also backported to the 2.4.20 kernel. Note that use of NAPI is entirely optional, drivers will work just fine (though perhaps a little more slowly) without it. A driver may continue using the old 2.4 technique for interfacing to the network stack and not benefit from the NAPI changes. NAPI additions to the kernel do not break backward compatibility.
Many recent NIC drivers automatically support NAPI, so you don’t need to do anything. Some drivers need you to explicitly specify NAPI in the kernel config or on the command line when compiling the driver.

To check your driver
# ethtool -i eth0
driver: e1000e
version: 2.1.4-k
firmware-version: 0.13-3
bus-info: 0000:00:19.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no


Performance under high packet load

PsizeIppsTputRxintTxintDoneNdone
60890000409362172762276823
128758150464364219301107738
25644563277464642155072112906
512232666994445241292191472411921062
10241190611000003872519192588725110
1440851931000003946576195059465690


A driver may continue using the old 2.4 technique for interfacing to the network stack and not benefit from the NAPI changes. NAPI additions to the kernel do not break backward compatibility.

Enable NAPI

Downolad the latest driver version by visting the following url:
  1. Linux kernel driver for the Intel(R) PRO/100 Ethernet devices, Intel(R) PRO/1000 gigabit Ethernet devices, and Intel(R) PRO/10GbE devices.
To enable NAPI, compile the driver module, passing in a configuration option:
make CFLAGS_EXTRA=-DE1000_NAPI install
Once done simply install new drivers.
See Intel e1000 documentation for more information 

Some drivers allow the user to specify the rate at which the NIC will generate interrupts. The e1000e driver allows you to pass a command line option InterruptThrottleRate
when loading the module with insmod. For the e1000e there are two dynamic interrupt throttle mechanisms, specified on the command line as 1 (dynamic) and 3 (dynamic conservative). The adaptive algorithm traffic into different classes and adjusts the interrupt rate appropriately. The difference between dynamic and dynamic conservative is the the rate for the “Lowest Latency” traffic class, dynamic (1) has a much more aggressive interrupt rate for this traffic class.
insmod e1000e.o InterruptThrottleRate=1

oprofile

OProfile is a profiling system for Linux 2.6 and higher systems on a number of architectures. It is capable of profiling all parts of a running system, from the kernel (including modules and interrupt handlers) to shared libraries to binaries. OProfile can profile the whole system in the background, collecting information at a low overhead. These features make it ideal for profiling entire systems to determine bottle necks in real-world systems.
Many CPUs provide "performance counters", hardware registers that can count "events"; for example, cache misses, or CPU cycles. OProfile provides profiles of code based on the number of these occurring events: repeatedly, every time a certain (configurable) number of events has occurred, the PC value is recorded. This information is aggregated into profiles for each binary image.
oprofile is a system wide profiler that can profile both kernel and application level code. There is a kernel driver for oprofile which generates collects data in the x86′s Model Specific Registers (MSRs) to give very detailed information about the performance of running code. oprofile can also annotate source code with performance information to make fixing bottlenecks easy. See oprofile’s homepage for more information.
Kernel options: CONFIG_OPROFILE=y and CONFIG_HAVE_OPROFILE=y

epoll

epoll(7) is useful for applications which must watch for events on large numbers of file descriptors. The epollinterface is designed to easily scale to large numbers of file descriptors. epoll is already enabled in most recent kernels, but some strange distributions (which will remain nameless) have this feature disabled.
Kernel option: CONFIG_EPOLL=y