<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>itwik&#039;s Blog</title>
	<atom:link href="http://itwik.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://itwik.wordpress.com</link>
	<description>this is strange..</description>
	<lastBuildDate>Thu, 17 Feb 2011 15:23:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='itwik.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>itwik&#039;s Blog</title>
		<link>http://itwik.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://itwik.wordpress.com/osd.xml" title="itwik&#039;s Blog" />
	<atom:link rel='hub' href='http://itwik.wordpress.com/?pushpress=hub'/>
		<item>
		<title>embedded perl in net-snmp is not *interpreted*</title>
		<link>http://itwik.wordpress.com/2011/02/17/embedded-perl-in-net-snmp-is-not-interpreted/</link>
		<comments>http://itwik.wordpress.com/2011/02/17/embedded-perl-in-net-snmp-is-not-interpreted/#comments</comments>
		<pubDate>Thu, 17 Feb 2011 15:21:28 +0000</pubDate>
		<dc:creator>itwik</dc:creator>
				<category><![CDATA[monitoring]]></category>
		<category><![CDATA[snmp]]></category>

		<guid isPermaLink="false">http://itwik.wordpress.com/?p=47</guid>
		<description><![CDATA[lately I started to dig into perl since I need provide some custom statistics via SNMP from one of our servers. The Server is running stock centos 5 with net-snmp which provides easy ways to extend via its embedded perl. Why did I choose perl over python? Easy, this is centos and net-snmp comes with [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=itwik.wordpress.com&amp;blog=13251742&amp;post=47&amp;subd=itwik&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>lately I started to dig into perl since I need provide some custom statistics via SNMP from one of our servers. The Server is running stock centos 5 with net-snmp which provides easy ways to extend via <a href="http://www.net-snmp.org/docs/perl-SNMP-README.html">its embedded perl</a>. Why did I choose perl over python? Easy, this is centos and net-snmp comes with perl bindings &#8211; none for python. Just install ﻿net-snmp-perl and off you go.</p>
<p>Once I passed the cultural shock I managed to come up with a working solution and presented a dynamically assembled table with my own content using <a href="http://www.net-snmp.org/wiki/index.php/Tut:Extending_snmpd_using_perl">this howto</a>. I was developing on the command line and my code was presented correctly. When moving over to net-snmp and snmpwalk however the output became stale. No matter how often I snmpwalked and how long I waited the result table always contained the same values. I was looking for some caching or expiration but without success. In the same time the output from my command line version [<a href="#1">1</a>] was always showing correct output.</p>
<p><span id="more-47"></span></p>
<p>It took me a while and actually I&#8217;m quite embarrassed about it. My SNMP handler function was working with a globally defined hash as a source for the SNMP table. The hash was populated with proper values on script invocation &#8211; which happend every time on command line, but only once on snmpd startup!</p>
<p>So important lesson learned: embedded languages get evaluated on startup and work from their on with their data structures. You require updated information? Make sure the function you call (the handler) updates it.</p>
<p><a name="1">1</a>: In case you wonder how to call the code from command line, simply replace the following section:<br />
<code><br />
$agent-&gt;register($program, $regoid, \&amp;my_snmp_handler);<br />
</code><br />
with<br />
<code><br />
my $rootOID = ".1.3.6.1.4.1.8072.998";<br />
my $regoid = new NetSNMP::OID($rootOID);<br />
$regat = '.1.3.6.1.4.1.8072.998';<br />
$extension = '1';<br />
$mibdata = '/root/passwd';<br />
$delimT='';<br />
$delimV=':';<br />
&amp;my_snmp_handler<br />
</code><br />
from the tutorials example.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/itwik.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/itwik.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/itwik.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/itwik.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/itwik.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/itwik.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/itwik.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/itwik.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/itwik.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/itwik.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/itwik.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/itwik.wordpress.com/47/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/itwik.wordpress.com/47/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/itwik.wordpress.com/47/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=itwik.wordpress.com&amp;blog=13251742&amp;post=47&amp;subd=itwik&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://itwik.wordpress.com/2011/02/17/embedded-perl-in-net-snmp-is-not-interpreted/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/3f107eb07a650066c1e84787f183c8b3?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">itwik</media:title>
		</media:content>
	</item>
		<item>
		<title>FusionIO for OpenNMS arrived</title>
		<link>http://itwik.wordpress.com/2010/06/28/fusionio-for-opennms-arrived/</link>
		<comments>http://itwik.wordpress.com/2010/06/28/fusionio-for-opennms-arrived/#comments</comments>
		<pubDate>Mon, 28 Jun 2010 12:45:36 +0000</pubDate>
		<dc:creator>itwik</dc:creator>
				<category><![CDATA[monitoring]]></category>
		<category><![CDATA[opennms]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://itwik.wordpress.com/?p=26</guid>
		<description><![CDATA[Using OpenNMS to monitor service availability and gather performance data is a pretty cool thing to do. OpenNMS automatically discovers and monitors all newly deployed machines on its own. Since we started to use OpenNMS five or so years ago we were not caught anymore by those &#8220;damn, we forgot to add that node / [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=itwik.wordpress.com&amp;blog=13251742&amp;post=26&amp;subd=itwik&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Using <a href="http://www.opennms.org">OpenNMS</a> to monitor service availability and gather performance data is a pretty cool thing to do. OpenNMS automatically discovers and monitors all newly deployed machines on its own. Since we started to use OpenNMS five or so years ago we were not caught anymore by those &#8220;damn, we forgot to add that node / that service to monitoring&#8221; 3am calls any more &#8211; every node was &#8220;just in&#8221;. What is further more interesting is the amount of performance data that we started to collect on our systems via JMX (we use lots of tomcat/jboss) and SNMP. Currently we&#8217;re collecting 6GB in 35k jrb files from 500 nodes / 1200 interfaces using <a href="http://www.opennms.org/wiki/RRD_store_by_group_feature">store-by-group</a> feature. This results in approximately 200k IO Ops done every 5 minutes. Taking in addition the database load the local disks were not sufficient to handle IO load anymore; The system was busy writing all that data down to disk for more than 5 minutes when the next bunch of requests came in&#8230;</p>
<p><span id="more-26"></span></p>
<p>To solve this we added another 12 GB RAM into the machine and put the jrb files on a ram disk. While this solved the immediate problem to write the data collected by the OpenNMS to disk it opened another can of problem for us: <em>power (and data) loss</em>. We agreed on backing up the ram disk to local disks every 30 minutes, so a small gap during a power outage was acceptable. This also went fine, but eventually the load on the machine went above 20 every now and then during disk backup and the machine became unavailable for minutes.</p>
<p>At that point we had three options:</p>
<ol>
<li>move out /var/opennms to our NetAPP FAS3020</li>
<li>attach two Dell PV220s each with 12 15k 300GB disks</li>
<li>look out for those fast-as-hell PCIe SLC flash drives</li>
</ol>
<p>Extending the workload of the monitoring system to our primary storage backend device was not a very popular idea (and had its chicken-egg problems on its own). Attaching some old stinky hardware via deprecated SCSI just <em>because &#8220;we have it on stock&#8221;</em> also seemed ill fated. After some internal discussion we came up with some promising TCO arguments regarding power consumption, rack space and vendor support contract costs towards management and convinced them to buy us a <a href="http://fusionio.com/products/iodrive/">FusionIO 80GB ioDrive</a> for storing our performance data. This beast is fast enough to do some 100k IO Ops with Single Level Cell (SLC) flash NAND chips connected directly to your PCIe bus. Woah.</p>
<p>So we went out to look for a FusionIO distributor, starting with our bread and butter vendor Dell. However, as always when you need something more than a cheap server with some additional network cards you suffer. They indeed <a href="http://search.dell.com/results.aspx?s=gen&amp;c=us&amp;l=en&amp;cs=&amp;k=ioDrive&amp;cat=all&amp;x=0&amp;y=0">offer the FusionIO on their website</a>, but it is not available within Germany nor China where we have offices. So we went out and found the guys from <a href="http://www.hamburgnet.de/">HamburgNET</a> who had the card on stock and delivered it to us within a couple of days. Too bad for Dell.</p>
<p>So finally the card arrived and we put it into a Dell r620 quad-xeon with 48GB RAM for playing around. After booting CentOS 5.4 we found a new device with lspci:</p>
<pre>[root@xxx ~]# lspci
...
04:00.0 Mass storage controller: Unknown device 1aed:1005 (rev 01)
...</pre>
<p>We checked the fusionio website and downloaded a driver source rpm and run rpmbuild over it. Our kernel is ﻿﻿2.6.18-164.15.1.el5xen, the latest supported kernel release by fusionio is 2.6.18_128.1.6.el5xen, however that did not pose a problem for the driver:</p>
<pre># install needed software
[root@xxx ~]# yum install kernel-devel kernel-headers rpm-build gcc
# build driver
[root@xxx ~]# rpmbuild --rebuild iodrive-driver-source-1.2.7.2-1.0.src.rpm
# install driver
[root@xxx ~]# rpm -ivh /usr/src/redhat/RPMS/x86_64/iodrive-driver-1.2.7.2-1.0_2.6.18_164.15.1.el5xen.x86_64.rpm
# load driver
[root@xxx ~]# depmod -a
[root@xxx ~]# modprobe fio-driver
[root@xxx ~]# lspci
...
04:00.0 Mass storage controller: Fusion-io ioDIMM3 320GB (rev 01)
...</pre>
<p>There it is, /dev/fioa, 80 GB in size.</p>
<p>We will start doing some heavy benchmarking now, let&#8217;s get some numbers for that beast. A first shot to see local write performance with ext2:</p>
<pre>[root@xxx fusionio]# time dd if=/dev/zero of=./bigfile bs=64k
dd: writing `./bigfile': No space left on device
1206304+0 records in
1206303+0 records out
79056330752 bytes (79 GB) copied, 106.389 seconds, 743 MB/s</pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/itwik.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/itwik.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/itwik.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/itwik.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/itwik.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/itwik.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/itwik.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/itwik.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/itwik.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/itwik.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/itwik.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/itwik.wordpress.com/26/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/itwik.wordpress.com/26/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/itwik.wordpress.com/26/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=itwik.wordpress.com&amp;blog=13251742&amp;post=26&amp;subd=itwik&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://itwik.wordpress.com/2010/06/28/fusionio-for-opennms-arrived/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/3f107eb07a650066c1e84787f183c8b3?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">itwik</media:title>
		</media:content>
	</item>
		<item>
		<title>Impressions from the OUCE 2010</title>
		<link>http://itwik.wordpress.com/2010/05/08/impressions-from-the-ouce-2010/</link>
		<comments>http://itwik.wordpress.com/2010/05/08/impressions-from-the-ouce-2010/#comments</comments>
		<pubDate>Sat, 08 May 2010 13:46:23 +0000</pubDate>
		<dc:creator>itwik</dc:creator>
				<category><![CDATA[monitoring]]></category>
		<category><![CDATA[opennms]]></category>

		<guid isPermaLink="false">http://itwik.wordpress.com/?p=22</guid>
		<description><![CDATA[I&#8217;ve been to the OpenNMS User Conference 2010 last week in Frankfurt, Germany. The venue was the same as last year, the Méridien Parkhotel near Frankfurt central station. Although the location itself is good (and the food excellent) the air supply in the conference rooms was bad. Having participated in the African social event the evening [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=itwik.wordpress.com&amp;blog=13251742&amp;post=22&amp;subd=itwik&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been to the <a href="www.opennms-conference.info/">OpenNMS User Conference 2010</a> last week in Frankfurt, Germany. The venue was the same as last year, the Méridien Parkhotel near Frankfurt central station. Although the location itself is good (and the food excellent) the air supply in the conference rooms was bad. Having participated in the African social event the evening before created some tough challanges for my circulatory system..</p>
<div>
<p>Apart from that the conference itself improved over last year, the amount as well as the quality of the presentations and workshops was very good. It was fun to listen to the <a href="http://www.adventuresinoss.com/">mouth of opennms</a> as well as the technical presentations about the new reporting API and the easy ways to extend the OpenNMS by means of beanshell. Sadly the documentation state of OpenNMS is bad, so the conference was a good chance to see what technical solutions are included within the product. Especially the beanshell approach  gives us administrators easy ways to add monitors within the JVM and not by means of forking something to the OS and run some scripts using the GpMonitor. Jay promised to put up his beanshell slides, so once they are up we&#8217;ve got good examples on how to extend the OpenNMS with custom monitors in a very easy manner.</p>
<p>It was good to talk to other people from the community about their use cases for OpenNMS. People from different companies having different organizational structures and policies presented their best practice solutions together with the problems they faced. It was commonly agreed that the project needs some kind of contrib/ space in the wiki where typical example configurations, custom monitors or additionally used software like net-snmp extensions, etc. could be posted. I guess this idea was well retrieved and we will soon see this kind of space .</p>
<p>Overall I think the conference was a huge success and I really enjoyed having been there. If all plays out well we will meet each other next year again!</p>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/itwik.wordpress.com/22/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/itwik.wordpress.com/22/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/itwik.wordpress.com/22/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/itwik.wordpress.com/22/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/itwik.wordpress.com/22/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/itwik.wordpress.com/22/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/itwik.wordpress.com/22/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/itwik.wordpress.com/22/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/itwik.wordpress.com/22/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/itwik.wordpress.com/22/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/itwik.wordpress.com/22/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/itwik.wordpress.com/22/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/itwik.wordpress.com/22/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/itwik.wordpress.com/22/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=itwik.wordpress.com&amp;blog=13251742&amp;post=22&amp;subd=itwik&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://itwik.wordpress.com/2010/05/08/impressions-from-the-ouce-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/3f107eb07a650066c1e84787f183c8b3?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">itwik</media:title>
		</media:content>
	</item>
		<item>
		<title>speeding up postgres onlinebackup compression</title>
		<link>http://itwik.wordpress.com/2010/04/30/speeding-up-postgres-onlinebackup-compression/</link>
		<comments>http://itwik.wordpress.com/2010/04/30/speeding-up-postgres-onlinebackup-compression/#comments</comments>
		<pubDate>Fri, 30 Apr 2010 10:26:00 +0000</pubDate>
		<dc:creator>itwik</dc:creator>
				<category><![CDATA[database]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[shell]]></category>

		<guid isPermaLink="false">http://itwik.wordpress.com/?p=16</guid>
		<description><![CDATA[Recently I stumbled over this blog entry where the benefits of xargs -P are outlined. In case you don&#8217;t know about -P yet, it allows you to specify the number of parallel processes run by xargs. So together with the -n switch you can run an arbitrary number or parallel jobs for your task without [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=itwik.wordpress.com&amp;blog=13251742&amp;post=16&amp;subd=itwik&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Recently I stumbled over <a href="http://www.xaprb.com/blog/2009/05/01/an-easy-way-to-run-many-tasks-in-parallel/">this blog entry</a> where the benefits of xargs -P are outlined. In case you don&#8217;t know about -P yet, it allows you to specify the number of parallel processes run by xargs. So together with the -n switch you can run an arbitrary number or parallel jobs for your task without having to care about job control.</p>
<p>For quite some time I&#8217;m watching our big postgresql datawarehouse onlinebackup being compressed for 4 days, so when I read this it came instantly to my mind that we can drastically reduce the amount of time it takes to bzip the backup file. The changes to our backup script were easy. Instead of tar&#8217;ing everything to one big file I piped the output to split and put it into nice 512MB chunks.</p>
<p>tar c &#8211;ignore-failed-read &#8211;numeric-owner &#8211;exclude &#8216;lost+found&#8217; &#8211;exclude &#8216;pg_xlog&#8217; -f &#8211; /mnt/myPGDATA | split -b 536870912 -d &#8211; myPGDATAbackup.tar.</p>
<p>This command will create myPGDATAbackup.tar.000, myPGDATAbackup.tar.001, myPGDATAbackup.tar.002, etc. files in your current work directory, all maximum 512MB in size.</p>
<p>Afterwards you just execute xargs on this files with CoreCount being the number of cores you want parallel to compress the files:</p>
<p>ls -1 myPGDATAbackup.tar.*|xargs -r -n 1 -P $CoreCount bzip2 -9</p>
<p>You can assign as many cores as you want to your backup job (use CoreCount=0 to use spawn as many processes as files), however you should be careful not to shoot your IO-backend down. After increasing the memory to 512*$CoreCount we were able to hold all currently compressed backup files in the buffer cache, so that the cores do not have to wait for the io subsystem to catch up with read requests.</p>
<p>The performance advancement is amazing. Our onlinebackup compression time went from 49 hours to 5,5 hours! Now we should speed up the data gathering via tar, bit that is another story.</p>
<div id="_mcePaste" style="overflow:hidden;position:absolute;left:-10000px;top:223px;width:1px;height:1px;"><a href="http://dict.leo.org/ende?lp=ende&amp;p=Ci4HO3kMAA&amp;search=advancement&amp;trestr=0x8001">advancement</a></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/itwik.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/itwik.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/itwik.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/itwik.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/itwik.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/itwik.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/itwik.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/itwik.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/itwik.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/itwik.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/itwik.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/itwik.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/itwik.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/itwik.wordpress.com/16/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=itwik.wordpress.com&amp;blog=13251742&amp;post=16&amp;subd=itwik&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://itwik.wordpress.com/2010/04/30/speeding-up-postgres-onlinebackup-compression/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/3f107eb07a650066c1e84787f183c8b3?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">itwik</media:title>
		</media:content>
	</item>
		<item>
		<title>Slow shared memory allocation</title>
		<link>http://itwik.wordpress.com/2010/04/22/slow-shared-memory-allocation/</link>
		<comments>http://itwik.wordpress.com/2010/04/22/slow-shared-memory-allocation/#comments</comments>
		<pubDate>Thu, 22 Apr 2010 14:26:23 +0000</pubDate>
		<dc:creator>itwik</dc:creator>
				<category><![CDATA[database]]></category>
		<category><![CDATA[kernel]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[sybase]]></category>

		<guid isPermaLink="false">http://itwik.wordpress.com/?p=4</guid>
		<description><![CDATA[Recently we were deploying a sybase ASE 12.5.4 64bit on a CentOS 5.3 with kernel 2.6.18-164.11.1.el5 and 48GB RAM available. The ASE was configured to use 32GB shared memory in &#8216;lock shared memory = 1&#8242; and &#8216;allocate max shared memory = 1&#8242; mode. The database was running smooth until we created a ramfs filesystem as [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=itwik.wordpress.com&amp;blog=13251742&amp;post=4&amp;subd=itwik&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Recently we were deploying a sybase ASE 12.5.4 64bit on a CentOS 5.3 with kernel 2.6.18-164.11.1.el5 and 48GB RAM available. The ASE was configured to use 32GB shared memory in &#8216;lock shared memory = 1&#8242; and &#8216;allocate max shared memory = 1&#8242; mode. The database was running smooth until we created a ramfs filesystem as a storage place for our tempdb device files. We created 4 empty files with dd for log and data with a total size of 10GB. The creation of the devices from within the ASE went without problems, the targeted performance boost was measurable.<br />
However, next time we restarted the database it did not come up properly but hung with the following log message:</p>
<p>00:00000:00000:2010/04/17 10:19:58.70 kernel  Using config area from primary master device.<br />
00:00000:00000:2010/04/17 10:19:58.72 kernel  Detected 16 physical CPU&#8217;s<br />
00:00000:00000:2010/04/17 10:19:58.73 kernel  Locking shared memory into physical memory.<br />
<span id="more-4"></span><br />
The system started to behave very weirdly, a &#8216;ps xa&#8217; was hanging, a &#8216;ls /proc&#8217; also got unresponsive. We investigated this issue further with strace and found out that the ASE was hanging while calling mlock(2):</p>
<p>6128  shmget(0x1107fde5, 34359738368, IPC_CREAT|IPC_EXCL|0600) = 98307<br />
6128  shmat(98307, 0, 0)                = ?<br />
6128  mlock(0x2aaaabddd000, 34359738368</p>
<p>After several rounds of config switch flipping we were able to track down the problem to the ramdisk we were using. Since the ramdisk is empty on every reboot we must create the sybase tempdb device files prior to the start of the ASE. If we do not do this, the database will not be able to finish its startup recovery and not boot up properly. However, if we do not create the device files, the database will go past its critical mlock call and startup normally within seconds (up to the point where it tries to read the tempdb device files..). This behavior cropped up regardless whether we use ramfs or tmpfs for ramdisk. To add more confusion to this issue the behavior was only reproducible with shared memory allocations &gt; 16GB.</p>
<p>The big surprise came after we left the database in this state in the evening and came back to work the next day just to discover that the database was up and running happily. Indeed, the mlock call was not hanging forever but it took it 2 hours to finish until the startup of the ASE continued. At this point we involved the sybase technical support, but they weren&#8217;t very helpful as this problem seems to have been new to them as well. Several rounds of strace runs and Q&amp;A mails went around without any result.</p>
<p>Eventually we were able to work around this issue by starting the sybase ASE without the tempdb device files created and putting the database immediately in the background. We executed a &#8220;sleep 4&#8243; on the shell and created the device files just in time for the sybase to start recovery on them. Obviously this is a very crappy workaround and the timing is dependent on a number of factors; But it bought us some time to find the real issue and we were able to hand over the database to the application developers for testing.</p>
<p>I asked a friend of mine to write me a small program which does nothing else then to allocate an arbitrary amount of shared memory and release if afterwards. And indeed, the problem was fully reproducible with this program! The memory allocation was dog slow when grabbing the memory with mlock, so sybase itself was out of the game.<br />
A couple of google rounds later I found an interesting <a href="http://lwn.net/Articles/286485/">article on LWN </a>which was describing the behavior we faced. Together with kernel/Documentation/sysctl/vm.txt I found the kernel tunable /proc/sys/vm/zone_reclaim_mode which was set 2 on our system. Setting this value to 0 made the kernel process mlock way faster and giving us the normal fast startup speed of sybase.</p>
<p>To make a long story short, the kernel tries to allocate my 32GB shared memory sequentially from its memory zone pool. Since we already allocated 10GB RAM to a ramdisk it hits a lot of non-reclaimable, non-swapable pages. Iterating through all 10GB of occupied RAM takes time &#8211; in our case 2 hours. Setting zone_reclaim_mode = 0 makes the kernel skip allocated pages during its iteration, hence the shared memory allocation is faster -&gt; the database starts quickly.</p>
<p>It seems that the kernel developers also <a href="http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-05/msg05709.html">think that zone_reclaim_mode = 0 is a good idea</a> and it will be default in future kernel versions. Let&#8217;s hope sybase and redhat (centos) take this change from upstream and integrate it into their products / solving guides. I also think that this issue could probably hit you on oracle, postgres or other databases as well &#8211; probably anything that allocated shared memory.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/itwik.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/itwik.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/itwik.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/itwik.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/itwik.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/itwik.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/itwik.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/itwik.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/itwik.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/itwik.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/itwik.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/itwik.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/itwik.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/itwik.wordpress.com/4/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=itwik.wordpress.com&amp;blog=13251742&amp;post=4&amp;subd=itwik&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://itwik.wordpress.com/2010/04/22/slow-shared-memory-allocation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/3f107eb07a650066c1e84787f183c8b3?s=96&#38;d=monsterid&#38;r=G" medium="image">
			<media:title type="html">itwik</media:title>
		</media:content>
	</item>
	</channel>
</rss>
