<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>IT From All Angles</title>
	<atom:link href="http://blog.allanglesit.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.allanglesit.com</link>
	<description>Trading Management For Administration One Idea At A Time</description>
	<lastBuildDate>Tue, 21 Feb 2012 18:53:46 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Adventures in ZFS: Storage Sizing</title>
		<link>http://blog.allanglesit.com/2012/02/adventures-in-zfs-storage-sizing/</link>
		<comments>http://blog.allanglesit.com/2012/02/adventures-in-zfs-storage-sizing/#comments</comments>
		<pubDate>Tue, 21 Feb 2012 12:00:35 +0000</pubDate>
		<dc:creator>matthew.mattoon</dc:creator>
				<category><![CDATA[General Information]]></category>
		<category><![CDATA[How To]]></category>
		<category><![CDATA[Solaris]]></category>
		<category><![CDATA[ZFS]]></category>
		<category><![CDATA[adventures in zfs]]></category>
		<category><![CDATA[planning]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[storage sizing]]></category>
		<category><![CDATA[zfs]]></category>

		<guid isPermaLink="false">http://blog.allanglesit.com/?p=854</guid>
		<description><![CDATA[One of the most complex parts of storage in general and ZFS in particular is correctly assessing the amount and types of storage that you will need to meet your requirements.  You can of course just purchase what you can afford and hope for the best.  But after reading this article you can relatively easily [...]]]></description>
			<content:encoded><![CDATA[<p>One of the most complex parts of storage in general and ZFS in particular is correctly assessing the amount and types of storage that you will need to meet your requirements.  You can of course just purchase what you can afford and hope for the best.  But after reading this article you can relatively easily determine approximately how much storage you should plan for based on a few factors from your existing environment, this article will primarily focus on ZFS, however alot of this is true in other storage platforms as well&#8230;</p>
<p><strong>Factor One &#8211; Use Case</strong></p>
<p>The use case is the single most important factor, this basically tells you how you plan on using the data.  Is this going to serve files (NFS or CIFS), will you be using it to serve block devices (Fibre Channel or iSCSI), or perhaps you will be using this as a consolidation point of multiple sources of data for use in a backup scenario.  The answer to this question will primarily determine if you are using a ZIL or Cache device or if you are using spares.  This of course will steal slots (where disks could go) so you will want to factor that into your larger capacity planning.</p>
<p><em>Physical Capacity</em></p>
<p>Obviously the machine that you buy will have a limit to how many disks you can fit in it.  After that you can plan on adding expansion chassis.  You need to make sure you understand how your zpools will be formatted and the types of disks that you will be using so that you can begin to see how much data you will be able to squeeze onto the spindles.  One easy thing to overlook is the need for drive bays for other purposes (spares, log, cache, system drives, etc).  If you plan on expanding with an additional chassis, will you use up all of your expansion slots with various cards (10Gb, FibreChannel, NICs, SSD, etc).  Also when considering disks you need to keep in mind that capacity is not the only metric to consider.  You also need to keep in mind that your workload will determine how performant the solution must be to meet requirements, for example a backup server will not require the same performance as a storage head which is handing out block devices which have Operating Systems installed on them (read: Virtual Machine environments).</p>
<p><em>Spare Devices</em></p>
<p>Depending on your use case you might need to consider allocating some drives as spares to allow you the time to procure and/or replace failed drives.  This will simply protect you from additional failure(s).  If you have the space I like to have 1 spare per 12-15 acting as a spare.  This really straight forward, just make sure that as part of any large system that the company accepts the risks involved with your decision, if they are not willing to accept the risks then provide them with an option which will protect them from that risk.  They will then either pay for the lower risk system or accept the risk of the cheaper system.</p>
<p><em>Cache Devices</em></p>
<p>Cache devices (L2ARC) is a really quick and cheap way to increase your read iops on a system wide basis.  Basically as data is read off of ZFS it is read through the cache device, then when subsequent read requests for the data come in then they can be served directly off of the cache, this works in combination with the ARC which is where all of your memory goes in ZFS.  Now a cache device is really necessary if you are using deduplication.  When you enable deduplication all subsequently written files are indexed into the dedup table, before a write can be committed it must be compared to items that are already in the dedup table (to see if anything else is the same &#8211; thus if it should actually be written to disk or if a pointer to the previously existing file will exist) initially this will not pose a problem as your dedupe table will be relatively small and it can be stored in the 25% of the ARC that it is relegated to.  However at some point you will run out of space in the ARC and if you do not have a L2ARC then your dedup table will be swapped which of course means that it will be holding up writes to disk, in other words it will be slow.  Since a cache device&#8217;s purpose is to speed up reads, then you want to look for a SSD which is slanted towards better performance on the read side.  I like Crucial M4 for this purpose, but please keep in mind they had a <a title="Adventures in ZFS: Crucial M4 Firmware SMART Issue" href="http://blog.allanglesit.com/2012/01/adventures-in-zfs-crucial-m4-firmware-smart-issue/" target="_blank">serious firmware bug</a> in a previous version so make sure you end up with v0309 on the disk and update to it if you don&#8217;t.</p>
<p><em>Log Devices</em></p>
<p>Log devices (ZFS Intent Log) is a really cool way to speed up technologies which utilize synchronous writes.  Synchronous writes are used to ensure that data is committed to the disk before additional data is sent, it essentially serializes the connection.  Most commonly you will find synchronous writes in databases, NFS and iSCSI.  So if you are planning on using one of these technologies then you will want to consider using a mirrored ZIL device.  Now I am sure someone out there is saying, &#8220;Whoa mirrored?  But that takes up two slots man&#8230;&#8221; and you would be correct, however keep in mind that if you have a failure of your ZIL, you have actual data committed to it which has not been committed to disk (read: data loss).  As such you really ought to have a mirrored ZIL or none at all, you can also stack multiple mirrors as a ZIL if you had a higher synchronous writes workload.  Since log devices are meant to improve writes (albeit a certain type of write) then you will want to look for a SSD which is slanted towards better write performance.  I tend to like OCZ Vertex 3 for this purpose.</p>
<p><strong>Factor Two &#8211; Data Growth</strong></p>
<p>Now we get into actually sizing data.  When you try to project growth you need to make sure that (1) you take use an appropriately sized sample set of data (2) factor in any relevant business or technical factors which could have skewed your sample.  I like to size my storage based off of backup size.  Simply grab the size of your full backups for the data in question, if you are going to use this to serve block devices collecting this data becomes a bit more complex, because now you are talking more of a machine sprawl type of growth which can be harder to account for.  Here is are a few formulas to get you started (make sure you are using the same metric, i.e. B, KB, MB, GB, TB).</p>
<p><em>Calculate Weekly Growth</em></p>
<pre class="qoate-code">
(full_week1 - full_week0) = growth_week1
(full_week2 - full_week1) = growth_week2
</pre>
<p><em>Calculate Weekly Growth Rate</em></p>
<pre class="qoate-code">
(growth_week1 / full_week0) = growth_rate_week1
</pre>
<p><em>Average Weekly Growth</em></p>
<pre class="qoate-code">
(growth_week1 + growth_week2 + growth_week3 + growth_week4) / number_of_weeks = average_weekly_growth
</pre>
<p><em>Average Weekly Growth Rate</em></p>
<pre class="qoate-code">
(growth_rate_week1 + growth_rate_week2 + growth_rate_week3 + growth_rate_week4) / number_of_weeks = average_weekly_growth_rate
</pre>
<p><em>Annualized Weekly Growth</em></p>
<pre class="qoate-code">
(average_weekly_growth * 53) = annual_growth
</pre>
<p>So for example if we start with this example&#8230;</p>
<table border="0" frame="VOID" rules="NONE" cellspacing="0">
<colgroup>
<col width="86" />
<col width="86" />
<col width="160" /></colgroup>
<tbody>
<tr>
<td align="LEFT" width="86" height="17">week0</td>
<td align="RIGHT" width="86">100</td>
<td align="LEFT" width="160"></td>
</tr>
<tr>
<td align="LEFT" height="17">week1</td>
<td align="RIGHT">105</td>
<td align="LEFT"></td>
</tr>
<tr>
<td align="LEFT" height="17">week2</td>
<td align="RIGHT">120</td>
<td align="LEFT"></td>
</tr>
<tr>
<td align="LEFT" height="17">week3</td>
<td align="RIGHT">122</td>
<td align="LEFT"></td>
</tr>
<tr>
<td align="LEFT" height="17">week4</td>
<td align="RIGHT">122</td>
<td align="LEFT"></td>
</tr>
<tr>
<td align="LEFT" height="17">week5</td>
<td align="RIGHT">118</td>
<td align="LEFT">data expired</td>
</tr>
<tr>
<td align="LEFT" height="17">week6</td>
<td align="RIGHT">130</td>
<td align="LEFT">added new servers</td>
</tr>
<tr>
<td align="LEFT" height="17">week7</td>
<td align="RIGHT">135</td>
<td align="LEFT"></td>
</tr>
<tr>
<td align="LEFT" height="17">week8</td>
<td align="RIGHT">136</td>
<td align="LEFT"></td>
</tr>
</tbody>
</table>
<p>Notice that in this example we have a couple of points where I saw it fit to make notes.  One was when our dataset decreased and the other was when it grew sharper than the trend up to that point.  I made these notes because I wanted to point out that if you had a very large deviation then you might want to make some sort of adjustment so as to not skew your results.  In this case both of these happened because of regular course of business so they should be legitimately considered as part of the growth curve.  However if you had some sort of failure in your backups that you were already accounting for that data in another way you would not want to double count it if it came back into the backups in the middle of your sample.</p>
<p><em>Weekly Growth</em></p>
<pre class="qoate-code">
week1 (105 - 100) = 5
week2 (120 - 105) = 15
week3 (122 - 120) = 2
week4 (122 - 122) = 0
week5 (118 - 122) = -4
week6 (130 - 118) = 12
week7 (135 - 130) = 5
week8 (136 - 135) = 1
</pre>
<p><em>Weekly Growth Rate</em></p>
<pre class="qoate-code">
week1 (5 / 100) = .05 = 5%
week2 (15 / 105) = .143 = 14.3%
week3 (2 / 120) = .017 = 1.7%
week4 (0 / 122) = 0 = 0%
week5 (-4 / 122) = -.033 = -3.3%
week6 (12 / 118) = .102 = 10.2%
week7 (5 / 130) = .039 = 3.9%
week8 (1 / 135) = .007 = .7%
</pre>
<p><em>Average Weekly Growth</em></p>
<pre class="qoate-code">
(5 + 15 + 2 + 0 + -4 + 12 + 5 + 1) / 8 = 4.5
</pre>
<p><em>Average Weekly Growth Rate</em></p>
<pre class="qoate-code">
(5 + 14.3 + 1.7 + 0 + -3.3 + 10.2 + 3.9 + .7) / 8 = 4.1%
</pre>
<p><em>Annualized Growth</em></p>
<pre class="qoate-code">
(4.5 * 53) = 238.5
</pre>
<p>Now as you can see from this example the numbers add up quickly, in a year we have easily doubled our dataset, now why is this important?  Basically most businesses perform budgeting at the beginning of the year and as such expenses need to be planned out, and any expansion that you do perform will need to _at least_ last through the year, either way you at least need to know how long you can reasonably expect to be able to use this hardware in its capacity before having to look for upgrades.  Your project will be generally regarded as successful if it meets the technical requirements with minimal involvement from the business (having to ask for more money), especially if you can tell them that this storage will need to be expanded in x number of months.</p>
<p><strong>Factor Three &#8211; Data Churn</strong></p>
<p>Churn is freaking scary when it comes to calculating capacity requirements.  Churn is the amount of data change, now some churn is actually growth so we will want to keep that in mind when we are performing our analysis, however churn is the amount of data that changes in a given week. It is amazing the levels of churn that some companies have.  This is especially disconcerting if you plan on utilizing snapshots or send/receive as a form of backup.  Now we still use backups to calculate churn, however instead we will use our mid-week backups instead of our fulls.  Now if you are using incremental backups then you will total all of those during the week to get your dataset size.  If you are using differentials you can calculate this using the final differential in the week.</p>
<p>Now the tricky thing about churn is that growth is churn, but churn is not growth.  So after we calculate our growth and our churn we will subtract our growth from our churn to get our actual churn, otherwise we will be double counting our growth.  I don&#8217;t bother doing this until we are talking about the averages and the annualized numbers.  You will notice that these formulas are largely the same as the growth formulas, just remember her that you need to calculate these against your aggregated incrementals or your final differential to get valid numbers.</p>
<p><em>Calculate Weekly Churn (differentials)<br />
</em></p>
<pre class="qoate-code">
diff_week1 = churn_week1
</pre>
<p><em>Calculate Weekly Churn (incrementals)</em></p>
<pre class="qoate-code">
(inc1_week1 + inc2_week1 + inc3_week1 + inc4_week1 + inc5_week1) = churn_week1
</pre>
<p><em>Calculate Weekly Churn Rate</em></p>
<pre class="qoate-code">
(churn_week1 / full_week0) = churn_rate_week1
</pre>
<p><em>Average Weekly Churn</em></p>
<pre class="qoate-code">
(churn_week1 + churn_week2 + churn_week3 + churn_week4) / number_of_weeks = average_weekly_churn
</pre>
<p><em>Average Weekly Churn Rate</em></p>
<pre class="qoate-code">
(churn_rate_week1 + churn_rate_week2 + churn_rate_week3 + churn_rate_week4) / number_of_weeks = average_weekly_churn_rate
</pre>
<p><em>Annualized Weekly Churn</em></p>
<pre class="qoate-code">
(average_churn_growth * 53) = annual_churn
</pre>
<p>Expanding on our previous example, if we had churn rates that looked like this&#8230;</p>
<table border="0" frame="VOID" rules="NONE" cellspacing="0">
<colgroup>
<col width="86" />
<col width="86" /></colgroup>
<tbody>
<tr>
<td align="LEFT" width="86" height="17">week1</td>
<td align="RIGHT" width="86">8</td>
</tr>
<tr>
<td align="LEFT" height="17">week2</td>
<td align="RIGHT">6</td>
</tr>
<tr>
<td align="LEFT" height="17">week3</td>
<td align="RIGHT">5</td>
</tr>
<tr>
<td align="LEFT" height="17">week4</td>
<td align="RIGHT">2</td>
</tr>
<tr>
<td align="LEFT" height="17">week5</td>
<td align="RIGHT">2</td>
</tr>
<tr>
<td align="LEFT" height="17">week6</td>
<td align="RIGHT">15</td>
</tr>
<tr>
<td align="LEFT" height="17">week7</td>
<td align="RIGHT">6</td>
</tr>
<tr>
<td align="LEFT" height="17">week8</td>
<td align="RIGHT">1</td>
</tr>
</tbody>
</table>
<p><em>Weekly Churn Rate</em></p>
<pre class="qoate-code">
week1 (8 / 100) = .08 = 8%
week2 (6 / 105) = .057 = 5.7%
week3 (5 / 120) = .042 = 4.2%
week4 (2 / 122) = .016 = 1.6%
week5 (2 / 122) = .016 = 1.6%
week6 (15 / 118) = .127 = 12.7%
week7 (6 / 130) = .046 = 4.6%
week8 (1 / 135) = .007 = .7%
</pre>
<p><em>Average Weekly Churn</em></p>
<pre class="qoate-code">
(8 + 6 + 5 + 2 + 2 + 15 + 6 + 1) / 8 = 5.6
</pre>
<p>Now this is only partly correct, we still need to account for the growth we have already included  by subtracting the average growth from our average churn.</p>
<pre class="qoate-code">
(5.6 - 4.5) = 1.1
</pre>
<p>Giving us a growth-adjusted weekly average churn of 1.1.</p>
<p><em>Average Weekly Churn Rate</em></p>
<pre class="qoate-code">
(8 + 5.7 + 4.2 + 1.6 + 1.6 + 12.7 + 4.6 + .7) / 8 = 4.9%
</pre>
<p>Again we must remove our growth.</p>
<pre class="qoate-code">
(4.9 - 4.5) = 0.4
</pre>
<p>Giving us a growth-adjusted weekly average churn of 0.4%</p>
<p><em>Annualized Churn</em></p>
<pre class="qoate-code">
(1.1 * 53) = 58.3
</pre>
<p>Now that we have calculated our storage needs we can start to work out our final configurations based on the goals of our project.  In our example case we have learned that based on our current dataset (136) and our annualized growth (58.3), which means that in 1 year our dataset will increase by 30% which means that if we had a goal of engineering a system which would not need any upgrades (based on current use cases) in the first two years then you would need a minimum size of 252.6.</p>
<p>Now one final consideration when sizing ZFS is pool capacity, ZFS uses copy on write to turn random writes into sequential writes, this is very good for performance, however when a pool exceeds 80% ZFS will not be able to do this as well and as such your writes will become semi-random (since parts of files will have to be written in non-sequential blocks that happen to be free).  So we should also make sure that we have a 20% ceiling on our projections so that we can ensure the same level of performance throughout the systems life.  And with that we have a final number of 303.1.</p>
<p>Now please keep in mind, these formulas will allow you to work through your own sizing exercise, but I am not suggesting that you have a growth rate of 4.5% and a churn rate of 1.1% this was merely an example, you will need to use your own numbers to come up with projections which are applicable to your scenario.  Also you will notice that in my example I did not use a size metric (MB, GB, TB, PB) I did this intentionally so as to not confuse you these formulas will work regardless of your metric, just ensure you use the same metric in all of your calculations.</p>
<p>Happy Sizing!</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.allanglesit.com/2012/02/adventures-in-zfs-storage-sizing/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Securing SSH with Publicly Accessible Servers</title>
		<link>http://blog.allanglesit.com/2012/02/securing-ssh-with-publicly-accessible-servers/</link>
		<comments>http://blog.allanglesit.com/2012/02/securing-ssh-with-publicly-accessible-servers/#comments</comments>
		<pubDate>Thu, 16 Feb 2012 12:00:17 +0000</pubDate>
		<dc:creator>matthew.mattoon</dc:creator>
				<category><![CDATA[How To]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Solaris]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[ssh]]></category>
		<category><![CDATA[sshkeys]]></category>
		<category><![CDATA[ubuntu]]></category>

		<guid isPermaLink="false">http://blog.allanglesit.com/?p=851</guid>
		<description><![CDATA[Just about every IT environment has some sort of remotely managed environment which requires that they have SSH open to the Internet.  Perhaps this is a VPS, dedicated server, or colocation.  Regardless of your reason, the fact is that there is just some times where you need to have SSH open to the internet. However [...]]]></description>
			<content:encoded><![CDATA[<p>Just about every IT environment has some sort of remotely managed environment which requires that they have SSH open to the Internet.  Perhaps this is a VPS, dedicated server, or colocation.  Regardless of your reason, the fact is that there is just some times where you need to have SSH open to the internet.</p>
<p>However regardless of if it is necessary doesn&#8217;t mean that you should just do it&#8230;  One look at your auth.log will reveal that.</p>
<pre class="qoate-code"> # tail /var/log/auth.log
Feb  5 21:12:32 mail sshd[3105]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=69-214-200-46.pool.ukrtel.net
Feb  5 21:12:34 mail sshd[3105]: Failed password for invalid user admin from 46.200.214.69 port 1714 ssh2
Feb  5 21:12:35 mail sshd[3107]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=69-214-200-46.pool.ukrtel.net  user=root
Feb  5 21:12:38 mail sshd[3107]: Failed password for root from 46.200.214.69 port 1817 ssh2</pre>
<p>The fact of that matter is that once you have SSH open folks will try and hit it and brute force there way in.  Now there are many ways around this.</p>
<p>1) Move SSH to a non-standard port number.<br />
2) Disable Root Logins over SSH and use non-standard usernames.<br />
3) Use fail2ban to proactively disconnect users who are attempting to brute force your server.<br />
4) Use SSH keys to secure your logins, and disable all password authentication.</p>
<p>Now options 1 and 2 are really just garbage.  They don&#8217;t actually do anything with regards to security, they simply obfuscate your environment in the hopes that your attacker will give up and go home.  Option 3 is good, and option 4 is a sledge hammer which is crude in its implementation.</p>
<p>Instead I will be implementing a modified version of option 4.  What we will be doing is allowing Public Key authentication from the entire internet while allowing Password authentication from trusted IP space, this can be an entire IP block, or single IPs.  This relies on the match directive in ssh, so please make sure your version of ssh supports this before attempting.</p>
<p><strong>Configure SSH Keys</strong></p>
<pre class="qoate-code"># ssh-keygen -t rsa</pre>
<p><strong>Copy SSH Keys to your Server</strong></p>
<pre class="qoate-code"># ssh-copy-id root@yourserver.domain</pre>
<p><strong>Validate SSH Key Authentication</strong></p>
<p>If you did it properly then you will not be asked for a password.</p>
<pre class="qoate-code"># ssh root@yourserver.domain</pre>
<p><strong>Secure SSH to Only Allow Password Auth from Trusted Networks</strong></p>
<p>Before you mess with this part ensure you have an alternate way of getting into the system in case you make a mistake which keeps you from using SSH.</p>
<p>In the /etc/ssh/sshd_config disable Password Authentication</p>
<pre class="qoate-code"># cat /etc/ssh/sshd_config
...
PasswordAuthentication no
...</pre>
<p>Then add the following to the end of the file, where x.x.x.x and y.y.y.y are your trusted IP addresses, and /32 is used to represent a single IP address.</p>
<pre class="qoate-code"># cat /etc/ssh/sshd_config
...
Match Address x.x.x.x/24,y.y.y.y/32
Password Authentication yes</pre>
<p><strong>Validate your Configuration</strong></p>
<p>Once everything is configured you can restart ssh and test.</p>
<pre class="qoate-code"># /etc/init.d/ssh restart</pre>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.allanglesit.com/2012/02/securing-ssh-with-publicly-accessible-servers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>vCenter: VPXD Crash</title>
		<link>http://blog.allanglesit.com/2012/02/vcenter-vpxd-crash/</link>
		<comments>http://blog.allanglesit.com/2012/02/vcenter-vpxd-crash/#comments</comments>
		<pubDate>Wed, 15 Feb 2012 12:00:13 +0000</pubDate>
		<dc:creator>matthew.mattoon</dc:creator>
				<category><![CDATA[How To]]></category>
		<category><![CDATA[VMWare]]></category>
		<category><![CDATA[db2]]></category>
		<category><![CDATA[vcenter]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[vmware]]></category>

		<guid isPermaLink="false">http://blog.allanglesit.com/?p=837</guid>
		<description><![CDATA[Recently I was trying to resolve an issue on a small vCenter 5.0.0 deployment.  Basically the primary symptom of the issue was that the service the vmware-vxpd service was crashing and needing to be restarted frequently. The first thing I did was examine the log files, while the service was still in a failed state. [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I was trying to resolve an issue on a small vCenter 5.0.0 deployment.  Basically the primary symptom of the issue was that the service the vmware-vxpd service was crashing and needing to be restarted frequently.</p>
<p>The first thing I did was examine the log files, while the service was still in a failed state.</p>
<pre class="qoate-code"># cat /var/log/vmware/vpx/vpxd.log</pre>
<p>I won&#8217;t bore you with an entire excerpt, however I will put some of the more helpful pieces of data from the log file here, perhaps it will help someone identify this article if they are having a similar issue but not really gaining any traction.</p>
<pre class="qoate-code">Unable to get exclusive access to vCenter repository.
Error deleting from VPX_SESSIONL
Alert:false@ /build/mts/release/bora-455964/bora/vpx/vpxd/util/vpxdVdb.cpp:408
Registry Item DB 5 value is ''
Failed to intialize VMware VirtualCenter. Shutting down...</pre>
<p>Now this environment is based off of SLES 11 SP1 with an embedded DB2 database.  This might affect other configurations, but I don&#8217;t know.  So use at your own risk.</p>
<p>Here is the VMWare KB which &#8220;describes&#8221; the fix&#8230;  But it is not very verbose.</p>
<p>http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&#038;cmd=displayKC&#038;externalId=1021581</p>
<p>Here is the step-by-step.</p>
<p>Please keep in mind, this article contains two fixes for two similarly logged but different problems.  Fix one should only work if your service crashes and is unable to restart.  Fix two is actually what solved the problem that we were experiencing.</p>
<p><strong>FIX ONE</strong> &#8211; Which did not work for me.</p>
<p><strong>Gather the Database Connection Information</strong></p>
<pre class="qoate-code"># cat /etc/vmware-vpx/embedded_db.cfg
EMB_DB_INSTALL_DIR='/opt/db2/current'
EMB_DB_HOME='/opt/db2/home/'
EMB_DB_TYPE='db2'
EMB_DB_SERVER='127.0.0.1'
EMB_DB_PORT='50000'
EMB_DB_INSTANCE='VCDB'
EMB_DB_USER='vc'
EMB_DB_PASSWORD='YOURPASSWORDHERE'</pre>
<p><strong>Stop the Service</strong></p>
<pre class="qoate-code"># /etc/init.d/vmware-vpxd stop
Stopping VMware vSphere Profile-Driven Storage Service...
Stopped VMware vSphere Profile-Driven Storage Service.
Stopping VMware Inventory Service...
Stopped VMware Inventory Service.
Stopping tomcat: success
Stopping vmware-vpxd: success
Shutting down ldap-server..done</pre>
<p><strong>Connect to DB2 Database</strong></p>
<pre class="qoate-code"># /opt/db2/v9.7.2/bin/db2
(c) Copyright IBM Corporation 1993,2007
Command Line Processor for DB2 Client 9.7.2

db2 =&gt; connect to vcdb user vc using YOURPASSWORDHERE

Database Connection Information

Database server        = DB2/LINUXX8664 9.7.2
SQL authorization ID   = VC
Local database alias   = VCDB</pre>
<p><strong>List Tables to Test Database Connection</strong></p>
<pre class="qoate-code">db2 =&gt; list tables
...
VPX_SESSIONLOCK                 VC              T     2011-11-16-17.28.59.418535
...
287 record(s) selected.</pre>
<p><strong>Delete All Rows from VPX_SESSIONLOCK Table</strong></p>
<pre class="qoate-code">db2 =&gt; delete from vpx_sessionlock
DB20000I  The SQL command completed successfully.</pre>
<p><strong>Quit and Restart Service</strong></p>
<pre class="qoate-code">db2 =&gt; quit
# /etc/init.d/vmware-vpxd start
Waiting for embedded DB2 database to startup: .success Cleaning session lock table: success Verifying EULA acceptance: success Starting ldap-server..done Starting vmware-vpxd: success Waiting for vpxd to initialize: ....success Starting tomcat: success Executing startup scripts...
Starting VMware Inventory Service...Waiting for VMware Inventory Service........................
VMware Inventory Service started.

Starting VMware vSphere Profile-Driven Storage Service...Waiting for VMware vSphere Profile-Driven Storage Service......
VMware vSphere Profile-Driven Storage Service started.</pre>
<p><strong> FIX TWO</strong> &#8211; Which did work for me.</p>
<p>Apparently the problem here is that db2 is configured with a transaction log which is too small, which is resulting in the service crashing.  Now according to VMWare&#8217;s KB [http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&amp;cmd=displayKC&amp;externalId=2006812] you should expect errors indicating that the transaction log is full, however we did not see these at all.  So we really went out on a limb simply because logically transaction logs made sense.  Also VMWare recommends the sizes that we are going to be configuring the logs to in these steps, so it was relatively low risk.  Also keep in mind we have 3 esxi hosts and maybe 30 guests, so this can happen regardless of your size.</p>
<p><strong>Stop the Service</strong></p>
<pre class="qoate-code"># service vmware-vpxd stop</pre>
<p><strong>Change to db2inst1 and Connect to Database</strong></p>
<pre class="qoate-code"># su db2inst1
db2inst1@vcenter00:~&gt;db2 connect to vcdb</pre>
<p><strong>Retreive Current Database Configuration</strong></p>
<pre class="qoate-code">db2inst1@vcenter00:~&gt; db2 get database config for vcdb | grep LOG
Catalog cache size (4KB)              (CATALOGCACHE_SZ) = 300
Log buffer size (4KB)                        (LOGBUFSZ) = 256
Log file size (4KB)                         (LOGFILSIZ) = 1024
Number of primary log files                (LOGPRIMARY) = 13
Number of secondary log files               (LOGSECOND) = 4
Changed path to log files                  (NEWLOGPATH) =
Path to log files                                       = /storage/db/db2/home/db2inst1/db2inst1/NODE0000/SQL00001/SQLOGDIR/
Overflow log path                     (OVERFLOWLOGPATH) =
Mirror log path                         (MIRRORLOGPATH) =
Block log on disk full                (BLK_LOG_DSK_FUL) = NO
Block non logged operations            (BLOCKNONLOGGED) = NO
Percent max primary log space by transaction  (MAX_LOG) = 0
Num. of active log files for 1 active UOW(NUM_LOG_SPAN) = 0
Log retain for recovery enabled             (LOGRETAIN) = OFF
First log archive method                 (LOGARCHMETH1) = OFF
Options for logarchmeth1                  (LOGARCHOPT1) =
Second log archive method                (LOGARCHMETH2) = OFF
Options for logarchmeth2                  (LOGARCHOPT2) =
Log pages during index build            (LOGINDEXBUILD) = OFF</pre>
<p><strong>Update Database Configuration</strong></p>
<pre class="qoate-code">db2inst1@vcenter00:~&gt; db2 update db CFG FOR VCDB USING logprimary 16 logsecond 112 logfilsiz 8192
DB20000I  The UPDATE DATABASE CONFIGURATION command completed successfully.
SQL1363W  One or more of the parameters submitted for immediate modification
were not changed dynamically. For these configuration parameters, all
applications must disconnect from this database before the changes become
effective.</pre>
<p><strong>Validate Database Configuration</strong></p>
<pre class="qoate-code">db2inst1@vcenter00:~&gt; db2 get database config for vcdb | grep LOG
Catalog cache size (4KB)              (CATALOGCACHE_SZ) = 300
Log buffer size (4KB)                        (LOGBUFSZ) = 256
Log file size (4KB)                         (LOGFILSIZ) = 8192
Number of primary log files                (LOGPRIMARY) = 16
Number of secondary log files               (LOGSECOND) = 112
Changed path to log files                  (NEWLOGPATH) =
Path to log files                                       = /storage/db/db2/home/db2inst1/db2inst1/NODE0000/SQL00001/SQLOGDIR/
Overflow log path                     (OVERFLOWLOGPATH) =
Mirror log path                         (MIRRORLOGPATH) =
Block log on disk full                (BLK_LOG_DSK_FUL) = NO
Block non logged operations            (BLOCKNONLOGGED) = NO
Percent max primary log space by transaction  (MAX_LOG) = 0
Num. of active log files for 1 active UOW(NUM_LOG_SPAN) = 0
Log retain for recovery enabled             (LOGRETAIN) = OFF
First log archive method                 (LOGARCHMETH1) = OFF
Options for logarchmeth1                  (LOGARCHOPT1) =
Second log archive method                (LOGARCHMETH2) = OFF
Options for logarchmeth2                  (LOGARCHOPT2) =
Log pages during index build            (LOGINDEXBUILD) = OFF</pre>
<p><strong>Exit and Restart Service</strong></p>
<pre class="qoate-code">db2inst1@vcenter00:~&gt; exit
# service vmware-vpxd start</pre>
<p>So there you go.  Hopefully one of these methods will help you resolve this issue in your environment.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.allanglesit.com/2012/02/vcenter-vpxd-crash/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>KVM Guests: Graceful Shutdown</title>
		<link>http://blog.allanglesit.com/2012/02/kvm-guests-graceful-shutdown/</link>
		<comments>http://blog.allanglesit.com/2012/02/kvm-guests-graceful-shutdown/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 12:00:29 +0000</pubDate>
		<dc:creator>matthew.mattoon</dc:creator>
				<category><![CDATA[How To]]></category>
		<category><![CDATA[Linux-KVM]]></category>
		<category><![CDATA[Windows]]></category>
		<category><![CDATA[kvm]]></category>
		<category><![CDATA[kvm guests]]></category>
		<category><![CDATA[libvirt]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[shutdown]]></category>
		<category><![CDATA[virsh]]></category>
		<category><![CDATA[virtualization]]></category>
		<category><![CDATA[windows]]></category>

		<guid isPermaLink="false">http://blog.allanglesit.com/?p=819</guid>
		<description><![CDATA[So a fairly trivial (but critical) aspect of using KVM is of course performing graceful shutdowns of your domains, without having to reach inside of the guest to perform the shutdown.  Now when it comes to turning off your guests you have two ways of proceeding (with virsh)&#8230; Which is a HARD power off of [...]]]></description>
			<content:encoded><![CDATA[<p>So a fairly trivial (but critical) aspect of using KVM is of course performing graceful shutdowns of your domains, without having to reach inside of the guest to perform the shutdown.  Now when it comes to turning off your guests you have two ways of proceeding (with virsh)&#8230;</p>
<pre class="qoate-code"># virsh destroy vmname</pre>
<p>Which is a HARD power off of your guest.  This is the same as pulling the power cord from the wall, nothing is shutdown, but the noise stops just the same.  This method can result in data corruption if writes are interrupted.</p>
<pre class="qoate-code"># virsh shutdown vmname</pre>
<p>Now in theory this will perform a graceful shutdown, or more importantly it sends a signal to the guest which results in the guest initiating a shutdown.  However this is not guaranteed, if your virtual machine configuration or guest OS is not configured to acknowledge or even receive these signals then nothing will happen.</p>
<p><strong>Enable ACPI on the Virtual Hardware</strong></p>
<p>Now the signal that is sent is an ACPI signal, this means that we need to configure the virtual machine to have acpi on the virtual processor.  This is accomplished by adding &lt;acpi/&gt; inside the features section of the virtual machines XML file.</p>
<p>This will print just the features block out of the XML file, it will not make any changes.</p>
<pre class="qoate-code"># awk '/&lt;features&gt;/,/&lt;\/features&gt;/' /etc/libvirt/qemu/vmname.xml
&lt;features&gt;
&lt;acpi/&gt;
&lt;apic/&gt;
&lt;pae/&gt;
&lt;/features&gt;</pre>
<p><strong>Configure the Guest Operating System (Linux)</strong></p>
<p>Linux is pretty simple, just make sure that acpid is installed.</p>
<p>Using APT</p>
<pre class="qoate-code"># apt-get install acpid</pre>
<p>Using YUM</p>
<pre class="qoate-code"># yum install acpid</pre>
<p><strong>Configure the Guest Operating System (Windows)</strong></p>
<p>Post Server 2000 Microsoft decided that by default servers should not respond to any idiot in a Datacenter with at least one finger.  Which is not an unreasonable assumption.  This had the unintended consequence of hamstringing virtualization, since obviously virtual machines don&#8217;t have the weakness of a power button which can be &#8220;bumped&#8221; by a one-fingered idiot.</p>
<p>I have seen different behavior based on the version of the OS you use, as I assume Microsoft has tried to make the behavior a bit more friendly and workable.  Basically as I have noticed.  Windows 2003 has to be logged on fully to have a virsh shutdown be successful.  Windows 2008 has to have the console open and the user has to have pressed CTRL-ALT-DEL, but not necessarily logged in.  Windows 2008 R2 simply needs the console open.</p>
<p>Regardless if you&#8217;d like to disable this protection on your VMs there is a registry change which can be made, of course this can also be made via a group policy.</p>
<pre class="qoate-code">HKLM\Software\Microsoft\Windows\CurrentVersion\Policies\System\Shutdownwithoutlogon

0=disabled
1=enabled</pre>
<p>This of course needs to be done on all of your Windows guests.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.allanglesit.com/2012/02/kvm-guests-graceful-shutdown/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Changing Face of Enterprise Support</title>
		<link>http://blog.allanglesit.com/2012/02/the-changing-face-of-enterprise-support/</link>
		<comments>http://blog.allanglesit.com/2012/02/the-changing-face-of-enterprise-support/#comments</comments>
		<pubDate>Fri, 03 Feb 2012 12:00:58 +0000</pubDate>
		<dc:creator>matthew.mattoon</dc:creator>
				<category><![CDATA[General Information]]></category>
		<category><![CDATA[Rants]]></category>
		<category><![CDATA[business practices]]></category>
		<category><![CDATA[customer care]]></category>
		<category><![CDATA[enterprise support]]></category>

		<guid isPermaLink="false">http://blog.allanglesit.com/?p=786</guid>
		<description><![CDATA[Now let me preface this post by saying&#8230; I am writing this article from my professional experiences, inside of a software company which provided enterprise support for its products, as well as from the perspective of the Enterprise Customer.  I am intentionally discarding all experiences with support as a home user, and even as a [...]]]></description>
			<content:encoded><![CDATA[<p>Now let me preface this post by saying&#8230;</p>
<p>I am writing this article from my professional experiences, inside of a software company which provided enterprise support for its products, as well as from the perspective of the Enterprise Customer.  I am intentionally discarding all experiences with support as a home user, and even as a small business owner.  This is specific to Enterprise Support, consumer support plays by different rules.</p>
<p>&nbsp;</p>
<p><strong>Value Your Customers Time And Investments</strong></p>
<p>Your customers have invested in your products, once a customer buys a product they are no longer customers, they are partners.  Your partners are extremely intelligent people who chose your product because of that aforementioned intelligence (if this is not true then why do you sell it?).  With regards to their investments your partner&#8217;s employees are also an investment, and if you waste their employee&#8217;s time, you are robbing your partner of productivity which will <strong>NEVER</strong> be regained.</p>
<p>This is real money folks.  If your partner has someone making $50/hour on a phone call with support and they spend 1 hour trying to resolve the issue this means that not only is your customer losing the $50 that they are paying them, but there is another invisible $50 which is lost due to the lost productivity.  Your customer is willing to deal with this <strong>IF</strong> you keep the waste to a minimum.</p>
<p><strong>Don&#8217;t Make Support A Profit Center</strong></p>
<p>Support should never be a profit center, once an organization switches support from being a cost center (something that costs the company money) to a profit center (something that makes the company money) the company will start to make decisions which will increase profitability and alienate customers.  This will result in more marketing dollars being spent to neutralize the ill-effects of your support organization.  Don&#8217;t misunderstand me, I don&#8217;t have a problem with paying for support, and I am not advocating that it be free or even cheap.  Ultimately the support organization should <strong>NEARLY</strong> pay for itself (unless of course there is some large recall required), what should not happen <strong>EVER</strong>, is customers should never be pressured to put a failed drive back in a system to see if it would rebuild.  This sort of thing might be common practice, but it shows the customer that we don&#8217;t value their data and their business.</p>
<p><strong>Market The Organization Not The Product</strong></p>
<p>Products come and products go&#8230;  A good support organization will actually pull in its own Enterprise Customers, I would submit that a portion of your marketing dollars could actually be spent on making critical changes in your support organization.  This approach is invaluable when your organization is small and flexible, and the products down sell themselves.</p>
<p>Once the changes are made and you have a phenomenal support organization then you should spend a little bit of your marketing budget on advertising your support organization, since from an Enterprise Customer perspective it is all about risk mitigation and cost containment.  Highlight this, show the world the changes that you have made.</p>
<p><strong>Build Your Organization For Success</strong></p>
<p>If you are a US-based company, you had better make sure that if you use any overseas support that they speak english really well (this also goes for really deep accents from anywhere).  I don&#8217;t have a problem if the support center is in India or Mexico or Singapore or wherever,  but if we cannot communicate clearly then forget it.  It won&#8217;t matter how good the support was all I will remember is straining to understand what you are saying.  Remember that when I call you, I am already having a bad day at best, don&#8217;t make it worse.  Your employees should always provide attentive customer service, it should be prompt and respectful.  However they should also be technically adept and there should be a clear and logical path of escalations.  If someone needs a script then let them go work for your competitor, your organization will be much better off.</p>
<p>&nbsp;</p>
<p>These four components will create a support organization which will actually help sell your product, instead of giving customers a reason to complain about your company.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.allanglesit.com/2012/02/the-changing-face-of-enterprise-support/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
<!-- This Quick Cache file was built for (  blog.allanglesit.com/feed/ ) in 2.33173 seconds, on Feb 23rd, 2012 at 12:32 am UTC. -->
<!-- This Quick Cache file will automatically expire ( and be re-built automatically ) on Feb 23rd, 2012 at 1:32 am UTC -->
<!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
<!-- Quick Cache Is Fully Functional :-) ... A Quick Cache file was just served for (  blog.allanglesit.com/feed/ ) in 0.00125 seconds, on Feb 23rd, 2012 at 1:11 am UTC. -->
