March 5th, 2012 | Tags: , , , , , ,

As one of our final validation points in an ongoing project has been enabling ESX to use a LAG on both the Management and VM Networks.  The last time we did this we were unable to get it to work, and it turns out the solution was actually very simple.  Basically the core of the issue was that we were attempting to use Dynamic LACP which ESX does not support, instead you are to use Static LACP.

Create New LAG

(switch01)# configure
(switch01) (Config)# port-channel esxhost00-vmnet
(switch01) (Config)# exit

Show New LAG

(switch01) #show port-channel

Logical Interface Group Id Port-Channel Name Link State Mbr Ports Active Ports
----------------- -------- ----------------- ---------- --------- ------------
0/1/11            11       esxhost00-vmnet   Down       

The above command gives us the Logical Interface, which we will need.  But if we look at it with a different command we can see the problem.

(switch01) #show port-channel all

Port-                  Link
Log.       Channel            Adm. Trap  STP           Mbr      Port    Port
Intf        Name        Link  Mode Mode  Mode   Type   Ports    Speed   Active
------ --------------- ------ ---- ---- ------ ------- ------ --------- ------
lag 11 esxhost00-vmnet Up     En.  En.  En.    Dynamic

The problem here is that by default we create LAGs in dynamic mode.  ESX requires static mode.

Set the LACP Mode as Static

(switch01)# configure
(switch01) (Config)#interface 0/1/11
(switch01) (Interface 0/1/11)#port-channel static
(switch01) (Interface 0/1/11)#exit

Down the Switchport for the Standby Vmnic

(switch01)# configure
(switch01) (Config)#interface 2/0/2
(switch01) (Interface 2/0/2)#shutdown
(switch01) (Interface 2/0/2)#exit

Review Existing ESX Portgroup Configuration

# esxcli network vswitch standard portgroup policy failover get --portgroup-name="VM Network"
Load Balancing: srcport
Network Failure Detection: link
Notify Switches: true
Failback: true
Active Adapters: vmnic4
Standby Adapters: vmnic6
Unused Adapters:
Override Vswitch Load Balancing: false
Override Vswitch Network Failure Detection: false
Override Vswitch Notify Switches: false
Override Vswitch Failback: false
Override Vswitch Uplinks: true
# esxcli network vswitch standard portgroup policy failover get --portgroup-name="Management Network"
Load Balancing: srcport
Network Failure Detection: link
Notify Switches: true
Failback: true
Active Adapters: vmnic4
Standby Adapters: vmnic6
Unused Adapters:
Override Vswitch Load Balancing: false
Override Vswitch Network Failure Detection: false
Override Vswitch Notify Switches: false
Override Vswitch Failback: false
Override Vswitch Uplinks: true

Change Existing ESX Portgroup Configuration

# esxcli network vswitch standard portgroup policy failover set --portgroup-name="VM Network" --load-balancing=iphash --failure-detection=link --notify-switches true --failback false --active-uplinks=vmnic4,vmnic6
# esxcli network vswitch standard portgroup policy failover set --portgroup-name="Management Network" --load-balancing=iphash --failure-detection=link --notify-switches true --failback false --active-uplinks=vmnic4,vmnic6

Review New ESX Portgroup Configuration

# esxcli network vswitch standard portgroup policy failover get --portgroup-name="VM Network"
Load Balancing: iphash
Network Failure Detection: link
Notify Switches: true
Failback: false
Active Adapters: vmnic4, vmnic6
Standby Adapters:
Unused Adapters:
Override Vswitch Load Balancing: true
Override Vswitch Network Failure Detection: true
Override Vswitch Notify Switches: true
Override Vswitch Failback: true
Override Vswitch Uplinks: true
# esxcli network vswitch standard portgroup policy failover get --portgroup-name="Management Network"
Load Balancing: iphash
Network Failure Detection: link
Notify Switches: true
Failback: false
Active Adapters: vmnic4, vmnic6
Standby Adapters:
Unused Adapters:
Override Vswitch Load Balancing: true
Override Vswitch Network Failure Detection: true
Override Vswitch Notify Switches: true
Override Vswitch Failback: true
Override Vswitch Uplinks: true

Add Both Switchports to the LAG

(switch01)# configure
(switch01) (Config)#interface 2/0/3
(switch01) (Interface 2/0/3)#addport 0/1/11
(switch01) (Interface 2/0/3)#exit
(switch01) (Config)#interface 2/0/2
(switch01) (Interface 2/0/2)#addport 0/1/11
(switch01) (Interface 2/0/2)#exit
(switch01) (Config)#exit

Review LAG Configuration

(switch01) #show port-channel all

Port-                  Link
Log.       Channel            Adm. Trap  STP           Mbr      Port    Port
Intf        Name        Link  Mode Mode  Mode   Type   Ports    Speed   Active
------ --------------- ------ ---- ---- ------ ------- ------ --------- ------
lag 11 esxhost00-vmnet Up     En.  En.  En.    Static  2/0/3 Auto      True
2/0/2 Auto      False 

Verify ESX Connectivity

So now that we have a LAG configured and operational albeit only on one port, the traffic will either work or not.  So if you have connectivity then you should be able to safely enable the disabled port (in my case 2/0/2) and not have any problems with your traffic.  Now if you do have problems with connectivity, then you probably have a problem with something like VLANs not being configured correctly on the LAG, we ran into that when we put this into production.

Up the Switchport for the Standby Vmnic

(switch01) #configure
(switch01) (Config)#interface 2/0/2
(switch01) (Interface 2/0/2)#no shutdown
(switch01) (Interface 2/0/2)#exit
(switch01) (Config)#exit

Review Final LAG Configuration

(switch01) #show port-channel all

Port-                  Link
Log.       Channel            Adm. Trap  STP           Mbr      Port    Port
Intf        Name        Link  Mode Mode  Mode   Type   Ports    Speed   Active
------ --------------- ------ ---- ---- ------ ------- ------ --------- ------
lag 11 vdihost00-vmnet Up     En.  En.  En.    Static  2/0/3 Auto      True
2/0/2 Auto      True   

Now notice here that we are showing the LAGs interface up with both ports Active, so assuming that you still have basic connectivity to each of your vswitches then you should be good to go.

Comments Off
February 29th, 2012 | Tags: , , , , ,

We use a couple of Dell PowerConnect 6224 switches in our storage network, however when they were first deployed we ran into an issue with the stacking modules. Basically when you brought up the switches they would both see themselves as master, and would not “stack” correctly. Turns out the solution was rather easy but not well documented.  Also as a side note, I have a friend with a couple of PowerConnect 6248 switches where he had the same issue.  So it appears this issue spans the whole family at least.

The core of this issue is that in the previous generation there was a 10g module and a separate stacking module.  With the current version they decided to simply have one module that is 10g or stacking, and you can set the mode to toggle between them, this of course is good for Dell, because it requires them to maintain smaller inventories, and also it is good for you because if your needs change you can re-purpose the same module for a completely different use-case.

I am not a “switch” guy, I find the stuff boring in most cases, however this was just an interesting enough issue with a simple enough fix that I decided to document the fix.  This article assumes that you know how to connect to the console of your switch and that you are familiar with the basics of configuring a switch.

Enter Enable Mode

> enable 

View Current Stack Configuration

# show stack-port
Configured  Running
Stack      Stack      Link       Link
Unit    Interface       Mode       Mode       Status     Speed (Gb/s)
---- ---------------- ---------- ---------- ------------ ------------
1    xg1              Ethernet   Ethernet   Link Down    Unknown
1    xg2              Ethernet   Ethernet   Link Down    Unknown
1    xg3              Ethernet   Ethernet   Link Down    Unknown
1    xg4              Ethernet   Ethernet   Link Down    Unknown    

Enter Configuration Mode

# config

Enter Stack Configuration Mode

(config)# stack

Change From Ethernet To Stack Mode

(config-stack)# stack-port 1/xg1 stack
(config-stack)# stack-port 1/xg2 stack
(config-stack)# exit

Verify New Stack Configuration

# show stack-port
Configured  Running
Stack      Stack      Link       Link
Unit    Interface       Mode       Mode       Status     Speed (Gb/s)
---- ---------------- ---------- ---------- ------------ ------------
1    xg1              Stack   Ethernet   Link Down    Unknown
1    xg2              Stack   Ethernet   Link Down    Unknown
1    xg3              Ethernet   Ethernet   Link Down    Unknown
1    xg4              Ethernet   Ethernet   Link Down    Unknown    

Once you are done with one wash, rinse, and repeat on the other switch(es), but one last thing to notice.  Here that we are showing two different modes, “configured stack mode” and “running stack mode” this is reflecting that we have made the configuration changes necessary to flip the modules over, but that the machine has not be rebooted so that it can reload its configuration.  So now when you have some downtime reboot the switches.  The first one that comes up will be the master and the second one will end up with a default configuration.  In our case the only configuration specific to our environment is jumbo frames.  Which can be configured switch wide.

Enter Enable Mode

> en

Enter Configuration Mode

# configure

Enter Interface Configuration Mode For All Ethernet Interfaces

(config)# interface range ethernet all

Enable Jumbo Frames

(config-if)# mtu 9216
(config-if)# exit

Well that does it.  You now have a stacked Dell PowerConnect with Jumbo Frames enabled.

Comments Off
February 28th, 2012 | Tags: , , , , , ,

Alot of my time is spent handling storage from the array side, however recently I had the need to test both sides of the process in order to perform some exclusionary tests against a fully adaptable environment, without impacting our production environments, so I ended up using OpenIndiana 151a on both the target and initator side of the connection.  This is how you handle the initiator side of the connection.

Identify the Initiator IQN

First we need to identify the IQN of the iSCSI initiator, this is necessary to configure the security on the iSCSI target.

# iscsiadm list initiator-node
Initiator node name: iqn.1986-03.com.sun:01:c0ca4ce904ff.4f45b957
Initiator node alias: openindiana
Login Parameters (Default/Configured):
Header Digest: NONE/-
Data Digest: NONE/-
Authentication Type: NONE
RADIUS Server: NONE
RADIUS Access: disabled
Tunable Parameters (Default/Configured):
Session Login Response Time: 60/-
Maximum Connection Retry Time: 180/-
Login Retry Time Interval: 60/-
Configured Sessions: 1

Configure iSCSI Static Discovery

Here we configure the type of discovery that we want to use, in our case we are using static discovery.

# iscsiadm modify discovery --static enable

Here we add the actual discovery parameters, the IQN of the target, as well as the IP which we will connect to to create the connection.

# iscsiadm add static-config iqn.2010-09.org.openindiana:02:1a7a530f-8508-4736-f269-d6363a8cb5e6,10.0.0.21:3260

View Available LUNs

Now with everything working on both sides of the connection we will end up seeing specifics about our connection, and our available LUNs

# iscsiadm list target -vS
Target: iqn.2010-09.org.openindiana:02:1a7a530f-8508-4736-f269-d6363a8cb5e6
Alias: -
TPGT: 1
ISID: 4000002a0000
Connections: 1
CID: 0
IP address (Local): 10.0.0.22:59749
IP address (Peer): 10.0.0.21:3260
Discovery Method: Static
Login Parameters (Negotiated):
Data Sequence In Order: yes
Data PDU In Order: yes
Default Time To Retain: 20
Default Time To Wait: 2
Error Recovery Level: 0
First Burst Length: 65536
Immediate Data: yes
Initial Ready To Transfer (R2T): yes
Max Burst Length: 262144
Max Outstanding R2T: 1
Max Receive Data Segment Length: 32768
Max Connections: 1
Header Digest: NONE
Data Digest: NONE

LUN: 0
Vendor:  OI
Product: COMSTAR
OS Device Name: /dev/rdsk/c4t600144F0B7EA490000004F471B750001d0s2

Notice LUN 0 in the above output.  This is our LUN, remember if you expose multiple LUNs then you will see multiple LUNs here, this is not yet a usable disk for that we will need to use the format utility to identify the disk name.

# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
0. c3t0d0 <ATA-WDCWD5003ABYX-1-1S02 cyl 60798 alt 2 hd 255 sec 63>
/pci@0,0/pci1028,4dd@1f,2/disk@0,0
1. c4t600144F0B7EA490000004F471B750001d0 <OI-COMSTAR-1.0 cyl 1303 alt 2 hd 255 sec 63>
/scsi_vhci/disk@g600144f0b7ea490000004f471b750001

We now have a completed iSCSI session, and we have a usable volume on our client side.

Comments Off

The most critical thing about any storage implementation is flexibility.  This means that you need to be able to adapt your solution to changing environmental considerations.  The nice thing about a ZFS-based solution is that all of the building blocks are available.  If you want to use it as a NAS device, you can use CIFS, NFS.  If you want to provision Fibre Channel that is available too.  Even some of the more sought after SAN features are included, replication, deduplication, thin-provisioning, and many more.  Now in todays article we are going to discuss how you can go about setting up iSCSI targets.  I personally am not a fan of iSCSI (I have been bitten by the hard knee too many times), however you really can’t argue with the flexibility that is provided by being able to quickly provision storage on already existing or readily available hardware.  The best thing about this process is that it is really similar to provisioning fibre channel targets, so if you are familiar with this process (available here) then you will be able to quickly adapt to the iSCSI variation.

I did my testing on OpenIndiana 151a, which had up to date packages as of Feb 23, 2012.  The process should be largely similar if not the same on Solaris 11 Express and Solaris 11 GA.

Install Prequisite Packages

# pkg install network/iscsi/target

Start SCSI Target Mode Framework Service

# svcadm enable svc:/system/stmf:default

Start iSCSI Target Service

# svcadm enable svc:/network/iscsi/target:default

Display the Status of the SCSI Target Service

The ALUA Status is what allows a LUN to be served via FC and iSCSI at the same time.  We do not require or want it.

# stmfadm list-state
Operational Status: online
Config Status     : initialized
ALUA Status       : disabled
ALUA Node         : 0

Create a File System (Volume) to be Provisioned

Before we can present a LUN then we need to create a ZFS file system as a volume so that it can be used as a block device.

# zfs create -V 10G rpool/testvol

Notice the lack of a mount point, this is due to provisioning the file system as a volume.

# zfs list rpool/testvol
NAME            USED  AVAIL  REFER  MOUNTPOINT
rpool/testvol  10.3G   436G    16K  -

Create a LUN for the Volume

So we have created a volume, now we need to make the STMF aware of it, so that it can hand it out as a block device, and thus it can be used.

# sbdadm create-lu /dev/zvol/rdsk/rpool/testvol
Created the following LU:

GUID                    DATA SIZE           SOURCE
--------------------------------  -------------------  ----------------
600144f0b7ea490000004f471b750001  10737418240          /dev/zvol/rdsk/rpool/testvol

We will need the path and the name of the LUN, which is available from the create-lu command (as GUID and SOURCE), however stmfadm gives us a more readable format.

# stmfadm list-lu -v
LU Name: 600144F0B7EA490000004F471B750001
Operational Status: Online
Provider Name     : sbd
Alias             : /dev/zvol/rdsk/rpool/testvol
View Entry Count  : 0
Data File         : /dev/zvol/rdsk/rpool/testvol
Meta File         : not set
Size              : 10737418240
Block Size        : 512
Management URL    : not set
Vendor ID         : OI
Product ID        : COMSTAR
Serial Num        : not set
Write Protect     : Disabled
Writeback Cache   : Enabled
Access State      : Active

Create an iSCSI Qualified Name (iqn) Target

This simply creates the target iqn for the machine we will be serving out iSCSI volumes from.

# itadm create-target
Target iqn.2010-09.org.openindiana:02:1a7a530f-8508-4736-f269-d6363a8cb5e6 successfully created

Put the New Target in Offline Mode

In order to add the target to the target group we will be creating (to control storage views) then we must offline our target first.

# stmfadm offline-target iqn.2010-09.org.openindiana:02:1a7a530f-8508-4736-f269-d6363a8cb5e6

Create a Target Group

This the group which will control the target side of the storage connection, so in other words it is on your array, and I like to name it as such, so if your array were named “array001″ then I would name your target group that.

# stmfadm create-tg array001

Add the Target to the Target Group

# stmfadm add-tg-member -g array001 iqn.2010-09.org.openindiana:02:1a7a530f-8508-4736-f269-d6363a8cb5e6

Review the Target Group Configuration

# stmfadm list-tg -v
Target Group: array001
Member: iqn.2010-09.org.openindiana:02:1a7a530f-8508-4736-f269-d6363a8cb5e6

Create a Host Group

This is the group which will identify where the traffic is coming from, so for organization I like to have it named to reflect the server name of the initiator side of the connection.  So if your server was “server001″ then I would name your host group to match that, this will help keep things nice and tidy and make your views much more legible.

# stmfadm create-hg server001

Add the Host to the Host Group

Here you will need to get the iqn of the client side of the storage connector and insert it into this command.

# stmfadm add-hg-member -g server001 iqn.1986-03.com.sun:01:c0ca4ce904ff.4f45b957

Create a View

A view is what ties together everything, we take the target group, the host group, and the LUN and put them in a view.  If the connection does not meet the first two criteria then it does not get to see the LUN.  This is important in the event that you have multiple servers accessing your array (which you will).  You don’t want to end up with multiple servers writing to the same LUNs at the same time without certain precautions being taken first, so we want to block that behavior.

# stmfadm add-view -t array001 -h server001 600144F0B7EA490000004F471B750001

Create a Target Portal Group

The Target Portal Group is what the initator actually connects to in order to find the storage.  Here we need to use the storage IP on array001.

# itadm create-tpg array001portal 10.0.0.21:3260

Enable Static Mode Discovery

# devfsadm -i iscsi

At this point once you have configured your iSCSI initiator on the client side then you should be able to see your iSCSI block device.

That wraps it up.  Put a neat little bow on it and send it on its way.

A few things to remember with iSCSI and your ZFS box.  You will need a mirrored ZIL (mirrored because you don’t want to lose data) to counteract the performance penalties associated with synchronous writes.  The most important thing is that iSCSI is a protocol that is suited towards flexibility but not performance.  If you need performance you should be considering fibre channel, remember a good chunk of the cost comes from the fabric, if you already have the fabric for another SAN you can easily connect another array into the existing fabric, remember it is Storage Area NETWORK.

Comments Off
February 21st, 2012 | Tags: , , , , ,

One of the most complex parts of storage in general and ZFS in particular is correctly assessing the amount and types of storage that you will need to meet your requirements.  You can of course just purchase what you can afford and hope for the best.  But after reading this article you can relatively easily determine approximately how much storage you should plan for based on a few factors from your existing environment, this article will primarily focus on ZFS, however alot of this is true in other storage platforms as well…

Factor One – Use Case

The use case is the single most important factor, this basically tells you how you plan on using the data.  Is this going to serve files (NFS or CIFS), will you be using it to serve block devices (Fibre Channel or iSCSI), or perhaps you will be using this as a consolidation point of multiple sources of data for use in a backup scenario.  The answer to this question will primarily determine if you are using a ZIL or Cache device or if you are using spares.  This of course will steal slots (where disks could go) so you will want to factor that into your larger capacity planning.

Physical Capacity

Obviously the machine that you buy will have a limit to how many disks you can fit in it.  After that you can plan on adding expansion chassis.  You need to make sure you understand how your zpools will be formatted and the types of disks that you will be using so that you can begin to see how much data you will be able to squeeze onto the spindles.  One easy thing to overlook is the need for drive bays for other purposes (spares, log, cache, system drives, etc).  If you plan on expanding with an additional chassis, will you use up all of your expansion slots with various cards (10Gb, FibreChannel, NICs, SSD, etc).  Also when considering disks you need to keep in mind that capacity is not the only metric to consider.  You also need to keep in mind that your workload will determine how performant the solution must be to meet requirements, for example a backup server will not require the same performance as a storage head which is handing out block devices which have Operating Systems installed on them (read: Virtual Machine environments).

Spare Devices

Depending on your use case you might need to consider allocating some drives as spares to allow you the time to procure and/or replace failed drives.  This will simply protect you from additional failure(s).  If you have the space I like to have 1 spare per 12-15 acting as a spare.  This really straight forward, just make sure that as part of any large system that the company accepts the risks involved with your decision, if they are not willing to accept the risks then provide them with an option which will protect them from that risk.  They will then either pay for the lower risk system or accept the risk of the cheaper system.

Cache Devices

Cache devices (L2ARC) is a really quick and cheap way to increase your read iops on a system wide basis.  Basically as data is read off of ZFS it is read through the cache device, then when subsequent read requests for the data come in then they can be served directly off of the cache, this works in combination with the ARC which is where all of your memory goes in ZFS.  Now a cache device is really necessary if you are using deduplication.  When you enable deduplication all subsequently written files are indexed into the dedup table, before a write can be committed it must be compared to items that are already in the dedup table (to see if anything else is the same – thus if it should actually be written to disk or if a pointer to the previously existing file will exist) initially this will not pose a problem as your dedupe table will be relatively small and it can be stored in the 25% of the ARC that it is relegated to.  However at some point you will run out of space in the ARC and if you do not have a L2ARC then your dedup table will be swapped which of course means that it will be holding up writes to disk, in other words it will be slow.  Since a cache device’s purpose is to speed up reads, then you want to look for a SSD which is slanted towards better performance on the read side.  I like Crucial M4 for this purpose, but please keep in mind they had a serious firmware bug in a previous version so make sure you end up with v0309 on the disk and update to it if you don’t.

Log Devices

Log devices (ZFS Intent Log) is a really cool way to speed up technologies which utilize synchronous writes.  Synchronous writes are used to ensure that data is committed to the disk before additional data is sent, it essentially serializes the connection.  Most commonly you will find synchronous writes in databases, NFS and iSCSI.  So if you are planning on using one of these technologies then you will want to consider using a mirrored ZIL device.  Now I am sure someone out there is saying, “Whoa mirrored?  But that takes up two slots man…” and you would be correct, however keep in mind that if you have a failure of your ZIL, you have actual data committed to it which has not been committed to disk (read: data loss).  As such you really ought to have a mirrored ZIL or none at all, you can also stack multiple mirrors as a ZIL if you had a higher synchronous writes workload.  Since log devices are meant to improve writes (albeit a certain type of write) then you will want to look for a SSD which is slanted towards better write performance.  I tend to like OCZ Vertex 3 for this purpose.

Factor Two – Data Growth

Now we get into actually sizing data.  When you try to project growth you need to make sure that (1) you take use an appropriately sized sample set of data (2) factor in any relevant business or technical factors which could have skewed your sample.  I like to size my storage based off of backup size.  Simply grab the size of your full backups for the data in question, if you are going to use this to serve block devices collecting this data becomes a bit more complex, because now you are talking more of a machine sprawl type of growth which can be harder to account for.  Here is are a few formulas to get you started (make sure you are using the same metric, i.e. B, KB, MB, GB, TB).

Calculate Weekly Growth

(full_week1 - full_week0) = growth_week1
(full_week2 - full_week1) = growth_week2

Calculate Weekly Growth Rate

(growth_week1 / full_week0) = growth_rate_week1

Average Weekly Growth

(growth_week1 + growth_week2 + growth_week3 + growth_week4) / number_of_weeks = average_weekly_growth

Average Weekly Growth Rate

(growth_rate_week1 + growth_rate_week2 + growth_rate_week3 + growth_rate_week4) / number_of_weeks = average_weekly_growth_rate

Annualized Weekly Growth

(average_weekly_growth * 53) = annual_growth

So for example if we start with this example…

week0 100
week1 105
week2 120
week3 122
week4 122
week5 118 data expired
week6 130 added new servers
week7 135
week8 136

Notice that in this example we have a couple of points where I saw it fit to make notes.  One was when our dataset decreased and the other was when it grew sharper than the trend up to that point.  I made these notes because I wanted to point out that if you had a very large deviation then you might want to make some sort of adjustment so as to not skew your results.  In this case both of these happened because of regular course of business so they should be legitimately considered as part of the growth curve.  However if you had some sort of failure in your backups that you were already accounting for that data in another way you would not want to double count it if it came back into the backups in the middle of your sample.

Weekly Growth

week1 (105 - 100) = 5
week2 (120 - 105) = 15
week3 (122 - 120) = 2
week4 (122 - 122) = 0
week5 (118 - 122) = -4
week6 (130 - 118) = 12
week7 (135 - 130) = 5
week8 (136 - 135) = 1

Weekly Growth Rate

week1 (5 / 100) = .05 = 5%
week2 (15 / 105) = .143 = 14.3%
week3 (2 / 120) = .017 = 1.7%
week4 (0 / 122) = 0 = 0%
week5 (-4 / 122) = -.033 = -3.3%
week6 (12 / 118) = .102 = 10.2%
week7 (5 / 130) = .039 = 3.9%
week8 (1 / 135) = .007 = .7%

Average Weekly Growth

(5 + 15 + 2 + 0 + -4 + 12 + 5 + 1) / 8 = 4.5

Average Weekly Growth Rate

(5 + 14.3 + 1.7 + 0 + -3.3 + 10.2 + 3.9 + .7) / 8 = 4.1%

Annualized Growth

(4.5 * 53) = 238.5

Now as you can see from this example the numbers add up quickly, in a year we have easily doubled our dataset, now why is this important?  Basically most businesses perform budgeting at the beginning of the year and as such expenses need to be planned out, and any expansion that you do perform will need to _at least_ last through the year, either way you at least need to know how long you can reasonably expect to be able to use this hardware in its capacity before having to look for upgrades.  Your project will be generally regarded as successful if it meets the technical requirements with minimal involvement from the business (having to ask for more money), especially if you can tell them that this storage will need to be expanded in x number of months.

Factor Three – Data Churn

Churn is freaking scary when it comes to calculating capacity requirements.  Churn is the amount of data change, now some churn is actually growth so we will want to keep that in mind when we are performing our analysis, however churn is the amount of data that changes in a given week. It is amazing the levels of churn that some companies have.  This is especially disconcerting if you plan on utilizing snapshots or send/receive as a form of backup.  Now we still use backups to calculate churn, however instead we will use our mid-week backups instead of our fulls.  Now if you are using incremental backups then you will total all of those during the week to get your dataset size.  If you are using differentials you can calculate this using the final differential in the week.

Now the tricky thing about churn is that growth is churn, but churn is not growth.  So after we calculate our growth and our churn we will subtract our growth from our churn to get our actual churn, otherwise we will be double counting our growth.  I don’t bother doing this until we are talking about the averages and the annualized numbers.  You will notice that these formulas are largely the same as the growth formulas, just remember her that you need to calculate these against your aggregated incrementals or your final differential to get valid numbers.

Calculate Weekly Churn (differentials)

diff_week1 = churn_week1

Calculate Weekly Churn (incrementals)

(inc1_week1 + inc2_week1 + inc3_week1 + inc4_week1 + inc5_week1) = churn_week1

Calculate Weekly Churn Rate

(churn_week1 / full_week0) = churn_rate_week1

Average Weekly Churn

(churn_week1 + churn_week2 + churn_week3 + churn_week4) / number_of_weeks = average_weekly_churn

Average Weekly Churn Rate

(churn_rate_week1 + churn_rate_week2 + churn_rate_week3 + churn_rate_week4) / number_of_weeks = average_weekly_churn_rate

Annualized Weekly Churn

(average_churn_growth * 53) = annual_churn

Expanding on our previous example, if we had churn rates that looked like this…

week1 8
week2 6
week3 5
week4 2
week5 2
week6 15
week7 6
week8 1

Weekly Churn Rate

week1 (8 / 100) = .08 = 8%
week2 (6 / 105) = .057 = 5.7%
week3 (5 / 120) = .042 = 4.2%
week4 (2 / 122) = .016 = 1.6%
week5 (2 / 122) = .016 = 1.6%
week6 (15 / 118) = .127 = 12.7%
week7 (6 / 130) = .046 = 4.6%
week8 (1 / 135) = .007 = .7%

Average Weekly Churn

(8 + 6 + 5 + 2 + 2 + 15 + 6 + 1) / 8 = 5.6

Now this is only partly correct, we still need to account for the growth we have already included  by subtracting the average growth from our average churn.

(5.6 - 4.5) = 1.1

Giving us a growth-adjusted weekly average churn of 1.1.

Average Weekly Churn Rate

(8 + 5.7 + 4.2 + 1.6 + 1.6 + 12.7 + 4.6 + .7) / 8 = 4.9%

Again we must remove our growth.

(4.9 - 4.5) = 0.4

Giving us a growth-adjusted weekly average churn of 0.4%

Annualized Churn

(1.1 * 53) = 58.3

Now that we have calculated our storage needs we can start to work out our final configurations based on the goals of our project.  In our example case we have learned that based on our current dataset (136) and our annualized growth (58.3), which means that in 1 year our dataset will increase by 30% which means that if we had a goal of engineering a system which would not need any upgrades (based on current use cases) in the first two years then you would need a minimum size of 252.6.

Now one final consideration when sizing ZFS is pool capacity, ZFS uses copy on write to turn random writes into sequential writes, this is very good for performance, however when a pool exceeds 80% ZFS will not be able to do this as well and as such your writes will become semi-random (since parts of files will have to be written in non-sequential blocks that happen to be free).  So we should also make sure that we have a 20% ceiling on our projections so that we can ensure the same level of performance throughout the systems life.  And with that we have a final number of 303.1.

Now please keep in mind, these formulas will allow you to work through your own sizing exercise, but I am not suggesting that you have a growth rate of 4.5% and a churn rate of 1.1% this was merely an example, you will need to use your own numbers to come up with projections which are applicable to your scenario.  Also you will notice that in my example I did not use a size metric (MB, GB, TB, PB) I did this intentionally so as to not confuse you these formulas will work regardless of your metric, just ensure you use the same metric in all of your calculations.

Happy Sizing!

 

Page 10 of 23« First...89101112...20...Last »
TOP