June 25th, 2014 | Tags: , , , ,

Today we are going over one of the fundamental aspects of administering a DNS infrastructure, that is maintaining consistency across multiple replicas.  The simplest method of accomplishing this is to have a master/slave relationship.  Essentially you have a read-write (master) copy and a read-only (slave) copy of the zone.  The slave looks to the master for information on changes.

Basic BIND Configuration

Here we are going to look at the key components of a BIND configuration, this should be the same regardless of if this is a master or a slave server.

# cat /etc/named.conf
options {
 directory "/var/named";
 version "unknown";
 pid-file "/var/named/named.pid";
 auth-nxdomain no;
 forwarders { 8.8.8.8; 8.8.4.4; }; // these are the forwards to be used for all non-authoritative queries.
 allow-transfer { "none"; }; // this is always none, we enable transfers on a zone-by-zone basis in each zone definition.
 allow-notify { "none"; }; // this is always none, we enable notifications on a zone-by-zone basis in each zone definition.
 allow-recursion { 127.0.0.1; 192.168.1.0/24; }; // include all networks you want to be able to recursively perform queries.
 notify explicit; // this turns off the default behavior of all servers sending notifies to all servers with an NS record.
 also-notify { 192.168.1.12; 192.168.1.13; }; // this turns on notifications for specific servers (so master would notify slaves).
};

Creating Acl Statements

Here we can get a little more fancy.  The acl statement is primarily for BIND views, however it also works as short hand for repetitive items.  Here we are going to use it to define the CIDR notation of our slave servers.

 acl slaves { 192.168.1.12/32; 192.168.1.13/32; };

Required Zones

Now we get into setting up our required zones.

zone "." {
 type hint;
 file "root.servers";
};

zone "localhost" {
 type master;
 file "localhost.db";
 allow-update{ none; };
};

zone "0.0.127.in-addr.arpa" {
 type master;
 file "localhost.rev.db";
 allow-update{ none; };
};

Configure Master Zones

Here is the definition for our master zones, in this case domain.local and 192.168.1.x.  I have not included the zone files themselves, but as long as you have a working zone file you can use this method to keep it in sync across multiple servers.

# cat /etc/named.conf
zone "domain.local" {
 type master;
 file "master/domain.db";
 allow-transfer { slaves; };
};

zone "1.168.192.in-addr.arpa" {
 type master;
 file "master/192.168.1.rev.db";
 allow-transfer { slaves; };
};

Configure Slave Zones

Define our slave zones.  Here we need to define where the master is and what servers we allow to notify which in this case is the master(s).  Another thing to notice is that the file location is not in the standard location.

# cat /etc/named.conf
zone "domain.local" {
 type slave;
 file "/var/cache/named/domain.db";
 masters { 192.168.1.11; };
 allow-notify { 192.168.1.11; };
};

zone "1.168.192.in-addr.arpa" {
 type slave;
 file "/var/cache/named/192.168.1.rev.db";
 masters { 192.168.1.11; };
 allow-notify { 192.168.1.11; };
};

When dealing with slave zones, it is really important to remember that you have to be very disciplined about updating the serial numbers in the zone files.  The numbering scheme really doesn’t matter as long as it always goes up, but I tend to use YYYYMMDD## so the first update of June 24 2014 would have a serial number of 2014061401, the second would be 2014061402, and on and on.

Check Your Configuration

We can use the named-checkconf utility to parse the configuration and ensure that it is correct (from a syntax and structure perspective).  There is also one for the zones named-checkzone.

# named-checkconf /etc/named.conf
June 19th, 2014 | Tags: , , , ,

Generally on this blog I tend to be far more technologist than pundit, in that I normally don’t write about something unless it has to do with doing something interesting technically.  However in this case I think a bit of talk about what happened and why might be in order for the architecture minded out there.

What Happened?

Let me first start by saying that I have no inside information here, I have never been a customer of codespaces.com, and have never in any capacity performed any work for them.  The information I have is all pulled from the Google Cache of their site here.  Now according to Codespaces.com on Tuesday June 17, 2014 they had a DDOS, and it appears that as a result of this DDOS attack that an attacker was able to gain access to their EC2 control panel.  The attackers ultimate goal was extortion, however when Codespaces attempted to regain control the attacker still had access to the control panel and was able to start deleting backups, storage and instances either strategically or indescriminately.  Now keep in mind, these are the current theories about what has happened, it could turn out that the attacker already had access prior to the DDOS attack, perhaps from a previous attack, and they were just planning their next move.

How Was This Possible?

Some will say this is because of the cloud.  I couldn’t disagree more.  Quite simply this is because of abstraction at all levels, when I was first writing this I felt the proper word was virtualization, however I don’t want people to read that and assume that virtualization vendors are to blame, they are only part of the problem.  The problem is three part (1) Centralized Storage (2) OS Virtualization (3) Simplification of Services.  Now of course these three problems have also been huge advancements in our field. Lets take a moment and unpack that.

Centralized Storage

Now we have had massive improvements in storage technologies that has resulted in massive cost savings by being able to scale a consolidated storage platform(s) as opposed to having to scale on all of the physical servers in the organization.  This doesn’t just pose a problem in the cloud though.  Think of the damage that an attacker could do if they had access to the administrative interface of your SAN/NAS devices.

OS Virtualization

As with storage we have been able to minimize waste at a compute level by leveraging virtualization, this allows us to more effectively utilize RAM and CPU and ensure that systems are utilized evenly over the entire organization.  This has resulted in an arguably bigger cost savings than centralized storage when you factor in the software license savings of some operating systems and third party software.  But again, the damage an attacker could do to an organization with this access to this interface is immense.

Simplification of Services

Finally the last reason for this is that services have gotten so much more simple.  Frankly it has removed the voodoo and black arts from the IT organization, because at then end of the day IT can provide a solution for $15K or a business user or developer can provision a service for $100/month.  The cost savings are fantastic, but the problem is removing your own experts from the equation leaves the business vulnerable to risks that perhaps the business user or developer doesn’t fully understand.

Now cloud provides blend these three things together which increases the value but also magnifies the risk.  This can be very bad, it can also be very good.  To ensure that we get more good than bad we need to architect around the weakness in the system.  This true for all things.  For example, a pistol has a trigger, this is used to make the gun fire.  Firing a pistol can be a very dangerous operation, especially if you remove the trigger guard (the trigger guard is the piece of metal (or other substance) which encloses the trigger to minimize accidental firing.  Now imagine that Police Officers didn’t have trigger guards on their pistols, I would imagine that we would have much higher rates of police officers shooting themselves in the foot, leg, etc when holstering their weapon.  This is something that we take for granted, however it is a very real risk.  To illustrate this take a gander at this pistol sans trigger guard and imagine carrying that in your waistband.

paterson-colt2

Lessons Learned for AWS and Similar Providers

1) Provide better segmentation, between operational services, and backup services, perhaps having a grace period on storage deletions (24 to 48 hours).

2) Implement some sort of extortion protocol that the customer can request (this of course could be exploited as well) which would result in full validation of all accounts and keys by AWS and the customer in tandem.

Lessons Learned for DevOps Shops

1) Know your infrastructure, know its weaknesses, hire experts even if only for short term “health-checks” to make sure that the Dev doesn’t overcome the Ops.

2) If you don’t already have a full time “Ops” guy then you need one.  This should not be a DevOps guy, or a Dev guy.  This guy is the one who will provide the voice of reason.  This guy should be architecture minded,

3) If you are currently 100% AWS or 100% Azure or any other cloud provider then you are 100% vulnerable to the exact same type of exploitation.  If you want to maintain a 100% cloud model you must look at extending beyond a single provider.

4) The final component to any DR plan (or even simply backup plan) should be a contingency in the event that the people or group of people missed something or worse, were incompetent.  Simulate a failure, a failure you haven’t even contemplated, perhaps even set aside some time with folks just to brainstorm how to simulate the failure.

Lessons Learned for IT Professionals

1) Traditional IT invented the DevOps model, this is our fault.  Our lack of flexibility and responsiveness, has left us open to have solutions built by someone who said that they could.

2) Be more flexible, quit driving the business into partially thought out models.

3) Advise the business of the risks associated with any sort of solution (cloud or not), and the important part is mitigations to those risks.  The fact is that for a lot of reasons cloud services are very attractive, find a way to incorporate them into the architecture without betting the business on them.

4) A repeat from DevOps because it matters here too…  The final component to any DR plan (or even simply backup plan) should be a contingency in the event that the people or group of people missed something or worse, were incompetent.  Simulate a failure, a failure you haven’t even contemplated, perhaps even set aside some time with folks just to brainstorm how to simulate the failure.

Lessons Learned for Customers of Cloud Providers

1) This failure was one of numerous types of failures that could have occurred, the bottom line is that you need to take responsibility for your data.

2) If your business is actually dependent on something you need to fully understand the risk of allowing someone else control over it.

The Verdict

The bottom line here is that codespaces.com does bear responsibility in this situation, though the fact of the matter is that the types of mistakes that were made are very easy mistakes to fall into even within the context of a seasoned IT professional.  However we need to be careful not to throw the baby out with the bath water.  The fact of the matter is that a crime was committed and that should be fully investigated and prosecuted, with as much recovered to codespaces.com customers as possible to make them whole again.  One final point is, that the folks who worked at codespaces.com have just gone through a very difficult process that is frankly once in a lifetime, that said they have learned lessons that your best guy hasn’t yet learned.  Do not hesitate to hire them because of this situation, I personally think that given a little bit of time to process what happened, they will be very valuable resources in terms of understanding how to ensure that this doesn’t happen again (in other words someone you want on your side if you utilize a cloud model, and frankly even if you don’t).  After all I can think of thousands of mistakes (big and small) that have taught me lessons, but I really can’t think of any successes that have taught me lessons.

A final thought to leave you with…  Comfort breeds more of the same; Pain breeds change.

June 16th, 2014 | Tags: , , , , ,

Today I am going to document the process of install BIND in Solaris 11.  I am using a Solaris 11.1 zone for this task, though nothing in this is specific to zones, and it should work on previous versions of Solaris 11.  I have done this quite a few times and it is not a very intuitive process.  As of the time of this writing the version of BIND in IPS is 9.6.3.7.2 (9.6-ESV-R7-P2).

Install BIND

Using the IPS we can install the BIND software.

# pkg install pkg://solaris/network/dns/bind

Create Required Directories

We will need a few directories to have a functional name server.

# mkdir -p /var/named/master

Create Group to Run BIND

We don’t want to run the service as root, so we will need to create a group.  Disregard the warning, this is due to the gid being <100.

# groupadd -g 98 named
UX: groupadd: WARNING: gid 98 is reserved.

Create User to Run BIND

We will also create another user to run the software.  Disregard the warning, this is due to the uid being <100.

# useradd -g named -d /var/named -u 98 named
UX: useradd: WARNING: uid 98 is reserved.

Set Directory Ownership

Permissions will need to be modified as well to support the new user/group.

# chown -R named:named /var/named

Modify Start User

Here we are going to set the dns/server service (full FMRI: svc:/network/dns/server) to use the named user we created earlier.

# svccfg -s dns/server:default setprop start/user=named

Modify Start Group

Here we are going to set the dns/server service (full FMRI: svc:/network/dns/server) to use the named group we created earlier.

# svccfg -s dns/server:default setprop start/group=named

Build Basic Configuration

Solaris 11 doesn’t come with a sample configuration, so we need to start from scratch.  To test our previous steps we will simply setup a caching and forward only DNS server.  Keep in mind you will need to allow additional networks to query or the name resolution itself will not work.

# cat /etc/named.conf
options {
 directory "/var/named";
 version "unknown";
 pid-file "/var/named/named.pid";
 forwarders { 8.8.8.8; 8.8.4.4; };
 forward only;
 allow-transfer { "none"; };
 allow-query {192.168.1.0/24; 192.168.2.0/24;};
};

zone "localhost" {
 type master;
 file "localhost.db";
 allow-update{none;};
};

zone "0.0.127.in-addr.arpa" {
 type master;
 file "localhost.rev.db";
 allow-update{none;};
};

Build Localhost Zone

# cat /var/named/localhost.db

$TTL 3h
@ IN SOA ns01.yourdomain.local. hostmaster.yourdomain.net. (
 2014061201 ; se = serial numbers
 12h ; ref = refresh
 15m ; ret = update retry
 3w ; ex = expiry
 3h ; min = minimum
 )

IN NS ns01.yourdomain.local.
 IN NS ns02.yourdomain.local.
 IN NS ns03.yourdomain.local.

@ IN NS @
 IN NS 127.0.0.1

Build Localhost Reverse Zone

# cat /var/named/localhost.rev.db

$TTL 3h
@ IN SOA ns01.yourdomain.local. hostmaster.yourdomain.net. (
 2014061201 ; se = serial numbers
 12h ; ref = refresh
 15m ; ret = update retry
 3w ; ex = expiry
 3h ; min = minimum
 )

IN NS ns01.yourdomain.local.
 IN NS ns02.yourdomain.local.
 IN NS ns03.yourdomain.local.

1 IN PTR localhost.

Add BIND Authorization

This allows the named user to administer the dns/server service.

# usermod -A solaris.smf.manage.bind named

Refresh Service Configuration

This will re-read the configuration from the SMF and capture the user and group changes we made earlier.

# svcadm refresh dns/server

Start DNS Server

Now we are done and ready to validate.  Lets start the service.

# svcadm enable dns/server

Check the Service

If the following command generates no output then that means the service has started properly with no errors.

# svcs -x

To confirm with actual output we can use the following command.

# svcs dns/server
STATE STIME FMRI
online 18:58:31 svc:/network/dns/server:default

But lets assume that we have a problem for a second, this is what you will see.

# svcs -x
svc:/network/dns/server:default (BIND DNS server)
State: maintenance since June 14, 2014 06:57:34 PM CDT
Reason: Start method failed repeatedly, last exited with status 1.
See: http://support.oracle.com/msg/SMF-8000-KS
See: named(1M)
See: /var/svc/log/network-dns-server:default.log
Impact: This service is not running.

Now we have two things to check the log file of the SMF service /var/svc/log/network-dns-server:default.log and /var/adm/messages.  Usually tailing the service file and grepping for named in the messages file will reveal the problem.  A common problem I have had is forgetting to set permissions properly or forgetting to refresh the service after the user and groups are updated.  The latter is visible by checking which user is running the named process.  If you do run into a problem, once the problem is fixed you can mark the service as fixed by using the following.

# svcadm clear dns/server

 

May 19th, 2014 | Tags: , , ,

In this article we are going to go over Datalink Multipathing Protocol (DLMP) available in Solaris.  DLMP is similar to IPMP, however there are some key differences, the biggest being the layer at which it operates.  DLMP operates at the datalink layer of the OSI model, while IPMP operates at the network layer.  Due to the differences in those layers DLMP opens up a lot of possibilities that were not possible with IPMP.  For example if you had a requirement for redundant networking for a service, either IPMP and DLMP would both be able to meet that requirement very well.  However if you had a requirement that the service runs inside of a zone or a logical domain then the level of work in IPMP becomes much higher.  This is because these hypervisors assign datalink devices to their guests.  Since the redundancy for IPMP is built on a level higher we need to assign multiple non-redundant interfaces to the guests and then build the interfaces and IPMP groups inside each of the guests.

Benefits of DLMP

  1. Virtualization friendly, you configure the aggregation on the control domain (or global zone) and hand out a redundant interface to a guest.
  2. Single command to configure a DLMP aggregation group.
  3. More portable, no switch side configuration or support is required

Drawbacks of DLMP

  1. Requires the same media speed for all members (differing speeds get put into standby and will not receive a failover).
  2. Requires a switch to mediate the connections, so no server to server connections.

Create a DLMP Aggregation

Creating and DLMP based aggregation is really similar to creating an LACP aggregation.  Simply change the mode to be dlmp.

# dladm create-aggr -m dlmp -l net0 -l net1 aggr0

Change an Existing Aggregration to DLMP

If you forgot to include the mode flag, that can be added using modify-aggr.

# dladm modify-aggr -m dlmp aggr0

Add Additional Interfaces to an Aggregration

Add additional interfaces (net2 and net3) to the existing aggregation group aggr0.

# dladm add-aggr -l net2 -l net3 aggr0

Remove Interfaces from an Aggregation

Remove interfaces (net2 and net3) to the existing aggregation group aggr0.  You cannot remove the last interface using this method.

# dladm remove-aggr -l net2 -l net3 aggr0

Delete an Aggregation Group

To delete an existing aggregation group, you can use the following command.

# dladm delete-aggr aggr0

Show Detailed Aggregation Information

Below will show you additional information about the aggregations.  In this case I find the speed, duplex and portstate fields helpful.  Additionally you can see the mac address on an interface.

# dladm show-aggr -x
LINK       PORT           SPEED DUPLEX   STATE     ADDRESS            PORTSTATE
xgaggr1    --             10000Mb full   up        0:10:e0:2d:ec:a4   --
           net0           10000Mb full   up        0:10:e0:2d:ec:a4   attached
           net1           10000Mb full   up        0:10:e0:2d:ec:a5   attached
aggr1      --             1000Mb full    up        a0:36:9f:1e:b5:9c  --
           net8           1000Mb full    up        a0:36:9f:1e:b5:88  attached
           net4           1000Mb full    up        a0:36:9f:1e:b5:9c  attached

 

May 15th, 2014 | Tags: , , , ,

One of the biggest benefits of migrating to the Service Management Framework is that we can introduce dependencies to a services, these dependencies can be other services which is very valuable, however they can also be file based.  This can be used in a number of ways, say for example you have some application trees that live on some NFS mounts, in the event that the mount hasn’t been successfully mounted it will still attempt to start the service.  However if we make our service dependent on the file, then if the file doesn’t exist it won’t even attempt to start the service.

MANUAL PROCESS

We can use svccfg to navigate the service tree and set the properties that we require.  The big gotcha here comes when you are defining the file (line 5), notice we define it including file://localhost/ as part of the path, if you don’t include this it will be unable to locate the file and your test will fail.  In this example we are assuming that this configuration file is needed for the service to start.

# svccfg -s application/xvfb
svc:/application/xvfb> select default
svc:/application/xvfb:default> addpg config_file dependency
svc:/application/xvfb:default> setprop config_file/grouping = astring: require_all
svc:/application/xvfb:default> setprop config_file/entities = fmri: file://localhost/etc/xvfb.conf
svc:/application/xvfb:default> setprop config_file/type = astring: path
svc:/application/xvfb:default> setprop config_file/restart_on = astring: refresh
svc:/application/xvfb:default> end

STREAMLINED COMMANDS

We can also execute the same actions without entering the SMF

# svccfg -s application/xvfb:default addpg config_file
# svccfg -s application/xvfb:default setprop config_file/grouping = astring: require_all
# svccfg -s application/xvfb:default setprop config_file/entities = fmri: file://localhost/etc/xvfb.conf
# svccfg -s application/xvfb:default setprop config_file/type = astring: path
# svccfg -s application/xvfb:default setprop config_file/restart_on = astring: none

INCLUDE IN A SERVICE DEFINITION

Now of course if your service doesn’t exist yet, then the best way is to include it in the service definition.  Dependencies belong inside of the instance tags (shown below).

<instance name='default' enabled='true'>
</instance>

Here is how the above example of a file-based dependency would look inside of a service definition.

<dependency name='config_file' grouping='require_all' restart_on='refresh' type='
<service_fmri value='file://localhost/etc/xvfb-securitypolicy.conf'/>
</dependency>

Also keep in mind when using this approach it is invaluable to use the svccfg utility to validate the structure of your service definition.

svccfg validate xvfb.xml

TESTING

Now of course all of this is without any value if we don’t test the outcome.  The best way to do this is by moving your file (so that it doesn’t exist) and restarting the service.  Of course this goes without saying that this will cause downtime (as we are trying to) so please ensure that you have coordinated everything necessary.

Page 2 of 2712345...1020...Last »
TOP