Complexity as a Single Point of Failure

A network can run flawlessly for months, seemingly validating every design decision you made. Sometimes though, all it takes is one packet interacting with an implementation quirk to expose the setup as the house of cards it really is.

The setup

I needed to extend my home network to a barn about 300 feet away. To do this, I installed an EnStation5-AC at the barn, configured it as a client bridge, and pointed it at my house.

My home network is segmented into multiple VLANs. Since I needed at least 2 of these VLANs at the barn, and most Wi-Fi gear does not support VLAN-tagged traffic, I chose to implement multiple VXLAN tunnels instead (one per VLAN). Each VLAN was bridged to its corresponding VXLAN tunnel endpoint (VTEP) on both sides of the Wi-Fi link.

Figure 1. VXLAN-based network architecture

If at this point you, the reader, are wondering: hey, the barn is only 300 feet away, wouldn’t it have been less complicated to just pull some fiber? You would be right. Unfortunately for you and me, we have not even finished describing the complexity of this setup.

See how it says “OPNsense 1” in the diagram above? That’s right, there’s more than one. I have OPNsense set up for high availability, which means there are actually 2 OPNsense routers in an active-passive failover configuration. If one fails (or needs to be taken offline for maintenance), the other one seamlessly takes over.

To make sure that the VXLAN tunnels to the barn aren’t cut off if OPNsense 1 is unavailable, OPNsense 2 also needs to be configured with VTEPs and associated bridges. Since VXLAN is a Layer 4 protocol (the VTEPs are configured to listen on the virtual IP shared between the two routers), there would be no conflicts between the two sets of VTEPs, as only one of the routers would own the virtual IP at any given point in time.

Figure 2. High availability VXLAN endpoint architecture

The fact that each router had bridges between the same logical network segments seemed a bit network-loop-ish to me, so I ran through a few scenarios to make sure there were no such gremlins lurking. For example:

  1. A broadcast packet is sent from the barn over the VXLAN tunnel for VLAN 1
  2. The packet is received and decapsulated by the VTEP on OPNsense 1
  3. The packet exits OPNsense 1 after being forwarded by bridge br1 to interface vlan01
  4. The packet is broadcast by the switch, and enters OPNsense 2 on interface vlan01
  5. The packet is forwarded by bridge br1 to the VTEP, and is encapsulated
  6. The VTEP is unable to send the encapsulated packet, as OPNsense 2 does not have control of the virtual IP the VTEP is bound to

Everything seemed alright, and the network ran fine for months with this setup. Plus, even if there were any network loops, that’s what STP is for, right?

Everything was not, in fact, alright.

One day, the network went down. Hard. Devices on the network seemed to lose not only internet access, but access to other devices on the network as well. Restarting the primary router fixed the network, so hey, it was probably just a wayward solar flare, right?

Nope. The network went down again, and again, seemingly without rhyme or reason. While devices on the network could not access the internet, monitoring from the internet side would never show any problems; services that I had hosted at home would continue to be accessible throughout the outage. And of course, at this point I still had no clue what was causing the outage, and therefore not even an educated guess as to how to reproduce it.

When I have Wireshark open, you know I’m having an excellent day.

Eventually, I got Wireshark capturing during one of the outages. The problem was immediately apparent, scrolling by at a thousand packets per second: a storm of mDNS query responses for the hostname of a new Mac. Great, that makes sense: the Mac was a recent addition to the network, which lines up with when the outages started happening. If the outages were caused by the Mac, taking it off the network must solve the problem, right? Nope. While turning on the Mac had a 99% probability of triggering an outage due to the storm of mDNS responses, turning the Mac off would never stop the storm.

Looking a bit closer at the storm, each packet originated from one of 2 MAC addresses. Since there were 2, I suspected that they belonged to the routers. Sure enough, inspection of the routers showed that those were the MAC addresses for bridge br1 on each one.

Bridges are Layer 2 constructs; when they forward packets, Layer 2 details (e.g. source/destination MAC address) should not change. The fact that these packets had source MAC addresses belonging to the routers means that these mDNS response packets were being generated by the routers themselves, not just being forwarded through.

Why is my router impersonating a client?

My first guess was mDNS repeater functionality. If the mDNS repeater was running on both OPNsense machines, then maybe it was bouncing a response betweeen VLANs (i.e. OPNsense 1 picks up an mDNS response on VLAN 1 and sends it on VLAN 2, where it gets picked up by OPNsense 2 and sent back on VLAN 1). But, I was only seeing the packet storm on VLAN 1; if it was mDNS repeater, I would’ve expected to see the same storm on VLAN 2. Additionally, disabling mDNS repeater did not stop the outages from happening.

I could not think of anything else that would cause OPNsense to generate a packet storm like this. In any case, I figured I needed to take a closer look at the start of a storm to make sense of the whole thing.

Wireshark running? Check. Mac ready to be turned on? Check. Lights, camera, action!

As it came online, the Mac sent a whole bunch of mDNS queries asking about services on its own hostname. It then immediately replied to those queries.1 The packet storm trigger? One of these replies was too big for a single IPv6 packet, and had to be fragmented.

So, what’s wrong with a fragmented packet?

The core issue is a quirk in how the way pf (the OPNsense/BSD packet filter) handles packet fragmentation.

pf is a Layer 3 firewall. For it to filter packets properly, it needs to see the whole packet, which means it will reassemble any fragmented packets it encounters. However, when a fragmented packet is reassembled, all the original Layer 2 headers are thrown away. The firewall now doesn’t know anything (nor does it care) about the source or destination MAC addresses.

This is not a problem in most cases, since routers like OPNsense usually sit between different Layer 2 network segments. Once an incoming packet goes through the firewall, it is sent on the next network segment with the source MAC address set to that of the router and the destination MAC address set to that of the next hop. The data from the original Layer 2 header is irrelevant on the new network segment.

In this case, though, we’re dealing with a bridge; both sides are on the same Layer 2 network segment. All that usually irrelevant Layer 2 header data is now very relevant; treating the packet like one that needs to be routed at Layer 3 leads to some very interesting potential modes of failure.

Since I have two routers on the same Layer 2 network segment with the same quirk, the following set of events would happen (in a loop) every time the Mac sent out a fragmented mDNS response packet:

  1. OPNsense 1 receives the fragmented packet on bridge br1.
  2. pf on OPNsense 1 reassembles the packet as part of its inspection, kicking it onto the Layer 3 processing path.
  3. OPNsense 1 “routes” the packet back out onto the same network, re-fragmenting it in the process. The source MAC address is set to that of bridge br1.
  4. Events 1 through 3 happen on OPNsense 2.

This quirk turned a single connection between 2 bridges into a de facto network loop.2

Let’s step away from the firewall, then.

Since the quirk causing the issue was primarily a BSD thing, I tried moving the entire highly available VXLAN endpoint architecture onto a set of Linux machines instead, using VRRP to share a virtual IP address. Since the Linux machines did not have to do any packet filtering, there would be no packet reassembly required, and therefore no cause for issue.

While this approach did fix the packet storms causing the network outages, it did introduce another problem: none of the IP cameras at the barn were accessible from the house. This was due to one or more of the following:

  • VXLAN tunnels have some overhead. The MTU of a tunnel is slightly smaller than that of its transport.
  • There is no way to configure the IP cameras to use a smaller MTU, and no way to increase the MTU of the Wi-Fi link to allow a larger MTU inside the tunnel.
  • Layer 2 does not perform fragmentation. Instead, packets larger than the MTU are silently dropped.
  • Fragmentation at Layer 3 is only performed by routers.
  • With the new setup, the tunnel endpoint was no longer on the same machine as the router.

It was at this point that I decided VXLAN was not the way.

What now?

In my search to find something that I could use to extend multiple VLANs across a Wi-Fi link without reducing MTU, I found B.A.T.M.A.N. advanced (batman-adv). Commonly used as part of mesh Wi-Fi solutions, batman-adv is a transport-agnostic Layer 2 mesh solution.

It encapsulates and forwards all traffic until it reaches the destination, hence emulating a virtual network switch of all nodes participating. Therefore all nodes appear to be link local and are unaware of the network’s topology as well as unaffected by any network changes.

The easiest way to think about a batman-adv mesh network is to imagine it as a distributed switch. Each system (often called “node”) running batman-adv is equal to a switch port.

The best part is, batman-adv understands that the underlying transport may not always have a large enough MTU to be able to transmit full-sized Layer 2 packets after encapsulation, so it implements transparent Layer 2 fragmentation.

Wait a minute… mesh?

At the house, I use UniFi access points, which support wireless meshing. The firmware on UniFi APs is based on OpenWRT; so is the firmware on the EnStation5-AC at the barn. Finding batman-adv got me thinking… can I configure the EnStation5-AC to play nice with UniFi’s wireless meshing solution? This would quite literally be exactly what I need: a wireless link that can push multiple networks to a remote access point.

UniFi allows root access to their access points if you enable SSH in the controller, and it’s not hard to get root access to most EnGenius access points (including the EnStation5-AC). This made poking around and figuring out how configuration changes in the UI affected the configuration of the underlying OpenWRT system.

The solution

Once wireless uplink (a.k.a. Mesh Parent) is enabled on the UniFi access point, /etc/hostapd/vwire*.cfg contains the details of the “mesh3” network. Turns out, all that is necessary after that is to configure the EnStation5-AC to operate in “WDS Station” mode and fill in the SSID and passphrase from the config file on the UniFi access point.

Not quite as simple as fiber, but definitely much closer.

Figure 3. WDS-based network architecture

Some final thoughts

So far, the new setup has worked perfectly for both tagged VLANs. However, untagged traffic does not pass properly, and the whole thing stops working if the EnStation5-AC is configured with a management VLAN other than the default. I suspect this has something to do with the way the bridge on the EnStation5-AC gets configured, but I somehow also managed to break SSH access in the process of debugging it. Oh well, at some point I’ll have the time to factory reset it and try again… ∎

Footnotes

  1. Why it does this, I have no idea. Seems conceptually similar to Gratuitous ARP. ↩︎
  2. A “loop” that STP had no chance of solving, I might add… ↩︎
  3. It’s not actually a mesh (no 802.11s), it’s just WDS. ↩︎

Enable Soft Keys on the Samsung Galaxy SII International (GT-I9100) (CyanogenMod 11)

Install “ES File Explorer” from the Play Store, and then open it.

Swipe in from the left edge to open the menu, and then click “Tools”.
2 ES Side Menu Tools
Scroll down, and enable “Root Explorer”.
3 ES Side Menu Tools Root Explorer On

A dialog might pop up asking you to grant superuser access. Click “Allow”.
4 ES Superuser
Enter your PIN if necessary.
5 ES Superuser PIN

Navigate to /system/. Click on build.prop.
6 ES system
Open it with “ES Note Editor”.
7 ES system build.prop Select
Click the 3 dots, then click “Edit”.
8 ES system build.prop Edit
Add the line qemu.hw.mainkeys=0 to the end of the file.
9 ES system build.prop qemu.hw.mainkeys 0
Press the back button, then when it asks you to save, click “Yes”.
10 ES system build.prop Save

Navigate to /system/usr/keylayout/. We will need to edit the following files:

Generic.kl
gpio-keys.kl
melfas-touchkey.kl
qwerty.kl
sec_key.kl
sec_touchkey.kl

11 ES system usr keylayout

Click on Generic.kl. In the “Open As” dialog, click “Text”.
12 ES system usr keylayout Generic.kl Open As
Open it with “ES Note Editor” as before. Change to edit mode, then comment out the following lines by putting a “#” at the beginning of the line:

key 102 MOVE_HOME
key 139 MENU WAKE_DROPPED
key 158 BACK WAKE_DROPPED

Save the file.

Follow the same steps for the rest of the files, but comment out the following lines instead:

For gpio-keys.kl, comment out the following line:

key 102 HOME WAKE

For melfas-touchkey.kl, comment out the following lines:

key 158 BACK VIRTUAL
key 139 MENU VIRTUAL

For qwerty.kl, comment out the following lines:

key 158 BACK WAKE_DROPPED
key 139 MENU WAKE_DROPPED
key 102 HOME WAKE

For sec_key.kl, comment out the following line:

key 102 HOME WAKE

For sec_touchkey.kl, comment out the following lines:

key 158 BACK VIRTUAL
key 139 MENU VIRTUAL

Exit ES File Explorer.

Open the Settings app. Under “Device”, click “Buttons”
13 Settings Devices
Click “Backlight”, and uncheck “Illuminate Buttons”.
14 Settings Devices Backlight

Reboot, and enjoy your shiny new soft keys!
NOTE: You will have to repeat these steps (with the exception of the backlight setting) every time you do a system update.

Install Pyrit with CAL++ support on Ubuntu 14.04 / Linux Mint 17.1

First, open a root shell. Enter your password when necessary:

sudo -i

Install the required dependencies:

apt-get install python-dev libssl-dev libpcap-dev zlib1g-dev cmake libboost1.54

Create a working directory to store all files:

mkdir -p /temp/pyrit
cd /temp/pyrit

Get the Pyrit source code:

svn checkout http://pyrit.googlecode.com/svn/trunk/ /temp/pyrit/svn

Build and install Pyrit:

cd /temp/pyrit/svn/pyrit
python setup.py build
python setup.py install --record installed-files.txt

Go to http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/#appsdkdownloads, and download the AMD APP SDK. (This guide was tested with v2.9.1 on 64bit)
Move the downloaded file into the directory /temp/pyrit, then execute the following to install the AMD APP SDK:

cd /temp/pyrit
bunzip2 AMD-APP-SDK*.tar.bz2
tar xvf AMD-APP-SDK*.tar
./AMD-APP-SDK*.sh

Get the CAL++ source code:

svn co https://svn.code.sf.net/p/calpp/code/trunk/ /temp/pyrit/calpp

Make a necessary modification to the CAL++ source code:

cd /temp/pyrit/calpp
sed -i.bak 's/ATISTREAMSDKROOT/AMDAPPSDKROOT/g' CMakeLists.txt

Build and install CAL++:

cmake .
make
make install

Make a necessary modification to the Pyrit CAL++ extension source code:

cd /temp/pyrit/svn/cpyrit_calpp
sed -i.bak -e 's/ATISTREAMSDKROOT/AMDAPPSDKROOT/g' -e "s/'include'/'include\/CAL'/" -e 's/0.4.0-dev/0.4.1-dev/' setup.py

Build and install the Pyrit CAL++ extension:

python setup.py build
export AMDAPPSDKROOT=/opt/AMDAPPSDK*
python setup.py install --record installed-files.txt

Clean up:

cat $HOME/pyrit/svn/pyrit/installed-files.txt $HOME/pyrit/svn/cpyrit_calpp/installed-files.txt $HOME/pyrit-installed-files.txt
rm -rvf $HOME/pyrit

Reboot to finish the installation.

To uninstall Pyrit and the Pyrit CAL++ extension:

cat $HOME/pyrit-installed-files.txt | xargs rm -rvf

How To Build Your Own Computer, Part 1

Have you ever wanted to build your own computer? Building your own computer has several advantages. You get to choose which parts go in your PC, and you can customize almost every aspect of it.

The first step to building a custom PC is choosing the parts. PCPartPicker is a very useful tool to choose the parts for your PC. It automatically checks compatibility between the parts that you choose. I used the following parts:

You can see the part list at http://pcpartpicker.com/user/abrakev/saved/. Note: The hard disks aren’t included, because I couldn’t find them in the PCPartPicker database.