Friday, January 18, 2019

dynv6.com: IPv6 dynamic DNS done right

Sometimes, your home PC or router does not have a static IP address. In this case, if you want to access your home network remotely, a common solution is to use a dynamic DNS provider, and configure your router to update its A record (a DNS record that holds an IPv4 address) each time the external address changes. Then you can be sure that your domain name always points to the current external IPv4 address of the router.

For accessing PCs behind the router, some trickery is needed, because usually there is only one public IPv4 address available for the whole network. Therefore, the router performs network address translation (NAT), and home PCs are not directly reachable. So, you have to either use port forwarding, or a full-blown VPN. Still, you need only one dynamic DNS record, because there is only one dynamic IP — the external IP of your router, and that's where you connect to.

Enter the IPv6 world. There is no NAT anymore, the ISP allocates a whole /64 (or maybe larger) prefix for your home network(s), and every home PC becomes reachable using its individual IPv6 address. Except, now all addresses are dynamic. On "Rostelecom" ISP in Yekaterinburg, Russia, they are dynamic even if you order a static IP address, i.e. only IPv4 is static then, and there is no way to get a statically allocated IPv6 network.

A typical IPv6 network has a prefix length of 64. It means that the first 64 bits denote the network, and are assigned (dynamically) by the ISP, while the lower 64 bits refer to the host and do not change when the ISP assigns the new prefix. Often but not always, the host part is just a MAC address with the second-lowest bit in the first octet inverted, and ff:fe inserted into the middle. This mechanism is often called EUI-64. For privacy reasons, typically there are also other short-lived IPv6 addresses on the interface, but let's ignore them.

Unfortunately, many dynamic DNS providers have implemented their IPv6 support equivalently to IPv4, even though it does not really make sense. That is, a dynamic DNS client can update its own AAAA record using some web API call, and that's it. If you run a dynamic DNS client on the router, then only the router's DNS record is updated, and there is still no way to access home PCs individually, short of running DynDNS clients on all of them. In other words, the fact that the addresses in the LAN are, in fact, related, and should be updated as a group, is usually completely ignored.

The dynv6.com dynamic DNS provider is a pleasant exception. After registration, you get a third-level domain corresponding to your home network. You can also add records to that domain, corresponding to each of your home PCs. And while doing so, you can either specify the full IPv6 address (as you can do with the traditional dynamic DNS providers), or only the host part, or the MAC address. The ability to specify only the host part (or infer it from the MAC address) is what makes their service useful. Indeed, if the parent record (corresponding to your whole network) changes, then its network part is reapplied to all host records that don't specify the network part of their IPv6 address explicitly. So, you can run only one dynamic DNS client on the router, and get domain names corresponding to all of your home PCs.

Let me illustrate this with an example.

Suppose that your router has obtained the following addresses from the ISP:

2001:db8:5:65bc:d0a2:1545:fbfe:d0b9/64 for the WAN interface
2001:db8:b:9a00::/56 as a delegated prefix

Then, it will (normally) use 2001:db8:b:9a00::1/64 as its LAN address, and PCs will get addresses from the 2001:db8:b:9a00::/64 network. You need to configure the router to update the AAAA record (let's use example.dynv6.net) with its LAN IPv6 address. Yes, LAN (and many router firmwares, including OpenWRT, get it wrong by default), because the WAN IPv6 address is completely unrelated to your home network. Then, using the web, create some additional AAAA records under the example.dynv6.net address:

desktop AAAA ::0206:29ff:fe6c:f3e5   # corresponds to MAC address 00:06:29:6c:f3:e5
qemu AAAA ::5054:00ff:fe12:3456   # corresponds to MAC address 52:54:00:12:34:56

Or, you could enter MAC addresses directly.

As I have already mentioned, the beauty of dynv6.com is that it does not interpret these addresses literally, but prepends the proper network part. That is, name resolution would actually yield reachable addresses:

example.dynv6.net. AAAA 2001:db8:b:9a00::1   # The only record that the router has to update
desktop.example.dynv6.net. AAAA 2001:db8:b:9a00:206:29ff:fe6c:f3e5   # Generated
qemu.example.dynv6.net. AAAA 2001:db8:b:9a00:5054:ff:fe12:3456    # Also generated

And you can, finally, connect remotely to any of those devices.

Sunday, January 6, 2019

Resizing Linux virtual machine disks

Sometimes, one runs out of disk space on a virtual machine, and realizes that it was a mistake to provide such a small disk to it in the first place. Fortunately, unlike real disks, the virtual ones can be resized at will. A handy command for this task comes with QEMU (and, if you are on Linux, why are you using anything else?). Here is how to extend a raw disk image to 10 GB:

qemu-img resize -f raw vda.img 10G

After running this command, the beginning of the disk will contain the old bytes that were there before, and at the end there will be a long run of zeroes. qemu-img is smart enough to avoid actually writing these zeros to the disk image, it creates a sparse file instead.

Resizing the disk image is only one-third of the job. The partition table still lists partitions of the old sizes, and the end of the disk is unused. Traditionally, fdisk has been the tool for altering the partition table. You can run it either from within your virtual machine, or directly on the disk image. All that is needed is to delete the last partition, and then recreate it with the same start sector, but with the correct size, so that it also covers the new part of the disk. Here is an example session with a simple MBR-based disk with two partitions:

# fdisk /dev/vda

Welcome to fdisk (util-linux 2.31.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): p
Disk /dev/vda: 10 GiB, 10737418240 bytes, 20971520 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xc02e3411

Device     Boot  Start      End  Sectors  Size Id Type
/dev/vda1  *      2048   997375   995328  486M 83 Linux
/dev/vda2       997376 12580863 11583488  5.5G 83 Linux

Command (m for help): d
Partition number (1,2, default 2): 2

Partition 2 has been deleted.

Command (m for help): n
Partition type
   p   primary (1 primary, 0 extended, 3 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (2-4, default 2): 2
First sector (997376-20971519, default 997376): 
Last sector, +sectors or +size{K,M,G,T,P} (997376-20971519, default 20971519): 

Created a new partition 2 of type 'Linux' and of size 9.5 GiB.
Partition #2 contains a ext4 signature.

Do you want to remove the signature? [Y]es/[N]o: n

Command (m for help): p

Disk /dev/vda: 10 GiB, 10737418240 bytes, 20971520 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xc02e3411

Device     Boot  Start      End  Sectors  Size Id Type
/dev/vda1  *      2048   997375   995328  486M 83 Linux
/dev/vda2       997376 20971519 19974144  9.5G 83 Linux

Command (m for help): w
The partition table has been altered.
Syncing disks.

As you see, it went smoothly. The kernel will pick up the new partition table after a reboot, and then you will be able to resize the filesystem with resize2fs (or some other tool if you are not using ext4).

Things are not so simple if the virtual disk is partitioned with GPT, not MBR, to begin with. The complication stems from the fact that there is a backup copy of GPT at the end of the disk. When we added zeros to the end of the disk, the backup copy ended up in the middle of the disk, not to be found. Also, the protective MBR now covers only the first part of the disk. The kernel is able to deal with this, but some versions of fdisk (at least fdisk found in Ubuntu 18.04) cannot. What happens is that fdisk is not able to create partitions that extend beyond the end of the old disk. And saving anything (in fact, even saving what already exists) fails with a rather unhelpful error message:

# fdisk /dev/vda

Welcome to fdisk (util-linux 2.31.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

GPT PMBR size mismatch (12582911 != 20971519) will be corrected by w(rite).
GPT PMBR size mismatch (12582911 != 20971519) will be corrected by w(rite).

Command (m for help): w
GPT PMBR size mismatch (12582911 != 20971519) will be corrected by w(rite).
fdisk: failed to write disklabel: Invalid argument

Modern versions of fdisk do not have this problem:
# fdisk /dev/vda

Welcome to fdisk (util-linux 2.33).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

GPT PMBR size mismatch (12582911 != 20971519) will be corrected by write.
The backup GPT table is not on the end of the device. This problem will be corrected by write.


Still, with GPT and not-so-recent versions of fdisk, it looks like we cannot use fdisk to take advantage of the newly added disk space. There is another tool, gdisk, that can manipulate GPT structures. However, it claims that there is almost no usable free space on the disk, and thus refuses to usefully resize the last partition by default.

# gdisk /dev/vda
GPT fdisk (gdisk) version 1.0.3

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.

Command (? for help): p
Disk /dev/vda: 20971520 sectors, 10.0 GiB
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): 61E744EF-1CD3-5145-BC59-4646E6CB03DE
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 2048, last usable sector is 12582878
Partitions will be aligned on 2048-sector boundaries
Total free space is 2015 sectors (1007.5 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048            4095   1024.0 KiB  EF02  
   2            4096          999423   486.0 MiB   8300  
   3          999424        12580863   5.5 GiB     8300  


What we need to do is to use the "expert" functionality in order to move the backup GPT to the end of the disk. After that, new free space will be available, and we will be able to resize the last partition.

Command (? for help): x

Expert command (? for help): e
Relocating backup data structures to the end of the disk

Expert command (? for help): m

Command (? for help): d
Partition number (1-3): 3

Command (? for help): n
Partition number (3-128, default 3): 
First sector (999424-20969472, default = 999424) or {+-}size{KMGTP}: 
Last sector (999424-20969472, default = 20969472) or {+-}size{KMGTP}: 
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300): 
Changed type of partition to 'Linux filesystem'

Command (? for help): p
Disk /dev/vda: 20971520 sectors, 10.0 GiB
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): 61E744EF-1CD3-5145-BC59-4646E6CB03DE
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 2048, last usable sector is 20969472
Partitions will be aligned on 2048-sector boundaries
Total free space is 0 sectors (0 bytes)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048            4095   1024.0 KiB  EF02  
   2            4096          999423   486.0 MiB   8300  
   3          999424        20969472   9.5 GiB     8300  Linux filesystem

Command (? for help): w

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/vda.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.


OK, it worked, but was a bit too complicated, to the degree sufficient to consider sticking with MBR where possible. And it changed the UUID of the last partition, which may or may not be OK in your setup (it is definitely not OK if /etc/fstab or the kernel command line mentions PARTUUID). The same warning about PARTUUID applies to modern versions of fdisk, too.

Anyway, it turns out that gdisk is not the simplest solution to the problem of resizing a GPT-based disk. The "sfdisk" program that comes with util-linux (i.e. with the same package that provides fdisk, even with not-so-recent versions) works just as well. We need to dump the existing partitions, edit the resulting script, and feed it back to sfdisk so that it recreates these partitions for us from scratch, with the correct sizes, and we can preserve all partition UUIDs, too.

Here is what this dump looks like:

# sfdisk --dump /dev/vda > disk.dump
GPT PMBR size mismatch (12582911 != 20971519) will be corrected by w(rite).
# cat disk.dump
label: gpt
label-id: 61E744EF-1CD3-5145-BC59-4646E6CB03DE
device: /dev/vda
unit: sectors
first-lba: 2048
last-lba: 12582878

/dev/vda1 : start=        2048, size=        2048, type=[...], uuid=[...]
/dev/vda2 : start=        4096, size=      995328, type=[...], uuid=[...]
/dev/vda3 : start=      999424, size=    11581440, type=[...], uuid=[...]

We need to fix the "last-lba" parameter and change the size of the last partition. Fortunately, sfdisk has reasonable defaults (use as much space as possible) for both parameters, so we can just delete them instead. Quite easy to do with sed:

# sed -i -e '/^last-lba:/d' -e '$s/size=[^,]*,//' disk.dump
# cat disk.dump
label: gpt
label-id: 61E744EF-1CD3-5145-BC59-4646E6CB03DE
device: /dev/vda
unit: sectors
first-lba: 2048

/dev/vda1 : start=        2048, size=        2048, type=[...], uuid=[...]
/dev/vda2 : start=        4096, size=      995328, type=[...], uuid=[...]
/dev/vda3 : start=      999424,  type=[...], uuid=[...]

Then, with some flags to turn off various checks, sfdisk loads the modified partition table dump:

# sfdisk --no-reread --no-tell-kernel -f --wipe never /dev/vda < disk.dump
GPT PMBR size mismatch (12582911 != 20971519) will be corrected by w(rite).
Disk /dev/vda: 10 GiB, 10737418240 bytes, 20971520 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 61E744EF-1CD3-5145-BC59-4646E6CB03DE

Old situation:

Device      Start      End  Sectors  Size Type
/dev/vda1    2048     4095     2048    1M BIOS boot
/dev/vda2    4096   999423   995328  486M Linux filesystem
/dev/vda3  999424 12580863 11581440  5.5G Linux filesystem

>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Created a new GPT disklabel (GUID: 61E744EF-1CD3-5145-BC59-4646E6CB03DE).
/dev/vda1: Created a new partition 1 of type 'BIOS boot' and of size 1 MiB.
/dev/vda2: Created a new partition 2 of type 'Linux filesystem' and of size 486 MiB.
Partition #2 contains a ext4 signature.
/dev/vda3: Created a new partition 3 of type 'Linux filesystem' and of size 9.5 GiB.
Partition #3 contains a ext4 signature.
/dev/vda4: Done.

New situation:
Disklabel type: gpt
Disk identifier: 61E744EF-1CD3-5145-BC59-4646E6CB03DE

Device      Start      End  Sectors  Size Type
/dev/vda1    2048     4095     2048    1M BIOS boot
/dev/vda2    4096   999423   995328  486M Linux filesystem
/dev/vda3  999424 20971486 19972063  9.5G Linux filesystem

The partition table has been altered.


Let me repeat. The following lines, ready to be copy-pasted without much thinking except for the disk name, resize the last partition to occupy as much disk space as possible, and they work both on GPT and MBR:

DISK=/dev/vda
sfdisk --dump $DISK > disk.dump
sed -i -e '/^last-lba:/d' -e '$s/size=[^,]*,//' disk.dump
sfdisk --no-reread --no-tell-kernel -f --wipe never $DISK < disk.dump

On Ubuntu, there is also a "cloud-guest-utils" package that provides an even-easier "growpart" command, here is how it works:
# growpart /dev/vda 3
CHANGED: partition=3 start=999424 old: size=11581440 end=12580864 new: size=19972063,end=20971487

Just as with MBR, after resizing the partition, you have to reboot your virtual machine, so that the kernel picks up the new partition size, and then you can resize the filesystem to match the partition size.

Thursday, January 3, 2019

Using Let's Encrypt certificates with GeoDNS

Let's Encrypt is a popular free TLS certificate authority. It currently issues certificates valid for only 90 days, and thus it is a good idea to automate their renewal. Fortunately, there are many tools to do so, including the official client called Certbot.

When Certbot or any other client asks Let's Encrypt for a certificate, it must prove that it indeed controls the domain names that are to be listed in the certificate. There are several ways to obtain such proof, by solving one of the possible challenges. HTTP-01 challenge requires the client to make a plain-text file with a given name and content available under the domain in question via HTTP, on port 80. DNS-01 challenge requires publishing a specific TXT record in DNS. There are other, less popular, kinds of challenges. HTTP-01 is the challenge which is the simplest to use in a situation where you have only one server that needs to have a non-wildcard TLS certificate for a given domain name (or several domain names).

Sometimes, however, you need to have a certificate for a given domain name available on more than one server. Such need arises e.g. if you use GeoDNS or DNS-based load balancing, i.e. answer DNS requests for your domain name (e.g., www.example.com) differently for different clients. E.g., you may want to have three servers, one in France, one in Singapore, and one in USA, and respond based on the client's IP address by returning the IP address of the geographically closest server. However, this presents a problem when trying to obtain a Let's Encrypt certificate. E.g., the HTTP-01 challenge fails out of the box because Let's Encrypt will likely connect to a different node than the one asking for the certificate, and will not find the text file that it looks for.

A traditional solution to this problem would be to set up a central server, let it respond to the challenges, and copy the certificates from it periodically to all the nodes.

Making the central server solve DNS-01 challenges is trivial — all that is needed is an automated way to change DNS records in your zone, and scripts are available for many popular DNS providers. I am not really comfortable with this approach, because if an intruder gets access to your central server, they can not only get a certificate and a private key for www.example.com, but also take over the whole domain, i.e. point the DNS records (including non-www) to their own server. This security concern can be alleviated by the use of CNAMEs that point _acme-challenge to a separate DNS zone with separate permissions, but doing so breaks all Let's Encrypt clients known to me. Some links: two bug reports for the official client, and my own GitHub gist for a modified Dehydrated hook script for Amazon Route 53.

For HTTP-01, the setup is different: you need to make the central server available over HTTP on a separate domain name (e.g. auth.example.com), and configure all the nodes to issue redirects when Let's Encrypt tries to verify the challenge. E.g., http://www.example.com/.well-known/acme-challenge/anything must redirect to http://auth.example.com/.well-known/acme-challenge/anything, and then Certbot running on auth.example.com will be able to obtain certificates for www.example.com without the security risk inherent for DNS-01 challenges. Proxying the requests, instead of redirecting them, also works.

Scripting the process of certificate distribution back to cluster nodes, handling network errors, reloading Apache (while avoiding needless restarts) and monitoring the result is another matter.

So, I asked myself a question: would it be possible to simplify this setup, if there are only a few nodes in the cluster? In particular, avoid the need to copy files from server to server, and to get rid of the central server altogether. And ideally get rid of any fragile custom scripts. It turns out that, with a bit of Apache magic, you can do that. No custom scripts are needed, no ssh keys for unattended distribution of files, no central server, just some simple rewrite rules.

Each of the servers will run Certbot and request certificates independently. The idea is to have a server ask someone else when it doesn't know the answer to the HTTP-01 challenge.

To do so, we need to enable mod_rewrite, mod_proxy, and mod_proxy_http on each server. Also, I assume that you already have some separate domain names (not for the general public) pointing to each of the cluster nodes, just for the purpose of solving the challenges. E.g., www-fr.example.com, www-sg.example.com, and www-us.example.com.

So, here is the definition of the Apache virtual host that responds to unencrypted HTTP requests. The same configuration file works for all cluster nodes.

<VirtualHost *:80>
    ServerName www.example.com
    ServerAlias example.com
    ServerAlias www-fr.example.com
    ServerAlias www-sg.example.com
    ServerAlias www-us.example.com

    ProxyPreserveHost On
    RewriteEngine On

    # First block of rules - solving known challenges.
    RewriteCond /var/www/letsencrypt/.well-known/acme-challenge/$2 -f
    RewriteRule ^/\.well-known/acme-challenge(|-fr|-sg|-us)/(.*) \
        /var/www/letsencrypt/.well-known/acme-challenge/$2 [L]

    # Second block of rules - passing unknown challenges further.
    # Due to RewriteCond in the first block, we already know at this
    # point that the file does not exist locally.
    RewriteRule ^/\.well-known/acme-challenge/(.*) \
        http://www-fr.example.com/.well-known/acme-challenge-fr/$1 [P,L]
    RewriteRule ^/\.well-known/acme-challenge-fr/(.*) \
        http://www-sg.example.com/.well-known/acme-challenge-sg/$1 [P,L]
    RewriteRule ^/\.well-known/acme-challenge-sg/(.*) \
        http://www-us.example.com/.well-known/acme-challenge-us/$1 [P,L]
    RewriteRule ^/\.well-known/acme-challenge-us/(.*) - [R=404]

    # HTTP to HTTPS redirection for everything not matched above
    RewriteRule /?(.*) https://www.example.com/$1 [R=301,L]
</VirtualHost>

For a complete example, add a virtual host for port 443 that serves your web application on https://www.example.com.
<VirtualHost *:443>
    ServerName www.example.com
    ServerAlias example.com

    # You may want to have a separate virtual host or a RewriteRule
    # for redirecting browsers who visit https://example.com or any
    # other unwanted domain name to https://www.example.com.
    # E.g.:

    RewriteEngine On
    RewriteCond %{HTTP_HOST} !=www.example.com [NC]
    RewriteRule /?(.*) https://www.example.com/$1 [R=301,L]

    # Configure Apache to serve your content
    DocumentRoot /var/www/example

    SSLEngine on
    Include /etc/letsencrypt/options-ssl-apache.conf

    # Use any temporary certificate here, even a self-signed one works.
    # This piece of configuration will be replaced by Certbot.
    SSLCertificateFile /etc/ssl/certs/ssl-cert-snakeoil.pem
    SSLCertificateKeyFile /etc/ssl/private/ssl-cert-snakeoil.key
</VirtualHost>

Run Certbot like this, on all servers:
mkdir -p /var/www/letsencrypt/.well-known/acme-challenge
certbot -d example.com -d www.example.com -w /var/www/letsencrypt \
    --noninteractive --authenticator webroot --installer apache

Any other Let's Encrypt client than works by placing files into a directory will also be good enough. Apache's mod_md will not work, though, because it deliberately blocks all requests for unknown challenge files, which is contrary to what we need.

Let's see how it works.

Certbot asks Let's Encrypt for a certificate. Let's Encrypt tells Certbot the file name that it will try to fetch, and the expected contents. Certbot places this file under /var/www/letsencrypt/.well-known/acme-challenge and tells Let's Encrypt that they can verify that it is there. Let's Encrypt resolves www.example.com (and example.com, but let's forget about it) in the DNS, and then asks for this file under http://www.example.com/.well-known/acme-challenge.

If their verifier is lucky enough to hit the same server that asked for the certificate, the RewriteCond for the first RewriteRule will be true (it just tests the file existence), and, due to this rule, Apache will serve the file. Note that the rule responds not only to acme-challenge URLs, but also to acme-challenge-fr, acme-challenge-sg, and acme-challenge-us URLs used internally by other servers.

If the verifier is unlucky, then the challenge file will not be found, and the second block of RewriteRule directives will come into play. Let's say that it was the Singapore server that requested the certificate (and thus can respond), but Let's Encrypt has contacted the server in USA.

For the request sent by Let's Encrypt verifier, we can see that only the first rule in the second block will match. It will (conditionally on the file not being found locally, as tested by the first block) proxy the request from the server in USA to the French server, and use the "acme-challenge-fr" directory in the URL to record the fact that it is not the original request. The French server will not find the file either, so will skip the first block, and apply the second rule in the second block of RewriteRules (because it sees "acme-challenge-fr" in the URL). Thus, the request will be proxied again, this time to the Singapore server, and with "acme-challenge-sg" in the URL. As it was the Singapore server who requested the certificate, it will find the file and respond with its contents, and through the French and US servers, Let's Encrypt verifier will get the response and issue the certificate.

The last RewriteRule in the second block terminates the chain for stray requests not originating from Let's Encrypt. Such requests get proxied three times and finally get a 404.

The proposed scheme is, in theory, extensible to any number of servers — all that is needed is that they are all online, and the chain of proxying the request through all of them is not too slow. But, there is a limit on Let's Encrypt side on the number of duplicate certificates, 5 per week. I would guess (but have not verified) that, in practice, due to both factors, it means at most 5 servers in the cluster would be safe — which is still good enough for some purposes.

Friday, December 28, 2018

LDAP with STARTTLS considered harmful

A few weeks ago, I hit a security issue on the company's LDAP server. Namely, it was not protected well-enough against misconfigured clients who send passwords in cleartext.

There are two mechanisms defined in LDAP that protect passwords in transit:

  1. SASL binds
  2. SSL/TLS
SASL is an extensible framework that allows arbitrary authentication mechanisms. However, all of the widely-implemented ones are either not based on passwords at all (so not suitable for our use case), or send the password in cleartext (so not better than a simple non-SASL bind), or require the server to store the password in cleartext for verification (even worse). In addition to this non-suitability for our purposes, web applications usually do not support LDAP with SASL binding.

SSL/TLS, on the other hand, is a widely-supported industry standard for encrypting and authenticating the data (including passwords) in transit.

OpenLDAP assigns so-called Security Strength Factor to each authentication mechanism, based on how well it protects authentication data on the network. For SSL, it is usually (but not always) the number of bits in the symmetric cipher key used during the session.

For LDAP, the "normal" way of implementing SSL is to support the "STARTTLS" request on port 389 (the same port as for unencrypted LDAP sessions). There is also a not-really-standard way, with TLS right from the start of the connection (i.e. no STARTTLS request), on port 636. This is called "ldaps".

OpenLDAP can require a certain minimum Security Strength Factor for authentication attempts. In slapd.conf, it is set like this: "security ssf=128". There are also related configuration directives, TLSProtocolMin, which sets the minimum SSL/TLS protocol version, and localSSF, which is the Security Strength Factor assumed on local unix-socket connections (ldapi:///).

So, we configure certificates, set an appropriate security strength factor, disable anonymous bind, and that's it? No. This still doesn't prevent the password leak.

Suppose that someone configures Apache like this:

    <Location />
        AuthType basic
        AuthName "example.com users only"
        AuthBasicProvider ldap
        AuthLDAPInitialBindAsUser on
        AuthLDAPInitialBindPattern (.+) uid=$1,ou=users,dc=example,dc=com
        AuthLDAPURL "ldap://ldap.example.com/ou=users,dc=example,dc=com?uid?sub?"
        Require valid-user
    </Location>

See the mistake? They forgot the client to use STARTTLS (i.e., forgot to add the word "TLS" as the last parameter to AuthLDAPURL).

Let's look what happens if a user tries to log in. Apache will connect to the LDAP server on port 389, successfully. Then, it will create a LDAP request for a simple bind, using the user's username and password, and send it. And it will be sent successfully, in cleartext over the network. Of course, OpenLDAP will carefully receive the request, parse it, and then refuse to authenticate the user, but it's too late. The password has been already sent in cleartext, and somebody between the servers has already captured it with the government-mandated tcpdump equivalent.

This would not have happened if the LDAP server were listening on port 636 (SSL) only. In this case, requests to port 389 will get an RST before Apache gets a chance to send the password. And requests which use the ldaps:// scheme are always encrypted. An additional benefit is that PHP-based software that is not specifically coded to use STARTTLS for LDAP (i.e. does not contain ldap_start_tls() function call) will continue to work when given the ldaps:// URL, and I don't have to audit it for this specific issue. Isn't that wonderful? So, please (after reconfiguring all existing clients) make sure that your LDAP server does not listen on port 389, and listens securely on port 636, instead.

Friday, October 19, 2018

I received a fake job alert

Today I received a job alert from a Russian job site, https://moikrug.ru/, which I do use to get job alerts. The job title was "Application Security Engineer", and it looked like I almost qualify. So, I decided to go to the company page behind the offer, and look what else they do.

Result: they do a lot of interesting research, and they also have a "jobs" page, which is empty. Also, the job offer that I have received contained some links to Google Docs, and an email address not on the company's domain, which looked quite non-serious for a company of that size.

So, I went to the "contact us" page, and called their phone. The secretary was unaware of the exact list of the currently opened positions, but told me that all official offers would be on a different job site, https://hh.ru/. It lists 8 positions, but nothing related to what I have received, and nothing for what I have the skills. So, we have concluded that I have received a fake job offer from impostors that illegally used the company name.

Conclusion: please beware of such scams, even from big and reputable job sites.

Tuesday, July 10, 2018

Possibly unexpected local access to OpenLDAP

OpenLDAP server, slapd, can listen on multiple sockets. In Ubuntu 18.04, by default (see SLAPD_SERVICES in /etc/default/slapd), it listens on TCP port 389 (ldap:///), which is indeed the purpose of a LDAP server. It also listens on a UNIX-domain socket (ldapi:///), which is necessary for access to the config database to work for root. It, by default, does not listen on the non-standard SSL port 636 (ldaps:///), but some people add it.

When configuring OpenLDAP, it is essential to set proper access control lists. People usually think in terms of anonymous users, authenticated users, subtrees, regular expressions, and such like. Then they apply the syntax documented in OpenLDAP admin guide. Then they try to connect to port 389 with some DNs in the tree and verify that these DNs can indeed access what is needed and cannot access or modify sensitive or read-only information. Often, anonymous read access is limited only to dn.exact="", so that the search bases are discoverable by various administration tools. And then, the task of securing the OpenLDAP server is declared done.

But is it really done? No! The mistake here is to test only access via port 389 and DNs from the tree.

Everybody who runs slapd (and especially those who grant permissions to "users"), please follow these steps:

  1. Login to your server using ssh as an unprivileged user.
  2. ldapsearch -H ldapi:/// -Y EXTERNAL -b '' -s base '*' '+'
  3. Note the value of the "namingContexts" attribute. Let's say it's "dc=example,dc=com".
  4. ldapsearch -H ldapi:/// -Y EXTERNAL -b 'dc=example,dc=com'
  5. Verify that it is not against your security policy for local users (e.g. www-data if your web app is compromised) to be able to extract this data.
What happens here is that a local user, just by virtue of having an UID and a GID, successfully authenticates via unix-domain socket, using the "EXTERNAL" SASL mechanism. The relevant DN for authentication looks like this: gidNumber=1000+uidNumber=1000,cn=peercred,cn=external,cn=auth

In other words, please be sure to close unneeded access for "dn.sub=cn=peercred,cn=external,cn=auth". Or, if you don't use the local socket for managing the configuration database (or are still on slapd.conf instead of slapd.d), consider configuring slapd not to listen on ldapi:/// at all.

Thursday, May 3, 2018

Downtime

A few days ago I had an interesting case of a server downtime. The server is just a playground for developers, so no big deal. But still, lessons learned.

The reports came almost simultaneously from developers and from the monitoring system, "cannot connect". And indeed, the server was not pingable. Someone else's server, with IP equal to the IP of our server with the last octet increased by 2, was pingable, so I concluded it was not a network problem.

Next reaction: look at the server's screen, using remote KVM provided by the hoster. Kernel panic! OK, need to screenshot it (done) and reboot the server. Except that the Power Control submenu in the viewer is grayed out, so I can't. And a few months ago, when we needed a similar kind of reset, it was there.

OK, so I created a ticket for resetting the server manually. And I had to remind them that the remote reboot functionality is supposed to work. Here is the hoster's reply (PDU = power distribution unit):

Dear Alexander,

Upon checking on the PDU, the PDU is refusing connection.

We'll arrange a PDU replacement the soonest possible.

We apologise for the inconvenience caused.

Everybody reading this post, now, please check that you don't fall into the same trap. Run your iKVM viewer against each of your server that it can connect to, and check that it runs, and that the menu item to reset the server still exists. Create a calendar reminder to periodically recheck it.

And maybe append "panic=10" to your linux kernel command line, so that manual intervention is not needed next time.