Let's Encrypt is a popular free TLS certificate authority. It currently issues certificates valid for only 90 days, and thus it is a good idea to automate their renewal. Fortunately, there are many tools to do so, including the official client called Certbot.
When Certbot or any other client asks Let's Encrypt for a certificate, it must prove that it indeed controls the domain names that are to be listed in the certificate. There are several ways to obtain such proof, by solving one of the possible challenges. HTTP-01 challenge requires the client to make a plain-text file with a given name and content available under the domain in question via HTTP, on port 80. DNS-01 challenge requires publishing a specific TXT record in DNS. There are other, less popular, kinds of challenges. HTTP-01 is the challenge which is the simplest to use in a situation where you have only one server that needs to have a non-wildcard TLS certificate for a given domain name (or several domain names).
Sometimes, however, you need to have a certificate for a given domain name available on more than one server. Such need arises e.g. if you use GeoDNS or DNS-based load balancing, i.e. answer DNS requests for your domain name (e.g., www.example.com) differently for different clients. E.g., you may want to have three servers, one in France, one in Singapore, and one in USA, and respond based on the client's IP address by returning the IP address of the geographically closest server. However, this presents a problem when trying to obtain a Let's Encrypt certificate. E.g., the HTTP-01 challenge fails out of the box because Let's Encrypt will likely connect to a different node than the one asking for the certificate, and will not find the text file that it looks for.
A traditional solution to this problem would be to set up a central server, let it respond to the challenges, and copy the certificates from it periodically to all the nodes.
Making the central server solve DNS-01 challenges is trivial — all that is needed is an automated way to change DNS records in your zone, and scripts are available for many popular DNS providers. I am not really comfortable with this approach, because if an intruder gets access to your central server, they can not only get a certificate and a private key for www.example.com, but also take over the whole domain, i.e. point the DNS records (including non-www) to their own server. This security concern can be alleviated by the use of CNAMEs that point _acme-challenge to a separate DNS zone with separate permissions, but doing so breaks all Let's Encrypt clients known to me. Some links: two bug reports for the official client, and my own GitHub gist for a modified Dehydrated hook script for Amazon Route 53.
For HTTP-01, the setup is different: you need to make the central server available over HTTP on a separate domain name (e.g. auth.example.com), and configure all the nodes to issue redirects when Let's Encrypt tries to verify the challenge. E.g., http://www.example.com/.well-known/acme-challenge/anything must redirect to http://auth.example.com/.well-known/acme-challenge/anything, and then Certbot running on auth.example.com will be able to obtain certificates for www.example.com without the security risk inherent for DNS-01 challenges. Proxying the requests, instead of redirecting them, also works.
Scripting the process of certificate distribution back to cluster nodes, handling network errors, reloading Apache (while avoiding needless restarts) and monitoring the result is another matter.
So, I asked myself a question: would it be possible to simplify this setup, if there are only a few nodes in the cluster? In particular, avoid the need to copy files from server to server, and to get rid of the central server altogether. And ideally get rid of any fragile custom scripts. It turns out that, with a bit of Apache magic, you can do that. No custom scripts are needed, no ssh keys for unattended distribution of files, no central server, just some simple rewrite rules.
Each of the servers will run Certbot and request certificates independently. The idea is to have a server ask someone else when it doesn't know the answer to the HTTP-01 challenge.
To do so, we need to enable mod_rewrite, mod_proxy, and mod_proxy_http on each server. Also, I assume that you already have some separate domain names (not for the general public) pointing to each of the cluster nodes, just for the purpose of solving the challenges. E.g., www-fr.example.com, www-sg.example.com, and www-us.example.com.
So, here is the definition of the Apache virtual host that responds to unencrypted HTTP requests. The same configuration file works for all cluster nodes.
For a complete example, add a virtual host for port 443 that serves your web application on https://www.example.com.
Run Certbot like this, on all servers:
Any other Let's Encrypt client than works by placing files into a directory will also be good enough. Apache's mod_md will not work, though, because it deliberately blocks all requests for unknown challenge files, which is contrary to what we need.
Let's see how it works.
Certbot asks Let's Encrypt for a certificate. Let's Encrypt tells Certbot the file name that it will try to fetch, and the expected contents. Certbot places this file under /var/www/letsencrypt/.well-known/acme-challenge and tells Let's Encrypt that they can verify that it is there. Let's Encrypt resolves www.example.com (and example.com, but let's forget about it) in the DNS, and then asks for this file under http://www.example.com/.well-known/acme-challenge.
If their verifier is lucky enough to hit the same server that asked for the certificate, the RewriteCond for the first RewriteRule will be true (it just tests the file existence), and, due to this rule, Apache will serve the file. Note that the rule responds not only to acme-challenge URLs, but also to acme-challenge-fr, acme-challenge-sg, and acme-challenge-us URLs used internally by other servers.
If the verifier is unlucky, then the challenge file will not be found, and the second block of RewriteRule directives will come into play. Let's say that it was the Singapore server that requested the certificate (and thus can respond), but Let's Encrypt has contacted the server in USA.
For the request sent by Let's Encrypt verifier, we can see that only the first rule in the second block will match. It will (conditionally on the file not being found locally, as tested by the first block) proxy the request from the server in USA to the French server, and use the "acme-challenge-fr" directory in the URL to record the fact that it is not the original request. The French server will not find the file either, so will skip the first block, and apply the second rule in the second block of RewriteRules (because it sees "acme-challenge-fr" in the URL). Thus, the request will be proxied again, this time to the Singapore server, and with "acme-challenge-sg" in the URL. As it was the Singapore server who requested the certificate, it will find the file and respond with its contents, and through the French and US servers, Let's Encrypt verifier will get the response and issue the certificate.
The last RewriteRule in the second block terminates the chain for stray requests not originating from Let's Encrypt. Such requests get proxied three times and finally get a 404.
The proposed scheme is, in theory, extensible to any number of servers — all that is needed is that they are all online, and the chain of proxying the request through all of them is not too slow. But, there is a limit on Let's Encrypt side on the number of duplicate certificates, 5 per week. I would guess (but have not verified) that, in practice, due to both factors, it means at most 5 servers in the cluster would be safe — which is still good enough for some purposes.
When Certbot or any other client asks Let's Encrypt for a certificate, it must prove that it indeed controls the domain names that are to be listed in the certificate. There are several ways to obtain such proof, by solving one of the possible challenges. HTTP-01 challenge requires the client to make a plain-text file with a given name and content available under the domain in question via HTTP, on port 80. DNS-01 challenge requires publishing a specific TXT record in DNS. There are other, less popular, kinds of challenges. HTTP-01 is the challenge which is the simplest to use in a situation where you have only one server that needs to have a non-wildcard TLS certificate for a given domain name (or several domain names).
Sometimes, however, you need to have a certificate for a given domain name available on more than one server. Such need arises e.g. if you use GeoDNS or DNS-based load balancing, i.e. answer DNS requests for your domain name (e.g., www.example.com) differently for different clients. E.g., you may want to have three servers, one in France, one in Singapore, and one in USA, and respond based on the client's IP address by returning the IP address of the geographically closest server. However, this presents a problem when trying to obtain a Let's Encrypt certificate. E.g., the HTTP-01 challenge fails out of the box because Let's Encrypt will likely connect to a different node than the one asking for the certificate, and will not find the text file that it looks for.
A traditional solution to this problem would be to set up a central server, let it respond to the challenges, and copy the certificates from it periodically to all the nodes.
Making the central server solve DNS-01 challenges is trivial — all that is needed is an automated way to change DNS records in your zone, and scripts are available for many popular DNS providers. I am not really comfortable with this approach, because if an intruder gets access to your central server, they can not only get a certificate and a private key for www.example.com, but also take over the whole domain, i.e. point the DNS records (including non-www) to their own server. This security concern can be alleviated by the use of CNAMEs that point _acme-challenge to a separate DNS zone with separate permissions, but doing so breaks all Let's Encrypt clients known to me. Some links: two bug reports for the official client, and my own GitHub gist for a modified Dehydrated hook script for Amazon Route 53.
For HTTP-01, the setup is different: you need to make the central server available over HTTP on a separate domain name (e.g. auth.example.com), and configure all the nodes to issue redirects when Let's Encrypt tries to verify the challenge. E.g., http://www.example.com/.well-known/acme-challenge/anything must redirect to http://auth.example.com/.well-known/acme-challenge/anything, and then Certbot running on auth.example.com will be able to obtain certificates for www.example.com without the security risk inherent for DNS-01 challenges. Proxying the requests, instead of redirecting them, also works.
Scripting the process of certificate distribution back to cluster nodes, handling network errors, reloading Apache (while avoiding needless restarts) and monitoring the result is another matter.
So, I asked myself a question: would it be possible to simplify this setup, if there are only a few nodes in the cluster? In particular, avoid the need to copy files from server to server, and to get rid of the central server altogether. And ideally get rid of any fragile custom scripts. It turns out that, with a bit of Apache magic, you can do that. No custom scripts are needed, no ssh keys for unattended distribution of files, no central server, just some simple rewrite rules.
Each of the servers will run Certbot and request certificates independently. The idea is to have a server ask someone else when it doesn't know the answer to the HTTP-01 challenge.
To do so, we need to enable mod_rewrite, mod_proxy, and mod_proxy_http on each server. Also, I assume that you already have some separate domain names (not for the general public) pointing to each of the cluster nodes, just for the purpose of solving the challenges. E.g., www-fr.example.com, www-sg.example.com, and www-us.example.com.
So, here is the definition of the Apache virtual host that responds to unencrypted HTTP requests. The same configuration file works for all cluster nodes.
<VirtualHost *:80> ServerName www.example.com ServerAlias example.com ServerAlias www-fr.example.com ServerAlias www-sg.example.com ServerAlias www-us.example.com ProxyPreserveHost On RewriteEngine On # First block of rules - solving known challenges. RewriteCond /var/www/letsencrypt/.well-known/acme-challenge/$2 -f RewriteRule ^/\.well-known/acme-challenge(|-fr|-sg|-us)/(.*) \ /var/www/letsencrypt/.well-known/acme-challenge/$2 [L] # Second block of rules - passing unknown challenges further. # Due to RewriteCond in the first block, we already know at this # point that the file does not exist locally. RewriteRule ^/\.well-known/acme-challenge/(.*) \ http://www-fr.example.com/.well-known/acme-challenge-fr/$1 [P,L] RewriteRule ^/\.well-known/acme-challenge-fr/(.*) \ http://www-sg.example.com/.well-known/acme-challenge-sg/$1 [P,L] RewriteRule ^/\.well-known/acme-challenge-sg/(.*) \ http://www-us.example.com/.well-known/acme-challenge-us/$1 [P,L] RewriteRule ^/\.well-known/acme-challenge-us/(.*) - [R=404] # HTTP to HTTPS redirection for everything not matched above RewriteRule /?(.*) https://www.example.com/$1 [R=301,L] </VirtualHost>
For a complete example, add a virtual host for port 443 that serves your web application on https://www.example.com.
<VirtualHost *:443> ServerName www.example.com ServerAlias example.com # You may want to have a separate virtual host or a RewriteRule # for redirecting browsers who visit https://example.com or any # other unwanted domain name to https://www.example.com. # E.g.: RewriteEngine On RewriteCond %{HTTP_HOST} !=www.example.com [NC] RewriteRule /?(.*) https://www.example.com/$1 [R=301,L] # Configure Apache to serve your content DocumentRoot /var/www/example SSLEngine on Include /etc/letsencrypt/options-ssl-apache.conf # Use any temporary certificate here, even a self-signed one works. # This piece of configuration will be replaced by Certbot. SSLCertificateFile /etc/ssl/certs/ssl-cert-snakeoil.pem SSLCertificateKeyFile /etc/ssl/private/ssl-cert-snakeoil.key </VirtualHost>
Run Certbot like this, on all servers:
mkdir -p /var/www/letsencrypt/.well-known/acme-challenge certbot -d example.com -d www.example.com -w /var/www/letsencrypt \ --noninteractive --authenticator webroot --installer apache
Any other Let's Encrypt client than works by placing files into a directory will also be good enough. Apache's mod_md will not work, though, because it deliberately blocks all requests for unknown challenge files, which is contrary to what we need.
Let's see how it works.
Certbot asks Let's Encrypt for a certificate. Let's Encrypt tells Certbot the file name that it will try to fetch, and the expected contents. Certbot places this file under /var/www/letsencrypt/.well-known/acme-challenge and tells Let's Encrypt that they can verify that it is there. Let's Encrypt resolves www.example.com (and example.com, but let's forget about it) in the DNS, and then asks for this file under http://www.example.com/.well-known/acme-challenge.
If their verifier is lucky enough to hit the same server that asked for the certificate, the RewriteCond for the first RewriteRule will be true (it just tests the file existence), and, due to this rule, Apache will serve the file. Note that the rule responds not only to acme-challenge URLs, but also to acme-challenge-fr, acme-challenge-sg, and acme-challenge-us URLs used internally by other servers.
If the verifier is unlucky, then the challenge file will not be found, and the second block of RewriteRule directives will come into play. Let's say that it was the Singapore server that requested the certificate (and thus can respond), but Let's Encrypt has contacted the server in USA.
For the request sent by Let's Encrypt verifier, we can see that only the first rule in the second block will match. It will (conditionally on the file not being found locally, as tested by the first block) proxy the request from the server in USA to the French server, and use the "acme-challenge-fr" directory in the URL to record the fact that it is not the original request. The French server will not find the file either, so will skip the first block, and apply the second rule in the second block of RewriteRules (because it sees "acme-challenge-fr" in the URL). Thus, the request will be proxied again, this time to the Singapore server, and with "acme-challenge-sg" in the URL. As it was the Singapore server who requested the certificate, it will find the file and respond with its contents, and through the French and US servers, Let's Encrypt verifier will get the response and issue the certificate.
The last RewriteRule in the second block terminates the chain for stray requests not originating from Let's Encrypt. Such requests get proxied three times and finally get a 404.
The proposed scheme is, in theory, extensible to any number of servers — all that is needed is that they are all online, and the chain of proxying the request through all of them is not too slow. But, there is a limit on Let's Encrypt side on the number of duplicate certificates, 5 per week. I would guess (but have not verified) that, in practice, due to both factors, it means at most 5 servers in the cluster would be safe — which is still good enough for some purposes.
No comments:
Post a Comment