tag:blogger.com,1999:blog-78445494852701531602024-02-14T00:37:51.710-08:00My BlogAlexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.comBlogger60125tag:blogger.com,1999:blog-7844549485270153160.post-47775851160075426832022-02-09T02:27:00.007-08:002022-02-09T05:33:40.302-08:00SSD failed<p>I have seen multiple times how HDDs fail, both in others' servers, and in my own computers. They usually develop unreadable sectors, and the rest of data can be recovered under Linux using tools like <code>ddrescue</code> to another HDD. But I have never seen any SSD failures myself before. Until today.</p><p><strike>Well, this case is not that different.</strike></p><p>Today, my desktop computer failed to boot, with some error messages from <code>systemd</code> about failing to start services. I thought that it might be a one-off error, rebooted it, only to find out that the root partition (XFS on LUKS on /dev/sda2) failed to mount. The error in <code>dmesg</code> told me to run <code>xfs_repair</code>, which I did not do initially.</p><p>It did mount with the <code>ro,norecovery</code> options, but I rebooted the system afterwards instead of copying the files somewhere immediately. It was a stupid move, and a lesson for the future.</p><p>Then I ran <code>xfs_repair</code>, but it complained a lot about I/O errors, and afterwards, the filesystem was no longer mountable even with the <code>norecovery</code> option.</p><p>The majority of sectors are still readable, so, as of now, <code>ddrescue</code> + <code>xfs_repair</code> still looks like a valid recovery strategy. I will update the blog post if it isn't. Even if it isn't, only a day of work is lost.</p><p><b>Update 1</b>: it worked, but I had to run <code>xfs_repair</code> twice, and the <code>node_modules</code> directory from one project ended up in <code>lost+found</code>. So nothing important was apparently lost.</p><p>And here is what smartctl in my rescue system says about the drive.</p><p><br /></p>
<pre>$ sudo smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.4-arch2-1] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Indilinx Barefoot 3 based SSDs
Device Model: OCZ-VECTOR
Serial Number: OCZ-Z5CB4KC20X0ZG7F8
LU WWN Device Id: 5 e83a97 27d603391
Firmware Version: 3.0
User Capacity: 512 110 190 592 bytes [512 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
TRIM Command: Available
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Feb 9 15:09:11 2022 +05
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x1d) SMART execute Offline immediate.
No Auto Offline data collection support.
Abort Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x00) Error logging NOT supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 0) minutes.
Extended self-test routine
recommended polling time: ( 0) minutes.
SMART Attributes Data Structure revision number: 18
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Runtime_Bad_Block 0x0000 033 033 000 Old_age Offline - 33
9 Power_On_Hours 0x0000 100 100 000 Old_age Offline - 31212
12 Power_Cycle_Count 0x0000 100 100 000 Old_age Offline - 5821
171 Avail_OP_Block_Count 0x0000 080 080 000 Old_age Offline - 72682832
174 Pwr_Cycle_Ct_Unplanned 0x0000 100 100 000 Old_age Offline - 406
195 Total_Prog_Failures 0x0000 100 100 000 Old_age Offline - 0
196 Total_Erase_Failures 0x0000 100 100 000 Old_age Offline - 0
197 Total_Unc_Read_Failures 0x0000 100 100 000 Old_age Offline - 33
208 Average_Erase_Count 0x0000 100 100 000 Old_age Offline - 317
210 SATA_CRC_Error_Count 0x0000 100 100 000 Old_age Offline - 60
224 In_Warranty 0x0000 100 100 000 Old_age Offline - 0
233 Remaining_Lifetime_Perc 0x0000 090 090 000 Old_age Offline - 90
241 Host_Writes_GiB 0x0000 100 100 000 Old_age Offline - 53191
242 Host_Reads_GiB 0x0000 100 100 000 Old_age Offline - 34251
249 Total_NAND_Prog_Ct_GiB 0x0000 100 100 000 Old_age Offline - 7808432714
Warning! SMART ATA Error Log Structure error: invalid SMART checksum.
SMART Error Log Version: 1
No Errors Logged
Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
Selective Self-tests/Logging not supported
</pre>
<p>So, despite the I/O errors, the drive still considers itself healthy (SMART overall-health self-assessment test result: PASSED). What a liar.</p><p><b>Update 2</b>: after a reboot, the system does not detect the drive at all, even in the BIOS. So I was lucky to be able to copy all the data just in time. <br /></p>Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com0tag:blogger.com,1999:blog-7844549485270153160.post-7763666195081125272021-07-24T10:19:00.001-07:002021-07-24T10:20:42.553-07:00A bad job with a very good description<p>Today I received a job invitation for a C++ developer position. My C++ skills are not up to date, so I initially doubted whether to apply. Nevertheless, I clicked a link with a more detailed job description. And this was the best job description that I have seen so far. No long list of desired skills. Clear view of the job duties.<br /></p><p>Let me translate and quote the relevant parts:</p><blockquote><p>Our product is <link removed>. You can download and try it on your PC.</p></blockquote><p>Well, this is why I refused. I do not want to be involved in "black SEO" (which is obvious from the link that I removed). But let's continue with the description:<br /></p><blockquote><p>Our mission is to allow users to create separate browser instances, with their own cookies, local storage, etc. And with their own substitutions of values such as navigator.platform, navigator.language and so on. <br /></p></blockquote><blockquote><p>For these tasks, we use currently a stock Google Chrome browser, and embed the needed property replacements using its API (Devtools Protocol) as JavaScript code. But this approach has certain drawbacks that do not allow bringing the browser operation to the desired "perfect" state. Therefore, we start development of our own browser build based on Chromium sources, so that all the needed property substitutions and modifications would come from the core and not from the injected JavaScript code.<br /></p></blockquote><p>A perfectly clear and logical
description of what they are trying to achieve, and, importantly, why. It is obvious that the person who wrote this has a coherent product vision. <br /></p><p>Let me snip the part regarding work conditions, as that's irrelevant for the purpose of this blog post, and continue with the best part of the job post:<br /></p><blockquote><p>And here is a benchmark that allows you to see if you fit this role. Create a Chromium build for Windows with a replaced icon. Just any icon, but make sure that it shows up both as an icon of the executable and in the taskbar. Also, modify the browser so that the navigator.platform property is always MacIntel.<br /></p></blockquote><p>This benchmark allows candidates to self-judge themselves much better than long lists of required skills. What if I don't have this one skill from the list - is it really important, or can I just learn it on the fly? With the candidate self-test approach, as demonstrated above, such question cannot arise, and there is less need to conduct in-person coding interviews.<br /></p><p>Another good aspect of this benchmark is that it involves modification of an existing code base, and not coding something from scratch. The ability to read and understand existing code in real-world projects is what really matters, and traditional coding interviews do not measure that.<br /></p><p>I wish all IT job descriptions were written like that. It's unfortunate that this position comes from the bad guys - otherwise I would have definitely applied.</p><p>P.S. If you need a remote Linux sysadmin, DevOps engineer, or a C, Python, or PHP developer, feel free to <a href="mailto:patrakov@gmail.com">contact me via email</a>.<br /></p>Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com1tag:blogger.com,1999:blog-7844549485270153160.post-35483722693801119932020-01-23T02:05:00.003-08:002021-08-03T11:00:26.686-07:00VPN privacy policies and privacy threats<div dir="ltr" style="text-align: left;" trbidi="on">
People use commercial VPNs as anonymity and privacy tools. To be useful as such a tool, the VPN provider must not store any information that would identify a real IP address of its user when given the details of IP packets from/to the website that the user has visited or his software (e.g. a BitTorrent client) has automatically communicated with. In other words, a user who uses BitTorrent or visits "shady" sites is expected that the VPN provider will not be able to point to him when asked "who torrented" or "who has visited this site".<br />
<br />
Off-the shelf VPN server software does log connections by default, and this is useful in a corporate setting for incident investigations. So, VPN providers often make an explicit "no-logs" statement in their privacy policies to indicate that they, well, don't log certain data or discard those logs after a predetermined amount of time. Here is an example of such policy statement, taken from <a href="https://www.ivacy.com/legal/#privacy-policy">Ivacy</a>:<br />
<blockquote class="tr_bq">
<i>We strictly do not log or monitor, online browsing activities, connection logs, VPN IPs assigned, original IP addresses, browsing history, outgoing traffic, connection times, data you have accessed and/or DNS queries generated by your end. We have no information that could associate specific activities to specific users.</i></blockquote>
It looks like this statement is short and to the point. Is it enough? Unfortunately, by itself, the statement above is insufficient. It would be perfectly compliant with the wording above if they streamed the connection events, VPN IPs assigned, encryption keys negotiated, etc., and mirrored the traffic (including the original connection IPs) to a third party in real time. Exfiltration is not the same as logging, and it is still true that the hypothetical evil Ivacy keeps no information that could associate specific activities to specific users.<br />
<br />
It is not only Ivacy who has the problem with the privacy policy focusing only on logging as the privacy threat — the problem is in fact very common, probably because nobody except me really thought about other ways to betray privacy. It became even more important in 2018, when it became the norm for VPN providers to undergo audits. And guess what, one common form of an audit is called a "No-Log Audit". Not a general "Privacy Policy Adherence Audit", but a narrow "No-Log Audit"!<br />
<br />
It is an interesting question what an auditor should do here. If this is a "No-Log Audit", then, formally, deliberate real-time exfiltration is out of scope. So are deliberately introduced cryptographic weaknesses that would allow third parties to fully decrypt connections (though, this is in scope for a "Security Audit", which is a completely different thing). Even during a more general policy compliance audit, targeted at the entirety of the privacy policy, formally, an auditor has the right not to report deliberate exfiltration as a finding, provided that the policy is worded carefully enough (so that data exfiltration is not a privacy policy violation as worded).<br />
<br />
Note the word "formally" above: it is all about the worst case. Some auditors do care about the spirit of the policy, not only about the letter. I have asked three companies that conducted No-Log Audits of various VPN providers in the past about this dilemma, and, so far, received one useful reply (from <a href="https://cure53.de/">cure53</a>, regarding their <a href="https://cure53.de/audit-report_ivpn.pdf">No-Log & Privacy Audit</a> of <a href="https://www.ivpn.net/">IVPN</a>).<br />
<blockquote class="tr_bq">
<i>Q: Was there any attempt during the audit to check that personally identifying information does not leave the company via network connections, as opposed to the on-disk logs that you have already confirmed as non-existing?</i><br />
A: Yes and no. We checked that on the servers we got access to. In the IVPN case we could not find any evidence that points towards them attempting that. While this doesn't mean that they, IVPN, don't do it at all, we at least didn't catch them trying.<br />
<...><br />
<i>Q: Would any of the above privacy violations (if IVPN were engaging in such activities) be caught/flagged/result in a failed audit?</i><br />
A: As a matter of fact, yes. Short after IVPN we audited another provider and it ended up in massive drama because they indeed logged and found different excuses every time. It was pathetic.</blockquote>
<div>
NordVPN is one of the VPN providers that does address the issue with the wording. Let me quote the relevant bits from their <a href="https://my.nordaccount.com/legal/privacy-policy/nordvpn/">privacy policy</a> (emphasis mine):<br />
<blockquote><i>Nord guarantees a strict no-logs policy for NordVPN Services, meaning that your internet activity while using NordVPN Services is not monitored, recorded, logged, stored <b>or passed to any third party</b>.</i></blockquote>
Much more clear and reassuring. Let's hope that other VPN providers read this blog post and apply the same simple fix.<br />
<br />
Disclaimer: I am a customer of some VPN services mentioned here. I have, at the time of this writing, absolutely no evidence that they engage, or engaged in the past, in the hypothetical malpractice described in this post. It was just an example. I am sure that, for every existing privacy policy in the world, a sufficiently advanced hairsplitter can figure out a way to "comply" in a similar way.</div>
</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com0tag:blogger.com,1999:blog-7844549485270153160.post-27674567057805893142019-11-17T03:43:00.002-08:002019-11-17T03:49:56.661-08:00Fail2ban and network misconfiguration<div dir="ltr" style="text-align: left;" trbidi="on">
I had a rather stupid security incident recently, and want you to check if your company network is vulnerable to the same issue.<br />
<br />
The network in question had a bastion host, with an externally accessible SSH server and a (more recently added) VPN. Because of someone's laziness (i.e. routing not configured properly), the bastion host was performing network address translation, so, no matter who connected, over VPN or just double SSH, all internal hosts would see incoming SSH connections from the internal IP address of the bastion host.<br />
<br />
The real issue that triggered a short SSH outage was that internal hosts ran fail2ban, and I had a slightly broken keyboard (already replaced as of now) that would sometimes double-register one of the keys, and the character on that key was in my password. So, I managed to mistype my password twice when connecting to one of the internal servers.<br />
<br />
Result: that internal server banned me. Or rather, it banned the IP address of the bastion host. And that's not only me, all other people who were working there also got banned for some time - which is clearly unwanted.<br />
<br />
Here are some recommendations how to avoid repeating this incident.<br />
<br />
<ol style="text-align: left;">
<li>If you can, make sure that internal servers can see the VPN IP address of the admin, not just the internal IP of the bastion host. I.e., remove NAT.</li>
<li>If you can't remove NAT, or have to support double-SSH through the bastion host, make sure that it is whitelisted. OTOH, fail2ban does protect against the situation when there is some password-guessing malware on the admin's PC. So, my recommendation would be, once NAT is removed, to deprecate the double-SSH workflow.</li>
<li>Use SSH keys.</li>
<li>When it is acceptable (i.e. would not be counted as an incident), actually perform the test, just to see if the ban is applied correctly.</li>
</ol>
<br />
<br /></div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com0tag:blogger.com,1999:blog-7844549485270153160.post-13868160807060153812019-07-10T08:58:00.004-07:002019-07-10T08:58:49.241-07:00How to work around slow IPMI virtual media<div dir="ltr" style="text-align: left;" trbidi="on">
Sometimes, you need to perform a custom installation of an operating system (likely Linux) on a dedicated server. This happens, e.g., if the hoster does not support your preferred Linux distribution, or if you want to make some non-default decisions during the installation that are not easy to change later, like custom partitioning or other storage-related settings. For this use case, some hosters offer IPMI access. Using a Java applet, you can remotely look at the display of the server and send keystrokes and mouse input - even to the BIOS. Also, you can connect a local ISO image as a virtual USB CD-ROM and install an operating system from it.<br />
<br />
The problem is - such virtual CD-ROM is unbearably slow if your server is not in the same city. The CD-ROM access pattern when loading the installer is all about random reads, because it consists mostly of loading (i.e., page faulting) executable programs and libraries from a squashfs image. Therefore there is an inherent speed limit based on the read access size and the latency between the IPMI viewer and the server. Even for linear reads, where the readahead mechanism should help, the virtual CD-ROM read speed is limited at something like 250 kilobytes per second if the latency is 100 ms. This is lower than 2x CD speed. If you are working from home using an ADSL or LTE connection, the speed would be even worse.<br />
<br />
With distant servers or slow internet links on the viewer side, Linux installers sometimes time out loading their own components from the virtual media. In fact, when trying to install CentOS 7 over IPMI on a server located in Latvia, I always got a disconnection before it could load 100 megabytes from the installation CD. Therefore, this installation method was unusable for me, even with a good internet connection. This demonstrates why it is a good idea to avoid installing operating systems over IPMI from large ISO images. Even "minimal" or "netinstall" images are typically too large!<br />
<br />
So, what are the available alternatives?<br />
<br />
Having another host near the server being installed, with a remote desktop and a good internet connection, definitely helps, and is the best option if available.<br />
<br />
<a href="http://ftp.debian.org/debian/dists/stable/main/installer-amd64/current/images/netboot/">Debian</a> and <a href="http://cdimages.ubuntu.com/netboot/">Ubuntu</a> offer special well-hidden "netboot" CD images which are even smaller than "netinstall". They are less than 75 MB in size and therefore have a better chance of success. In fact, even when not constrained by IPMI, I never use anything else to install these Linux distributions.<br />
<br />
And it turns out that, if your server should obtain its IP address with DHCP, there is an even easier option: load the installer from the network. There is a special website, <a href="https://netboot.xyz/">netboot.xyz</a>, that hosts iPXE configuration files. All you need to download is a very small (1 MB) <a href="https://boot.netboot.xyz/ipxe/netboot.xyz.iso">ISO</a> with iPXE itself. Boot the server from it, and you will get a menu with many Linux variants available for installation.<br />
<br />
There are two limitations with the netboot.xyz approach. <a href="https://github.com/antonym/netboot.xyz/issues/342">First</a>, to install some Linux variants, you do need DHCP, and not all hosting providers use it for their dedicated servers. Even though there is a failsafe menu where you can set a static IP address, some installers, including CentOS, assume DHCP. <a href="https://github.com/antonym/netboot.xyz/issues/268">Second</a>, you will not get UEFI, because their UEFI ISO is broken. For both limitations, bugs are filed and hopefully will get fixed soon.<br />
<br />
A last-resort option, but still an option, would also be to install the system as you need it in a local virtual machine, and then transfer the block device contents to the server's disks over the network. To do so, you need a temporary lightweight Linux distribution. Tiny Core Linux (especially <a href="http://tinycorelinux.net/ports.html">the Pure 64 port</a>) fits the bill quite well. You can boot from their ISO image (which is currently only 26 MB in size, and supports UEFI) without installing it, configure the network if needed, install openssh using the "tce-load" tool, set the "tc" user password, and do what you need over ssh.<br />
<br />
With the three techniques demonstrated so far, I can now say with confidence that slow IPMI virtual media will no longer stop me from remotely installing the Linux distribution that I need, exactly as I need.</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com0tag:blogger.com,1999:blog-7844549485270153160.post-41027593527283056582019-06-16T05:45:00.000-07:002019-06-16T05:45:11.991-07:00KExec on modern distributions<div dir="ltr" style="text-align: left;" trbidi="on">
Many years ago, I <a href="https://patrakov.blogspot.com/2010/11/minimizing-server-downtime.html">mentioned</a> that KExec is a good way to reduce server downtime due to reboots. It is still the case, and it is especially true for hosters such as OVH and Hetzner that set their dedicated servers to boot from network (so that one can activate their rescue system from the web) before passing the control to the boot loader installed on the hard disk. Look, it takes 1.5 minutes to reboot an OVH server, and, on some server models, you can reduce this time to 15 seconds by avoiding all the time spent in the BIOS and in their netbooted micro-OS! Well, to be fair, on OVH you can avoid the netbooted environment by changing the boot priorities from the BIOS.<br />
<br />
The problem is, distributions make KExec too complicated to use. E.g., Debian and Ubuntu, when one installs kexec-tools, present a prompt whether KExec should handle reboots. The catch is, it works only with sysvinit, not with systemd. With systemd, you are supposed to remember to type "systemctl kexec" if you want the next reboot to be handled by KExec. And it's not only distributions: since systemd version 236, KExec is supported only together with UEFI and the "sd-boot" boot loader, while the majority of hosters still stick with the legacy boot process and the majority of Linux distributions still use GRUB2 as their boot loader. An attempt to run "systemctl kexec" on something unsupported results in this error message:<br />
<br />
<pre>Cannot find the ESP partition mount point.</pre>
<br />
Or, if /boot is on mdadm-based RAID1, another, equally stupid and unhelpful, error:<br />
<br />
<pre>Failed to probe partition scheme of "/dev/block/9:2": Input/output error</pre>
<br />
While switching to UEFI and sd-boot is viable in some cases, it is not always the case. Fortunately, there is a way to override systemd developers' stance on what's supported, and even make the "reboot" command invoke KExec. Note that the setup is a big bad unsupported hack. There are no guarantees that the setup below will work with future versions of systemd.<br />
<br />
The trick is to create the service that loads the new kernel and to override the commands that systemd executes when doing the actual reboot.<br />
<br />
Here is the unit that loads the new kernel.<br />
<code></code><br />
<pre><code># File: /etc/systemd/system/kexec-load.service
[Unit]
Description=Loading new kernel into memory
Documentation=man:kexec(8)
DefaultDependencies=no
Before=reboot.target
RequiresMountsFor=/boot
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStop=/sbin/kexec -d -l /vmlinuz --initrd=/initrd.img --reuse-cmdline
[Install]
WantedBy=default.target
</code></pre>
<br />
It assumes that symlinks to the installed kernel and the initrd are available in the root directory, which is true for Debian and Ubuntu. On other systems please adjust the paths as appropriate. E.g., on Arch Linux, the correct paths are /boot/vmlinuz-linux and /boot/initramfs-linux.img.<br />
<br />
There are other variants of this unit circulating around. A common mistake is that nothing ensures that the attempt to load the new kernel from /boot happens before the /boot partition is unmounted. The unit above does not have this race condition issue.<br />
<br />
The second part of the puzzle is an override file that replaces the commands that reboot the system.<br />
<code></code><br />
<pre><code># File: /etc/systemd/system/systemd-reboot.service.d/override.conf
[Service]
Type=oneshot
ExecStart=
ExecStart=-/bin/systemctl --force kexec
ExecStart=/bin/systemctl --force reboot
</code></pre>
<br />
That's it: try to kexec, and hopefully it does not return. If it does, then ignore the error and try the regular reboot.<br />
<br />
For safety, let's also create a script that temporarily disables the override and thus performs one normal BIOS-based reboot.<br />
<code></code><br />
<pre><code>#!/bin/sh
# File: /usr/local/bin/normal-reboot
mkdir -p /run/systemd/transient/systemd-reboot.service.d/
ln -sf /dev/null /run/systemd/transient/systemd-reboot.service.d/override.conf
ln -sf /dev/null /run/systemd/transient/kexec-load.service
systemctl daemon-reload
reboot
</code></pre>
<br />
Give the script proper permissions and enable the service:<br />
<code></code><br />
<pre><code>chmod 0755 /usr/local/bin/normal-reboot
systemctl enable kexec-load.service
reboot
</code></pre>
<br />
If everything goes well, this will be the last BIOS-based reboot. Further reboots will be handled by KExec, even if you type "reboot".<br />
<br />
This blog post would be incomplete without instructions what to do if the setup fails. And it can fail for various reasons, e.g. due to incompatible hardware or some driver assuming that its device has been properly reset by the BIOS.<br />
<br />
Well, the most common problem is with a corrupted graphical framebuffer console. In this case, it may be sufficient to add "nomodeset" to the kernel command line.<br />
<br />
Other systems may not be fixable so easily, or at all. E.g., on some OVH dedicated servers (in particular, on their "EG-32" product which is based on the Intel Corporation S1200SPL board), the kexec-ed kernel cannot properly route IRQs, and therefore does not detect SATA disks, and the on-board Ethernet adapter also becomes non-functional. In such cases, it is necessary to hard-reset the server and undo the setup. Here is how:<br />
<code></code><br />
<pre><code>systemctl disable kexec-load.service
rm -rf /etc/systemd/system/systemd-reboot.service.d
rm -f /etc/systemd/system/kexec-load.service
rm -f /usr/local/bin/normal-reboot
systemctl daemon-reload
reboot
</code></pre>
<br />
This reboot, and all further reboots, will be going through the BIOS.
</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com3tag:blogger.com,1999:blog-7844549485270153160.post-67077915413378262702019-01-18T18:36:00.000-08:002019-01-18T18:36:16.068-08:00dynv6.com: IPv6 dynamic DNS done right<div dir="ltr" style="text-align: left;" trbidi="on">
Sometimes, your home PC or router does not have a static IP address. In this case, if you want to access your home network remotely, a common solution is to use a dynamic DNS provider, and configure your router to update its A record (a DNS record that holds an IPv4 address) each time the external address changes. Then you can be sure that your domain name always points to the current external IPv4 address of the router.<br />
<br />
For accessing PCs behind the router, some trickery is needed, because usually there is only one public IPv4 address available for the whole network. Therefore, the router performs network address translation (NAT), and home PCs are not directly reachable. So, you have to either use port forwarding, or a full-blown VPN. Still, you need only one dynamic DNS record, because there is only one dynamic IP — the external IP of your router, and that's where you connect to.<br />
<br />
Enter the IPv6 world. There is no NAT anymore, the ISP allocates a whole /64 (or maybe larger) prefix for your home network(s), and every home PC becomes reachable using its individual IPv6 address. Except, now all addresses are dynamic. On "Rostelecom" ISP in Yekaterinburg, Russia, they are dynamic even if you order a static IP address, i.e. only IPv4 is static then, and there is no way to get a statically allocated IPv6 network.<br />
<br />
A typical IPv6 network has a prefix length of 64. It means that the first 64 bits denote the network, and are assigned (dynamically) by the ISP, while the lower 64 bits refer to the host and do not change when the ISP assigns the new prefix. Often but not always, the host part is just a MAC address with the second-lowest bit in the first octet inverted, and ff:fe inserted into the middle. This mechanism is often called EUI-64. For privacy reasons, typically there are also other short-lived IPv6 addresses on the interface, but let's ignore them.<br />
<br />
Unfortunately, many dynamic DNS providers have implemented their IPv6 support equivalently to IPv4, even though it does not really make sense. That is, a dynamic DNS client can update its own AAAA record using some web API call, and that's it. If you run a dynamic DNS client on the router, then only the router's DNS record is updated, and there is still no way to access home PCs individually, short of running DynDNS clients on all of them. In other words, the fact that the addresses in the LAN are, in fact, related, and should be updated as a group, is usually completely ignored.<br />
<br />
The <a href="https://dynv6.com/">dynv6.com</a> dynamic DNS provider is a pleasant exception. After registration, you get a third-level domain corresponding to your home network. You can also add records to that domain, corresponding to each of your home PCs. And while doing so, you can either specify the full IPv6 address (as you can do with the traditional dynamic DNS providers), or only the host part, or the MAC address. The ability to specify only the host part (or infer it from the MAC address) is what makes their service useful. Indeed, if the parent record (corresponding to your whole network) changes, then its network part is reapplied to all host records that don't specify the network part of their IPv6 address explicitly. So, you can run only one dynamic DNS client on the router, and get domain names corresponding to all of your home PCs.<br />
<br />
Let me illustrate this with an example.<br />
<br />
Suppose that your router has obtained the following addresses from the ISP:<br />
<br />
2001:db8:5:65bc:d0a2:1545:fbfe:d0b9/64 for the WAN interface<br />
2001:db8:b:9a00::/56 as a delegated prefix<br />
<br />
Then, it will (normally) use 2001:db8:b:9a00::1/64 as its LAN address, and PCs will get addresses from the 2001:db8:b:9a00::/64 network. You need to configure the router to update the AAAA record (let's use example.dynv6.net) with its LAN IPv6 address. Yes, LAN (and many router firmwares, including OpenWRT, get it wrong by default), because the WAN IPv6 address is completely unrelated to your home network. Then, using the web, create some additional AAAA records under the example.dynv6.net address:<br />
<br />
desktop AAAA ::0206:29ff:fe6c:f3e5 # corresponds to MAC address 00:06:29:6c:f3:e5<br />
qemu AAAA ::5054:00ff:fe12:3456 # corresponds to MAC address 52:54:00:12:34:56<br />
<br />
Or, you could enter MAC addresses directly.<br />
<br />
As I have already mentioned, the beauty of dynv6.com is that it does not interpret these addresses literally, but prepends the proper network part. That is, name resolution would actually yield reachable addresses:<br />
<br />
example.dynv6.net. AAAA 2001:db8:b:9a00::1 # The only record that the router has to update<br />
desktop.example.dynv6.net. AAAA 2001:db8:b:9a00:206:29ff:fe6c:f3e5 # Generated<br />
qemu.example.dynv6.net. AAAA 2001:db8:b:9a00:5054:ff:fe12:3456 # Also generated<br />
<br />
And you can, finally, connect remotely to any of those devices.</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com9tag:blogger.com,1999:blog-7844549485270153160.post-70828469839439472122019-01-06T18:37:00.000-08:002020-03-07T20:17:47.310-08:00Resizing Linux virtual machine disks<div dir="ltr" style="text-align: left;" trbidi="on">
Sometimes, one runs out of disk space on a virtual machine, and realizes that it was a mistake to provide such a small disk to it in the first place. Fortunately, unlike real disks, the virtual ones can be resized at will. A handy command for this task comes with QEMU (and, if you are on Linux, why are you using anything else?). Here is how to extend a raw disk image to 10 GB:<br />
<br />
<pre>qemu-img resize -f raw vda.img 10G
</pre>
<br />
After running this command, the beginning of the disk will contain the old bytes that were there before, and at the end there will be a long run of zeroes. qemu-img is smart enough to avoid actually writing these zeros to the disk image, it creates a sparse file instead.<br />
<br />
Resizing the disk image is only one-third of the job. The partition table still lists partitions of the old sizes, and the end of the disk is unused. Traditionally, fdisk has been the tool for altering the partition table. You can run it either from within your virtual machine, or directly on the disk image. All that is needed is to delete the last partition, and then recreate it with the same start sector, but with the correct size, so that it also covers the new part of the disk. Here is an example session with a simple MBR-based disk with two partitions:<br />
<br />
<pre># <b>fdisk /dev/vda</b>
Welcome to fdisk (util-linux 2.31.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Command (m for help): <b>p</b>
Disk /dev/vda: 10 GiB, 10737418240 bytes, 20971520 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xc02e3411
Device Boot Start End Sectors Size Id Type
/dev/vda1 * 2048 997375 995328 486M 83 Linux
/dev/vda2 997376 12580863 11583488 5.5G 83 Linux
Command (m for help): <b>d</b>
Partition number (1,2, default 2): <b>2</b>
Partition 2 has been deleted.
Command (m for help): <b>n</b>
Partition type
p primary (1 primary, 0 extended, 3 free)
e extended (container for logical partitions)
Select (default p): <b>p</b>
Partition number (2-4, default 2): <b>2</b>
First sector (997376-20971519, default 997376):
Last sector, +sectors or +size{K,M,G,T,P} (997376-20971519, default 20971519):
Created a new partition 2 of type 'Linux' and of size 9.5 GiB.
Partition #2 contains a ext4 signature.
Do you want to remove the signature? [Y]es/[N]o: <b>n</b>
Command (m for help): <b>p</b>
Disk /dev/vda: 10 GiB, 10737418240 bytes, 20971520 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xc02e3411
Device Boot Start End Sectors Size Id Type
/dev/vda1 * 2048 997375 995328 486M 83 Linux
/dev/vda2 997376 20971519 19974144 9.5G 83 Linux
Command (m for help): <b>w</b>
The partition table has been altered.
Syncing disks.
</pre>
<br />
As you see, it went smoothly. The kernel will pick up the new partition table after a reboot, and then you will be able to resize the filesystem with resize2fs (or some other tool if you are not using ext4).
<br />
<br />
Things are not so simple if the virtual disk is partitioned with GPT, not MBR, to begin with. The complication stems from the fact that there is a backup copy of GPT at the end of the disk. When we added zeros to the end of the disk, the backup copy ended up in the middle of the disk, not to be found. Also, the protective MBR now covers only the first part of the disk. The kernel is able to deal with this, but some versions of fdisk (at least fdisk found in Ubuntu 18.04) cannot. What happens is that fdisk is not able to create partitions that extend beyond the end of the old disk. And saving anything (in fact, even saving what already exists) fails with a rather unhelpful error message:<br />
<br />
<pre># <b>fdisk /dev/vda</b>
Welcome to fdisk (util-linux 2.31.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
GPT PMBR size mismatch (12582911 != 20971519) will be corrected by w(rite).
GPT PMBR size mismatch (12582911 != 20971519) will be corrected by w(rite).
Command (m for help): <b>w</b>
GPT PMBR size mismatch (12582911 != 20971519) will be corrected by w(rite).
fdisk: failed to write disklabel: Invalid argument
</pre>
<br />
Modern versions of fdisk do not have this problem:<br />
<pre># <b>fdisk /dev/vda</b>
Welcome to fdisk (util-linux 2.33).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
GPT PMBR size mismatch (12582911 != 20971519) will be corrected by write.
The backup GPT table is not on the end of the device. This problem will be corrected by write.
</pre>
<br />
Still, with GPT and not-so-recent versions of fdisk, it looks like we cannot use fdisk to take advantage of the newly added disk space. There is another tool, gdisk, that can manipulate GPT structures. However, it claims that there is almost no usable free space on the disk, and thus refuses to usefully resize the last partition by default.<br />
<br />
<pre># <b>gdisk /dev/vda</b>
GPT fdisk (gdisk) version 1.0.3
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Found valid GPT with protective MBR; using GPT.
Command (? for help): <b>p</b>
Disk /dev/vda: 20971520 sectors, 10.0 GiB
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): 61E744EF-1CD3-5145-BC59-4646E6CB03DE
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 2048, last usable sector is 12582878
Partitions will be aligned on 2048-sector boundaries
Total free space is 2015 sectors (1007.5 KiB)
Number Start (sector) End (sector) Size Code Name
1 2048 4095 1024.0 KiB EF02
2 4096 999423 486.0 MiB 8300
3 999424 12580863 5.5 GiB 8300
</pre>
<br />
What we need to do is to use the "expert" functionality in order to move the backup GPT to the end of the disk. After that, new free space will be available, and we will be able to resize the last partition.<br />
<br />
<pre>Command (? for help): <b>x</b>
Expert command (? for help): <b>e</b>
Relocating backup data structures to the end of the disk
Expert command (? for help): <b>m</b>
Command (? for help): <b>d</b>
Partition number (1-3): <b>3</b>
Command (? for help): <b>n</b>
Partition number (3-128, default 3):
First sector (999424-20969472, default = 999424) or {+-}size{KMGTP}:
Last sector (999424-20969472, default = 20969472) or {+-}size{KMGTP}:
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300):
Changed type of partition to 'Linux filesystem'
Command (? for help): <b>p</b>
Disk /dev/vda: 20971520 sectors, 10.0 GiB
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): 61E744EF-1CD3-5145-BC59-4646E6CB03DE
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 2048, last usable sector is 20969472
Partitions will be aligned on 2048-sector boundaries
Total free space is 0 sectors (0 bytes)
Number Start (sector) End (sector) Size Code Name
1 2048 4095 1024.0 KiB EF02
2 4096 999423 486.0 MiB 8300
3 999424 20969472 9.5 GiB 8300 Linux filesystem
Command (? for help): <b>w</b>
Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!
Do you want to proceed? (Y/N): <b>y</b>
OK; writing new GUID partition table (GPT) to /dev/vda.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
</pre>
<br />
OK, it worked, but was a bit too complicated, to the degree sufficient to consider sticking with MBR where possible. And it changed the UUID of the last partition, which may or may not be OK in your setup (it is definitely not OK if /etc/fstab or the kernel command line mentions PARTUUID). The same warning about PARTUUID applies to modern versions of fdisk, too.<br />
<br />
Anyway, it turns out that gdisk is not the simplest solution to the problem of resizing a GPT-based disk. The "sfdisk" program that comes with util-linux (i.e. with the same package that provides fdisk, even with not-so-recent versions) works just as well. We need to dump the existing partitions, edit the resulting script, and feed it back to sfdisk so that it recreates these partitions for us from scratch, with the correct sizes, and we can preserve all partition UUIDs, too.<br />
<br />
Here is what this dump looks like:<br />
<br />
<pre># <b>sfdisk --dump /dev/vda > disk.dump</b>
GPT PMBR size mismatch (12582911 != 20971519) will be corrected by w(rite).
# <b>cat disk.dump</b>
label: gpt
label-id: 61E744EF-1CD3-5145-BC59-4646E6CB03DE
device: /dev/vda
unit: sectors
first-lba: 2048
last-lba: 12582878
/dev/vda1 : start= 2048, size= 2048, type=[...], uuid=[...]
/dev/vda2 : start= 4096, size= 995328, type=[...], uuid=[...]
/dev/vda3 : start= 999424, size= 11581440, type=[...], uuid=[...]
</pre>
<br />
We need to fix the "last-lba" parameter and change the size of the last partition. Also, newer versions of sfdisk add the "sector-size: 512" line that they don't understand. Fortunately, sfdisk has reasonable defaults (use as much space as possible) for all three parameters, so we can just delete them instead. Quite easy to do with sed:<br />
<br />
<pre># <b>sed -i -e '/^sector-size:/d' -e '/^last-lba:/d' -e '$s/size=[^,]*,//' disk.dump</b></pre>
<pre># <b>cat disk.dump</b>
label: gpt
label-id: 61E744EF-1CD3-5145-BC59-4646E6CB03DE
device: /dev/vda
unit: sectors
first-lba: 2048
/dev/vda1 : start= 2048, size= 2048, type=[...], uuid=[...]
/dev/vda2 : start= 4096, size= 995328, type=[...], uuid=[...]
/dev/vda3 : start= 999424, type=[...], uuid=[...]
</pre>
<br />
Then, with some flags to turn off various checks, sfdisk loads the modified partition table dump:<br />
<br />
<pre># <b>sfdisk --no-reread --no-tell-kernel -f --wipe never /dev/vda < disk.dump</b>
GPT PMBR size mismatch (12582911 != 20971519) will be corrected by w(rite).
Disk /dev/vda: 10 GiB, 10737418240 bytes, 20971520 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 61E744EF-1CD3-5145-BC59-4646E6CB03DE
Old situation:
Device Start End Sectors Size Type
/dev/vda1 2048 4095 2048 1M BIOS boot
/dev/vda2 4096 999423 995328 486M Linux filesystem
/dev/vda3 999424 12580863 11581440 5.5G Linux filesystem
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Created a new GPT disklabel (GUID: 61E744EF-1CD3-5145-BC59-4646E6CB03DE).
/dev/vda1: Created a new partition 1 of type 'BIOS boot' and of size 1 MiB.
/dev/vda2: Created a new partition 2 of type 'Linux filesystem' and of size 486 MiB.
Partition #2 contains a ext4 signature.
/dev/vda3: Created a new partition 3 of type 'Linux filesystem' and of size 9.5 GiB.
Partition #3 contains a ext4 signature.
/dev/vda4: Done.
New situation:
Disklabel type: gpt
Disk identifier: 61E744EF-1CD3-5145-BC59-4646E6CB03DE
Device Start End Sectors Size Type
/dev/vda1 2048 4095 2048 1M BIOS boot
/dev/vda2 4096 999423 995328 486M Linux filesystem
/dev/vda3 999424 20971486 19972063 9.5G Linux filesystem
The partition table has been altered.
</pre>
<br />
Let me repeat. The following lines, ready to be copy-pasted without much thinking except for the disk name, resize the last partition to occupy as much disk space as possible, and they work both on GPT and MBR:<br />
<br />
<pre>DISK=/dev/vda
sfdisk --dump $DISK > disk.dump
sed -i -e '/^sector-size:/d' -e '/^last-lba:/d' -e '$s/size=[^,]*,//' disk.dump
sfdisk --no-reread --no-tell-kernel -f --wipe never $DISK < disk.dump
</pre>
<br />
On Ubuntu, there is also a "cloud-guest-utils" package that provides an even-easier "growpart" command, here is how it works:
<br />
<pre># <b>growpart /dev/vda 3</b>
CHANGED: partition=3 start=999424 old: size=11581440 end=12580864 new: size=19972063,end=20971487
</pre>
<br />
Just as with MBR, after resizing the partition, you have to reboot your virtual machine, so that the kernel picks up the new partition size, and then you can resize the filesystem to match the partition size.
</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com0tag:blogger.com,1999:blog-7844549485270153160.post-79389834114305923902019-01-03T12:47:00.000-08:002019-01-03T12:47:53.483-08:00Using Let's Encrypt certificates with GeoDNS<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
Let's Encrypt is a popular free TLS certificate authority. It currently issues certificates valid for only 90 days, and thus it is a good idea to automate their renewal. Fortunately, there are many tools to do so, including the official client called Certbot.<br />
<br />
When Certbot or any other client asks Let's Encrypt for a certificate, it must prove that it indeed controls the domain names that are to be listed in the certificate. There are several ways to obtain such proof, by solving one of the possible challenges. HTTP-01 challenge requires the client to make a plain-text file with a given name and content available under the domain in question via HTTP, on port 80. DNS-01 challenge requires publishing a specific TXT record in DNS. There are other, less popular, kinds of challenges. HTTP-01 is the challenge which is the simplest to use in a situation where you have only one server that needs to have a non-wildcard TLS certificate for a given domain name (or several domain names).<br />
<br />
Sometimes, however, you need to have a certificate for a given domain name available on more than one server. Such need arises e.g. if you use GeoDNS or DNS-based load balancing, i.e. answer DNS requests for your domain name (e.g., www.example.com) differently for different clients. E.g., you may want to have three servers, one in France, one in Singapore, and one in USA, and respond based on the client's IP address by returning the IP address of the geographically closest server. However, this presents a problem when trying to obtain a Let's Encrypt certificate. E.g., the HTTP-01 challenge fails out of the box because Let's Encrypt will likely connect to a different node than the one asking for the certificate, and will not find the text file that it looks for.<br />
<br />
A traditional solution to this problem would be to set up a central server, let it respond to the challenges, and copy the certificates from it periodically to all the nodes.<br />
<br />
Making the central server solve DNS-01 challenges is trivial — all that is needed is an automated way to change DNS records in your zone, and scripts are available for many popular DNS providers. I am not really comfortable with this approach, because if an intruder gets access to your central server, they can not only get a certificate and a private key for www.example.com, but also take over the whole domain, i.e. point the DNS records (including non-www) to their own server. This security concern can be alleviated by the use of CNAMEs that point _acme-challenge to a separate DNS zone with separate permissions, but doing so breaks all Let's Encrypt clients known to me. Some links: two <a href="https://github.com/certbot/certbot/issues/6566">bug</a> <a href="https://github.com/certbot/certbot/issues/5877">reports</a> for the official client, and my own GitHub <a href="https://gist.github.com/patrakov/fbf0a09c027c0d32712c8703ab614868">gist</a> for a modified Dehydrated hook script for Amazon Route 53.<br />
<br />
For HTTP-01, the setup is different: you need to make the central server available over HTTP on a separate domain name (e.g. auth.example.com), and configure all the nodes to issue redirects when Let's Encrypt tries to verify the challenge. E.g., http://www.example.com/.well-known/acme-challenge/anything must redirect to http://auth.example.com/.well-known/acme-challenge/anything, and then Certbot running on auth.example.com will be able to obtain certificates for www.example.com without the security risk inherent for DNS-01 challenges. Proxying the requests, instead of redirecting them, also works.<br />
<br />
Scripting the process of certificate distribution back to cluster nodes, handling network errors, reloading Apache (while avoiding needless restarts) and monitoring the result is another matter.<br />
<br />
So, I asked myself a question: would it be possible to simplify this setup, if there are only a few nodes in the cluster? In particular, avoid the need to copy files from server to server, and to get rid of the central server altogether. And ideally get rid of any fragile custom scripts. It turns out that, with a bit of Apache magic, you can do that. No custom scripts are needed, no ssh keys for unattended distribution of files, no central server, just some simple rewrite rules.<br />
<br />
Each of the servers will run Certbot and request certificates independently. The idea is to have a server ask someone else when it doesn't know the answer to the HTTP-01 challenge.<br />
<br />
To do so, we need to enable mod_rewrite, mod_proxy, and mod_proxy_http on each server. Also, I assume that you already have some separate domain names (not for the general public) pointing to each of the cluster nodes, just for the purpose of solving the challenges. E.g., www-fr.example.com, www-sg.example.com, and www-us.example.com.<br />
<br />
So, here is the definition of the Apache virtual host that responds to unencrypted HTTP requests. The same configuration file works for all cluster nodes.<br />
<br />
<pre>
<VirtualHost *:80>
ServerName www.example.com
ServerAlias example.com
ServerAlias www-fr.example.com
ServerAlias www-sg.example.com
ServerAlias www-us.example.com
ProxyPreserveHost On
RewriteEngine On
# First block of rules - solving known challenges.
RewriteCond /var/www/letsencrypt/.well-known/acme-challenge/$2 -f
RewriteRule ^/\.well-known/acme-challenge(|-fr|-sg|-us)/(.*) \
/var/www/letsencrypt/.well-known/acme-challenge/$2 [L]
# Second block of rules - passing unknown challenges further.
# Due to RewriteCond in the first block, we already know at this
# point that the file does not exist locally.
RewriteRule ^/\.well-known/acme-challenge/(.*) \
http://www-fr.example.com/.well-known/acme-challenge-fr/$1 [P,L]
RewriteRule ^/\.well-known/acme-challenge-fr/(.*) \
http://www-sg.example.com/.well-known/acme-challenge-sg/$1 [P,L]
RewriteRule ^/\.well-known/acme-challenge-sg/(.*) \
http://www-us.example.com/.well-known/acme-challenge-us/$1 [P,L]
RewriteRule ^/\.well-known/acme-challenge-us/(.*) - [R=404]
# HTTP to HTTPS redirection for everything not matched above
RewriteRule /?(.*) https://www.example.com/$1 [R=301,L]
</VirtualHost>
</pre>
<br />
For a complete example, add a virtual host for port 443 that serves your web application on https://www.example.com.<br />
<pre>
<VirtualHost *:443>
ServerName www.example.com
ServerAlias example.com
# You may want to have a separate virtual host or a RewriteRule
# for redirecting browsers who visit https://example.com or any
# other unwanted domain name to https://www.example.com.
# E.g.:
RewriteEngine On
RewriteCond %{HTTP_HOST} !=www.example.com [NC]
RewriteRule /?(.*) https://www.example.com/$1 [R=301,L]
# Configure Apache to serve your content
DocumentRoot /var/www/example
SSLEngine on
Include /etc/letsencrypt/options-ssl-apache.conf
# Use any temporary certificate here, even a self-signed one works.
# This piece of configuration will be replaced by Certbot.
SSLCertificateFile /etc/ssl/certs/ssl-cert-snakeoil.pem
SSLCertificateKeyFile /etc/ssl/private/ssl-cert-snakeoil.key
</VirtualHost>
</pre>
<br />
Run Certbot like this, on all servers:<br />
<pre>
mkdir -p /var/www/letsencrypt/.well-known/acme-challenge
certbot -d example.com -d www.example.com -w /var/www/letsencrypt \
--noninteractive --authenticator webroot --installer apache
</pre>
<br />
Any other Let's Encrypt client than works by placing files into a directory will also be good enough. Apache's mod_md will not work, though, because it deliberately blocks all requests for unknown challenge files, which is contrary to what we need.<br />
<br />
Let's see how it works.<br />
<br />
Certbot asks Let's Encrypt for a certificate. Let's Encrypt tells Certbot the file name that it will try to fetch, and the expected contents. Certbot places this file under /var/www/letsencrypt/.well-known/acme-challenge and tells Let's Encrypt that they can verify that it is there. Let's Encrypt resolves www.example.com (and example.com, but let's forget about it) in the DNS, and then asks for this file under http://www.example.com/.well-known/acme-challenge.<br />
<br />
If their verifier is lucky enough to hit the same server that asked for the certificate, the RewriteCond for the first RewriteRule will be true (it just tests the file existence), and, due to this rule, Apache will serve the file. Note that the rule responds not only to acme-challenge URLs, but also to acme-challenge-fr, acme-challenge-sg, and acme-challenge-us URLs used internally by other servers.<br />
<br />
If the verifier is unlucky, then the challenge file will not be found, and the second block of RewriteRule directives will come into play. Let's say that it was the Singapore server that requested the certificate (and thus can respond), but Let's Encrypt has contacted the server in USA.<br />
<br />
For the request sent by Let's Encrypt verifier, we can see that only the first rule in the second block will match. It will (conditionally on the file not being found locally, as tested by the first block) proxy the request from the server in USA to the French server, and use the "acme-challenge-fr" directory in the URL to record the fact that it is not the original request. The French server will not find the file either, so will skip the first block, and apply the second rule in the second block of RewriteRules (because it sees "acme-challenge-fr" in the URL). Thus, the request will be proxied again, this time to the Singapore server, and with "acme-challenge-sg" in the URL. As it was the Singapore server who requested the certificate, it will find the file and respond with its contents, and through the French and US servers, Let's Encrypt verifier will get the response and issue the certificate.<br />
<br />
The last RewriteRule in the second block terminates the chain for stray requests not originating from Let's Encrypt. Such requests get proxied three times and finally get a 404.<br />
<br />
The proposed scheme is, in theory, extensible to any number of servers — all that is needed is that they are all online, and the chain of proxying the request through all of them is not too slow. But, there is a limit on Let's Encrypt side on the number of duplicate certificates, 5 per week. I would guess (but have not verified) that, in practice, due to both factors, it means at most 5 servers in the cluster would be safe — which is still good enough for some purposes.
</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com0tag:blogger.com,1999:blog-7844549485270153160.post-11794713484949512002018-12-28T05:45:00.000-08:002018-12-28T05:45:50.885-08:00LDAP with STARTTLS considered harmful<div dir="ltr" style="text-align: left;" trbidi="on">
A few weeks ago, I hit a security issue on the company's LDAP server. Namely, it was not protected well-enough against misconfigured clients who send passwords in cleartext.<br />
<br />
There are two mechanisms defined in LDAP that protect passwords in transit:<br />
<br />
<ol>
<li>SASL binds</li>
<li>SSL/TLS</li>
</ol>
SASL is an extensible framework that allows arbitrary authentication mechanisms. However, all of the widely-implemented ones are either not based on passwords at all (so not suitable for our use case), or send the password in cleartext (so not better than a simple non-SASL bind), or require the server to store the password in cleartext for verification (even worse). In addition to this non-suitability for our purposes, web applications usually do not support LDAP with SASL binding.<br />
<br />
SSL/TLS, on the other hand, is a widely-supported industry standard for encrypting and authenticating the data (including passwords) in transit.<br />
<br />
OpenLDAP assigns so-called Security Strength Factor to each authentication mechanism, based on how well it protects authentication data on the network. For SSL, it is usually (but not always) the number of bits in the symmetric cipher key used during the session.<br />
<br />
For LDAP, the "normal" way of implementing SSL is to support the "STARTTLS" request on port 389 (the same port as for unencrypted LDAP sessions). There is also a not-really-standard way, with TLS right from the start of the connection (i.e. no STARTTLS request), on port 636. This is called "ldaps".<br />
<br />
OpenLDAP can require a certain minimum Security Strength Factor for authentication attempts. In slapd.conf, it is set like this: "security ssf=128". There are also related configuration directives, TLSProtocolMin, which sets the minimum SSL/TLS protocol version, and localSSF, which is the Security Strength Factor assumed on local unix-socket connections (ldapi:///).<br />
<br />
So, we configure certificates, set an appropriate security strength factor, disable anonymous bind, and that's it? No. This still doesn't prevent the password leak.<br />
<br />
Suppose that someone configures Apache like this:<br />
<br />
<pre> <Location />
AuthType basic
AuthName "example.com users only"
AuthBasicProvider ldap
AuthLDAPInitialBindAsUser on
AuthLDAPInitialBindPattern (.+) uid=$1,ou=users,dc=example,dc=com
AuthLDAPURL "ldap://ldap.example.com/ou=users,dc=example,dc=com?uid?sub?"
Require valid-user
</Location></pre>
<br />
See the mistake? They forgot the client to use STARTTLS (i.e., forgot to add the word "TLS" as the last parameter to AuthLDAPURL).<br />
<br />
Let's look what happens if a user tries to log in. Apache will connect to the LDAP server on port 389, successfully. Then, it will create a LDAP request for a simple bind, using the user's username and password, and send it. And it will be sent successfully, in cleartext over the network. Of course, OpenLDAP will carefully receive the request, parse it, and then refuse to authenticate the user, but it's too late. The password has been already sent in cleartext, and somebody between the servers has already captured it with the government-mandated tcpdump equivalent.<br />
<br />
This would not have happened if the LDAP server were listening on port 636 (SSL) only. In this case, requests to port 389 will get an RST before Apache gets a chance to send the password. And requests which use the ldaps:// scheme are always encrypted. An additional benefit is that PHP-based software that is not specifically coded to use STARTTLS for LDAP (i.e. does not contain ldap_start_tls() function call) will continue to work when given the ldaps:// URL, and I don't have to audit it for this specific issue. Isn't that wonderful? So, please (after reconfiguring all existing clients) make sure that your LDAP server does not listen on port 389, and listens securely on port 636, instead.</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com0tag:blogger.com,1999:blog-7844549485270153160.post-16372410712503361142018-10-19T05:06:00.003-07:002018-10-19T05:06:47.029-07:00I received a fake job alert<div dir="ltr" style="text-align: left;" trbidi="on">
Today I received a job alert from a Russian job site, <a href="https://moikrug.ru/">https://moikrug.ru/</a>, which I do use to get job alerts. The job title was "Application Security Engineer", and it looked like I almost qualify. So, I decided to go to the <a href="http://www.rqc.ru/">company page</a> behind the offer, and look what else they do.<br />
<br />
Result: they do a lot of interesting research, and they also have a "jobs" page, which is empty. Also, the job offer that I have received contained some links to Google Docs, and an email address not on the company's domain, which looked quite non-serious for a company of that size.<br />
<br />
So, I went to the "contact us" page, and called their phone. The secretary was unaware of the exact list of the currently opened positions, but told me that all official offers would be on a different job site, <a href="https://hh.ru/">https://hh.ru/</a>. It lists 8 positions, but nothing related to what I have received, and nothing for what I have the skills. So, we have concluded that I have received a fake job offer from impostors that illegally used the company name.<br />
<br />
Conclusion: please beware of such scams, even from big and reputable job sites.</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com0tag:blogger.com,1999:blog-7844549485270153160.post-31646703842452461012018-07-10T03:00:00.000-07:002018-07-10T22:19:29.494-07:00Possibly unexpected local access to OpenLDAP<div dir="ltr" style="text-align: left;" trbidi="on">
OpenLDAP server, slapd, can listen on multiple sockets. In Ubuntu 18.04, by default (see SLAPD_SERVICES in /etc/default/slapd), it listens on TCP port 389 (ldap:///), which is indeed the purpose of a LDAP server. It also listens on a UNIX-domain socket (ldapi:///), which is necessary for access to the config database to work for root. It, by default, does not listen on the non-standard SSL port 636 (ldaps:///), but some people add it.<br />
<br />
When configuring OpenLDAP, it is essential to set proper access control lists. People usually think in terms of anonymous users, authenticated users, subtrees, regular expressions, and such like. Then they apply the syntax documented in <a href="https://www.openldap.org/doc/admin24/access-control.html#Access%20Control%20via%20Dynamic%20Configuration">OpenLDAP admin guide</a>. Then they try to connect to port 389 with some DNs in the tree and verify that these DNs can indeed access what is needed and cannot access or modify sensitive or read-only information. Often, anonymous read access is limited only to dn.exact="", so that the search bases are discoverable by various administration tools. And then, the task of securing the OpenLDAP server is declared done.<br />
<br />
But is it really done? No! The mistake here is to test only access via port 389 and DNs from the tree.<br />
<br />
Everybody who runs slapd (and especially those who grant permissions to "users"), please follow these steps:<br />
<br />
<ol style="text-align: left;">
<li>Login to your server using ssh as an unprivileged user.</li>
<li>ldapsearch -H ldapi:/// -Y EXTERNAL -b '' -s base '*' '+'</li>
<li>Note the value of the "namingContexts" attribute. Let's say it's "dc=example,dc=com".</li>
<li>ldapsearch -H ldapi:/// -Y EXTERNAL -b 'dc=example,dc=com'</li>
<li>Verify that it is not against your security policy for local users (e.g. www-data if your web app is compromised) to be able to extract this data.</li>
</ol>
What happens here is that a local user, just by virtue of having an UID and a GID, successfully authenticates via unix-domain socket, using the "EXTERNAL" SASL mechanism. The relevant DN for authentication looks like this: gidNumber=1000+uidNumber=1000,cn=peercred,cn=external,cn=auth<br />
<br />
In other words, please be sure to close unneeded access for "dn.sub=cn=peercred,cn=external,cn=auth". Or, if you don't use the local socket for managing the configuration database (or are still on slapd.conf instead of slapd.d), consider configuring slapd not to listen on ldapi:/// at all.</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com0tag:blogger.com,1999:blog-7844549485270153160.post-45069556259000329652018-05-03T05:56:00.000-07:002018-05-03T05:56:12.033-07:00Downtime<div dir="ltr" style="text-align: left;" trbidi="on">
A few days ago I had an interesting case of a server downtime. The server is just a playground for developers, so no big deal. But still, lessons learned.<br />
<br />
The reports came almost simultaneously from developers and from the monitoring system, "cannot connect". And indeed, the server was not pingable. Someone else's server, with IP equal to the IP of our server with the last octet increased by 2, was pingable, so I concluded it was not a network problem.<br />
<br />
Next reaction: look at the server's screen, using remote KVM provided by the hoster. Kernel panic! OK, need to screenshot it (done) and reboot the server. Except that the Power Control submenu in the viewer is grayed out, so I can't. And a few months ago, when we needed a similar kind of reset, it was there.<br />
<br />
OK, so I created a ticket for resetting the server manually. And I had to remind them that the remote reboot functionality is supposed to work. Here is the hoster's reply (PDU = power distribution unit):<br />
<br />
<blockquote class="tr_bq">
<span style="background-color: white; color: #222222; font-family: Verdana, Arial, Helvetica; font-size: x-small;">Dear Alexander,</span><br style="background-color: white; color: #222222; font-family: Verdana, Arial, Helvetica; font-size: small;" /><br style="background-color: white; color: #222222; font-family: Verdana, Arial, Helvetica; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Verdana, Arial, Helvetica; font-size: x-small;">Upon checking on the PDU, the PDU is refusing connection.</span><br style="background-color: white; color: #222222; font-family: Verdana, Arial, Helvetica; font-size: small;" /><br style="background-color: white; color: #222222; font-family: Verdana, Arial, Helvetica; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Verdana, Arial, Helvetica; font-size: x-small;">We'll arrange a PDU replacement the soonest possible.</span><br style="background-color: white; color: #222222; font-family: Verdana, Arial, Helvetica; font-size: small;" /><br style="background-color: white; color: #222222; font-family: Verdana, Arial, Helvetica; font-size: small;" /><span style="background-color: white; color: #222222; font-family: Verdana, Arial, Helvetica; font-size: x-small;">We apologise for the inconvenience caused.</span></blockquote>
<br />
Everybody reading this post, now, please check that you don't fall into the same trap. Run your iKVM viewer against each of your server that it can connect to, and check that it runs, and that the menu item to reset the server still exists. Create a calendar reminder to periodically recheck it.<br />
<br />
And maybe append "panic=10" to your linux kernel command line, so that manual intervention is not needed next time.</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com0tag:blogger.com,1999:blog-7844549485270153160.post-45943392674059089862018-02-17T07:27:00.000-08:002018-02-17T07:49:49.306-08:00A case of network throughput optimization<div dir="ltr" style="text-align: left;" trbidi="on">
The company that I work for has servers in several countries, including Germany, China, USA and Malaysia. We run MySQL with replication, and also sometimes need to copy images of virtual machines or LXC containers between servers. And, until recently, this was painfully slow, except between Germany and USA. We often resorted to recreating virtual machines and containers from the same template and doing the same manipulations, instead of just copying the result (e.g. using rsync or scp). We often received <a href="http://munin-monitoring.org/">Munin</a> alerts about MySQL replication not working well (i.e.: a test UPDATE that is done every two minutes on the master is not visible on the slave), and could not do anything about it. Because, well, it is just a very slow (stabilizes at 5 Mbit/s or so between USA and Malaysia, and even worse between China and anything else) network, and it is not our network.<br />
<br />
So, it looked sad, except that raw UDP tests performed using <a href="https://iperf.fr/">iperf</a> indicated much higher bandwidth (95 Mbit/s between USA and Malaysia, with only 0.034% packet loss) than what was available for scp or for MySQL replication between the same servers. So, it was clearly the case that the usual "don't tune anything" advice is questionable here, and system could, in theory, work better.<br />
<br />
For the record, the latency, as reported by ping between the servers in USA and Malaysia, is 217 ms.<br />
<br />
The available guides for Linux network stack tuning usually begin with sysctls regarding various buffer sizes. E.g., setting <span style="font-family: "courier new" , "courier" , monospace;">net.core.rmem_max</span> and <span style="font-family: "courier new" , "courier" , monospace;">net.core.wmem_max</span> to bigger values based on the bandwidth-delay product. In my case, the estimated bandwidth-delay product (which is the same as the amount of data in flight) would be about 2.7 megabytes. So, setting both to 8388608 and retesting with a larger TCP window size (4 M) should be logical. Except, it didn't really work. The throughput was only 8 Mbit/s instead of 5. I didn't try to modify <span style="font-family: "courier new" , "courier" , monospace;">net.ipv4.tcp_rmem</span> or <span style="font-family: "courier new" , "courier" , monospace;">net.ipv4.tcp_wmem</span> because the default values were already of the correct order of magnitude.<br />
<br />
Other guides, including <a href="https://access.redhat.com/sites/default/files/attachments/20150325_network_performance_tuning.pdf">the official one from RedHat</a>, talk about things like NIC ring buffers, interrupts, adapter queues and offloading. But these things are relevant for multi-gigabit networks, not for the mere 95 Mbit/s that we are aiming at.<br />
<br />
The thing that actually helped was to change the TCP congestion control algorithm. This algorithm is what decides when to speed up data transmission and when to slow it down.<br />
<br />
Linux comes with many modules that implement TCP congestion control algorithms. And, in newer kernels, there are new algorithms and some improvements in the old ones. So, it pays off to install a new kernel. For Ubuntu 16.04, this means installing the <span style="font-family: "courier new" , "courier" , monospace;">linux-generic-hwe-16.04-edge</span> package.<br />
<br />
The available modules are in <span style="font-family: "courier new" , "courier" , monospace;">/lib/modules/`uname -r`/kernel/net/ipv4/</span> directory. Here is how to load them all, for testing purposes:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">cd /lib/modules/`uname -r`/kernel/net/ipv4/</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">for mod in tcp_*.ko ; do modprobe -v ${mod%.ko} ; done</span><br />
<br />
<span style="font-family: inherit;">For each of the loaded congestion control algorithms, it is possible to run iperf with the </span><span style="font-family: "courier new" , "courier" , monospace;">--linux-congestion <algo></algo></span> parameter to benchmark it. Here are the results in my case, as reported by the server, with 4 M window (changed by the kernel to 8 M).<br />
<br />
bbr: 56.7 Mbits/sec<br />
bic: 24.5 Mbits/sec<br />
cdg: 0.891 Mbits/sec<br />
cubic: 8.38 Mbits/sec<br />
dctcp: 17.6 Mbits/sec<br />
highspeed: 1.50 Mbits/sec<br />
htcp: 3.55 Mbits/sec<br />
hybla: 20.6 Mbits/sec<br />
illinois: 7.24 Mbits/sec<br />
lp: 2.13 Mbits/sec<br />
nv: 1.47 Mbits/sec<br />
reno: 2.36 Mbits/sec<br />
scalable: 2.50 Mbits/sec<br />
vegas: 1.51 Mbits/sec<br />
veno: 1.70 Mbits/sec<br />
westwood: 3.83 Mbits/sec<br />
yeah: 3.20 Mbits/sec<br />
<br />
The condition that the speeds mentioned above are from the server-side reports (iperf server is the receiver of the data) is important. The client always reports higher throughput. This happens because the kernel buffers client's data and says "it has been finished" even though a lot of data sits in the buffer still waiting to be sent. The server sees the actual duration of the transfer and is thus in the position to provide an accurate report.<br />
<br />
A good question is whether a large window or <span style="font-family: "courier new" , "courier" , monospace;">net.core.rmem_max</span> and <span style="font-family: "courier new" , "courier" , monospace;">net.core.wmem_max</span> is really needed. I don't think that benchmarking all algorithms again makes sense, because bbr is the clear winner. Actually, for cdg, which is the worst algorithm according to the above benchmark, leaving the window size and r/wmem_max at their default values resulted in a speed boost to 6.53 Mbits/sec. And here are the results for bbr:<br />
<br />
Default window size, default r/wmem_max: 56.0 Mbits/sec<br />
Default window size (85 or 128 KB), 8M r/wmem_max: 55.4 Mbits/sec<br />
4M window, 8M r/wmem_max: 56.7 Mbits/sec (copied from the above)<br />
<br />
I.e.: in this case, the <i>only</i> tuning needed was to switch the TCP congestion control algorithm to something modern. We did not achieve the maximum possible throughput, but even this is a 10x improvement.<br />
<br />
Here is how to make the changes persistent:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">echo tcp_bbr > /etc/modules-load.d/tcp.conf</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">echo net.ipv4.tcp_congestion_control=bbr > /etc/sysctl.d/91-tcp.conf</span><br />
<br />
There are some important notes regarding the bbr congestion control algorithm:<br />
<br />
<ol style="text-align: left;">
<li>It is only available starting with linux-4.9.</li>
<li>In kernels before 4.13, it only operated correctly when combined with the "fq" qdisc.</li>
<li>There are also important fixes, regarding recovery from the idle state of the connection, that happened in the 4.13 timeframe.</li>
</ol>
In other words, just use the latest kernel.<br />
<br />
I will not repeat the mechanism due to which bbr is good on high-latency high-throughput slightly-lossy networks. <a href="https://groups.google.com/forum/#!forum/bbr-dev">Google's presentations</a> do it better. Google uses it for youtube and other services, and it needs to be present on sender's side only. And it eliminated MySQL replication alerts for us. So maybe you should use it, too?</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com0tag:blogger.com,1999:blog-7844549485270153160.post-65632587629689817232016-06-30T20:34:00.001-07:002016-06-30T20:34:53.444-07:00If you want to run a business in China<div dir="ltr" style="text-align: left;" trbidi="on">
...then you will need a Chinese phone number. I.e. a phone number with the country code +86. Your customers will use this number to reach your company, and you will use this number for outgoing calls to them, too.<br />
<br />
There are many SIP providers that offer Chinese phone numbers, but not all of them are good. Here is why.<br />
<br />
The phone system in China has an important quirk: it mangles Caller ID numbers on incoming international calls. This is not VoIP specific, and applies even to simple mobile-to-mobile calls. E.g., my mobile phone number in Russia starts with +7 953, and if I place a call to almost any other country, they will see that +7 953 XXX XXXX is calling. But, if I call a phone number in China, they will instead see something else, with no country code and no common suffix with my actual phone number.<br />
<br />
The problem is that some SIP providers land calls to China (including calls from a Chinese number obtained from their pool) on gateways that are outside China. If you use such provider and call a Chinese customer, they will not recognize you, because the call will be treated as international (even though it is intended to be between two Chinese phone numbers), and your caller ID will be mangled.<br />
<br />
As far as I know, there is no way to tell if a SIP provider is affected by this problem, without trying their service or calling their support.</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com4tag:blogger.com,1999:blog-7844549485270153160.post-46338527350662780422016-05-24T13:55:00.000-07:002016-05-24T13:55:20.473-07:00Is TSX busted on Skylake, too? No, it's just buggy software<div dir="ltr" style="text-align: left;" trbidi="on">
The story about Intel <a href="https://techreport.com/news/26911/errata-prompts-intel-to-disable-tsx-in-haswell-early-broadwell-cpus">recalling</a> Transactional Synchronization Extensions<br />
from Haswell and Broadwell lines of their CPUs by means of a microcode update has hit the web in the past. But it looks like this is not the end of the story.<br />
<br />
The company I work for has a development server in Hetzner, and it uses this type of CPU:<br />
<br />
<br />
<pre>processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 94
model name : Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
stepping : 3
microcode : 0x39
cpu MHz : 3825.265
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt
tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep
bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1
dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs :
bogomips : 6816.61
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
</pre>
<br />
<br />
I.e. it is a Skylake. The server is running Ubuntu 16.04, and the CPU has HLE and RTM families of instructions.<br />
<br />
One of my recent tasks was to prepare, on this server, an LXC container based on Ubuntu 16.04 with a lightweight desktop accessible over VNC, for "remote classroom" purposes. We already have such containers on other servers, but they were based on Ubuntu 14.04. Such containers work well on this server, too, but it's time to upgrade. In these old containers, we use a regular Xorg server with a "dummy" video driver, and export the screen using <a href="http://www.karlrunge.com/x11vnc/">x11vnc</a>.<br />
<br />
So, I decided to clone the old container and update Ubuntu there. Result: x11vnc, or sometimes Xorg, now crashes (SIGSEGV) when one attempts to change the desktop resolution. The backtrace points into the __lll_unlock_elision() function which is a part of glibc implementation of mutexes for CPUs with Hardware Lock Elision instructions.<br />
<br />
This crash doesn't happen when I run the same container on a server with an older CPU (which doesn't have TSX in the first place), or if I try to reproduce the bug at home (where I have a Haswell, with TSX disabled by the new microcode).<br />
<br />
So, all apparently points to a bug related to these extensions. Or does it?<br />
<br />
The __lll_unlock_elision() function has this helpful comment in it:<br />
<br />
<pre> /* When the lock was free we're in a transaction.
When you crash here you unlocked a free lock. */
</pre>
<br />
And indeed, there is some <a href="https://bugs.debian.org/807244">discussion of another crash</a> in __lll_unlock_elision(), related to NVidia driver (which is not used here). In that discussion, it was highlighted that an unlock of already-unlocked mutex would be silently ignored if a mutex implementation not optimized for TSX is used, but a CPU with TSX would expose such latent bug. Locking balance bugs are easily verified using valgrind. And indeed:<br />
<br />
<pre>DISPLAY=:1 valgrind --tool=helgrind x11vnc
...
==4209== ---Thread-Announcement------------------------------------------
==4209==
==4209== Thread #1 is the program's root thread
==4209==
==4209== ----------------------------------------------------------------
==4209==
==4209== Thread #1 unlocked a not-locked lock at 0x9CDA00
==4209== at 0x4C326B4: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==4209== by 0x4556B2: ??? (in /usr/bin/x11vnc)
==4209== by 0x45A35E: ??? (in /usr/bin/x11vnc)
==4209== by 0x466646: ??? (in /usr/bin/x11vnc)
==4209== by 0x410E30: ??? (in /usr/bin/x11vnc)
==4209== by 0x717D82F: (below main) (libc-start.c:291)
==4209== Lock at 0x9CDA00 was first observed
==4209== at 0x4C360BA: pthread_mutex_init (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==4209== by 0x40FECC: ??? (in /usr/bin/x11vnc)
==4209== by 0x717D82F: (below main) (libc-start.c:291)
==4209== Address 0x9cda00 is in the BSS segment of /usr/bin/x11vnc
==4209==
==4209==
</pre>
<br />
It is a software bug, not a CPU bug. But still - until such bugs are eliminated from the distribution, I'd rather not use it on a server with a CPU with TSX.</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com4tag:blogger.com,1999:blog-7844549485270153160.post-36092801147989084502016-05-08T07:52:00.001-07:002016-05-08T08:13:52.350-07:00Root filesystem snapshots and kernel upgrades<div dir="ltr" style="text-align: left;" trbidi="on">
On my laptop (which is running Arch), I decided to have periodic snapshots of the filesystem, in order to revert bad upgrades (especially those involving a large and unknown set of interdependent packages) easily. My toolset for this task is <a href="https://sourceware.org/lvm2/">LVM2</a> and <a href="http://snapper.io/">Snapper</a>. Yes, I know that LVM2 is kind-of discouraged, and Snapper also supports btrfs, but most of the points below apply to btrfs, too.<br />
<br />
Snapper, when used with LVM2, requires not just LVM2, but thinly-provisioned LVM2 volumes. Fortunately, Arch <a href="https://wiki.archlinux.org/index.php/LVM#Special_preparations_for_root_on_thinly-provisioned_volume">can have root filesystem</a> on such volumes, so this is not a problem.<br />
<br />
So, I have /boot on /dev/sda1, LVM on LUKS on /dev/sda2, root on a thinly-provisioned logical volume, and /home on another thinly-provisioned volume. And also swap on a non-thinly-provisioned volume. A separate /boot partition is needed because boot loaders generally don't understand thinly-provisioned LVM volumes, especially on encrypted disks. A separate volume for /home is needed because I don't want my new files in /home to be lost if I revert the system to its old snapshot. The same need to make a separate volume applies to other directories that contain data that should be preserved, but there are no such directories on my laptop. They can appear if I install e.g PostgreSQL.<br />
<br />
And now there is a problem. Rollback to a snapshot works, but only if there were no kernel updates between the time when the snapshot was taken and when an attempt to revert was made. The root cause is that the kernel image is in /boot, and loadable modules for it are in /usr/lib/modules. The modules are reverted, but the boot loader still loads a new kernel, which now has no corresponding modules.<br />
<br />
There are two solutions: either revert the kernel and its initramfs, too, when reverting the root file system, or make sure that modules are not reverted. I have not investigated how to make the first option possible, even though it would be a perfect solution. However, I have tried to make sure that modules are not reverted, and I am not satisfied with the result.<br />
<br />
The idea was to move modules to /boot/modules, and make this location available somehow as /usr/lib/modules. Here "somehow" can mean either a symlink, or a bind mount. A symlink doesn't work, because the kernel upgrade in Arch will restore it back to a directory. A bind mount doesn't work, either. The issue is that, by putting modules on non-root filesystem, one creates a circular dependency between local filesystem mounting and udev (this would apply to a symlink, too).<br />
<br />
Indeed, systemd-udevd, on startup, maps the /usr/lib/modules/`uname -r`/modules.alias.bin file into memory. So, now it has a (real) dependency on /usr/lib/modules being mounted. However, mounting local filesystems from /etc/fstab sometimes depends on systemd-udevd, because of device nodes. So, bind-mounting /usr/lib/modules merely from /etc/fstab, using built-in systemd tools, cannot work.<br />
<br />
But it can work from a wrapper that starts before the real init:<br />
<code></code><br />
<pre><code>#!/bin/sh
mount -n /boot # /dev/sda1 is in devtmpfs and doesn't need udev
mount -n /usr/lib/modules # there is still a line in fstab about that
exec /sbin/init "$@" </code></pre>
<br />
But that's ugly. In the end, I removed the wrapper, installed an old known-working "linux" package, made a copy of the kernel, its initramfs and modules, upgraded the kernel again, and put the saved files back, so that they are now not controlled by the package manager. So now I have a known good kernel down in the boot menu, and knowledge that its modules will always be present in my root filesystem if I don't revert further than up to today's state.<br />
<br />
And now one final remark. Remember that I said: "The same need to make a separate volume applies to other directories that contain data that should be preserved"? There is a temptation to apply this to the whole /var directory, but that would be wrong. If a system is being reverted to its old snapshot, a package database (which is in /var/lib/pacman) should be reverted, too. But /var/lib/pacman is under /var.<br />
<br />
The conclusion is that Linux plumbers should think a bit about this "revert the whole system" use case, and maybe move some directories.</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com0tag:blogger.com,1999:blog-7844549485270153160.post-67377006144965126142015-12-20T07:56:00.001-08:002015-12-20T07:56:47.598-08:00Ready to drop Gentoo<div dir="ltr" style="text-align: left;" trbidi="on">
I was a Gentoo user since 2010. For me, it was, at that time, a source of fresh, well-maintained packages, without the multimedia related US-lawyer-induced brain damage that plagued Debian. Also, by compiling the packages on my local PC, it neatly sidestepped legal problems related to redistribution of GPL-ed packages with GPL-incompatible dependencies, and trademark issues related to Mozilla products. Also, it offered enough choice in the form of USE flags to sidestep too-raw technologies.<br />
<br />
Today, I am re-evaluating this decision. I still care about perfect multimedia support, even if relies on technologies that are illegal in some country (even if that country is my own). I still care about Firefox identifying itself as Firefox in the User-Agent header, as to avoid broken sites (such as <a href="https://room.co/">https://room.co/</a>), but I don't want to use binaries from Mozilla, because they rely on outdated technology (i.e. are appropriate to something like RHEL 5). And, obviously, I care about modern and bug-free packages, or at least about non-upstream bugs (and, ideally, upstream bugs, too) being fixed promptly.<br />
<br />
Also, I rely on a feature that is not found upstream in any desktop environment anymore: full-screen color correction, even in games. Yes, I have a colorimeter.<br />
<br />
This was necessary with my old Sony VAIO Z23A4R laptop, because it had a wide-gamut screen (94% coverage of Adobe RGB) and produced very oversaturated colors by default. This is also necessary on my new laptop, Lenovo Ideapad Yoga 2 Pro, because otherwise it is very hard to convince it to display the yellow color. Contrary to popular claims, it <i>can</i> display yellow, even in Linux, given the exact RGB values, but even slight changes (that would only produce a slightly different shade of yellow on normal screens) cause it to display either yellowish-red or yellowish-green color.<br />
<br />
So, it must be easy for me to install extra packages (such as <a href="http://sourceforge.net/projects/compicc/">CompICC</a>) from source, and, ideally, have them integrated into package management. And, the less the number of such extra packages needed for full-screen color correction, the better.<br />
<br />
Now back to Gentoo. It still allows me to ignore lawyers, too-radical Free Software proponents, and their crippling effect on the software that I want to use. It, mostly, still allows me to take suspicious too-new infrastructure out of the equation. For full-screen color correction, I need exactly one ebuild that is not in the main Portage tree (CompICC). But other packages started to suffer from bitrot.<br />
<br />
Problem 1: MATE desktop environment stuck at version 1.8. Probably just due to lack of manpower to review the updates. This is <a href="https://bugs.gentoo.org/show_bug.cgi?id=551588">bug 551588</a>.<br />
Problem 2: Attempt to upgrade GNOME to version 3.18 brought in a lot of C++11 related breakage that wasn't handled promptly enough, e.g., by reverting the upgrade. This is <a href="https://bugs.gentoo.org/show_bug.cgi?id=566328">bug 566328</a>.<br />
Problem 3: QEMU will not let Windows 8 guests to use resolutions higher than 1024x768. Upstream QEMU does not have this bug - it is an invention of overzealous unbundling that replaced a perfectly working bundled version of VGA BIOS with an inferior copy of Bochs VGA BIOS. This is <a href="https://bugs.gentoo.org/show_bug.cgi?id=529862">bug 529862</a>.<br />
<br />
I don't yet know which Linux distribution I will use. Maybe Arch (but it requires so much stuff from AUR to build CompICC! maybe I should use Compiz-CMS instead), maybe something else. We'll see.</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com3tag:blogger.com,1999:blog-7844549485270153160.post-76665061821915698362015-10-18T11:24:00.001-07:002015-10-18T11:27:05.307-07:00Still using icims.com for recruiting? Think again!<div dir="ltr" style="text-align: left;" trbidi="on">
<div>
If your company has open vacancies and uses some system for pre-screening candidates (e.g. by giving them questions), I have a "small" task for you. Go to your system, answer the questions as if you were a candidate, validate the answers as you would expect from a candidate (e.g. actually perform the actions that the answer describes), and then save the results. Look at the whole process. Make a conclusion for yourself whether your system is usable for the stated purpose. Communicate it to your management, if needed.<br />
<br />
If you are using <a href="http://icims.com/">icims.com</a> for hiring technical candidates, the answer is most probably "not suitable at all".<br />
<br />
The most annoying bug that icims.com has is that it does not allow the candidate to enter certain characters in certain positions. The exact error message is:<br />
<blockquote class="tr_bq">
<span class="iCIMS_Forms_ErrorMsg">Q3 2
Contains invalid characters. You cannot use the characters: ' " \
/ or ` in an enclosing instance of <>, <<, >> or
><.</span></blockquote>
This triggers at least on the following types of input:<br />
<ul style="text-align: left;">
<li>XML or HTML</li>
<li>Command redirections, e.g.: echo "foo bar" >> baz.txt </li>
<li>Sequences of menu items to click, e.g.: "File > New > Folder", if a bad character happens to be before that</li>
</ul>
</div>
So, you cannot ask questions about HTML, shell scripting, or even general questions about using GUI-based applications.<br />
<br />
This error message probably means that they are concerned about XSS attacks. However, filtering out invalid characters is a very sloppy way of protection against such attacks. And it imposes completely unreasonable restrictions on the user input.<br />
<br />
In fact, any kind of input (including XML, bash scripts or text about clicking the menu) should be suitable, and can be made to display safely and properly in any browser, just by escaping the special characters when generating the HTML page. Many template engines exist that do this escaping for you automatically. Today, there is simply no reason not to use them.<br />
<br />
If a candidate sees such error, he/she becomes demotivated. It is a stupid barrier before getting the correct answer to you. It also indicates that you don't care about your customers (by choosing business partners that allow such sloppy practices). Worse, some of your candidates (who see icims.com for the first time) can think that it is your product, or your internal system, and that <i>you</i> (not icims.com) have web developers with insufficient skills. I.e. that your company is not good enough to work in, because you don't weed out underqualified workers.<br />
<br />
You don't want to lose candidates. So you don't want to use icims.com. Really.</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com1tag:blogger.com,1999:blog-7844549485270153160.post-68083791137113647602014-09-15T12:15:00.000-07:002014-09-15T12:15:38.512-07:00Why static analyzers should see all the code<div dir="ltr" style="text-align: left;" trbidi="on">
Just for fun, I decided to run a new "<a href="https://github.com/jgm/stmd">standard markdown</a>" C code through a <a href="http://clang-analyzer.llvm.org/">static analyzer</a> provided by the <a href="http://clang.llvm.org/">Clang</a> project. On the surface, this looks very easy:<br />
<br />
<code></code><br />
<pre><code>CCC_CC=clang scan-build make stmd
</code></pre>
<br />
It even finds bugs. A lot of dead assignments, and some logic & memory errors: dereferencing a null pointer, memory leaks and a double-free. However, are they real?<br />
<br />
E.g., it complains that the following piece of code in src/bstrlib.c introduces a possible leak of memory pointed by buff which was previously allocated in the same function:<br />
<br />
<code></code><br />
<pre><code>bdestroy (buff);
return ret;
</code></pre>
<br />
It does not understand that bdestroy is a memory deallocation function. Indeed, it could be anything. It could be defined in a different file. It indeed does not destroy the buffer and thus leaks the memory if some integrity error occurs (and the return code is never checked).<br />
<br />
So indeed, the code of bdestroy smells somewhat. But is it a problem? How can we trick clang into understanding that this can't happen?<br />
<br />
Part of the problem stems from the fact that clang looks at one file at a time and thus does not understand dependencies between functions defined in different files. There is, however, a way to fix it.<br />
<br />
All we need to do is to create a C source file that includes all other C source files. Let's call it "all.c".<br />
<br />
<code></code><br />
<pre><code>#include "blocks.c"
#include "bstrlib.c"
#include "detab.c"
#include "html.c"
#include "inlines.c"
#include "main.c"
#include "print.c"
#include "scanners.c"
#include "utf8.c"
</code></pre>
<br />
Unfortunately, it does not compile out of the box, because of the conflicting "advance" macros in inlines.c and utf8.c (fixable by undefining these macros at the end of each file), and because of the missing header guard around stmd.h (fixable trivially by adding it). With that, one can submit this all-inclusive file to the static analyzer:<br />
<br />
<pre><code>
scan-build clang -g -O3 -Wall -std=c99 -c -o src/all.o src/all.c
</code></pre>
<br />
Result: no bugs found, except dead assignments.<br />
<br />
<br /></div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com0tag:blogger.com,1999:blog-7844549485270153160.post-73504246975586911602014-05-17T09:58:00.002-07:002014-05-17T09:58:45.460-07:00Antispam misconfigurations<div dir="ltr" style="text-align: left;" trbidi="on">
<h3 style="text-align: left;">
Introduction </h3>
<div dir="ltr" style="text-align: left;" trbidi="on">
This blog post is about ensuring correct operation of one particular antispam solution. However, I think that the thoughts about possible misconfigurations expressed here apply to most of them.<br />
<br />
The following combination of mail-related software is quite popular: <a href="http://www.postfix.org/">Postfix</a> + <a href="http://dspam.nuclearelephant.com/">DSPAM</a> + <a href="http://www.dovecot.org/">Dovecot</a>. Each of these products comes with an extensive user manual, and packages are available for almost every linux distribution. So, I decided to use it for the company mail. In fact, Postfix and Dovecot were already installed (with all users being virtual), and it only remained to install DSPAM, because spam became a problem for some users.<br />
<br />
Here is what kinds of non-spam messages go through our server: business mail (invoices, documents, commercial offers), technical support, discussions within the team, bugtracker tickets, automated notifications (e.g. when contracts are about to expire).<br />
<br />
There are many manuals on setting up DSPAM together with Postfix and Dovecot. Below are the common things mentioned in them.<br />
<br />
Postfix should pass the incoming mail into DSPAM. The preferred protocol for doing this is LMTP over a unix-domain socket. DSPAM should add X-DSPAM-* headers to them and reinject into Postfix. Then Postfix should contact Dovecot via LMTP, and then the message finally gets delivered to the user's mailbox (or the spam folder, with the help of a sieve filter). If DSPAM makes a mistake, the user can move the message appropriately via IMAP, and the dovecot-antispam plugin will train DSPAM about this incident.<br />
<br />
So far so good. I installed DSPAM (with a simple hash driver backend) and configured the rest of mail-related software to use it. It even appeared to work for me after initial training. But then, we encountered problems, not explicitly mentioned in the manuals, described below. If you are reading this post, please test your mail servers for them, too.<br />
<h3 style="text-align: left;">
Training did not work for some users</h3>
<div dir="ltr" style="text-align: left;" trbidi="on">
Some users, including myself, used their full e-mail (including the company domain) as their IMAP username, and some didn't include the domain part. Both setups worked for sending and receiving mail. However, in the initial configuration, the user's login was passed to dspam-train as-is:</div>
<br />
<pre>antispam_dspam_args = --deliver=;--client;--user;%u</pre>
<br />
Result: for some users (those who didn't append the domain to their IMAP username), the retraining process looked for the hash file in /var/spool/dspam/data/local, while that hash file is always in /var/spool/dspam/data/ourdomain.ru. The fix is to spell the domain explicitly:<br />
<br />
<pre>antispam_dspam_args = --deliver=;--client;--user;%n@ourdomain.ru</pre>
<br />
In fact, I think that any use of %u in Dovecot configuration is wrong if you have only one domain on the mail server.<br />
<h3 style="text-align: left;">
Duplicate e-mail from monitoring scripts</h3>
<div style="text-align: left;">
Monitoring scripts send e-mail to root@ourdomain.ru from other hosts if something bad happens. However, after configuring DSPAM, each of such messages arrived twice to my mailbox. This happened because the "root" alias is expanded recursively (this is OK, as root is virtual and has nothing to do with uid 0). We want to archive all root mail for easy reference, as well as to deliver it to the actual sysadmins. The alias expansion happened twice: once before DSPAM and once after it. The solution is to disable it once. I disabled it before DSPAM:<br />
<br />
<pre>smtp inet n - n - - smtpd
-o content_filter=lmtp:unix:/var/run/dspam/dspam.sock
-o receive_override_options=no_address_mappings
</pre>
<br />
However, this was a mistake.
</div>
<h3 style="text-align: left;">
Training still did not work for sales</h3>
The sales team complained that they were not able to train DSPAM so that the incoming commercial queries end up in their inbox, and not in the spam folder. Manual training didn't help, either. This appeared to be a variation of the first problem: wrong path to the hash file.<br />
<br />
The sales team has a "sales" mail alias that expands to all of them. As such, due to the previous "fix", Postfix told DSPAM that the mail is addressed to sales@ourdomain.ru:<br />
<br />
<pre>smtp inet n - n - - smtpd
-o content_filter=lmtp:unix:/var/run/dspam/dspam.sock
-o receive_override_options=no_address_mappings
</pre>
<br />
Thus, DSPAM placed the hash file in /var/spool/dspam/data/ourdomain.ru/sales, while the training process looked in /var/spool/dspam/data/ourdomain.ru/$person. The solution was to move the no_address_mappings option after DSPAM, i.e. the reinjection service. This way, both DSPAM and the dovecot-antispam plugin see the expanded recepient addresses.<br />
<h3 style="text-align: left;">
Some e-mail from new team members was marked as spam</h3>
A general expectation is that authenticated e-mail from one user to the other user on the same corporate mail server is not spam. However, the new team members (and even some old ones) misconfigured their e-mail clients to use port 25 (with STARTSSL and authentication) for outgoing e-mail. As such, all their outgoing e-mail was processed by DSPAM, because the only factor that decides whether to process the e-mail is the port. The solution was to educate everyone on the team to use port 587 for outgoing e-mail, which is not configured to process messages with DSPAM. Also it would have been nice to make authentication always fail on port 25, but I didn't do this yet.<br />
<h3 style="text-align: left;">
Outgoing e-mail was sometimes marked as spam</h3>
The general expectation is that outgoing mail should never be marked as spam, even if it is spam. If you disagree, then please note that there is nobody to notice the problem, and nobody except root can retrain the spam filter in such case.<br />
<br />
This is mostly a duplicate of the previous item, with an interesting twist. Namely, there are some web scripts and cron jobs that send mail to external users, and both connect to 127.0.0.1:25 without authentication. I solved this by splitting the default smtp line in master.cf into two: one for 127.0.0.1:smtp, and one for my external IP address. Spam filtering is enabled only for the second line.<br />
<h3 style="text-align: left;">
Conclusion</h3>
<br />
It works! Or at least pretends to work. With so many pitfalls already seen, I cannot be sure.<br />
<br />
<br />
</div>
</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com0tag:blogger.com,1999:blog-7844549485270153160.post-62572525044883093202012-12-18T20:59:00.000-08:002012-12-18T20:59:37.176-08:00Stupid MySQL Help and Parser<div dir="ltr" style="text-align: left;" trbidi="on">
Just stumbled upon this:<br />
<code></code><br />
<pre><code>mysql> <b>help CAST</b>
Name: 'CAST'
Description:
Syntax:
CAST(expr AS type)
The CAST() function takes a value of one type and produce a value of
another type, similar to CONVERT(). See the description of CONVERT()
for more information.
URL: <a href="http://dev.mysql.com/doc/refman/5.1/en/cast-functions.html">http://dev.mysql.com/doc/refman/5.1/en/cast-functions.html</a>
mysql> <b>select CAST ('2012-01-01 12:00:00' AS DATETIME);</b>
ERROR 1305 (42000): FUNCTION CAST does not exist
mysql> <b>select CAST('2012-01-01 12:00:00' AS DATETIME);</b>
+-----------------------------------------+
| CAST('2012-01-01 12:00:00' AS DATETIME) |
+-----------------------------------------+
| 2012-01-01 12:00:00 |
+-----------------------------------------+
1 row in set (0,00 sec)
</code></pre>
<p>I.e. the help claims that CAST is a function, but if you call it with a stray space after the name, it does not exist.</p>
</div>Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com1tag:blogger.com,1999:blog-7844549485270153160.post-5881637509741793012012-10-06T06:39:00.000-07:002012-10-06T06:39:16.668-07:00Monkey-patching bash functions<div dir="ltr" style="text-align: left;" trbidi="on">
Disclaimer: the whole post is a big bad hack.<br />
<br />
Suppose that there is a big regularly-updated library of bash functions, and your script sources it or is sourced by it. One of these functions is not exactly right for your purpose (e.g. it contains a bug or misses a feature), but fixing the bug or adding the feature there is not practical. This might happen if the file containing the function library is managed by a package manager, and your bugfix will be overwritten at the next package update.<br />
<br />
A straightforward sledgehammer-like solution is to make a copy of the library, fix the bug there, and source or be sourced by your modified copy (thus losing all future updates). This is not good.<br />
<br />
If the offending function is called by your script directly, then, of course, you can define a differently-named function that is otherwise-identical to the original one, but has the needed fix, directly in your script, and use it. However, this approach does not work (or, rather, requires you to duplicate the whole call chain) if your script calls the offending function only indirectly.<br />
<br />
A possibly-better solution (that may or may not work) is to redefine the offending function in your script. Indeed, out of many possibly existing definitions for a function bash uses the last one it encountered. Here is an interactive example of such overloading:<br />
<pre><code>
$ <b>barf() { echo "123" ; }</b>
$ <b>barf</b>
123
$ <b>barf() { echo "456" ; }</b>
$ <b>barf</b>
456
</code></pre>
<br />
So, now you know how to completely replace a function. But, what if only a small change is required? E.g., if one command is missing at the end? This is also solvable thanks to an introspection feature of bash. I am talking about the "type" builtin. Here is what it does when applied to a function:<br />
<br />
<pre><code>
$ <b>type barf</b>
barf is a function
barf ()
{
echo "456"
}
</code></pre>
<br />
So, you have one line of a meaningless header and then the full slightly-reformatted source of the function on standard output. Let's grab this into a variable:<br />
<pre><code>
$ <b>def=$( type barf )</b>
</code></pre>
<br />
You can then post-process it. E.g., let's transform this into a definition of a function that does exactly the same but also prints "789" at the end. The easiest way to do that is to remove the first line (the header) and insert echo "789" before the last line. Uhm, it is not easy to remove a line from a variable in pure bash... no problem, we'll comment it out instead!
<br />
<pre><code>
$ <b>def="# $( type barf )"</b>
$ <b>echo "$def"</b>
# barf is a function
barf ()
{
echo "456"
}
</code></pre>
<br />
And now remove the last character (a closing brace) and replace it with the correct code:
<br />
<pre><code>
$ <b>def="${def%\}}echo \"789\" ; }"</b>
$ <b>echo "$def"</b>
# barf is a function
barf ()
{
echo "456"
echo "789" ; }
</code></pre>
<br />
All that remains is to feed the contents of this variable back to bash as a piece of code. That's what eval does:
<br />
<pre><code>
$ <b>eval "$def"</b>
</code></pre>
<br />
Now the function looks correct and does what is intended:
<br />
<pre><code>
$ <b>type barf</b>
barf is a function
barf ()
{
echo "456";
echo "789"
}
$ <b>barf</b>
456
789
</code></pre>
</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com3tag:blogger.com,1999:blog-7844549485270153160.post-43042050141343456922012-08-27T01:51:00.000-07:002012-08-27T02:04:06.914-07:00SafeDNS.com is now open for registration<div dir="ltr" style="text-align: left;" trbidi="on">
The company I work for decided to establish its presence in USA. So, please welcome a new player in the international market of content filtering solutions: <a href="https://www.safedns.com/">SafeDNS</a>. Strictly speaking, we are still in beta and need YOU to help us kill the last bugs.<br />
<br />
This service can be useful if you want to protect yourself or your kids against accidentally opening sites with unsuitable content. Or, to prevent your employees from wasting time at work on such things as social networking and videos. Or even to evade a bad filter set up by your ISP :)<br />
<br />
We have more than 4 mln sites sorted into more than 50 categories, and it's you who decides what to block and what to let through. All you need is an e-mail address, a public static IPv4 address on your router and the ability to change DNS settings on your computers.<br />
<br />
<a href="https://www.safedns.com/auth/register" rel="nofollow">Register now</a> (it's free!), read the <a href="https://www.safedns.com/guide/">guide</a>, and help us improve the service by sending <a href="https://www.safedns.com/feedback" rel="nofollow">feedback</a>.</div>
Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com0tag:blogger.com,1999:blog-7844549485270153160.post-40972423877962177502012-07-28T00:01:00.002-07:002012-07-28T00:01:28.009-07:00On the negative side of duties<div dir="ltr" style="text-align: left;" trbidi="on">
Yesterday Lennart Poettering presented a talk with the title "The UI and the OS". The main idea of the talk was simplification of the architecture of the whole OS. <span style="background-color: white;">On one of the slides, there was approximately this seemingly-obvious text: "The distributions' job is to distribute your software".</span><br />
<br />
This phrase (while completely appropriate for the slide in question) has reminded me one of the first lessons on philosophy that I received from <a href="http://www.famous-scientists.ru/12139/">G. V. Boldygin</a> in the Urals State University while studying there. On that lesson, we discussed the role of science (including theoretical science) in the society. The obvious idea immediately proposed: theoretical science leads to technological progress by telling what to do that nobody else did before. The surprising fact given to us as students was that there were significant practical technological advances (achieved by trial-and-error) even before theorecical science was formed. The conclusion was that it is almost always incorrect to refer to only one side of something's role of duties (i.e. to something that is done, as opposed to something that is prevented, or vice versa). Indeed, we were told that science not only provides technological progress directly, but also saves us from wasting money and time on certain "scientifically impossible" projects such as trying to build a perpetual motion machine.<br />
<br />
So, let's apply this lesson to refute Lennart's phrase: distributions exist not only to distribute your software. Indeed, as illustrated by the Windows world, one doesn't need any help for this. The other side of their duties is that they reject bad (in their opinion) software by not distributing it, thus creating an anchor of trust for software they do provide to users.</div>Alexander E. Patrakovhttp://www.blogger.com/profile/15370096336423115833noreply@blogger.com0