Wednesday, February 9, 2022

SSD failed

I have seen multiple times how HDDs fail, both in others' servers, and in my own computers. They usually develop unreadable sectors, and the rest of data can be recovered under Linux using tools like ddrescue to another HDD. But I have never seen any SSD failures myself before. Until today.

Well, this case is not that different.

Today, my desktop computer failed to boot, with some error messages from systemd about failing to start services. I thought that it might be a one-off error, rebooted it, only to find out that the root partition (XFS on LUKS on /dev/sda2) failed to mount. The error in dmesg told me to run xfs_repair, which I did not do initially.

It did mount with the ro,norecovery options, but I rebooted the system afterwards instead of copying the files somewhere immediately. It was a stupid move, and a lesson for the future.

Then I ran xfs_repair, but it complained a lot about I/O errors, and afterwards, the filesystem was no longer mountable even with the norecovery option.

The majority of sectors are still readable, so, as of now, ddrescue + xfs_repair still looks like a valid recovery strategy. I will update the blog post if it isn't. Even if it isn't, only a day of work is lost.

Update 1: it worked, but I had to run xfs_repair twice, and the node_modules directory from one project ended up in lost+found. So nothing important was apparently lost.

And here is what smartctl in my rescue system says about the drive.

$ sudo smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.4-arch2-1] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke,

Model Family:     Indilinx Barefoot 3 based SSDs
Device Model:     OCZ-VECTOR
Serial Number:    OCZ-Z5CB4KC20X0ZG7F8
LU WWN Device Id: 5 e83a97 27d603391
Firmware Version: 3.0
User Capacity:    512 110 190 592 bytes [512 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Available
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Feb  9 15:09:11 2022 +05
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x1d) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Abort Offline collection upon new
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x00)	Error logging NOT supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   0) minutes.
Extended self-test routine
recommended polling time: 	 (   0) minutes.

SMART Attributes Data Structure revision number: 18
Vendor Specific SMART Attributes with Thresholds:
  5 Runtime_Bad_Block       0x0000   033   033   000    Old_age   Offline      -       33
  9 Power_On_Hours          0x0000   100   100   000    Old_age   Offline      -       31212
 12 Power_Cycle_Count       0x0000   100   100   000    Old_age   Offline      -       5821
171 Avail_OP_Block_Count    0x0000   080   080   000    Old_age   Offline      -       72682832
174 Pwr_Cycle_Ct_Unplanned  0x0000   100   100   000    Old_age   Offline      -       406
195 Total_Prog_Failures     0x0000   100   100   000    Old_age   Offline      -       0
196 Total_Erase_Failures    0x0000   100   100   000    Old_age   Offline      -       0
197 Total_Unc_Read_Failures 0x0000   100   100   000    Old_age   Offline      -       33
208 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       317
210 SATA_CRC_Error_Count    0x0000   100   100   000    Old_age   Offline      -       60
224 In_Warranty             0x0000   100   100   000    Old_age   Offline      -       0
233 Remaining_Lifetime_Perc 0x0000   090   090   000    Old_age   Offline      -       90
241 Host_Writes_GiB         0x0000   100   100   000    Old_age   Offline      -       53191
242 Host_Reads_GiB          0x0000   100   100   000    Old_age   Offline      -       34251
249 Total_NAND_Prog_Ct_GiB  0x0000   100   100   000    Old_age   Offline      -       7808432714

Warning! SMART ATA Error Log Structure error: invalid SMART checksum.
SMART Error Log Version: 1
No Errors Logged

Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

So, despite the I/O errors, the drive still considers itself healthy (SMART overall-health self-assessment test result: PASSED). What a liar.

Update 2: after a reboot, the system does not detect the drive at all, even in the BIOS. So I was lucky to be able to copy all the data just in time.

Saturday, July 24, 2021

A bad job with a very good description

Today I received a job invitation for a C++ developer position. My C++ skills are not up to date, so I initially doubted whether to apply. Nevertheless, I clicked a link with a more detailed job description. And this was the best job description that I have seen so far. No long list of desired skills. Clear view of the job duties.

Let me translate and quote the relevant parts:

Our product is <link removed>. You can download and try it on your PC.

Well, this is why I refused. I do not want to be involved in "black SEO" (which is obvious from the link that I removed). But let's continue with the description:

Our mission is to allow users to create separate browser instances, with their own cookies, local storage, etc. And with their own substitutions of values such as navigator.platform, navigator.language and so on.

For these tasks, we use currently a stock Google Chrome browser, and embed the needed property replacements using its API (Devtools Protocol) as JavaScript code. But this approach has certain drawbacks that do not allow bringing the browser operation to the desired "perfect" state. Therefore, we start development of our own browser build based on Chromium sources, so that all the needed property substitutions and modifications would come from the core and not from the injected JavaScript code.

A perfectly clear  and logical description of what they are trying to achieve, and, importantly, why. It is obvious that the person who wrote this has a coherent product vision.

Let me snip the part regarding work conditions, as that's irrelevant for the purpose of this blog post, and continue with the best part of the job post:

And here is a benchmark that allows you to see if you fit this role. Create a Chromium build for Windows with a replaced icon. Just any icon, but make sure that it shows up both as an icon of the executable and in the taskbar. Also, modify the browser so that the navigator.platform property is always MacIntel.

This benchmark allows candidates to self-judge themselves much better than long lists of required skills. What if I don't have this one skill from the list - is it really important, or can I just learn it on the fly? With the candidate self-test approach, as demonstrated above, such question cannot arise, and there is less need to conduct in-person coding interviews.

Another good aspect of this benchmark is that it involves modification of an existing code base, and not coding something from scratch. The ability to read and understand existing code in real-world projects is what really matters, and traditional coding interviews do not measure that.

I wish all IT job descriptions were written like that. It's unfortunate that this position comes from the bad guys - otherwise I would have definitely applied.

P.S. If you need a remote Linux sysadmin, DevOps engineer, or a C, Python, or PHP developer, feel free to contact me via email.

Thursday, January 23, 2020

VPN privacy policies and privacy threats

People use commercial VPNs as anonymity and privacy tools. To be useful as such a tool, the VPN provider must not store any information that would identify a real IP address of its user when given the details of IP packets from/to the website that the user has visited or his software (e.g. a BitTorrent client) has automatically communicated with. In other words, a user who uses BitTorrent or visits "shady" sites is expected that the VPN provider will not be able to point to him when asked "who torrented" or "who has visited this site".

Off-the shelf VPN server software does log connections by default, and this is useful in a corporate setting for incident investigations. So, VPN providers often make an explicit "no-logs" statement in their privacy policies to indicate that they, well, don't log certain data or discard those logs after a predetermined amount of time. Here is an example of such policy statement, taken from Ivacy:
We strictly do not log or monitor, online browsing activities, connection logs, VPN IPs assigned, original IP addresses, browsing history, outgoing traffic, connection times, data you have accessed and/or DNS queries generated by your end. We have no information that could associate specific activities to specific users.
It looks like this statement is short and to the point. Is it enough? Unfortunately, by itself, the statement above is insufficient. It would be perfectly compliant with the wording above if they streamed the connection events, VPN IPs assigned, encryption keys negotiated, etc., and mirrored the traffic (including the original connection IPs) to a third party in real time. Exfiltration is not the same as logging, and it is still true that the hypothetical evil Ivacy keeps no information that could associate specific activities to specific users.

It is not only Ivacy who has the problem with the privacy policy focusing only on logging as the privacy threat — the problem is in fact very common, probably because nobody except me really thought about other ways to betray privacy. It became even more important in 2018, when it became the norm for VPN providers to undergo audits. And guess what, one common form of an audit is called a "No-Log Audit". Not a general "Privacy Policy Adherence Audit", but a narrow "No-Log Audit"!

It is an interesting question what an auditor should do here. If this is a "No-Log Audit", then, formally, deliberate real-time exfiltration is out of scope. So are deliberately introduced cryptographic weaknesses that would allow third parties to fully decrypt connections (though, this is in scope for a "Security Audit", which is a completely different thing). Even during a more general policy compliance audit, targeted at the entirety of the privacy policy, formally, an auditor has the right not to report deliberate exfiltration as a finding, provided that the policy is worded carefully enough (so that data exfiltration is not a privacy policy violation as worded).

Note the word "formally" above: it is all about the worst case. Some auditors do care about the spirit of the policy, not only about the letter. I have asked three companies that conducted No-Log Audits of various VPN providers in the past about this dilemma, and, so far, received one useful reply (from cure53, regarding their No-Log & Privacy Audit of IVPN).
Q: Was there any attempt during the audit to check that personally identifying information does not leave the company via network connections, as opposed to the on-disk logs that you have already confirmed as non-existing?
A: Yes and no. We checked that on the servers we got access to. In the IVPN case we could not find any evidence that points towards them attempting that. While this doesn't mean that they, IVPN, don't do it at all, we at least didn't catch them trying.
Q: Would any of the above privacy violations (if IVPN were engaging in such activities) be caught/flagged/result in a failed audit?
A: As a matter of fact, yes. Short after IVPN we audited another provider and it ended up in massive drama because they indeed logged and found different excuses every time. It was pathetic.
NordVPN is one of the VPN providers that does address the issue with the wording. Let me quote the relevant bits from their privacy policy (emphasis mine):
Nord guarantees a strict no-logs policy for NordVPN Services, meaning that your internet activity while using NordVPN Services is not monitored, recorded, logged, stored or passed to any third party.
Much more clear and reassuring. Let's hope that other VPN providers read this blog post and apply the same simple fix.

Disclaimer: I am a customer of some VPN services mentioned here. I have, at the time of this writing, absolutely no evidence that they engage, or engaged in the past, in the hypothetical malpractice described in this post. It was just an example. I am sure that, for every existing privacy policy in the world, a sufficiently advanced hairsplitter can figure out a way to "comply" in a similar way.

Sunday, November 17, 2019

Fail2ban and network misconfiguration

I had a rather stupid security incident recently, and want you to check if your company network is vulnerable to the same issue.

The network in question had a bastion host, with an externally accessible SSH server and a (more recently added) VPN. Because of someone's laziness (i.e. routing not configured properly), the bastion host was performing network address translation, so, no matter who connected, over VPN or just double SSH, all internal hosts would see incoming SSH connections from the internal IP address of the bastion host.

The real issue that triggered a short SSH outage was that internal hosts ran fail2ban, and I had a slightly broken keyboard (already replaced as of now) that would sometimes double-register one of the keys, and the character on that key was in my password. So, I managed to mistype my password twice when connecting to one of the internal servers.

Result: that internal server banned me. Or rather, it banned the IP address of the bastion host. And that's not only me, all other people who were working there also got banned for some time - which is clearly unwanted.

Here are some recommendations how to avoid repeating this incident.

  1. If you can, make sure that internal servers can see the VPN IP address of the admin, not just the internal IP of the bastion host. I.e., remove NAT.
  2. If you can't remove NAT, or have to support double-SSH through the bastion host, make sure that it is whitelisted. OTOH, fail2ban does protect against the situation when there is some password-guessing malware on the admin's PC. So, my recommendation would be, once NAT is removed, to deprecate the double-SSH workflow.
  3. Use SSH keys.
  4. When it is acceptable (i.e. would not be counted as an incident), actually perform the test, just to see if the ban is applied correctly.

Wednesday, July 10, 2019

How to work around slow IPMI virtual media

Sometimes, you need to perform a custom installation of an operating system (likely Linux) on a dedicated server. This happens, e.g., if the hoster does not support your preferred Linux distribution, or if you want to make some non-default decisions during the installation that are not easy to change later, like custom partitioning or other storage-related settings. For this use case, some hosters offer IPMI access. Using a Java applet, you can remotely look at the display of the server and send keystrokes and mouse input - even to the BIOS. Also, you can connect a local ISO image as a virtual USB CD-ROM and install an operating system from it.

The problem is - such virtual CD-ROM is unbearably slow if your server is not in the same city. The CD-ROM access pattern when loading the installer is all about random reads, because it consists mostly of loading (i.e., page faulting) executable programs and libraries from a squashfs image. Therefore there is an inherent speed limit based on the read access size and the latency between the IPMI viewer and the server. Even for linear reads, where the readahead mechanism should help, the virtual CD-ROM read speed is limited at something like 250 kilobytes per second if the latency is 100 ms. This is lower than 2x CD speed. If you are working from home using an ADSL or LTE connection, the speed would be even worse.

With distant servers or slow internet links on the viewer side, Linux installers sometimes time out loading their own components from the virtual media. In fact, when trying to install CentOS 7 over IPMI on a server located in Latvia, I always got a disconnection before it could load 100 megabytes from the installation CD. Therefore, this installation method was unusable for me, even with a good internet connection. This demonstrates why it is a good idea to avoid installing operating systems over IPMI from large ISO images. Even "minimal" or "netinstall" images are typically too large!

So, what are the available alternatives?

Having another host near the server being installed, with a remote desktop and a good internet connection, definitely helps, and is the best option if available.

Debian and Ubuntu offer special well-hidden "netboot" CD images which are even smaller than "netinstall". They are less than 75 MB in size and therefore have a better chance of success. In fact, even when not constrained by IPMI, I never use anything else to install these Linux distributions.

And it turns out that, if your server should obtain its IP address with DHCP, there is an even easier option: load the installer from the network. There is a special website,, that hosts iPXE configuration files. All you need to download is a very small (1 MB) ISO with iPXE itself. Boot the server from it, and you will get a menu with many Linux variants available for installation.

There are two limitations with the approach. First, to install some Linux variants, you do need DHCP, and not all hosting providers use it for their dedicated servers. Even though there is a failsafe menu where you can set a static IP address, some installers, including CentOS, assume DHCP. Second, you will not get UEFI, because their UEFI ISO is broken. For both limitations, bugs are filed and hopefully will get fixed soon.

A last-resort option, but still an option, would also be to install the system as you need it in a local virtual machine, and then transfer the block device contents to the server's disks over the network. To do so, you need a temporary lightweight Linux distribution. Tiny Core Linux (especially the Pure 64 port) fits the bill quite well. You can boot from their ISO image (which is currently only 26 MB in size, and supports UEFI) without installing it, configure the network if needed, install openssh using the "tce-load" tool, set the "tc" user password, and do what you need over ssh.

With the three techniques demonstrated so far, I can now say with confidence that slow IPMI virtual media will no longer stop me from remotely installing the Linux distribution that I need, exactly as I need.

Sunday, June 16, 2019

KExec on modern distributions

Many years ago, I mentioned that KExec is a good way to reduce server downtime due to reboots. It is still the case, and it is especially true for hosters such as OVH and Hetzner that set their dedicated servers to boot from network (so that one can activate their rescue system from the web) before passing the control to the boot loader installed on the hard disk. Look, it takes 1.5 minutes to reboot an OVH server, and, on some server models, you can reduce this time to 15 seconds by avoiding all the time spent in the BIOS and in their netbooted micro-OS! Well, to be fair, on OVH you can avoid the netbooted environment by changing the boot priorities from the BIOS.

The problem is, distributions make KExec too complicated to use. E.g., Debian and Ubuntu, when one installs kexec-tools, present a prompt whether KExec should handle reboots. The catch is, it works only with sysvinit, not with systemd. With systemd, you are supposed to remember to type "systemctl kexec" if you want the next reboot to be handled by KExec. And it's not only distributions: since systemd version 236, KExec is supported only together with UEFI and the "sd-boot" boot loader, while the majority of hosters still stick with the legacy boot process and the majority of Linux distributions still use GRUB2 as their boot loader. An attempt to run "systemctl kexec" on something unsupported results in this error message:

Cannot find the ESP partition mount point.

Or, if /boot is on mdadm-based RAID1, another, equally stupid and unhelpful, error:

Failed to probe partition scheme of "/dev/block/9:2": Input/output error

While switching to UEFI and sd-boot is viable in some cases, it is not always the case. Fortunately, there is a way to override systemd developers' stance on what's supported, and even make the "reboot" command invoke KExec. Note that the setup is a big bad unsupported hack. There are no guarantees that the setup below will work with future versions of systemd.

The trick is to create the service that loads the new kernel and to override the commands that systemd executes when doing the actual reboot.

Here is the unit that loads the new kernel.

# File: /etc/systemd/system/kexec-load.service
Description=Loading new kernel into memory

ExecStop=/sbin/kexec -d -l /vmlinuz --initrd=/initrd.img --reuse-cmdline


It assumes that symlinks to the installed kernel and the initrd are available in the root directory, which is true for Debian and Ubuntu. On other systems please adjust the paths as appropriate. E.g., on Arch Linux, the correct paths are /boot/vmlinuz-linux and /boot/initramfs-linux.img.

There are other variants of this unit circulating around. A common mistake is that nothing ensures that the attempt to load the new kernel from /boot happens before the /boot partition is unmounted. The unit above does not have this race condition issue.

The second part of the puzzle is an override file that replaces the commands that reboot the system.

# File: /etc/systemd/system/systemd-reboot.service.d/override.conf
ExecStart=-/bin/systemctl --force kexec
ExecStart=/bin/systemctl --force reboot

That's it: try to kexec, and hopefully it does not return. If it does, then ignore the error and try the regular reboot.

For safety, let's also create a script that temporarily disables the override and thus performs one normal BIOS-based reboot.

# File: /usr/local/bin/normal-reboot
mkdir -p /run/systemd/transient/systemd-reboot.service.d/
ln -sf /dev/null /run/systemd/transient/systemd-reboot.service.d/override.conf
ln -sf /dev/null /run/systemd/transient/kexec-load.service
systemctl daemon-reload

Give the script proper permissions and enable the service:

chmod 0755 /usr/local/bin/normal-reboot
systemctl enable kexec-load.service

If everything goes well, this will be the last BIOS-based reboot. Further reboots will be handled by KExec, even if you type "reboot".

This blog post would be incomplete without instructions what to do if the setup fails. And it can fail for various reasons, e.g. due to incompatible hardware or some driver assuming that its device has been properly reset by the BIOS.

Well, the most common problem is with a corrupted graphical framebuffer console. In this case, it may be sufficient to add "nomodeset" to the kernel command line.

Other systems may not be fixable so easily, or at all. E.g., on some OVH dedicated servers (in particular, on their "EG-32" product which is based on the Intel Corporation S1200SPL board), the kexec-ed kernel cannot properly route IRQs, and therefore does not detect SATA disks, and the on-board Ethernet adapter also becomes non-functional. In such cases, it is necessary to hard-reset the server and undo the setup. Here is how:

systemctl disable kexec-load.service
rm -rf /etc/systemd/system/systemd-reboot.service.d
rm -f /etc/systemd/system/kexec-load.service
rm -f /usr/local/bin/normal-reboot
systemctl daemon-reload

This reboot, and all further reboots, will be going through the BIOS.

Friday, January 18, 2019 IPv6 dynamic DNS done right

Sometimes, your home PC or router does not have a static IP address. In this case, if you want to access your home network remotely, a common solution is to use a dynamic DNS provider, and configure your router to update its A record (a DNS record that holds an IPv4 address) each time the external address changes. Then you can be sure that your domain name always points to the current external IPv4 address of the router.

For accessing PCs behind the router, some trickery is needed, because usually there is only one public IPv4 address available for the whole network. Therefore, the router performs network address translation (NAT), and home PCs are not directly reachable. So, you have to either use port forwarding, or a full-blown VPN. Still, you need only one dynamic DNS record, because there is only one dynamic IP — the external IP of your router, and that's where you connect to.

Enter the IPv6 world. There is no NAT anymore, the ISP allocates a whole /64 (or maybe larger) prefix for your home network(s), and every home PC becomes reachable using its individual IPv6 address. Except, now all addresses are dynamic. On "Rostelecom" ISP in Yekaterinburg, Russia, they are dynamic even if you order a static IP address, i.e. only IPv4 is static then, and there is no way to get a statically allocated IPv6 network.

A typical IPv6 network has a prefix length of 64. It means that the first 64 bits denote the network, and are assigned (dynamically) by the ISP, while the lower 64 bits refer to the host and do not change when the ISP assigns the new prefix. Often but not always, the host part is just a MAC address with the second-lowest bit in the first octet inverted, and ff:fe inserted into the middle. This mechanism is often called EUI-64. For privacy reasons, typically there are also other short-lived IPv6 addresses on the interface, but let's ignore them.

Unfortunately, many dynamic DNS providers have implemented their IPv6 support equivalently to IPv4, even though it does not really make sense. That is, a dynamic DNS client can update its own AAAA record using some web API call, and that's it. If you run a dynamic DNS client on the router, then only the router's DNS record is updated, and there is still no way to access home PCs individually, short of running DynDNS clients on all of them. In other words, the fact that the addresses in the LAN are, in fact, related, and should be updated as a group, is usually completely ignored.

The dynamic DNS provider is a pleasant exception. After registration, you get a third-level domain corresponding to your home network. You can also add records to that domain, corresponding to each of your home PCs. And while doing so, you can either specify the full IPv6 address (as you can do with the traditional dynamic DNS providers), or only the host part, or the MAC address. The ability to specify only the host part (or infer it from the MAC address) is what makes their service useful. Indeed, if the parent record (corresponding to your whole network) changes, then its network part is reapplied to all host records that don't specify the network part of their IPv6 address explicitly. So, you can run only one dynamic DNS client on the router, and get domain names corresponding to all of your home PCs.

Let me illustrate this with an example.

Suppose that your router has obtained the following addresses from the ISP:

2001:db8:5:65bc:d0a2:1545:fbfe:d0b9/64 for the WAN interface
2001:db8:b:9a00::/56 as a delegated prefix

Then, it will (normally) use 2001:db8:b:9a00::1/64 as its LAN address, and PCs will get addresses from the 2001:db8:b:9a00::/64 network. You need to configure the router to update the AAAA record (let's use with its LAN IPv6 address. Yes, LAN (and many router firmwares, including OpenWRT, get it wrong by default), because the WAN IPv6 address is completely unrelated to your home network. Then, using the web, create some additional AAAA records under the address:

desktop AAAA ::0206:29ff:fe6c:f3e5   # corresponds to MAC address 00:06:29:6c:f3:e5
qemu AAAA ::5054:00ff:fe12:3456   # corresponds to MAC address 52:54:00:12:34:56

Or, you could enter MAC addresses directly.

As I have already mentioned, the beauty of is that it does not interpret these addresses literally, but prepends the proper network part. That is, name resolution would actually yield reachable addresses: AAAA 2001:db8:b:9a00::1   # The only record that the router has to update AAAA 2001:db8:b:9a00:206:29ff:fe6c:f3e5   # Generated AAAA 2001:db8:b:9a00:5054:ff:fe12:3456    # Also generated

And you can, finally, connect remotely to any of those devices.