Tuesday, May 24, 2016

Is TSX busted on Skylake, too? No, it's just buggy software

The story about Intel recalling Transactional Synchronization Extensions
from Haswell and Broadwell lines of their CPUs by means of a microcode update has hit the web in the past. But it looks like this is not the end of the story.

The company I work for has a development server in Hetzner, and it uses this type of CPU:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model  : 94
model name : Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
stepping : 3
microcode : 0x39
cpu MHz  : 3825.265
cache size : 8192 KB
physical id : 0
siblings : 8
core id  : 0
cpu cores : 4
apicid  : 0
initial apicid : 0
fpu  : yes
fpu_exception : yes
cpuid level : 22
wp  : yes
flags  : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology 
nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est 
tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt 
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt 
tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep 
bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 
dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs  :
bogomips : 6816.61
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

I.e. it is a Skylake. The server is running Ubuntu 16.04, and the CPU has HLE and RTM families of instructions.

One of my recent tasks was to prepare, on this server, an LXC container based on Ubuntu 16.04 with a lightweight desktop accessible over VNC, for "remote classroom" purposes. We already have such containers on other servers, but they were based on Ubuntu 14.04. Such containers work well on this server, too, but it's time to upgrade. In these old containers, we use a regular Xorg server with a "dummy" video driver, and export the screen using x11vnc.

So, I decided to clone the old container and update Ubuntu there. Result: x11vnc, or sometimes Xorg, now crashes (SIGSEGV) when one attempts to change the desktop resolution. The backtrace points into the __lll_unlock_elision() function which is a part of glibc implementation of mutexes for CPUs with Hardware Lock Elision instructions.

This crash doesn't happen when I run the same container on a server with an older CPU (which doesn't have TSX in the first place), or if I try to reproduce the bug at home (where I have a Haswell, with TSX disabled by the new microcode).

So, all apparently points to a bug related to these extensions. Or does it?

The __lll_unlock_elision() function has this helpful comment in it:

  /* When the lock was free we're in a transaction.
     When you crash here you unlocked a free lock.  */

And indeed, there is some discussion of another crash in __lll_unlock_elision(), related to NVidia driver (which is not used here). In that discussion, it was highlighted that an unlock of already-unlocked mutex would be silently ignored if a mutex implementation not optimized for TSX is used, but a CPU with TSX would expose such latent bug. Locking balance bugs are easily verified using valgrind. And indeed:

DISPLAY=:1 valgrind --tool=helgrind x11vnc
==4209== ---Thread-Announcement------------------------------------------
==4209== Thread #1 is the program's root thread
==4209== ----------------------------------------------------------------
==4209== Thread #1 unlocked a not-locked lock at 0x9CDA00
==4209==    at 0x4C326B4: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==4209==    by 0x4556B2: ??? (in /usr/bin/x11vnc)
==4209==    by 0x45A35E: ??? (in /usr/bin/x11vnc)
==4209==    by 0x466646: ??? (in /usr/bin/x11vnc)
==4209==    by 0x410E30: ??? (in /usr/bin/x11vnc)
==4209==    by 0x717D82F: (below main) (libc-start.c:291)
==4209==  Lock at 0x9CDA00 was first observed
==4209==    at 0x4C360BA: pthread_mutex_init (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==4209==    by 0x40FECC: ??? (in /usr/bin/x11vnc)
==4209==    by 0x717D82F: (below main) (libc-start.c:291)
==4209==  Address 0x9cda00 is in the BSS segment of /usr/bin/x11vnc

It is a software bug, not a CPU bug. But still - until such bugs are eliminated from the distribution, I'd rather not use it on a server with a CPU with TSX.


Peter Cordes said...

BTW, the exact reason that double-unlock crashes is that the XEND instruction raises a #GP(0) exception when executed outside of a transactional region. Apparently Linux delivers a SIGSEGV in this case.

So unless the unlock code checks (with XTEST) that you're inside a transaction before executing XEND, it's always going to get an exception from this bug.

I don't think slowing down the fast-path so that buggy software can double-unlock without noticing would be a good change.

I guess it would be nice if there was some kind of workaround to let programs keep working if they somehow happen to work correctly even though they have double-unlocking bugs, but without slowing down the fast path.

So they'd still run XEND, but instead of terminating from the resulting #GP(0), they'd instead keep running. They could install a SIGSEGV signal handler that checked the instruction bytes at the faulting address to see if it was 0F 01 D5 (XEND), and if so, adjust RIP in the process-context struct that is passed to signal handlers to resume execution after the END. (And maybe also log a message).

Sakari Soravuori said...

So let me get this straight, it's the end software that is misbehaving but thus far has been able to get away with it since pre-TSX processors do not care about the double-unlock? Also glibc has no error handling for this due to performance reasons I'm assuming? __lll_unlock_elision() is just there to blindly unlock the mutex?

Alexander Patrakov said...

Yes, you are right

Alexander Patrakov said...

For the record, we have encountered two more instances of this bug: