Hot Reloading Rust: Windows and Linux

Recently, I’ve been implementing hot reloading for a new game engine I’m writing in Rust. If you’re interested in the subject, you may have seen the Faster Than Lime article, which is one of the most detailed sources out there. Although Amos encountered major issues implementing hot reloading on Linux, I was surprised to discover that hot reloading works just fine on Windows!

What makes these platforms different? Read on!

Hot Reloading and Thread Local Storage

Check out the Faster Than Lime article for the gory details, but to summarize: Hot reloading interacts poorly with values in Thread Local Storage, when these values have destructorsAny object that allocates (Vec, Box, String, Rc) must run a destructor to free the underlying memory.. To see why, let’s consider this situation:

  1. The main thread loads a dynamic library.

  2. Thread 2 calls a function in the library, which creates a TLS value x, associated with Thread 2.

  3. The main thread unloads the dynamic library.

  4. Thread 2 exits.

When should the destructor for x be run?

Normally, TLS destructors run when a thread exits (step 4), but the dynamic library has already unloaded! The code for the destructor may be long gone! Clearly it cannot run at step 4.

Step 3 is a possibility, but it’s unsafe to access a TLS value from a different thread that the one it was created on. At step 3, we’re running on the main thread. Thread 2 may be busy, or even using that TLS value — we can’t just interrupt it to tell it to run a destructor. Clearly, we cannot destruct the value at step 3!

We shouldn’t leak the value — it could be holding a socket or other RAII resource.

What Linux Does

The second half of the Faster Than Lime article explores this problem. What Linux does, in this case, is to skip unloading the dynamic library. Even if you request unloading the library at step 3, it will hang on to the library until all threads with TLS values from the dynamic library have exited.By waiting for the thread to exit, the TLS destructors can be run normally, because the dynamic library is still loaded.

This means any thread that enters the dynamic library becomes “poisoned”. The library now can’t be unloaded until all of these threads exit. Worse, if the main thread ever calls into the library, the library can never be unloaded. Once Rust registers a TLS destruction callback via __cxa_thread_atexit, we’re stuck.

Obviously, this is not great for our hot-reloading plans. Two options cross my mind:

Frankly, neither of those solutions are great. If you’re aware of a great solution on Linux, I’d love to know. But as I’m primarily developing for Windows, I won’t dig deeper here.

What Windows Does

Unlike Linux, the Windows TLS API does not take a destruction callback. Instead, the DllMain function of the dynamic library is called whenever threads are created and destroyed. It’s up to the author of the dynamic library to manually manage the creation and destruction of TLS values. So in this case, it’s up to Rust how it wants to implement TLS Destructors.

What does Rust do?

Let’s write a short program to test out the behavior of TLS destructors on Windows, simulating the scenario at the start of this article. For our TLS value, we’re using a custom type that prints whenever it is constructed or drops.

PS C:\Projects\hotreload> .\harness.exe
Loading library on main thread. Thread=19944.
Calling plugin function on thread 2. Thread=28996
(TLS) Initialized X on thread 28996
Unloading plugin on main thread. Thread=19944
Finished thread 2. Thread=28996

The behavior we see is:

  1. The plugin is loaded.

  2. The TLS value is lazilyNote the fact that TLS values are lazy! If you don’t use them, they never get created. initialized on first use.

  3. The plugin is unloaded.

  4. The TLS value is not dropped.

Odd.. the TLS value is never dropped — it never printed ‘Destroyed’. But the plugin library was successfully unloaded. We can confirm that with WinDBG’s module explorer. What’s happening here?

Let’s dig a bit deeper. The beauty of an open source standard library is that we can just browse the code! Destructors for TLS values are handled in the standard library by std::sys::windows::on_tls_callback.

The documentation for this file is absolutely fantastic, so I highly recommend giving it a read. on_tls_callback is called by the OS when a thread is destroyed (DLL_THREAD_DETACH), or the process exits / library is unloaded (PROCESS_DETACH).

We can drop down a break point into this function to see what happens. Note that in Rust, the standard library is compiled statically into both the main executable and the dynamic library, so there are actually two versions of on_tls_callback in our program, but the one in the dynamic library is the one that will be called when the library unloads.

In WinDBG we can use the command: bu plugin!std::sys::windows::thread_local_key::on_tls_callback.

Stepping through we see the following behavior:

dwReason (aka register edx) contains the value of 0 (DLL_PROCESS_DETACH) when the library unloads. This is what we expect based on Microsoft documentation, for an unloading dynamic library.

Both run_dtors and run_keyless_dtors are called, but when stepping through, both of them have an empty list of destructors! So we’re not running any destructors at all.

This makes some sense though. The library is being unloaded on the main thread, but in our use case, the main thread never calls into the library! The TLS values in our plugin are never initialized on the main thread, so there’s nothing to destroy.

So what happens if we unload the plugin instead on thread 2?

PS C:\Projects\hotreload> .\harness.exe
Loading library on main thread. Thread=6456.
Calling plugin function on thread 2. Thread=14024
(TLS) Initialized X on thread 14024
Unloading plugin on thread 2. Thread=14024
(TLS) Dropping X on thread 14024
Finished thread 2. Thread=14024

Oho! So now we’re properly dropping our TLS value.


This leads to our final conclusion:

Rust will run the TLS destructors for the thread that unloads the library. All other TLS values are leaked. Note: Rust only runs the TLS destructors associated with the unloading library. The TLS values associated with the primary executable will be left alone — as they should, because the rest of the program is still running!

The leaks are unfortunate, but to be fair, that’s probably the best we can do in this situation. Windows is unloading the library whether we like it or not, and the TLS destructors on the current thread are the only ones that are safe to access.

A silver lining of this approach is that if the library is only used from a single thread, hot-reloading works perfectly: it can be loaded and unloaded promptly, and without TLS leaks. And unlike Linux, that thread can continue to live on past the usage of the dynamic library.

Hot-reloading in Rust

So which strategy is better, Windows or Linux? Honestly, it feels like a bit of a wash.

Linux is arguably the safest. It guarantees that no TLS values are leaked. However, it does that... by leaking the entire dynamic library itself. I’m not sure that’s an obvious win, especially if you’re running on the main thread, and it’s certainly a problem for implementing hot-reloading on Linux.

Rust’s Windows implementation is more flexible, but this leads to situations where TLS values are leaked, which is dangerous for RAII resources. But.. that flexibility enables us to make our own choice. If we know the TLS values are safe to leak for our plugins, hot-reloading works just fine. And so far I haven’t actually found one that has caused issues. I’d be curious from others if they have examples of TLS values with RAII semantics.

At the end of the day, hot-reloading in Rust is usable on Windows by default, so it’s hard not to at least give it a point there. At least for me as a game developer, hot-reloaded modules are a critical engine feature.

Perhaps the real criminal here is thread-local storage itself. The very nature of tying value lifetime to thread lifetime conflicts with the ability to unload dynamic libraries. But as far as performance goes, it’s hard to imagine life without them. Perhaps one day Rust could expose a different primitive that provides the benefits of both!

Further Reading: