C++11 thread_local的用法

thread_local 是 C 11 为线程安全引进的变量声明符。

thread_local 简介

thread_local 是一个存储器指定符。

所谓存储器指定符，其作用类似命名空间，指定了变量名的存储期以及链接方式。同类型的关键字还有：
auto：自动存储期； register：自动存储期，提示编译器将此变量置于寄存器中； static：静态或线程存储期，同时提示是内部链接； extern：静态或线程存储期，同时提示是外部链接； thread_local：线程存储期； mutable：不影响存储期或链接。

对于 thread_local，官方解释是：

thread_local 关键词只对声明于命名空间作用域的对象、声明于块作用域的对象及静态数据成员允许。它指示对象拥有线程存储期。它能与 static 或 extern 结合，以分别指定内部或外部链接（除了静态数据成员始终拥有外部链接），但附加的 static 不影响存储期。
线程存储期: 对象的存储在线程开始时分配，而在线程结束时解分配。每个线程拥有其自身的对象实例。唯有声明为 thread_local 的对象拥有此存储期。 thread_local 能与 static 或 extern 一同出现，以调整链接。

这里有一个很重要的信息，就是 static thread_local 和 thread_local 声明是等价的，都是指定变量的周期是在线程内部，并且是静态的。这是什么意思呢？举个代码的例子。

下面是一个线程安全的均匀分布随机数生成，

代码语言：txt复制

inline void random_uniform_float(float *const dst, const int len, const int min = 0, const int max = 1)
{
    // generator is only created once in per thread, but distribution can be regenerated.
    static thread_local std::default_random_engine generator;     // heavy
    std::uniform_real_distribution<float> distribution(min, max); // light
    for (int i = 0; i < len;   i)
    {
        dst[i] = distribution(generator);
    }
}

generator 是一个函数的静态变量，理论上这个静态变量在函数的所有调用期间都是同一个的（静态存储期），相反 distribution 是每次调用生成的函数内临时变量。现在 generator 被 thread_local 修饰，表示其存储周期从整个函数调用变为了线程存储期，也就是在同一个线程内，这个变量表现的就和函数静态变量一样，但是不同线程中是不同的。可以理解为 thread_local 缩小了变量的存储周期。关于 thread_local 变量自动 static，C 标准中也有说明：

When thread_local is applied to a variable of block scope the storage-class-specifier static is implied if it does not appear explicitly

关于 thread_local 的定义我也不想过多着墨，还是看代码例子说明吧。

thread_local 使用示例

全局变量

代码语言：txt复制

#include  <iostream>
#include <thread>
#include <mutex>
std::mutex cout_mutex;    //方便多线程打印

thread_local int x = 1;

void thread_func(const std::string& thread_name) {
    for (int i = 0; i < 3;   i) {
        x  ;
        std::lock_guard<std::mutex> lock(cout_mutex);
        std::cout << "thread[" << thread_name << "]: x = " << x << std::endl;
    }
    return;
}

int main() {
    std::thread t1(thread_func, "t1");
    std::thread t2(thread_func, "t2");
    t1.join();
    t2.join();
    return 0;
}

上面的代码在主函数中，创建了两个线程 t1 和 t2，并分别传递了一个字符串作为线程名，以便在线程函数中输出。接着，使用 join() 方法等待线程执行完毕。

在线程函数 thread_func 中，每个线程将 x 的值增加三次，然后输出结果。由于 x 是 thread_local 变量，每个线程对它的操作互不干扰，因此输出结果是不同的。由于 cout 是共享的，使用 std::lock_guardstd::mutex lock(cout_mutex) 锁住 cout，以避免多个线程同时输出内容时出现乱序或重叠的问题。

输出：

代码语言：txt复制

thread[t2]: x = 2  
thread[t2]: x = 3  
thread[t2]: x = 4  
thread[t1]: x = 2  
thread[t1]: x = 3  
thread[t1]: x = 4

可以看出全局的 thread_local 变量在每个线程里是分别自加互不干扰的。

局部变量

代码语言：txt复制

#include  <iostream>
#include <thread>
#include <mutex>
std::mutex cout_mutex;    //方便多线程打印

void thread_func(const std::string& thread_name) {
    for (int i = 0; i < 3;   i) {
        thread_local int x = 1;
        x  ;
        std::lock_guard<std::mutex> lock(cout_mutex);
        std::cout << "thread[" << thread_name << "]: x = " << x << std::endl;
    }
    return;
}

int main() {
    std::thread t1(thread_func, "t1");
    std::thread t2(thread_func, "t2");
    t1.join();
    t2.join();
    return 0;
}

上面这段代码用了 STL 库中的 thread、mutex 等类，可以实现并发执行多个线程。代码中定义了一个 thread_func 函数，其参数为一个字符串 thread_name，表示线程的名字。函数中使用了一个 thread_local 关键字声明的变量 x，该变量只在当前线程中可见。在每次循环中，x 都会加 1，并使用互斥锁保证输出时的线程安全。在主函数中，创建了两个线程 t1 和 t2 分别执行 thread_func 函数，并使用 join 函数等待两个线程执行完毕后再退出程序。

输出：

代码语言：txt复制

thread[t2]: x = 2 
thread[t2]: x = 3 
thread[t2]: x = 4 
thread[t1]: x = 2 
thread[t1]: x = 3 
thread[t1]: x = 4

可以看到虽然是局部变量，但是在每个线程的每次 for 循环中，使用的都是线程中的同一个变量，也侧面印证了 thread_local 变量会自动 static。

如果我们不加 thread_local，输出如下：

代码语言：txt复制

thread[t2]: x = 2  
thread[t2]: x = 2  
thread[t2]: x = 2  
thread[t1]: x = 2  
thread[t1]: x = 2  
thread[t1]: x = 2

体现了局部变量的特征。

这里还有一个要注意的地方，就是 thread_local 虽然改变了变量的存储周期，但是并没有改变变量的使用周期或者说作用域，比如上述的局部变量，其使用范围不能超过 for 循环外部，否则编译出错。

代码语言：txt复制

void thread_func(const std::string& thread_name) {
    for (int i = 0; i < 3;   i) {
        thread_local int x = 1;
        x  ;
        std::lock_guard<std::mutex> lock(cout_mutex);
        std::cout << "thread[" << thread_name << "]: x = " << x << std::endl;
    }
    x  ;    //编译会出错：error: ‘x’ was not declared in this scope
    return;
}

上面的代码中，thread_func函数被作为线程函数传递给std::thread类的构造函数。在thread_func函数中，有一个for循环，循环体内声明了一个thread_local变量x。thread_local关键字用于声明一个线程局部变量，即该变量在每个线程中都有一份独立的实例，且在不同线程中互相独立。所以每个线程中的x都是独立的，不会互相影响。

在循环体内部，对x进行自增操作，并使用std::lock_guard保护打印输出，以避免并发操作导致的数据竞争问题。然后打印输出x的值和线程名。

在循环结束后，试图对x进行自增操作。但是由于在这个位置，x的作用域已经结束了，编译器会报错：error: ‘x’ was not declared in this scope。所以这行代码是不合法的。

类对象

代码语言：txt复制

#include  <iostream>
#include <thread>
#include <mutex>
std::mutex cout_mutex;

//定义类
class A {
public:
    A() {
        std::lock_guard<std::mutex> lock(cout_mutex);
        std::cout << "create A" << std::endl;
    }

    ~A() {
        std::lock_guard<std::mutex> lock(cout_mutex);
        std::cout << "destroy A" << std::endl;
    }

    int counter = 0;
    int get_value() {
        return counter  ;
    }
};

void thread_func(const std::string& thread_name) {
    for (int i = 0; i < 3;   i) {
        thread_local A* a = new A();
        std::lock_guard<std::mutex> lock(cout_mutex);
        std::cout << "thread[" << thread_name << "]: a.counter:" << a->get_value() << std::endl;
    }
    return;
}

int main() {
    std::thread t1(thread_func, "t1");
    std::thread t2(thread_func, "t2");
    t1.join();
    t2.join();
    return 0;
}

上面这段代码演示了使用线程局部存储来保证多线程环境下类的独立性。

首先，定义了一个类A，该类具有一个计数器counter和一个成员函数get_value()，用于获取当前计数器的值并将计数器加1。

在主函数中，创建了两个线程t1和t2，分别调用函数thread_func()。函数thread_func()内部定义了一个指向类A对象的指针a，并将其设置为线程局部存储。接着，对a调用get_value()函数，输出当前计数器的值。

由于a是线程局部存储，因此每个线程都拥有自己的a对象。每个线程调用get_value()函数时，都是针对自己的a对象进行操作。这样可以避免多个线程同时操作同一个对象而导致的数据竞争问题。

另外，类A的构造函数和析构函数内部都加了互斥锁，这是为了确保多线程环境下构造和析构操作的安全性。加锁保证了同一时间只有一个线程能够访问类A的构造和析构函数，避免了多个线程同时进行这些操作而引发的竞争问题。

输出：

代码语言：txt复制

create A  
thread[t1]: a.counter:0  
thread[t1]: a.counter:1  
thread[t1]: a.counter:2  
create A  
thread[t2]: a.counter:0  
thread[t2]: a.counter:1  
thread[t2]: a.counter:2

可以看出类对象的使用和创建和内部类型类似，都不会创建多个。这种情况在函数间或通过函数返回实例也是一样的：

代码语言：txt复制

A* creatA() {
    return new A();
}

void loopin_func(const std::string& thread_name) {
    thread_local A* a = creatA();
    std::lock_guard<std::mutex> lock(cout_mutex);
    std::cout << "thread[" << thread_name << "]: a.counter:" << a->get_value() << std::endl;
    return;
}

void thread_func(const std::string& thread_name) {
    for (int i = 0; i < 3;   i) {    
        loopin_func(thread_name);
    }
    return;
}

上面这段代码主要涉及到函数和线程的使用。

函数 A* creatA() 返回指向 A 类对象的指针。在 loopin_func 中，将 creatA() 返回的指针赋值给 thread_local A* a，表示该指针变量的存储期是线程局部的。接着，使用 a 调用 A 类的成员函数 get_value()，并将其返回值输出到控制台上。

在 thread_func 中，循环调用 loopin_func 函数，每次调用都会生成一个新的 A 类对象并输出其成员变量 counter 的值。

由于 a 是线程局部变量，因此每个线程都会拥有自己的 a 对象，并且每次调用 loopin_func 函数时都会生成一个新的 A 类对象。此外，使用互斥锁保证了输出的线程安全性。

输出：

代码语言：txt复制

create A
thread[t1]: a.counter:0 
thread[t1]: a.counter:1 
thread[t1]: a.counter:2 
create A 
thread[t2]: a.counter:0 
thread[t2]: a.counter:1 
thread[t2]: a.counter:2

虽然 createA() 看上去被调用了多次，实际上只被调用了一次，因为 thread_local 变量只会在每个线程最开始被调用的时候进行初始化，并且只会被初始化一次。

举一反三，如果不是初始化，而是赋值，则情况就不同了：

代码语言：txt复制

void loopin_func(const std::string& thread_name) {
    thread_local A* a;
    a = creatA();
    std::lock_guard<std::mutex> lock(cout_mutex);
    std::cout << "thread[" << thread_name << "]: a.counter:" << a->get_value() << std::endl;
    return;
}

上面这段代码定义了一个函数loopin_func，接受一个字符串参数thread_name。该函数在每个线程的本地存储中创建了一个类型为A*的线程局部变量a。然后调用creatA函数创建一个新的A对象，并将指针存储在a中。最后，该函数打印出对象的counter值。

由于每个线程都拥有自己的线程局部变量a，因此creatA在每个线程中只会调用一次，而不是在每次调用loopin_func时都会调用一次。这是因为线程局部变量在第一次使用时会被初始化，并在每个线程的生命周期内保持其值。

输出：

代码语言：txt复制

create A
thread[t1]: a.counter:0
thread[t1]: a.counter:1
thread[t1]: a.counter:2
create A
thread[t2]: a.counter:0
thread[t2]: a.counter:1
thread[t2]: a.counter:2

很明显，虽然只初始化一次，但却可以被多次赋值，因此 C 变量初始化是十分重要的（手动狗头）。

类成员变量

规定：thread_local 作为类成员变量时必须是 static 的。

代码语言：txt复制

class B {
public:
    B() {
        std::lock_guard<std::mutex> lock(cout_mutex);
        std::cout << "create B" << std::endl;
    }
    ~B() {}
    thread_local static int b_key;
    //thread_local int b_key;
    int b_value = 24;
    static int b_static;
};

thread_local int B::b_key = 12;
int B::b_static = 36;

void thread_func(const std::string& thread_name) {
    B b;
    for (int i = 0; i < 3;   i) {
        b.b_key--;
        b.b_value--;
        b.b_static--;   // not thread safe
        std::lock_guard<std::mutex> lock(cout_mutex);
        std::cout << "thread[" << thread_name << "]: b_key:" << b.b_key << ", b_value:" << b.b_value << ", b_static:" << b.b_static << std::endl;
        std::cout << "thread[" << thread_name << "]: B::key:" << B::b_key << ", b_value:" << b.b_value << ", b_static: " << B::b_static << std::endl;
    return;
}

上面这段代码定义了一个名为 B 的类，并在其中定义了三个属性：b_key、b_value 和 b_static。其中，b_key 是一个线程局部静态变量，而 b_value 和 b_static 则是普通的类成员变量。这个类的构造函数输出 create B，析构函数为空。

thread_func 函数接收一个字符串参数，表示当前线程的名字。在函数中创建一个 B 类型的对象 b，并循环 3 次。每次循环，将 b_key、b_value 和 b_static 的值减一，并使用互斥锁锁定输出，将这三个属性的值打印出来，分别加上 thread_name、B::b_key、b_value 和 B::b_static 的前缀。

输出：

代码语言：txt复制

create B
thread[t2]: b_key:11, b_value:23, b_static:35
thread[t2]: B::key:11, b_value:23, b_static: 35
thread[t2]: b_key:10, b_value:22, b_static:34
thread[t2]: B::key:10, b_value:22, b_static: 34
thread[t2]: b_key:9, b_value:21, b_static:33
thread[t2]: B::key:9, b_value:21, b_static: 33
create B
thread[t1]: b_key:11, b_value:23, b_static:32
thread[t1]: B::key:11, b_value:23, b_static: 32
thread[t1]: b_key:10, b_value:22, b_static:31
thread[t1]: B::key:10, b_value:22, b_static: 31
thread[t1]: b_key:9, b_value:21, b_static:30
thread[t1]: B::key:9, b_value:21, b_static: 30

b_key 是 thread_local，虽然其也是 static 的，但是每个线程中有一个，每次线程中的所有调用共享这个变量。b_static 是真正的 static，全局只有一个，所有线程共享这个变量。

我正在参与2023腾讯技术创作特训营第三期有奖征文，组队打卡瓜分大奖！

2023腾讯·技术创作特训营第三期

0 人点赞

C++11 thread_local的 用法

thread_local 简介

thread_local 使用示例

全局变量

局部变量

类对象

类成员变量

C++11 thread_local的用法