C++ 中文周刊 2024-06-30 第162期

2024-07-30 15:23:21 浏览数 (2)

本期文章由 HMY lhmouse赞助

资讯

标准委员会动态/ide/编译器信息放在这里

最近c 26还在开会,前方记者Mick有报道

后续有其他报道咱们也会转发一下

编译器信息最新动态推荐关注hellogcc公众号 本周更新 2024-06-26 第260期

文章

Member ordering and binary sizes

https://www.sandordargo.com/blog/2024/06/26/member-ordering-and-binary-size

就是字段排序可能存在空洞 导致padding填充

比如

代码语言:javascript复制
class Example1 {
public:
    double a;// = 4.2;   // 8 bytes
    int b;// = 1;      // 4 bytes
    float c;    // 4 bytes
    char d;     // 1 byte
    bool e;     // 1 byte
    bool f;     // 1 byte
    // Assuming typical alignment, 'a' (8 bytes) should be first,
    // followed by 'b' and 'c' (both 4 bytes), and then 'd' and 'e' (1 byte each).
};

static_assert(sizeof(Example1) == 24);
struct Example2 {
    int b;//=1;      // 4 bytes
    char d;     // 1 byte
    float c;    // 4 bytes
    bool e;     // 1 byte
    double a;// = 4.2;   // 8 bytes
    bool f;
};

static_assert(sizeof(Example2) == 32);
代码语言:javascript复制

popcnt也能向量化?

看个乐

UB or not UB: How gcc and clang handle statically known undefined behaviour

https://diekmann.uk/blog/2024-06-25-statically-known-undefined-behaviour.html

省流 gcc遇到UB倾向于生成ud2 崩溃,clang遇到UB倾向于不崩溃,把影响抹除

How the STL uses explicit

https://quuxplusone.github.io/blog/2024/06/25/most-stl-ctors-arent-explicit-but-yours-still-should-be/

标准库用的还是比较保守的

Efficiently allocating lots of std::shared_ptr

https://www.lukas-barth.net/blog/efficiently-allocating-shared-ptr/

单线程分配shared_ptr对象,比较new make_shared 对象池,压测结果

new

make_shared

fast_pool_allocator

GCC

69.0

38.1

34.0

Clang libstdc

69.2

38.6

35.7

Clang libc

76.5

40.8

40.4

GCC tcmalloc

57.2

30.3

33.8

GCC jemalloc

86.7

42.7

33.9

看一乐 当然结果是满足直觉的,池化快一些,或者别用一堆shared ptr,别这么用

How much memory does a call to ‘malloc’ allocates?

https://lemire.me/blog/2024/06/27/how-much-memory-does-a-call-to-malloc-allocates/

讲的是个常识,你分配的内存总是向上取整的,不一定是你要多少分配多少

size, speed and order tradeoffs

https://github.com/seanbutler/cache-speed-tests

其实就是访问l1 l2 cache会有不同延迟,通过不同大小文件来测试,有空可以跑一下代码

为什么C 的std::forward会有两种重载

https://zhuanlan.zhihu.com/p/705380238

超详细!spdlog源码解析

https://zhuanlan.zhihu.com/p/674073158 https://zhuanlan.zhihu.com/p/674689537 https://zhuanlan.zhihu.com/p/675918624

Latency-Sensitive Applications and the Memory Subsystem: Keeping the Data in the Cache

https://johnnysswlab.com/latency-sensitive-applications-and-the-memory-subsystem-keeping-the-data-in-the-cache/

while循环,没干活,干活逻辑是数据访问,那没干活分支应该可以热数据

比如原来的逻辑

代码语言:javascript复制
std::unordered_map<int32_t, order> my_orders;
...
packet_t* p;
while(!exit) {
    p = get_packet();
    // If packet arrived
    if (p) {
        // Check if the identifier is known to us
        auto it = my_orders.find(p->id);
        if (it != my_orders.end()) {
            send_answer(p->origin, it->second);
        }
    }
}

while里是个干活逻辑,但是有个大的if,我们可以把这个if拆出来分成干活不干活两个逻辑

代码语言:javascript复制
std::unordered_map<int32_t, order> my_orders;
...
packet_t* p;
int64_t total_random_found = 0;
while(!exit) {
    // 增加个检查header 然后再判断packet,不满足就去warm
    // 如果header没满足,packet必不满足
    if (packet_header_arrived()) {
        p = get_packet();
        // If packet arrived
        if (p) {
            // Check if the identifier is known to us
            auto it = my_orders.find(p->id);
            if (it != my_orders.end()) {
                send_answer(p->origin, it->second);
            }
        }
    } else {
        // 不干活就Cache warming 
        auto random_id = get_random_id();
        auto it = my_orders.find(random_id);
        // 随便干点啥避免被编译器优化掉
        total_random_found  = (it != my_orders.end());
    }
}
std::cout << "Total random found " << total_random_found << "n";

当然这种cache warm不一定非得随机,有可能副作用

可以从历史值来用,有个词怎么说来着,启发式

硬件层也有cache warm 比如

https://johnnysswlab.com/wp-content/uploads/Introducing-Cache-Pseudo-Locking-to-Reduce-Memory-Access-Latency-Reinette-Chatre-Intel.pdf

amd也有 L3 Cache Range Reservation 不过没例子

作者测试了软件模拟cache warm,随机访问

数据,迭代多次的延迟,越小越好

hashmap数据量

正常访问hashmap

没有访问的时候只warm 0

没有访问的时候随机warm

1 K

226.1 (219.0)

213.3 (205.1)

132.5 (67.3)

4 K

324.7 (296.3)

350.7 (331.3)

140.1 (95.4)

16 K

396.8 (341.1)

389.1 (354.5)

208.7 (134.5)

64 K

425.5 (376.1)

416.0 (360.6)

232.1 (152.6)

256 K

514.2 (451.5)

473.3 (480.6)

338.8 (317.6)

1 M

599.8 (550.2)

615.1 (573.6)

466.3 (429.8)

4 M

702.1 (647.0)

619.7 (649.2)

531.3 (508.3)

16 M

756.7 (677.6)

668.8 (707.4)

543.2 (499.9)

64 M

769.1 (702.3)

735.9 (734.2)

641.0 (774.4)

能看到随机访问 随机warm效果显著

Latency-Sensitive Application and the Memory Subsystem Part 2: Memory Management Mechanisms

https://johnnysswlab.com/latency-sensitive-application-and-the-memory-subsystem-part-2-memory-management-mechanisms

这篇文章的视角比较奇怪,可能和已知的信息不同,目标是低延迟避免内存机制影响

page fault会引入延迟,所以要破坏page fault的生成条件 怎么做?

尽可能分配好,而不是用到在分配,有概率触发page fault

  • • mmap使用MAP_POPULATE
  • • 使用calloc不用malloc,用malloc/new 强制0填充
  • • 零初始化数组,立马使用上
  • • vector 创造时直接构造好大小,不用reserve reserve不一定内存预分配,可能还会造成page fault()
    • • 或者重载allocator,预先分配内存
    • • 其他容器也是有类似的问题
  • • 使用内存大页
  • • 禁用 5-Level Page Walk
  • • TLB shotdown规避 这个一时半会讲不完 可以看这个 https://www.jabperf.com/how-to-deter-or-disarm-tlb-shootdowns/
  • • 关闭swap

视频

C Weekly - Ep 434 - GCC's Amazing NEW (2024) -Wnrvo

https://www.youtube.com/watch?v=PTCFddZfnXc&ab_channel=C++WeeklyWithJasonTurner

-Wnrvo 帮助分析,效果显著

Mirko Arsenijević — Lifting the Pipes - Beyond Sender/Receiver and Expected Outcome — 26.6.2024.
https://www.youtube.com/watch?v=B5uNxPe-MVQ&ab_channel=C++Serbia

介绍他的dag库,没开源

开源项目介绍

  • • https://github.com/lhmouse/asteria 一个脚本语言,可嵌入,长期找人,希望胖友们帮帮忙,也可以加群753302367和作者对线

互动环节

最近睡眠很差,如果觉得内容有误大家多多指出,困了,先睡

练了一下午街霸6,好难啊我靠,我年纪真是大了,反应跟不上连招连不上,也有可能是设备不行

0 人点赞