本期文章由 HMY lhmouse赞助
资讯
标准委员会动态/ide/编译器信息放在这里
最近c 26还在开会,前方记者Mick有报道
后续有其他报道咱们也会转发一下
编译器信息最新动态推荐关注hellogcc公众号 本周更新 2024-06-26 第260期
文章
Member ordering and binary sizes
https://www.sandordargo.com/blog/2024/06/26/member-ordering-and-binary-size
就是字段排序可能存在空洞 导致padding填充
比如
代码语言:javascript复制class Example1 {
public:
double a;// = 4.2; // 8 bytes
int b;// = 1; // 4 bytes
float c; // 4 bytes
char d; // 1 byte
bool e; // 1 byte
bool f; // 1 byte
// Assuming typical alignment, 'a' (8 bytes) should be first,
// followed by 'b' and 'c' (both 4 bytes), and then 'd' and 'e' (1 byte each).
};
static_assert(sizeof(Example1) == 24);
struct Example2 {
int b;//=1; // 4 bytes
char d; // 1 byte
float c; // 4 bytes
bool e; // 1 byte
double a;// = 4.2; // 8 bytes
bool f;
};
static_assert(sizeof(Example2) == 32);
代码语言:javascript复制
popcnt也能向量化?
看个乐
UB or not UB: How gcc and clang handle statically known undefined behaviour
https://diekmann.uk/blog/2024-06-25-statically-known-undefined-behaviour.html
省流 gcc遇到UB倾向于生成ud2 崩溃,clang遇到UB倾向于不崩溃,把影响抹除
How the STL uses explicit
https://quuxplusone.github.io/blog/2024/06/25/most-stl-ctors-arent-explicit-but-yours-still-should-be/
标准库用的还是比较保守的
Efficiently allocating lots of std::shared_ptr
https://www.lukas-barth.net/blog/efficiently-allocating-shared-ptr/
单线程分配shared_ptr对象,比较new make_shared 对象池,压测结果
new | make_shared | fast_pool_allocator | |
---|---|---|---|
GCC | 69.0 | 38.1 | 34.0 |
Clang libstdc | 69.2 | 38.6 | 35.7 |
Clang libc | 76.5 | 40.8 | 40.4 |
GCC tcmalloc | 57.2 | 30.3 | 33.8 |
GCC jemalloc | 86.7 | 42.7 | 33.9 |
看一乐 当然结果是满足直觉的,池化快一些,或者别用一堆shared ptr,别这么用
How much memory does a call to ‘malloc’ allocates?
https://lemire.me/blog/2024/06/27/how-much-memory-does-a-call-to-malloc-allocates/
讲的是个常识,你分配的内存总是向上取整的,不一定是你要多少分配多少
size, speed and order tradeoffs
https://github.com/seanbutler/cache-speed-tests
其实就是访问l1 l2 cache会有不同延迟,通过不同大小文件来测试,有空可以跑一下代码
为什么C 的std::forward会有两种重载
https://zhuanlan.zhihu.com/p/705380238
超详细!spdlog源码解析
https://zhuanlan.zhihu.com/p/674073158 https://zhuanlan.zhihu.com/p/674689537 https://zhuanlan.zhihu.com/p/675918624
Latency-Sensitive Applications and the Memory Subsystem: Keeping the Data in the Cache
https://johnnysswlab.com/latency-sensitive-applications-and-the-memory-subsystem-keeping-the-data-in-the-cache/
while循环,没干活,干活逻辑是数据访问,那没干活分支应该可以热数据
比如原来的逻辑
代码语言:javascript复制std::unordered_map<int32_t, order> my_orders;
...
packet_t* p;
while(!exit) {
p = get_packet();
// If packet arrived
if (p) {
// Check if the identifier is known to us
auto it = my_orders.find(p->id);
if (it != my_orders.end()) {
send_answer(p->origin, it->second);
}
}
}
while里是个干活逻辑,但是有个大的if,我们可以把这个if拆出来分成干活不干活两个逻辑
代码语言:javascript复制std::unordered_map<int32_t, order> my_orders;
...
packet_t* p;
int64_t total_random_found = 0;
while(!exit) {
// 增加个检查header 然后再判断packet,不满足就去warm
// 如果header没满足,packet必不满足
if (packet_header_arrived()) {
p = get_packet();
// If packet arrived
if (p) {
// Check if the identifier is known to us
auto it = my_orders.find(p->id);
if (it != my_orders.end()) {
send_answer(p->origin, it->second);
}
}
} else {
// 不干活就Cache warming
auto random_id = get_random_id();
auto it = my_orders.find(random_id);
// 随便干点啥避免被编译器优化掉
total_random_found = (it != my_orders.end());
}
}
std::cout << "Total random found " << total_random_found << "n";
当然这种cache warm不一定非得随机,有可能副作用
可以从历史值来用,有个词怎么说来着,启发式
硬件层也有cache warm 比如
https://johnnysswlab.com/wp-content/uploads/Introducing-Cache-Pseudo-Locking-to-Reduce-Memory-Access-Latency-Reinette-Chatre-Intel.pdf
amd也有 L3 Cache Range Reservation 不过没例子
作者测试了软件模拟cache warm,随机访问
数据,迭代多次的延迟,越小越好
hashmap数据量 | 正常访问hashmap | 没有访问的时候只warm 0 | 没有访问的时候随机warm |
---|---|---|---|
1 K | 226.1 (219.0) | 213.3 (205.1) | 132.5 (67.3) |
4 K | 324.7 (296.3) | 350.7 (331.3) | 140.1 (95.4) |
16 K | 396.8 (341.1) | 389.1 (354.5) | 208.7 (134.5) |
64 K | 425.5 (376.1) | 416.0 (360.6) | 232.1 (152.6) |
256 K | 514.2 (451.5) | 473.3 (480.6) | 338.8 (317.6) |
1 M | 599.8 (550.2) | 615.1 (573.6) | 466.3 (429.8) |
4 M | 702.1 (647.0) | 619.7 (649.2) | 531.3 (508.3) |
16 M | 756.7 (677.6) | 668.8 (707.4) | 543.2 (499.9) |
64 M | 769.1 (702.3) | 735.9 (734.2) | 641.0 (774.4) |
能看到随机访问 随机warm效果显著
Latency-Sensitive Application and the Memory Subsystem Part 2: Memory Management Mechanisms
https://johnnysswlab.com/latency-sensitive-application-and-the-memory-subsystem-part-2-memory-management-mechanisms
这篇文章的视角比较奇怪,可能和已知的信息不同,目标是低延迟避免内存机制影响
page fault会引入延迟,所以要破坏page fault的生成条件 怎么做?
尽可能分配好,而不是用到在分配,有概率触发page fault
- • mmap使用MAP_POPULATE
- • 使用calloc不用malloc,用malloc/new 强制0填充
- • 零初始化数组,立马使用上
- • vector 创造时直接构造好大小,不用reserve reserve不一定内存预分配,可能还会造成page fault()
- • 或者重载allocator,预先分配内存
- • 其他容器也是有类似的问题
- • 使用内存大页
- • 禁用 5-Level Page Walk
- • TLB shotdown规避 这个一时半会讲不完 可以看这个 https://www.jabperf.com/how-to-deter-or-disarm-tlb-shootdowns/
- • 关闭swap
视频
C Weekly - Ep 434 - GCC's Amazing NEW (2024) -Wnrvo
https://www.youtube.com/watch?v=PTCFddZfnXc&ab_channel=C++WeeklyWithJasonTurner
-Wnrvo
帮助分析,效果显著
Mirko Arsenijević — Lifting the Pipes - Beyond Sender/Receiver and Expected Outcome — 26.6.2024.
https://www.youtube.com/watch?v=B5uNxPe-MVQ&ab_channel=C++Serbia
介绍他的dag库,没开源
开源项目介绍
- • https://github.com/lhmouse/asteria 一个脚本语言,可嵌入,长期找人,希望胖友们帮帮忙,也可以加群753302367和作者对线
互动环节
最近睡眠很差,如果觉得内容有误大家多多指出,困了,先睡
练了一下午街霸6,好难啊我靠,我年纪真是大了,反应跟不上连招连不上,也有可能是设备不行