C++ 中文周刊 第122期

2024-07-30 14:15:14 浏览数 (2)

资讯

标准委员会动态/ide/编译器信息放在这里

#include cleanup in Visual Studio[1]

支持提示删除没用的头文件,我记得clion是不是早就支持了?

编译器信息最新动态推荐关注hellogcc公众号 本周更新 2023-07-12 第210期

Xmake v2.8.1 发布,大量细节特性改进[2]

支持grpc-plugin了,那么别的支持也能实现,感觉更强了,可以用大项目试试

文章

  • • [Did you know that C 26 added `@, $, and `` to the basic character set? ](https://github.com/tip-of-the-week/cpp/blob/master/tips/338.md)
代码语言:javascript复制
auto $dollar_sign = 42;
auto @commerical_at = 42;
auto `grave_accent = 42;

没懂有啥用

  • • 字符串截断问题

在某群和群友讨论 std::string能不能存带 '/0'的字符,我之前遇到过坑,就想当然的认为不能,然后被群友教育了一种用法

代码语言:javascript复制
#include <iostream>
#include <vector>
#include <string>

int main() {
    std::string s1 = std::string{"123"};
    std::cout<<"---n";
    std::cout<< s1.size()<<"n";
    std::cout<<"---n";
    std::string s2;
    s2.append("");
    s2.append("1");
    std::cout<<"---n";
    std::cout<< s2.size()<<"n";
    std::cout<< s2<<"n";
    std::cout<<"---n";

    std::vector<char> v = {'','','2','3','','5','6'};
    std::string s3(v.begin(), v.end());
    std::cout<<"---n";
    std::cout<< s3.size()<<"n";
    std::cout<< s3<<"n";
    std::cout<<"---n";
    std::string s4 = std::string{"a23", 17};
    std::cout<<"---n";
    std::cout<< s4.size()<<"n";
    std::cout<< s4<<"n";
    std::cout<<"---n";
    return 0;
}
代码语言:javascript复制

s1 s2都是截断的,但是s3不是,想要std::string包含/0只能通过迭代器构造,不能从c字符串构造,因为c字符串复制判断结尾用的/0

同理,s4加上了长度构造,就包含/0不会截断了。c字符串的缺陷,没有长度信息

我这里先入为主觉得所有构造都这样了。稍为想的远一点

  • • Inside boost::concurrent_flat_map[3]

boost::concurrent_flat_map开链的并发hashmap速度不输tbb boost 1.83发布

  • • 为什么新版内核将进程pid管理从bitmap替换成了radix-tree?

有点意思

  • • The C Type Loophole (C 14)[4]

经典的友元函数注入

代码语言:javascript复制
template<int N> struct tag{};

template<typename T, int N>
struct loophole_t {
  friend auto loophole(tag<N>) { return T{}; };
};
auto loophole(tag<0>);
sizeof(loophole_t<std::string, 0> );
statc_assert(std::is_same<std::string, decltype(loophole(tag<0>{})) >::value, "same");
代码语言:javascript复制

这玩意属于缺陷,说不定以后就修了,为啥要讲这个老文章?看下面这个

  • • Concept到底,来自te::conceptify的救赎![5]

看代码

代码语言:javascript复制
//Conceptify it (Requires C  20)
struct Drawable {
  void draw(std::ostream &out) const {
    te::call([](auto const &self, auto &out)
      -> decltype(self.draw(out)) { self.draw(out); }, *this, out);
  }
};

struct Square {
  void draw(std::ostream &out) const { out << "Square"; }
};

template<te::conceptify<Drawable> TDrawable>
void draw(TDrawable const &drawable) { 
  drawable.draw(std::cout);
}

int main() {
  auto drawable = Square{};
  draw(drawable);  // prints Square
}
代码语言:javascript复制

我咋感觉te::conceptify<Drawable>看上去和实现concept没区别?

这种实现能定义te::poly<Drawable>保存在容器里然后遍历?

实现原理是通过一个mapping类注册类型和typelist,typelist绑上lambda, 我说的不精确,可以看原文

call注册

代码语言:javascript复制
template <...>
constexpr auto call(const TExpr expr, const I &interface, Ts &&... args) {
  ...
  return detail::call_impl<I>(...);
}

template <...>
constexpr auto call_impl(...) {
  void(typename mappings<I, N>::template set<type_list<TExpr, Ts...> >{});
  return ...;
}
代码语言:javascript复制

mapping是这样的

通过这个友元注入get 来记住T类型,也就是之前保存的typelist,构造出lambda

代码语言:javascript复制
template <class, std::size_t>
struct mappings final {
  friend auto get(mappings);
  template <class T>
  struct set {
    friend auto get(mappings) { return T{}; }
  };
};
代码语言:javascript复制

template <class T, class TExpr, class... Ts>
constexpr auto requires_impl(type_list<TExpr, Ts...>)
    -> decltype(&TExpr::template operator()<T, Ts...>);

template <class I, class T, std::size_t... Ns>
constexpr auto requires_impl(std::index_sequence<Ns...>) -> type_list<
    decltype(requires_impl<I>(decltype(get(mappings<T, Ns   1>{})){}))...>;
}  // namespace detail


template <class I, class T>
concept bool conceptify = requires {
  detail::requires_impl<I, T>(
      std::make_index_sequence<detail::mappings_size<T, I>()>{});
};

挺巧妙,友元函数注入 typelist绑定

问题在于lambda每次都是构造的,可能有小对象问题,但愿编译器能优化掉

  • • A find-Function to Append .and_then(..).or_else(..) in C [6]

find还要判断结果,很烦,类似optional,封装一下

代码语言:javascript复制
#include <iostream>
#include <vector>
#include <algorithm>
#include <optional>

namespace cwt {    
    // first we craete a type which will hold our find result
    template<typename T> 
    class find_result {
        public: 
            find_result() = default;
            find_result(T value) : m_value{value} {}

            // this is the and_then method which gets a callback
            template<typename Func>
            const find_result<T>& and_then(const Func&& func) const {
                // we only call the callback if we have a value 
                if (m_value.has_value()) {
                    func(m_value.value());
                }
                // and to further add or_else we return *this
                return *this;
            }

            // almost same here, just with return type void
            template<typename Func>
            void or_else(const Func&& func) const {
                if (!m_value.has_value()) {
                    func();
                }
            }
        private:
            // and since we don't know if we found a value
            // we hold possible one as optional
            std::optional<T> m_value{std::nullopt}; 
    };

    // this my find function, where try to find a value in a vector
    template<typename T>
    find_result<T> find(const std::vector<T>& v, const T value) {
        // like before we use the iterator
        auto it = std::find(v.begin(), v.end(), value);
        // and if we didn't find the value we return
        // find_result default constructed
        if (it == v.end()) {
            return find_result<T>();
        } else {
            // or with the found value
            return find_result<T>(*it);
        }
    }
} // namespace cwt

int main() {
    // lets create a simple vector of int values
    std::vector<int> v = {1,2,3,4};

    // we use our find function
    // and since we return find_result<int> 
    // we can append or call and_then or_else directly
    cwt::find(v, 2)
    .and_then([](int result){ std::cout << "found " << result << 'n'; })
    .or_else([](){ std::cout << "found nothingn"; })
    ;

    cwt::find(v, 10)
    .and_then([](int result){ std::cout << "found " << result << 'n'; })
    .or_else([](){ std::cout << "found nothingn"; })
    ;

    return 0;
}

看个乐

  • • Packing a string of digits into an integer quickly[7]

SIMD环节,需求,把 "20141103 012910"转成数字0x20141103012910

代码语言:javascript复制
#include <x86intrin.h> // Windows: <intrin.h>
#include <string.h>

// From "20141103 012910", we want to get
// 0x20141103012910
uint64_t extract_nibbles(const char* c) {
  uint64_t part1, part2;
  memcpy(&part1, c, sizeof(uint64_t));
  memcpy(&part2 , c   7, sizeof(uint64_t));
  part1 = _bswap64(part1);
  part2 = _bswap64(part2);
  part1 = _pext_u64(part1, 0x0f0f0f0f0f0f0f0f);
  part2 = _pext_u64(part2, 0x0f000f0f0f0f0f0f);
  return (part1<<24) | (part2);
}
代码语言:javascript复制

汇编

代码语言:javascript复制
movbe rax, QWORD PTR [rdi]
movbe rdx, QWORD PTR [rdi 7]
movabs rcx, 1085102592571150095
pext rax, rax, rcx
movabs rcx, 1080880467920490255
sal rax, 24
pext rdx, rdx, rcx
or rax, rdx

pext在amd zen3架构上开销很大,但本身也非常小巧了

ARM平台

代码语言:javascript复制
#include <arm_neon.h>
// From "20141103 012910", we want to get
// 0x20141103012910
uint64_t extract_nibbles(const char *c) {
  const uint8_t *ascii = (const uint8_t *)(c);
  uint8x16_t in = vld1q_u8(ascii);
  // masking the high nibbles,
  in = vandq_u8(in, vmovq_n_u8(0x0f));
  // shuffle the bytes
  const uint8x16_t shuf = {14, 13, 12, 11, 10, 9, 7, 6,
    5, 4, 3, 2, 1, 0, 255, 255};
  in = vqtbl1q_u8(in, shuf);
  // then shift/or
  uint16x8_t ins =
    vsraq_n_u16(vreinterpretq_u16_u8(in),
    vreinterpretq_u16_u8(in), 4);
  // then narrow (16->8),
  int8x8_t packed = vmovn_u16(ins);
  // extract to general register.
  return vget_lane_u64(vreinterpret_u64_u16(packed), 0);
}

汇编

代码语言:javascript复制
adrp x8, .LCPI0_0
ldr q1, [x0]
movi v0.16b, #15
ldr q2, [x8, :lo12:.LCPI0_0]
and v0.16b, v1.16b, v0.16b
tbl v0.16b, { v0.16b }, v2.16b
usra v0.8h, v0.8h, #4
xtn v0.8b, v0.8h
fmov x0, d0
  • • Recognizing string prefixes with SIMD instructions[8]

SIMD环节,在一个字符串数组里找子串是否存在

普通实现,bsearsh

代码语言:javascript复制
std::string *lookup_symbol(const char *input) {
  return bsearch(input, strings.data(), strings.size(),
  sizeof(std::string), compare);
}

或者trie

或者SIMD

代码我就不贴了,https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2023/07/13/benchmarks/benchmark.cpp

天书

参考阅读 https://trent.me/is-prefix-of-string-in-table/ 天书

  • • Software Performance and Class Layout[9]

这个讲的是局部性问题

比如

代码语言:javascript复制
class my_class {
   int a;
   int b;
   ...
   int z;
};
int sum_all(my_class* m, int n) {
    int sum = 0;
    for (int i = 0; i < n; i  ) {
        sum  = m[i].a   m[i].z;
    }
    return sum;
}

循环用到了a和z,那么a和z就应该靠近点

代码语言:javascript复制
class my_class {
   int a;
   int z;
   int b;
   ...
};

用不到的拆出来

代码语言:javascript复制
class my_class {
   int m1;
   int m2;
   int m3;
};
int sum_all(my_class* m, int n) {
    int sum = 0;
    for (int i = 0; i < n; i  ) {
        sum  = m[i].m1   m[i].m2;
    }
    return sum;
}

没用到m3,那就把它拿出来

代码语言:javascript复制
class my_class_base {
   int m1;
   int m2;
};
class my_class_aux {
   int m3;
};
int sum_all(my_class_base* m, int n) {
    int sum = 0;
    for (int i = 0; i < n; i  ) {
        sum  = m[i].m1   m[i].m2;
    }
    return sum;
}

同理,如果两个类有互相使用,就放在一起

代码语言:javascript复制
class my_class1 {
    int m1;
    int m2;
};
class my_class2 {
    int a1;
    int a2;
}
int sum_all(my_class1* m1, my_class2* m2, int n) {
    int sum = 0;
    for (int i = 0; i < n; i  ) {
        sum  = m1[i].m1   m1[i].m2   m2[i].a1;
    }
    return sum;
}

改成

代码语言:javascript复制
class my_class1 {
    int m1;
    int m2;
    int a1;
};
class my_class2 {
    int a2;
};
int sum_all(my_class1* m1, int n) {
    int sum = 0;
    for (int i = 0; i < n; i  ) {
        sum  = m1[i].m1   m1[i].m2   m1[i].a1;
    }
    return sum;
}

不要循环中访问指针

代码语言:javascript复制
class my_class {
   int m1;
   int* p_a1;
};
int sum_all(my_class* m, int n) {
    int sum = 0;
    for (int i = 0; i < n; i  ) {
        sum  = m[i].m1   *m[i].p_a1;
    }
    return sum;
}

这个pa1非常不合理,应该改成值

再比如这种猥琐的公用

代码语言:javascript复制
class shared {
    int a1;
};
class my_class_1 {
   int m1;
   shared* s;
};
class my_class_2 {
   int m1;
   shared* s;
};
int sum_all_1(my_class_1* m, int n) {
    int sum = 0;
    for (int i = 0; i < n; i  ) {
        sum  = m[i].m1   m[i].s->a1;
    }
    return sum;
}
int sum_all_2(my_class_2* m, int n) {
    int sum = 0;
    for (int i = 0; i < n; i  ) {
        sum  = m[i].m1   m[i].s->a1;
    }
    return sum;
}

更新s省事了但是实际上循环访问低效,也得改成值

Structure Of Arrays (SOA)结构体数组改成数组结构体

把数据集改小,比如

代码语言:javascript复制
class big_class {
   int index;
   ...
};
void my_sort(std::vector<big_class>& v) {
    std::sort(v.begin(), v.end(), [](const big_class& l, const big_class& r) { return l.index < r.index; });
}

一堆不相关的数据参与了数据加载,可以改成这个

代码语言:javascript复制
class small_class {
    int index;
    int pointer;
};
void my_sort(std::vector<big_class>& v) {
    std::vector<small_class> tmp;
    tmp.reserve(v.size());
    for (int i = 0; i < v.size(); i  ) {
        tmp.push_back({v[i].index, i});
    }
    std::sort(tmp.begin(), tmp.end(), [](const small_class& l, const small_class& r) { return l.index < r.index; });
    std::vector<big_class> result;
    result.reserve(tmp.size());
    for (int i = 0; i < tmp.size(); i  ) {
        result.push_back(v[tmp[i].index]);
    }
    v = std::move(result);
}

这种改法得测量一下,未必有收益,可能有,但不多,主要取决于数据结构,只要bigclass比smallclass大很多的话,收益肯定是有的

  • • Why does the compiler complain about a missing constructor when I’m just resizing my std::vector to a smaller size?[10]

你用std::vector::resize缩容,提示构造告警

代码语言:javascript复制
std::vector<Thing> things;
things.resize(n); // keep only the first n
代码语言:javascript复制

但你是缩容,不需要构造啊。问题在于resize包括扩容和缩容,不知道你是缩容

还是erase吧

代码语言:javascript复制
things.erase(things.begin()   n, things.end());

  • • How to clone a Windows Runtime vector in the face of possible concurrent modification, part 1[11]
  • • How to clone a Windows Runtime vector in the face of possible concurrent modification, part 2[12]
  • • How to wait for multiple C coroutines to complete before propagating failure, concluding remarks[13]
  • • How to wait for multiple C coroutines to complete before propagating failure, finding the awaiter[14]

看不懂

开源项目需要人手

  • • asteria[15] 一个脚本语言,可嵌入,长期找人,希望胖友们帮帮忙,也可以加群384042845和作者对线
  • • Unilang[16] deepin的一个通用编程语言,点子有点意思,也缺人,感兴趣的可以github讨论区或者deepin论坛看一看。这里也挂着长期推荐了
  • • gcc-mcf[17] 懂的都懂

新项目介绍/版本更新

  • • https://github.com/MikePopoloski/boost_unordered 把boost unorderd抽出来了 boost很重
  • • https://github.com/intel/x86-simd-sort/releases/tag/v2.0 simd sort。能快点

本文永久链接[18]

如果有疑问评论最好在上面链接到评论区里评论,这样方便搜索,微信公众号有点封闭/知乎吞评论

引用链接

[1] #include cleanup in Visual Studio: https://devblogs.microsoft.com/cppblog/include-cleanup-in-visual-studio/ [2] Xmake v2.8.1 发布,大量细节特性改进: https://zhuanlan.zhihu.com/p/642825956 [3] Inside boost::concurrent_flat_map: http://bannalia.blogspot.com/2023/07/inside-boostconcurrentflatmap.html [4] The C Type Loophole (C 14): https://alexpolt.github.io/type-loophole.html [5] Concept到底,来自te::conceptify的救赎!: https://zhuanlan.zhihu.com/p/642889170 [6] A find-Function to Append .and_then(..).or_else(..) in C : www.codingwiththomas.com/blog/a-find-function-to-append-andthenorelse-in-c [7] Packing a string of digits into an integer quickly: https://lemire.me/blog/2023/07/07/packing-a-string-of-digits-into-an-integer-quickly/ [8] Recognizing string prefixes with SIMD instructions: https://lemire.me/blog/2023/07/14/recognizing-string-prefixes-with-simd-instructions/ [9] Software Performance and Class Layout: https://johnnysswlab.com/software-performance-and-class-layout/?utm_source=feedly&utm_medium=rss&utm_campaign=software-performance-and-class-layout [10] Why does the compiler complain about a missing constructor when I’m just resizing my std::vector to a smaller size?: https://devblogs.microsoft.com/oldnewthing/20230711-00/?p=108408 [11] How to clone a Windows Runtime vector in the face of possible concurrent modification, part 1: https://devblogs.microsoft.com/oldnewthing/20230712-00/?p=108412 [12] How to clone a Windows Runtime vector in the face of possible concurrent modification, part 2: https://devblogs.microsoft.com/oldnewthing/20230713-00/?p=108446 [13] How to wait for multiple C coroutines to complete before propagating failure, concluding remarks: https://devblogs.microsoft.com/oldnewthing/20230710-00/?p=108405 [14] How to wait for multiple C coroutines to complete before propagating failure, finding the awaiter: https://devblogs.microsoft.com/oldnewthing/20230707-00/?p=108402 [15] asteria: https://github.com/lhmouse/asteria [16] Unilang: https://github.com/linuxdeepin/unilang [17] gcc-mcf: https://gcc-mcf.lhmouse.com/ [18] 本文永久链接:

0 人点赞