Linux Netlink

2023-03-18 17:26:01 浏览数 (1)

When you are writing a linux application that needs either kernel to userspace communications or userspace to kernel communications, the typical answer is to use ioctl and sockets.

当您编写需要内核到用户空间通信或用户空间到内核通信的 Linux 应用程序时,典型的答案是使用 ioctl 和套接字。

This is a simple mechanism for sending information down from userspace into the kernel to make requests for info, or to direct the kernel to perform an operation on behalf of the userspace application.

这是一种简单的机制,用于将信息从用户空间向下发送到内核以发出信息请求,或指示内核代表用户空间应用程序执行操作。

A good example of this type of communications between a userspace application and the kernel can be found in the venerable ethtool config application. Here the tool itself is a userspace application that communicates via sockets to the kernel. The kernel contains the API that the application uses to perform the communications.

用户空间应用程序和内核之间这种类型的通信的一个很好的例子可以在古老的 ethtool 配置应用程序中找到。这里的工具本身是一个用户空间应用程序,它通过套接字与内核进行通信。内核包含应用程序用来执行通信的 API。

Setting a NICs channels 设置网卡通道

Let’s look at an example usecase of ethtool with a modern multi-queue network interface (NIC). Modern NICs have the hardware and ability to use multiple channels for sending & receiving packets. These take advantage of multi-core CPUs to balance the load of transmitting (Tx) and receiving (Rx) traffic. Historically all the traffic (and associated interrupts) was handled by a single core, spreading the workload across multiple cores can significantly improve performance.

让我们看一下带有现代多队列网络接口 (NIC) 的 ethtool 的示例用例。现代 NIC 具有使用多个通道发送和接收数据包的硬件和能力。它们利用多核 CPU 来平衡传输 (Tx) 和接收 (Rx) 流量的负载。从历史上看,所有流量(和相关中断)都由单个内核处理,将工作负载分散到多个内核可以显着提高性能。

How would we set the combined channel number on a NIC that supports the feature using ioctl?

我们如何在支持使用 ioctl 的功能的 NIC 上设置组合通道号?

代码语言:javascript复制
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <net/if.h>
#include <linux/ethtool.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <linux/sockios.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>

typedef struct example_ethtools_channels example_ethtools_channels_t;

struct example_ethtools_channels
{
    unsigned int num_rx_channels; /**< Number of rx channels */
    unsigned int num_tx_channels; /**< Number of tx channels */
    unsigned int num_other_channels; /**< Number of other channels */
    unsigned int num_combined_channels; /**< Number of combined channels */
};

size_t string_strlcpy(char *dst, const char *src, size_t size)
{
    size_t len;
    len = strlen(src);
    if (len > size)
        len = size;
    memcpy(dst, src, len);
    dst[len] = '';
    return len;
}

int set_channels(const char *if_name, const unsigned int if_idx, const example_ethtools_channels_t *channels)
{
    int fd = -1;
    int result = 0;
    struct ifreq ifr;
    struct ethtool_channels ethcmd;

    fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_IP);
    if (fd == -1) {
        return -1;
    }

    /* Set the interface name in this request. */
    memset(&ifr, 0, sizeof(ifr));
    string_strlcpy(ifr.ifr_name, if_name, sizeof(ifr.ifr_name));


    /* Set the ethtool command */
    ethcmd.cmd = ETHTOOL_SCHANNELS;
    ethcmd.rx_count = channels->num_rx_channels;
    ethcmd.tx_count = channels->num_tx_channels;
    ethcmd.other_count = channels->num_other_channels;
    ethcmd.combined_count = channels->num_combined_channels;

    printf("setting rx_count[%d] tx_count[%d] other_count[%d] combined_count[%d]n",
                    channels->num_rx_channels,
                    channels->num_tx_channels,
                    channels->num_other_channels,
                    channels->num_combined_channels);

    /* Set the ifrequest data to point to the ethtool command and submit. */
    ifr.ifr_data = &ethcmd;

    int status = ioctl(fd, SIOCETHTOOL, &ifr);

    if (status != 0) {
        printf("error setting channels [%d]n", status);
        printf("errno: %dn", errno);
        result = -1;
    }
    close(fd);

    return result;
}

int main(int argc, char **argv)
{
    char *if_name = argv[1];
    unsigned int if_idx = if_nametoindex(if_name);
    example_ethtools_channels_t ch;
    ch.num_combined_channels = 4;
    ch.num_rx_channels = 0;
    ch.num_tx_channels = 0;
    ch.num_other_channels = 0;
    printf("Setting new channel details...n");
    int result = set_channels(if_name, if_idx, &ch);
    if(result == -1) {
        printf("failed to set channelsn");
        return 1;
    }
    return 0;
}

Netlink

In contrast to the previous communications options between application and kernel, to add a new protocol with netlink requires a simple addition of a constant to netlink.h then the kernel and application can immediately communicate via a sockets-based API.

与之前应用程序和内核之间的通信选项相比,要使用 netlink 添加新协议,只需向 netlink.h 添加一个常量,然后内核和应用程序就可以立即通过基于套接字的 API 进行通信。

The original goal of netlink was to provide a better way of modifying network related settings and transferring network related information between userspace and kernel. Importantly, the communications between userspace and kernel is bi-directional or rather the netlink socket is a duplex socket.

netlink 的最初目标是提供一种更好的方式来修改网络相关设置以及在用户空间和内核之间传输网络相关信息。重要的是,用户空间和内核之间的通信是双向的,或者说 netlink 套接字是双工套接字。

With this new means of communication both to and from the kernel, there is now a great way of developing applications, that by design, need frequent update events directly from the kernel. What started as more effective means to relay and modify network related information has become a generic kernel and userspace communications fabric via NETLINK_GENERIC.

有了这种与内核之间的新通信方式,现在有一种开发应用程序的好方法,根据设计,需要直接从内核进行频繁的更新事件。最初作为中继和修改网络相关信息的更有效手段,现已成为通过 NETLINK_GENERIC 的通用内核和用户空间通信结构。

The downsides

All of the advantages of netlink over syscalls or ioctl sound fantastic, however there is a catch. The simplicity of sending & receiving a message using ioctl is gone, netlink itself is a more complex messaging system — particularly in terms of constructing the messages themselves.

netlink 相对于 syscalls 或 ioctl 的所有优势听起来都很棒,但是有一个问题。使用 ioctl 发送和接收消息的简单性消失了,netlink 本身是一个更复杂的消息系统——特别是在构建消息本身方面。

代码语言:javascript复制
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <time.h>
#include <errno.h>

#include <netinet/in.h>
#include <netinet/tcp.h>

#include <linux/sockios.h>
#include <linux/if.h>
#include <linux/if_link.h>
#include <linux/rtnetlink.h>

#define ALIGNTO    4
#define ALIGN(len)    (((len) ALIGNTO-1) & ~(ALIGNTO-1))
#define ATTR_HDRLEN  ALIGN(sizeof(struct nlattr))
#define SOCKET_BUFFER_SIZE (sysconf(_SC_PAGESIZE) < 8192L ? sysconf(_SC_PAGESIZE) : 8192L)

int main()
{
  int nls = -1;
  struct sockaddr_nl kernel_nladdr;
  struct iovec io;
  struct msghdr msg;
  struct ifinfomsg *ifm;
  unsigned int change, flags, seq;
  char *ifname;
  char buf[SOCKET_BUFFER_SIZE]; /* 8192 by default */

  struct nlmsghdr *nlmsg;
  seq = time(NULL);

  /* The netlink message is destined to the kernel so nl_pid == 0. */
  memset(&kernel_nladdr, 0, sizeof(kernel_nladdr));
  kernel_nladdr.nl_family = AF_NETLINK;
  kernel_nladdr.nl_groups = 0; /* unicast */
  kernel_nladdr.nl_pid = 0;

  nls = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
  if (nls == -1)
  {
    printf("cannot open socket %sn", strerror(errno));
    return -1;
  }

  int br;

  br = bind(nls, (struct sockaddr *) &kernel_nladdr, sizeof (kernel_nladdr));
  if (br == -1)
  {
    printf("cannot bind to socketn");
    return -1;
  }

  int hlen = ALIGN(sizeof(struct nlmsghdr));
  nlmsg = buf;
  memset(buf, 0, hlen);
  nlmsg->nlmsg_len = hlen;

  nlmsg->nlmsg_type = RTM_NEWLINK;
  nlmsg->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
  nlmsg->nlmsg_seq = seq;

  /* extra header */
  char *ptr = (char *)nlmsg   nlmsg->nlmsg_len;
  size_t ehlen = ALIGN(sizeof(*ifm));
  nlmsg->nlmsg_len  = ehlen;
  memset(ptr, 0, ehlen);

  /* put interface down */
  change = 0;
  flags = 0;
  change |= IFF_UP;
  flags &= ~IFF_UP; /* down = !up, obviously */

  ifm = (void *)ptr;
  ifm->ifi_family = AF_UNSPEC;
  ifm->ifi_change = change;
  ifm->ifi_flags = flags;
  
  /* add payload details - nlattr & padding */
  ifname = "eth0";
  struct nlattr *attr = (void *)nlmsg   ALIGN(nlmsg->nlmsg_len);
  uint16_t payload_len = ALIGN(sizeof(struct nlattr))   strlen(ifname);
  int pad;

  attr->nla_type = IFLA_IFNAME;
  attr->nla_len = payload_len;
  memcpy((void *)attr   ATTR_HDRLEN, ifname, strlen(ifname));
  pad = ALIGN(strlen(ifname)) - strlen(ifname);
  if (pad > 0)
    memset((void *)attr   ATTR_HDRLEN   strlen(ifname), 0, pad);

  nlmsg->nlmsg_len  = ALIGN(payload_len);

  /* end of inner netlink nlattr details */

  /* Stick the request in an io vector */
  io.iov_base = (void *)nlmsg;
  io.iov_len = nlmsg->nlmsg_len;
  
  /* Wrap it in a msg */  
  memset(&msg, 0, sizeof(msg));
  msg.msg_iov = &io;
  msg.msg_iovlen = 1;
  msg.msg_name = (void *)&kernel_nladdr;
  msg.msg_namelen = sizeof(kernel_nladdr);

  /* Send it */
  int res = sendmsg(nls, &msg, 0);
  printf("result of send: %d", res);
  
  return 0;
}
  • https://medium.com/thg-tech-blog/on-linux-netlink-d7af1987f89d
  • https://www.infradead.org/~tgr/libnl/doc/core.html
  • https://www.netfilter.org/projects/libmnl/index.html

0 人点赞