我们需要解决什么问题?
1. 基于日志链路定位问题源头
当我们从上层平台发出一个请求后,由于用户不知道链路之间数据的传递关系,但是又想要快速定位问题出在什么地方,是云管平台,还是openstack,亦或者是操作系统层面,一个结构化的日志数据能够帮助我们快速定位问题。
2. 云管平台资源与Openstack资源的映射关系
用户一般通过云管平台或者API来向Openstack-API来申请资源调度,当云管平台向openstack发送一个HTTP请求后,Openstack会在回复的响应里添加一个request-id,使用这个request-id我们可以在日志中追踪到openstack的服务调度过程。
3. 基于Timeline的日志比较判断问题源头
Openstack是一个复杂系统,一个api的调用失败可能是多个方面的问题,比如当我们在云管侧开通虚拟机失败时,日志可能出现在各个地方(nova-api,nova-compute, cinder-volume or neutron-vswitch等),但是我们需要知道的是哪个模块主要导致了该请求的失败。这是我们就需要使用timeline来横向比较同一时间段的各个模块的错误日志,从而判断出问题的源头。
架构方案
filebeat->kafka->logstash->elasticsearch->kibana ->LogChainAnalysis
- filebeat:以容器的形式运行在控制计算节点采集数据。
- kafka: 随着环境规模的不断扩增,日志量不断增长,接入到日志服务的产品线不断增多,遇到流量高峰,写入到es的性能就会降低,cpu打满,随时都有集群宕机的风险。因此,接入消息队列,可以削峰填谷。
- Logstash:将无序的数据切割为结构化的数据。
- Elasticsearch:存储日志数据,并且提供索引提供给Kibana和LogChainAnalysis进行分析。
- Kibana:日志查看Dashboard。
- LogChainAnalysis:结构化数据的显示。
云管平台日志收集方案
云管日志是我们首先进行收集和处理的日志,我讲从这开始一步步结构整个链路。
云管平台一般会拥有大量的服务,而这些服务也会产生大量的日志,比如resource,identity,gateway等服务的日志。但是在这些日志中,只有资源服务会去调用openstack的接口去操作虚拟资源,所以我们需要对resource的日志进行二次处理。当云管平台向openstack发出一个request,openstack在认证通过后,会返回一个x-openstack-request-id和x-compute-request-id的响应报文。我们可以使用这个request-id去底层的各个日志中查询具体的日志信息。
而且在这个处理过的结构化日志中,我们同时可以查到云管平台所对应的资源ID(这里我标识为UUID),UUID和request-id就产生了一个对应关系,这也是我们LogChainAnalysis中重要的组成部分。
Openstack日志收集方案
openstack有大量日志,从各个日志间找到问题的源头是一件很麻烦的事。每个OpenStack服务发送一个带有HTTP响应的请求ID报头。这个值对于跟踪日志中的问题很有用。然而,当操作跨越服务边界时,跟踪就会变得困难,因为服务会为每个入站请求生成一个新的ID;nova的request-id不能帮助用户找到nova在完成对nova的请求时调用的其他服务的调试信息。当同时有许多请求时,这就变得尤其成问题。
request-id在请求处理开始时生成,这是用户在请求返回时将看到的ID。nova呼叫其他OpenStack服务(如glance)将以请求ID作为报头发送响应。通过记录两个请求ID (nova->glance)的映射,用户将能够轻松地在nova-compute日志中查找glance返回的请求ID。有了glance请求ID之后,用户就可以检查glance日志中与nova请求对应的调试信息。Openstack形成日志消息需要两个请求id: 一个由nova生成,另一个包含在来自另一个服务的响应中。nova生成的请求ID位于传递给python客户机包装器的上下文中。这个就是我后期开发LogChainAnalysis的思路。
部署环境
部署Elasticsearch
1. 环境准备
准备三台Linux系统,本教程使用的是CentOS7如下IP地址:
代码语言:txt复制10.192.31.160
10.192.31.161
10.192.31.162
2. 下载Elasticsearch
代码语言:txt复制curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.5.1-linux-x86_64.tar.gz
3. 系统设置
Elasticsearch不能在 root
用户下启动,我们需要在三台机器上分别创建一个普通用户:
# 创建elastic用户
useradd elastic
# 设置用户密码
passwd elastic
# 切换到elastic用户
su elastic
分别在三台机器上的 /home/elastic/ 目录下创建elasticsearch文件夹,然后在elasticsearch文件夹下分别创建data、logs文件夹:
代码语言:txt复制cd /home/elastic/
mkdir -p elasticsearch/data
mkdir -p elasticsearch/logs
在生产环境下我们要把Elasticsearch生成的索引文件数据存放到自定义的目录下
data:存储Elasticsearch索引文件数据
logs:存储日志文件
4. 配置Elasticsearch
首先我们将下载好的 elasticsearch-7.5.1-linux-x86_64.tar.gz
压缩包上传到 192.168.28.129
这台机器上的 /home/elastic/elasticsearch/
目录下,随便那一台机器都可以没有顺序区分。
解压 elasticsearch-7.5.1-linux-x86_64.tar.gz
tar -xvf elasticsearch-7.5.1-linux-x86_64.tar.gz
5. 修改elasticsearch.yml
输入如下命令修改 elasticsearch.yml
配置文件:
vi elasticsearch-7.5.1/config/elasticsearch.yml
修改后的配置文件如下(以下是master节点的配置,其他2个节点主要配置对应的network.host参数,其他参数不用变):
主要修改如下几处配置:
- http.port:当前启动Elasticsearch的端口号,一般默认 `9200` 即可,当然你也可以修改
- network.host:Elasticsearch绑定的IP,外界可以通过这个IP访问到当前Elasticsearch节点,一般配配置当前系统的IP,或者 `0.0.0.0` (任何地址都能访问到)。
- discovery.seed_hosts:配置所有Elasticsearch节点绑定的IP地址。
- cluster.initial_master_nodes:配置那些节点可以有资格被选为主节点。
- xpack.monitoring.collection.enabled:收集监控数据默认为false不收集监控数据
6. 启动Elasticsearch
Elasticsearch可以从后台启动:./bin/elasticsearch -d 启动ES
分别在三台机器上启动Elasticsearch,启动过程中建议单个机器启动成功后在启动另一台。
7. 检查集群
上面我们已经搭建好了三个节点的集群,并且已经启动了。
接下来我们来检查一下集群是否已经形成,给三台服务器中的任意一台发送http请求:[http://10.192.31.160:9200/_cat/health?v](http://10.192.31.160:9200/_cat/health?v)
部署Kibana
容器化部署
代码语言:txt复制docker pull kibana:7.10.1
docker run --name kibana -e ELASTICSEARCH_URL=http://10.192.31.160:9200 -p 5601:5601 -d kibana:7.10.1
部署kafka
1. 环境准备
java环境安装:Centos7默认yum安装java8:`yum install java -y`
kafka包下载: 下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/
2. kafka配置启动
先启动zookeeper,再启动kafka:
./kafka-server-start.sh ../config/server.properties
配置system服务启动:
vim /etc/systemd/system/zookeeper.service
3. 配置system服务启动
vim /etc/systemd/system/zookeeper.service
代码语言:txt复制[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
ExecStart=/root/kafka/bin/zookeeper-server-start.sh /root/kafka/config/zookeeper.properties
ExecStop=/root/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
User=root
Group=root
[Install]
WantedBy=multi-user.target
vim /etc/systemd/system/kafka.service
代码语言:txt复制[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
[Service]
Type=simple
ExecStart=/root/kafka/bin/kafka-server-start.sh /root/kafka/config/server.properties
ExecStop=/root/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
4. 启动服务
代码语言:txt复制systemctl start zookeeper #启动zookeepe
systemctl enable zookeeper #开机自启动
systemctl start kafka #启动kafka
systemctl enable kafka #开机自启动
5. 验证kafka启动成功
查看topic是否产生数据
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic nova-api-log
部署Logstash
1. 使用yum进行安装
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
Add the following in your /etc/yum.repos.d/
directory in a file with a .repo
suffix, for example logstash.repo
[logstash-7.x]
name=Elastic repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
安装:sudo yum install logstash
2. 修改配置文件
代码语言:txt复制cd /usr/share/logstash/
vim /etc/logstash/conf.d/logstash.conf
openstack-logstash.conf
代码语言:yaml复制input {
kafka {
bootstrap_servers => "10.192.31.163:9092"
topics => ["nova-compute-log", "nova-api-log","nova-scheduler-log", "nova-conductor-log", "cinder-volume-log", "cinder-api-log","cinder-scheduler-log" , "keystone-log", "neutron-server-log", "openvswitch-agent-log", "glance-api-log", "glance-registry-log"]
group_id => "LogChainAnalysis"
decorate_events => true
auto_offset_reset => "latest"
consumer_threads => 5
codec => "json"
}
}
filter{
if [@metadata][kafka][topic] == "nova-compute-log" {
grok {
match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
}
mutate {
add_field => {"[@metadata][index]" => "nova-compute-log-%{ YYYY.MM.dd}"}
add_field => {"vip" => "%{[fields][vip]}"}
}
}
if [@metadata][kafka][topic] == "nova-api-log" {
grok {
match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
}
mutate {
add_field => {"[@metadata][index]" => "nova-api-log-%{ YYYY.MM.dd}"}
add_field => {"vip" => "%{[fields][vip]}"}
}
}
if [@metadata][kafka][topic] == "nova-scheduler-log" {
grok {
match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
}
mutate {
add_field => {"[@metadata][index]" => "nova-scheduler-log-%{ YYYY.MM.dd}"}
add_field => {"vip" => "%{[fields][vip]}"}
}
}
if [@metadata][kafka][topic] == "nova-conductor-log" {
grok {
match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
}
mutate {
add_field => {"[@metadata][index]" => "nova-conductor-log-%{ YYYY.MM.dd}"}
add_field => {"vip" => "%{[fields][vip]}"}
}
}
if [@metadata][kafka][topic] == "cinder-volume-log" {
grok {
match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
}
mutate {
add_field => {"[@metadata][index]" => "cinder-volume-log-%{ YYYY.MM.dd}"}
add_field => {"vip" => "%{[fields][vip]}"}
}
}
if [@metadata][kafka][topic] == "cinder-api-log" {
grok {
match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
}
mutate {
add_field => {"[@metadata][index]" => "cinder-api-log-%{ YYYY.MM.dd}"}
add_field => {"vip" => "%{[fields][vip]}"}
}
}
if [@metadata][kafka][topic] == "cinder-scheduler-log" {
grok {
match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
}
mutate {
add_field => {"[@metadata][index]" => "cinder-scheduler-log-%{ YYYY.MM.dd}"}
add_field => {"vip" => "%{[fields][vip]}"}
}
}
if [@metadata][kafka][topic] == "keystone-log" {
grok {
match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
}
mutate {
add_field => {"[@metadata][index]" => "keystone-log-%{ YYYY.MM.dd}"}
add_field => {"vip" => "%{[fields][vip]}"}
}
}
if [@metadata][kafka][topic] == "neutron-server-log" {
grok {
match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
}
mutate {
add_field => {"[@metadata][index]" => "neutron-server-log-%{ YYYY.MM.dd}"}
add_field => {"vip" => "%{[fields][vip]}"}
}
}
if [@metadata][kafka][topic] == "openvswitch-agent-log" {
grok {
match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
}
mutate {
add_field => {"[@metadata][index]" => "openvswitch-agent-log-%{ YYYY.MM.dd}"}
add_field => {"vip" => "%{[fields][vip]}"}
}
}
if [@metadata][kafka][topic] == "glance-api-log" {
grok {
match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
}
mutate {
add_field => {"[@metadata][index]" => "glance-api-log-%{ YYYY.MM.dd}"}
add_field => {"vip" => "%{[fields][vip]}"}
}
}
if [@metadata][kafka][topic] == "glance-registry-log" {
grok {
match => { "message" => "(?m)^(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}%{SPACE}%{TIME})%{SPACE}%{NUMBER:pid}?%{SPACE}?%{LOGLEVEL:level} [?b%{NOTSPACE:module}b]?%{SPACE}[?b(?<request_id>req-%{UUID:uuid})%{SPACE}(?<user_id>[a-z0-9]{32}|-)%{SPACE}(?<project_id>[a-z0-9]{32}|-)%{SPACE}-%{SPACE}-%{SPACE}-]?%{SPACE}?%{GREEDYDATA:logmessage}?"}
}
mutate {
add_field => {"[@metadata][index]" => "glance-registry-log-%{ YYYY.MM.dd}"}
add_field => {"vip" => "%{[fields][vip]}"}
}
}
if ![request_id] { drop {} }
mutate {
remove_field => ["kafka"]
remove_field => ["message"]
}
}
output {
stdout { }
elasticsearch {
hosts => ["http://10.192.31.160:9200", "http://10.192.31.161:9200", "http://10.192.31.162:9200"]
index => "%{[@metadata][index]}"
timeout => 300
}
}
resource-logstash.conf
代码语言:yaml复制input {
kafka {
bootstrap_servers => "127.0.0.1:9092"
topics => ["resource-log"]
group_id => "LogChainAnalysis"
decorate_events => true
consumer_threads => 5
auto_offset_reset => "latest"
enable_auto_commit => true
codec => "json"
}
}
filter{
if [@metadata][kafka][topic] == "resource-log" {
if [message] =~ "tat" {
grok {
match => ["message", "^(tat)"]
add_tag => ["stacktrace"]
}
}
grok {
match => [ "message",
"(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}T%{TIME})%{SPACE}*%{LOGLEVEL:level}%{SPACE}*%{NOTSPACE:module}[%{UUID:uuid}]- (?<logmessage>.*)"
]
}
mutate {
add_field => {"vip" => "%{[fields][vip]}"}
add_field => {"[@metadata][index]" => "resource-log-%{ YYYY.MM.dd}"}
}
}
mutate {
remove_field => ["kafka"]
remove_field => ["message"]
}
}
output {
stdout { }
elasticsearch {
hosts => ["http://10.192.31.160:9200", "http://10.192.31.161:9200", "http://10.192.31.162:9200"]
index => "%{[@metadata][index]}"
timeout => 300
}
}
3. 启动logstash
代码语言:txt复制systemctl enable logstash
systemctl start logstash
部署filebeat
- 拉取filebeat镜像
docker pull elastic/filebeat:7.10.2
https://www.elastic.co/guide/en/beats/filebeat/current/running-on-docker.html
2. filebeat配置文件
创建一个配置文件目录mkdir -p /data/filebeat
修改filebeat配置文件,filebeat配置文件/data/filebeat/filebeat.docker.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /log/haihe/resource/resource.log
multiline:
pattern: '^['
negate: true
match: after
fields:
log_topic: resource-log
vip: 172.118.32.30
- type: log
enabled: true
paths:
- /log/nova/nova-api.log
exclude_files: ['.gz$']
multiline:
pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
negate: true
match: after
fields:
log_topic: nova-api-log
vip: 172.118.32.30
- type: log
enabled: true
paths:
- /log/nova/nova-compute.log
exclude_files: ['.gz$']
multiline:
pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
negate: true
match: after
fields:
log_topic: nova-compute-log
vip: 172.118.32.30
- type: log
enabled: true
paths:
- /log/nova/nova-scheduler.log
exclude_files: ['.gz$']
multiline:
pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
negate: true
match: after
fields:
log_topic: nova-scheduler-log
vip: 172.118.32.30
- type: log
enabled: true
paths:
- /log/nova/nova-conductor.log
exclude_files: ['.gz$']
multiline:
pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
negate: true
match: after
fields:
log_topic: nova-conductor-log
vip: 172.118.32.30
- type: log
enabled: true
paths:
- /log/cinder/api.log
exclude_files: ['.gz$']
multiline:
pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
negate: true
match: after
fields:
log_topic: cinder-api-log
vip: 172.118.32.30
- type: log
enabled: true
paths:
- /log/cinder/volume.log
exclude_files: ['.gz$']
multiline:
pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
negate: true
match: after
fields:
log_topic: cinder-volume-log
vip: 172.118.32.30
- type: log
enabled: true
paths:
- /log/cinder/scheduler.log
exclude_files: ['.gz$']
multiline:
pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
negate: true
match: after
fields:
log_topic: cinder-scheduler-log
vip: 172.118.32.30
- type: log
enabled: true
paths:
- /log/neutron/server.log
multiline:
pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
negate: true
match: after
fields:
log_topic: neutron-server-log
vip: 172.118.32.30
- type: log
enabled: true
paths:
- /log/neutron/openvswitch-agent.log
multiline:
pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
negate: true
match: after
fields:
log_topic: openvswitch-agent-log
vip: 172.118.32.30
- type: log
enabled: true
paths:
- /log/glance/api.log
multiline:
pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
negate: true
match: after
fields:
log_topic: glance-api-log
vip: 172.118.32.30
- type: log
enabled: true
paths:
- /log/glance/registry.log
multiline:
pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
negate: true
match: after
fields:
log_topic: glance-registry-log
vip: 172.118.32.30
- type: log
enabled: true
paths:
- /log/keystone/keystone.log
multiline:
pattern: '^d{4}-d{2}-d{2} d{2}:d{2}:d{2}.d{3}'
negate: true
match: after
fields:
log_topic: keystone-log
vip: 172.118.32.30
output.kafka:
enabled: true
hosts: ['10.192.31.163:9092']
topic: '%{[fields.log_topic]}'
codec.json:
pretty: false
partition.round_robin:
reachable_only: false
required_acks: 1
compression: gzip
参数讲解:
代码语言:txt复制1. multiline:多行日志合并
2. log_topic:传输给kafka的topic
3. vip:集群VIP,这个我们后面会用到
3. docker启动镜像
代码语言:txt复制docker run -d
--restart=always
--log-driver json-file
--name=filebeat
--user=root
--volume="/data/filebeat/filebeat.docker.yml:/usr/share/filebeat/filebeat.yml:ro"
--volume="/var/log/:/log/"
--volume="/var/lib/docker/containers:/var/lib/docker/containers:ro"
--volume="/var/run/docker.sock:/var/run/docker.sock:ro"
elastic/filebeat:7.10.2 filebeat -e -strict.perms=false
部署LogChainAnalysis
从git上下载项目
代码语言:javascript复制git clone https://github.com/zelat/LogChainAnalysis
cd LogChainAnalysis
nohup python3 manage.py >/dev/null 2>&1 &
Logstash切割Springboot日志
我们首先查看springboot的logback的日志,下面为logback pattern日志打印格式:
?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}T%{TIME})%{SPACE}*%{LOGLEVEL:level}%{SPACE}*%{NOTSPACE:module}[%{UUID:uuid}]- (?<logmessage>.*
%-5level 表示,将输出从程序启动到创建日志记录的时间进行左对齐且最小宽度为5
在logstash中我们使用连续空格来判断:%{SPACE}* `匹配不确定数量的空格.
我们需要将日志切割为以下格式:
Name | Description |
---|---|
vip | 集群VIP地址 |
@timestamp | 时间戳 |
level | 日志级别一般分为INFO、DEBUG、ERROR等 |
module | |
uuid | 上层资源ID |
logmessage | 具体的消息内容 |
Logstash切割Openstack日志
分割openstack日志,获取vip,@timestamp,level,request_id,project_id,user_id,module,logmessage的数据
NAME | DESCRIPTION |
---|---|
vip | 集群VIP地址 |
@timestamp | 时间戳 |
level | 日志级别一般分为INFO、DEBUG、ERROR等 |
request_id | openstack request id |
project_id | openstack项目ID |
user_id | openstack用户ID |
module | 模块名 |
logmessage | 具体日志信息 |
数据汇入LogChainAnalysis
1. LogChainAnalysis系统
vip: 输入集群vip地址
UUID:云管平台资源ID
2. 得到日志链路
这里介绍下这个json文件是什么意思,云管侧UUID对应的底层request-id为req-d9e461b1-860e-4b50-9d5a-55b66371032a,它同时存在于nova-api,nova-compute,nova-conductor,nova-scheduler的组件日志中,同时nova-compute还调用了一些其他组件的服务比如cider-api,neutron-server,cinder-volume,这些组件会将它的request-id回复给nova-compute。
3. 查看具体日志
点击具体的日志名,我们可以得到具体的日志信息。
Demo
遇到的问题
1. filebeat一直提示 [publisher] pipeline/retry.go:219 retryer: send unwait signal to consumer 原因:可能是无法连接到kafka,需要修改kafka的server.properties,ip为kafka所在的机器内网ip advertised.listeners=PLAINTEXT://192.168.1.142:9092
2. 日志量太大的问题
在logstash配置文件中增加以下字段,过滤掉不带request_id的日志
if ![request_id] { drop {} }
3. 将filebeat所在host.ip一同回传 在filebeat配置文件中间增加以下字段:
代码语言:javascript复制processors:
- add_host_metadata: ~
- drop_fields:
fields: ["host.architecture", "host.containerized", "host.id", "host.os.name", "host.os.family", "host.os.version", "host.os.kernel"]
4. elasticsearch 集群无法启动出现如下提示 failed to send join request to master https://blog.csdn.net/diyiday/article/details/83926488
5. 解决es集群启动完成后报master_not_discovered_exception https://yanglinwei.blog.csdn.net/article/details/105274464
参考文档
- Elasticsearch集群搭建(基于Elasticsearch7.5.1)
- centos7快速部署单机kafka
- Kafka常用topic操作命令汇总
- Configure the Kafka output
- filebeat采集数据的几个痛点的解决方案
- Logstash-filebeat6.5多行日志合并
- Logstash消费kafka同步数据到Elasticsearch
- Service map
- Kv filter plugin
- ELK - logstash 多个配置文件及模板的使用
- Manage Spring Boot Logs with Elasticsearch, Logstash and Kibana
- Monitoring Java applications with Elastic: Multiservice traces and correlated logs