有时候客户往往会先于我们发现产品的异常,经常是他们跟我们反馈,为了改变这种状况,我们需要监控服务发生的各种异常。而正好SBA2 提供了HTTP请求相关的异常统计指标。那么我们经过适当的设计就可以达到每次有异常发生的时候就发送通知,实现原理就是:查询的最新异常累计值大于原有值时,表示新发生异常
看下提醒效果:
同样的马上代码:
NotifierAutoConfiguration.exceptionAlarm
代码语言:javascript复制@Bean(initMethod = "start", destroyMethod = "stop")
@ConditionalOnProperty(prefix = "spring.boot.admin.notify.exception", name = "enabled", havingValue = "true")
@ConfigurationProperties("spring.boot.admin.notify.exception")
@Lazy(false)
public ExceptionAlarm exceptionAlarm(InstanceRepository repository, AlarmMessage alarmMessage) {
return new ExceptionAlarm(repository, alarmMessage);
}
ExceptionAlarm
代码语言:javascript复制@Slf4j
public class ExceptionAlarm {
private final RestTemplate restTemplate = new RestTemplate();
private Scheduler scheduler;
private Disposable subscription;
private InstanceRepository repository;
private AlarmMessage alarmMessage;
/**
* 开关
*/
private boolean enabled = true;
/**
* 检测频率,秒
*/
private long interval = 10;
/**
* 排除异常
*/
private String exclude = "None,BizException";
/**
* 排除实例
*/
private String excludeInstances = "";
/**
* 提醒模版
*/
private final String ALARM_TPL = "服务实例【%s】,发生异常【%s】";
/**
* 最后一次检测时的异常次数
*/
private final Map<String, Map<String, Integer>> instanceCount = new HashMap<>();
public ExceptionAlarm(InstanceRepository repository, AlarmMessage alarmMessage) {
this.repository = repository;
this.alarmMessage = alarmMessage;
}
private void checkFn(Long aLong) {
if (!enabled) {
return;
}
log.debug("check exception for all instances");
//检查所有实例
repository.findAll().filter(instance -> !excludeInstances.contains(instance.getRegistration().getName())).map(instance -> {
String instanceName = instance.getRegistration().getName();
List<String> exceptionList = getExceptionTag(instance);
for (String exception : exceptionList) {
Integer value = getValue(instance, exception);
Integer lastValue = instanceCount.getOrDefault(instanceName, new HashMap<>()).get(exception);
//查询的异常统计值大于原有值时,表示新发生异常
if (lastValue != null && value > lastValue) {
String content = String.format(ALARM_TPL, instanceName, exception);
alarmMessage.sendData(content);
instanceCount.get(instanceName).put(exception, value);
} else {
Map<String, Integer> map = instanceCount.getOrDefault(instanceName, new HashMap<>());
map.put(exception, value);
instanceCount.put(instanceName, map);
}
}
return Mono.just(0d);
}).subscribe();
}
private Integer getValue(Instance instance, String tags) {
String reqUrl = instance.getRegistration().getManagementUrl() "/metrics/http.server.requests?tag=exception:" tags;
ResponseEntity<String> responseEntity = restTemplate.getForEntity(reqUrl, String.class);
String body = responseEntity.getBody();
JSONObject bodyObject = JSON.parseObject(body);
JSONArray measurementsArray = bodyObject.getJSONArray("measurements");
if (measurementsArray != null && !measurementsArray.isEmpty()) {
return measurementsArray.getJSONObject(0).getInteger("value");
}
return 0;
}
private List<String> getExceptionTag(Instance instance) {
try {
String reqUrl = instance.getRegistration().getManagementUrl() "/metrics/http.server.requests";
log.debug("check jvm {},uri {}", instance.getRegistration().getName(), reqUrl);
ResponseEntity<String> responseEntity = restTemplate.getForEntity(reqUrl, String.class);
String body = responseEntity.getBody();
JSONObject bodyObject = JSON.parseObject(body);
JSONArray tagsArray = bodyObject.getJSONArray("availableTags");
if (tagsArray != null && !tagsArray.isEmpty()) {
for (Object tag : tagsArray) {
JSONObject tagObject = (JSONObject) tag;
if ("exception".equals(tagObject.getString("tag"))) {
List<String> valuesList = tagObject.getJSONArray("values").toJavaList(String.class);
return valuesList.stream().filter(s -> !exclude.contains(s)).collect(Collectors.toList());
}
}
}
} catch (Exception ex) {
log.error(ex.getMessage());
}
return Collections.emptyList();
}
public long getInterval() {
return interval;
}
public void setInterval(long interval) {
this.interval = interval;
}
public String getExclude() {
return exclude;
}
public void setExclude(String exclude) {
this.exclude = exclude;
}
public boolean isEnabled() {
return enabled;
}
public void setEnabled(boolean enabled) {
this.enabled = enabled;
}
public String getExcludeInstances() {
return excludeInstances;
}
public void setExcludeInstances(String excludeInstances) {
this.excludeInstances = excludeInstances;
}
private void start() {
this.scheduler = Schedulers.newSingle("exception-check");
this.subscription = Flux.interval(Duration.ofSeconds(this.interval)).subscribeOn(this.scheduler).subscribe(this::checkFn);
initInstanceCount();
}
private void initInstanceCount() {
repository.findAll().map(instance -> {
String instanceName = instance.getRegistration().getName();
List<String> exceptionList = getExceptionTag(instance);
for (String exception : exceptionList) {
Integer value = getValue(instance, exception);
Map<String, Integer> map = instanceCount.getOrDefault(instanceName, new HashMap<>());
map.put(exception, value);
instanceCount.put(instanceName, map);
}
return Mono.just(0d);
}).subscribe();
}
private void stop() {
if (this.subscription != null) {
this.subscription.dispose();
this.subscription = null;
}
if (this.scheduler != null) {
this.scheduler.dispose();
this.scheduler = null;
}
}
}
其中
代码语言:javascript复制/**
* 排除异常
*/
private String exclude = "None,BizException";
该处代码用来排除不告警的异常,因为我们的很多业务异常也会被SBA2记录,此类异常不太具有参考价值,排除。
代码语言:javascript复制private void start() {
this.scheduler = Schedulers.newSingle("exception-check");
this.subscription = Flux.interval(Duration.ofSeconds(this.interval)).subscribeOn(this.scheduler).subscribe(this::checkFn);
initInstanceCount();
}
private void initInstanceCount() {
repository.findAll().map(instance -> {
String instanceName = instance.getRegistration().getName();
List<String> exceptionList = getExceptionTag(instance);
for (String exception : exceptionList) {
Integer value = getValue(instance, exception);
Map<String, Integer> map = instanceCount.getOrDefault(instanceName, new HashMap<>());
map.put(exception, value);
instanceCount.put(instanceName, map);
}
return Mono.just(0d);
}).subscribe();
}
该处代码用于在SBA2启动的时候重新价值对应服务的异常到内存中,由于SBA2默认都是将异常保存在内存中,每次重启就会丢失。 其他同JVM监控,不再叙述。