diff --git a/doc/images/apollo-client-monitor-jmx.jpg b/doc/images/apollo-client-monitor-jmx.jpg new file mode 100644 index 00000000000..3b0a2f4d75a Binary files /dev/null and b/doc/images/apollo-client-monitor-jmx.jpg differ diff --git a/doc/images/apollo-client-monitor-prometheus.png b/doc/images/apollo-client-monitor-prometheus.png new file mode 100644 index 00000000000..f709bb57d23 Binary files /dev/null and b/doc/images/apollo-client-monitor-prometheus.png differ diff --git a/docs/en/client/java-sdk-user-guide.md b/docs/en/client/java-sdk-user-guide.md index 2f2d887ef77..b4e0366ea10 100644 --- a/docs/en/client/java-sdk-user-guide.md +++ b/docs/en/client/java-sdk-user-guide.md @@ -1281,4 +1281,4 @@ The interface is `com.ctrip.framework.apollo.spi.ConfigServiceLoadBalancerClient The Input is multiple ConfigServices returned by meta server, and the output is a ConfigService selected. -The default service provider is `com.ctrip.framework.apollo.spi.RandomConfigServiceLoadBalancerClient`, which chooses one ConfigService from multiple ConfigServices using random strategy . +The default service provider is `com.ctrip.framework.apollo.spi.RandomConfigServiceLoadBalancerClient`, which chooses one ConfigService from multiple ConfigServices using random strategy . \ No newline at end of file diff --git a/docs/zh/client/java-sdk-user-guide.md b/docs/zh/client/java-sdk-user-guide.md index 4250c2bdb9b..c9b4b27a6a7 100644 --- a/docs/zh/client/java-sdk-user-guide.md +++ b/docs/zh/client/java-sdk-user-guide.md @@ -398,6 +398,48 @@ apollo.label=YOUR-APOLLO-LABEL 3. 通过`app.properties`配置文件 * 可以在`classpath:/META-INF/app.properties`指定`apollo.override-system-properties=true` + +#### 1.2.4.9 开启客户端监控 + +> 适用于2.4.0及以上版本 +在2.4.0版本开始,客户端的可观测性得到了加强,用户可以通过ConfigService获取到ConfigMonitor直接获得客户端状态信息以及将状态信息以指标形式上报给监控系统,以下是一些相关配置 + +1.是否启用Monitor机制 +```properties +#是否启动Monitor机制, 即ConfigMonitor是否启用,默认false +apollo.client.monitor.enabled = true +``` + +2.是否将Monitor数据以Jmx形式暴露 +```properties +#是否将Monitor数据以Jmx形式暴露,开启后可以通过J-console等工具查看相关信息,默认为false +apollo.client.monitor.jmx.enabled = true +``` + +3.设置Monitor存储Exception的最大数量 + +```properties +#Monitor存储异常信息的最大数量,默认为25,符合先进先出原则 +apollo.client.monitor.exception-queue-size= 30 +``` + +4.指定导出指标数据使用的对应监控系统的Exporter类型 + +```properties +#指定导出指标数据使用的对应监控系统的Exporter类型,如引入apollo-plugin-client-prometheus则可填写prometheus进行启用, +# 可填配置取决于用户引入的MetricsExporter的SPI +apollo.client.monitor.external.type= prometheus +``` + +5.指定Monitor导出状态信息转为指标数据的频率 + +```properties +#指定Exporter从Monitor中导出状态信息转为指标数据的频率,默认为10秒导出一次, +apollo.client.monitor.external.export-period= 20 +``` + + + # 二、Maven Dependency Apollo的客户端jar包已经上传到中央仓库,应用在实际使用时只需要按照如下方式引入即可。 ```xml @@ -491,6 +533,86 @@ ConfigFile configFile = ConfigService.getConfigFile("test", ConfigFileFormat.XML String content = configFile.getContent(); ``` +### 3.1.5 获取客户端监控指标 + +apollo-client在2.4.0版本里大幅增强了可观测性,提供了ConfigMonitor-API以及JMX,Prometheus的指标导出方式,相关启用配置详见 [1.2.4.9 开启客户端监控](#_1249-开启客户端监控) + + +#### 3.1.5.1 通过ConfigMonitor获取监控数据 + +```java + ConfigMonitor configMonitor = ConfigService.getConfigMonitor(); + //错误相关监控API + ApolloClientExceptionMonitorApi exceptionMonitorApi = configMonitor.getExceptionMonitorApi(); + List apolloConfigExceptionList = exceptionMonitorApi.getApolloConfigExceptionList(); + //命名空间相关监控API + ApolloClientNamespaceMonitorApi namespaceMonitorApi = configMonitor.getNamespaceMonitorApi(); + List namespace404 = namespaceMonitorApi.getNotFoundNamespaces(); + //启动参数相关监控API + ApolloClientBootstrapArgsMonitorApi runningParamsMonitorApi = configMonitor.getBootstrapArgsMonitorApi(); + String bootstrapNamespaces = runningParamsMonitorApi.getBootstrapNamespaces(); + //线程池相关监控API + ApolloClientThreadPoolMonitorApi threadPoolMonitorApi = configMonitor.getThreadPoolMonitorApi(); + ApolloThreadPoolInfo remoteConfigRepositoryThreadPoolInfo = threadPoolMonitorApi.getRemoteConfigRepositoryThreadPoolInfo(); +``` + +#### 3.1.5.2 以JMX形式暴露状态信息 + +启用相关配置 + +```properties +apollo.client.monitor.enabled = true +apollo.client.monitor.jmx.enabled = true +``` + +启动应用后,开启J-console或类似工具即可查看,这里用J-console做例子 + +![showing Apollo client monitoring metrics in JMX](https://cdn.jsdelivr.net/gh/apolloconfig/apollo@master/doc/images/apollo-client-monitor-jmx.jpg) + +#### 3.1.5.3 客户端导出指标暑假到外部监控系统 + +用户可以根据需求自定义接入Prometheus等监控系统,客户端提供了SPI,详见 [7.2 MetricsExporter扩展](#_7.2_MetricsExporter扩展) + +*相关指标数据表格* + +**Namespace Metrics** + +指标对应API : ApolloClientNamespaceMonitorApi + +| 指标名称 | 标签 | 对应Monitor-API | +| --------------------------------------------------- | --------- | -------------------------------------------- | +| apollo_client_namespace_usage_total | namespace | namespaceMetrics.getUsageCount() | +| apollo_client_namespace_item_num | namespace | namespaceMetrics.getFirstLoadTimeSpendInMs() | +| apollo_client_namespace_not_found | | namespaceMonitorApi.getNotFoundNamespaces() | +| apollo_client_namespace_timeout | | namespaceMonitorApi.getTimeoutNamespaces() | +| apollo_client_namespace_first_load_time_spend_in_ms | namespace | namespaceMetrics.getLatestUpdateTime | + +**Thread Pool Metrics** + +指标对应API:ApolloClientThreadPoolMonitorApi + +| 指标名称 | 标签 | 对应Monitor-API | +| -------------------------------------------------- | ---------------- | ------------------------------------------ | +| apollo_client_thread_pool_pool_size | thread_pool_name | threadPoolInfo.getPoolSize() | +| apollo_client_thread_pool_maximum_pool_size | thread_pool_name | hreadPoolInfo.getMaximumPoolSize() | +| apollo_client_thread_pool_largest_pool_size | thread_pool_name | threadPoolInfo.getLargestPoolSize() | +| apollo_client_thread_pool_completed_task_count | thread_pool_name | threadPoolInfo.getCompletedTaskCount() | +| apollo_client_thread_pool_queue_remaining_capacity | thread_pool_name | threadPoolInfo.getQueueRemainingCapacity() | +| apollo_client_thread_pool_total_task_count | thread_pool_name | threadPoolInfo.getTotalTaskCount() | +| apollo_client_thread_pool_active_task_count | thread_pool_name | threadPoolInfo.getActiveTaskCount() | +| apollo_client_thread_pool_core_pool_size | thread_pool_name | threadPoolInfo.getCorePoolSize() | +| apollo_client_thread_pool_queue_size | thread_pool_name | threadPoolInfo.getQueueSize() | + +**Exception Metrics** + +指标对应API:ApolloClientExceptionMonitorApi + +| 指标名称 | 标签 | +| --------------------------------- | -------------------------------------------------- | +| apollo_client_exception_num_total | exceptionMonitorApi.getExceptionCountFromStartup() | + + + ## 3.2 Spring整合方式 ### 3.2.1 配置 @@ -1220,3 +1342,274 @@ interface是`com.ctrip.framework.apollo.spi.ConfigServiceLoadBalancerClient`。 输入是meta server返回的多个ConfigService,输出是1个ConfigService。 默认服务提供是`com.ctrip.framework.apollo.spi.RandomConfigServiceLoadBalancerClient`,使用random策略,也就是随机从多个ConfigService中选择1个ConfigService。 + + + +## 7.2 指标输出到Prometheus + +在2.4.0版本及以上的java客户端中,增加了指标收集,导出的支持,默认支持Prometheus,用户可以自行扩展接入不同的监控系统。 + +### 客户端对接Prometheus +引入提供的官方依赖包 +```xml + + com.ctrip.framework.apollo + apollo-plugin-client-prometheus + 2.4.0 + +``` +调整配置 +```properties +apollo.client.monitor.external.type= prometheus +``` + +这样就可以通过ConfigMonitor拿到ExporterData(格式取决于你配置的监控系统),然后暴露端点给Prometheus即可 + +示例代码 + +```java +@RestController +@ResponseBody +public class TestController { + + @GetMapping("/metrics") + public String metrics() { + ConfigMonitor configMonitor = ConfigService.getConfigMonitor(); + return configMonitor.getExporterData(); + } +} +``` + +启动应用后让Prometheus监听该接口,打印请求日志即可发现如下类似格式信息 + +``` +# TYPE apollo_client_thread_pool_active_task_count gauge +# HELP apollo_client_thread_pool_active_task_count apollo gauge metrics +apollo_client_thread_pool_active_task_count{thread_pool_name="RemoteConfigRepository"} 0.0 +apollo_client_thread_pool_active_task_count{thread_pool_name="AbstractApolloClientMetricsExporter"} 1.0 +apollo_client_thread_pool_active_task_count{thread_pool_name="AbstractConfigFile"} 0.0 +apollo_client_thread_pool_active_task_count{thread_pool_name="AbstractConfig"} 0.0 +# TYPE apollo_client_namespace_timeout gauge +# HELP apollo_client_namespace_timeout apollo gauge metrics +apollo_client_namespace_timeout 0.0 +# TYPE apollo_client_thread_pool_pool_size gauge +# HELP apollo_client_thread_pool_pool_size apollo gauge metrics +apollo_client_thread_pool_pool_size{thread_pool_name="RemoteConfigRepository"} 1.0 +apollo_client_thread_pool_pool_size{thread_pool_name="AbstractApolloClientMetricsExporter"} 1.0 +apollo_client_thread_pool_pool_size{thread_pool_name="AbstractConfigFile"} 0.0 +apollo_client_thread_pool_pool_size{thread_pool_name="AbstractConfig"} 0.0 +# TYPE apollo_client_thread_pool_queue_remaining_capacity gauge +# HELP apollo_client_thread_pool_queue_remaining_capacity apollo gauge metrics +apollo_client_thread_pool_queue_remaining_capacity{thread_pool_name="RemoteConfigRepository"} 2.147483647E9 +apollo_client_thread_pool_queue_remaining_capacity{thread_pool_name="AbstractApolloClientMetricsExporter"} 2.147483647E9 +apollo_client_thread_pool_queue_remaining_capacity{thread_pool_name="AbstractConfigFile"} 0.0 +apollo_client_thread_pool_queue_remaining_capacity{thread_pool_name="AbstractConfig"} 0.0 +# TYPE apollo_client_exception_num counter +# HELP apollo_client_exception_num apollo counter metrics +apollo_client_exception_num_total 1404.0 +apollo_client_exception_num_created 1.729435502796E9 +# TYPE apollo_client_thread_pool_largest_pool_size gauge +# HELP apollo_client_thread_pool_largest_pool_size apollo gauge metrics +apollo_client_thread_pool_largest_pool_size{thread_pool_name="RemoteConfigRepository"} 1.0 +apollo_client_thread_pool_largest_pool_size{thread_pool_name="AbstractApolloClientMetricsExporter"} 1.0 +apollo_client_thread_pool_largest_pool_size{thread_pool_name="AbstractConfigFile"} 0.0 +apollo_client_thread_pool_largest_pool_size{thread_pool_name="AbstractConfig"} 0.0 +# TYPE apollo_client_thread_pool_queue_size gauge +# HELP apollo_client_thread_pool_queue_size apollo gauge metrics +apollo_client_thread_pool_queue_size{thread_pool_name="RemoteConfigRepository"} 352.0 +apollo_client_thread_pool_queue_size{thread_pool_name="AbstractApolloClientMetricsExporter"} 0.0 +apollo_client_thread_pool_queue_size{thread_pool_name="AbstractConfigFile"} 0.0 +apollo_client_thread_pool_queue_size{thread_pool_name="AbstractConfig"} 0.0 +# TYPE apollo_client_namespace_usage counter +# HELP apollo_client_namespace_usage apollo counter metrics +apollo_client_namespace_usage_total{namespace="application"} 11.0 +apollo_client_namespace_usage_created{namespace="application"} 1.729435502791E9 +# TYPE apollo_client_thread_pool_core_pool_size gauge +# HELP apollo_client_thread_pool_core_pool_size apollo gauge metrics +apollo_client_thread_pool_core_pool_size{thread_pool_name="RemoteConfigRepository"} 1.0 +apollo_client_thread_pool_core_pool_size{thread_pool_name="AbstractApolloClientMetricsExporter"} 1.0 +apollo_client_thread_pool_core_pool_size{thread_pool_name="AbstractConfigFile"} 0.0 +apollo_client_thread_pool_core_pool_size{thread_pool_name="AbstractConfig"} 0.0 +# TYPE apollo_client_namespace_not_found gauge +# HELP apollo_client_namespace_not_found apollo gauge metrics +apollo_client_namespace_not_found 351.0 +# TYPE apollo_client_thread_pool_total_task_count gauge +# HELP apollo_client_thread_pool_total_task_count apollo gauge metrics +apollo_client_thread_pool_total_task_count{thread_pool_name="RemoteConfigRepository"} 353.0 +apollo_client_thread_pool_total_task_count{thread_pool_name="AbstractApolloClientMetricsExporter"} 4.0 +apollo_client_thread_pool_total_task_count{thread_pool_name="AbstractConfigFile"} 0.0 +apollo_client_thread_pool_total_task_count{thread_pool_name="AbstractConfig"} 0.0 +# TYPE apollo_client_namespace_first_load_time_spend_in_ms gauge +# HELP apollo_client_namespace_first_load_time_spend_in_ms apollo gauge metrics +apollo_client_namespace_first_load_time_spend_in_ms{namespace="application"} 108.0 +# TYPE apollo_client_thread_pool_maximum_pool_size gauge +# HELP apollo_client_thread_pool_maximum_pool_size apollo gauge metrics +apollo_client_thread_pool_maximum_pool_size{thread_pool_name="RemoteConfigRepository"} 2.147483647E9 +apollo_client_thread_pool_maximum_pool_size{thread_pool_name="AbstractApolloClientMetricsExporter"} 2.147483647E9 +apollo_client_thread_pool_maximum_pool_size{thread_pool_name="AbstractConfigFile"} 2.147483647E9 +apollo_client_thread_pool_maximum_pool_size{thread_pool_name="AbstractConfig"} 2.147483647E9 +# TYPE apollo_client_namespace_item_num gauge +# HELP apollo_client_namespace_item_num apollo gauge metrics +apollo_client_namespace_item_num{namespace="application"} 9.0 +# TYPE apollo_client_thread_pool_completed_task_count gauge +# HELP apollo_client_thread_pool_completed_task_count apollo gauge metrics +apollo_client_thread_pool_completed_task_count{thread_pool_name="RemoteConfigRepository"} 1.0 +apollo_client_thread_pool_completed_task_count{thread_pool_name="AbstractApolloClientMetricsExporter"} 3.0 +apollo_client_thread_pool_completed_task_count{thread_pool_name="AbstractConfigFile"} 0.0 +apollo_client_thread_pool_completed_task_count{thread_pool_name="AbstractConfig"} 0.0 +# EOF +``` + +同时查看Prometheus控制台也能看到如下信息 + +![Prometheus console showing Apollo client metrics](https://cdn.jsdelivr.net/gh/apolloconfig/apollo@master/doc/images/apollo-client-monitor-prometheus.png) + +## 7.3 指标输出到自定义监控系统 +### skyWalking为例 + +创建SkyWalkingMetricsExporter类,继承AbstractApolloClientMetricsExporter(通用指标导出框架) + +继承后大致代码如下 + +```java + +public class SkyWalkingMetricsExporter extends + AbstractApolloClientMetricsExporter implements ApolloClientMetricsExporter { + + @Override + public void doInit() { + + } + + @Override + public boolean isSupport(String form) { + + } + + + @Override + public void registerOrUpdateCounterSample(String name, Map tags, double incrValue) { + + } + + + @Override + public void registerOrUpdateGaugeSample(String name, Map tags, double value) { + + } + + @Override + public String response() { + + } +} + +``` + +doInit方法是供用户在初始化时自行做扩展的,会在AbstractApoolloClientMetircsExporter里的init方法被调用 + +```java + @Override + public void init(List collectors, long collectPeriod) { + // code + doInit(); + // code + } +``` + + +这里引入了skyWalking的micrometer依赖 +```xml + + org.apache.skywalking + apm-toolkit-micrometer-1.10 + +``` +根据Micrometer的机制初始化SkywalkingMeterRegistry,以及一些map用于存储指标数据 +```java + private static final String SKYWALKING = "skywalking"; + private SkywalkingMeterRegistry registry; + private Map map; + private Map gaugeMap; + private Map> gaugeValues; + + @Override + public void doInit() { + registry = new SkywalkingMeterRegistry(); + map = new ConcurrentHashMap<>(); + gaugeValues = new ConcurrentHashMap<>(); + gaugeMap = new ConcurrentHashMap<>(); + } +``` + +isSupport方法将会在DefaultApolloClientMetricsExporterFactory通过SPI读取MetricsExporter时被调用做判断,用于实现在有多个SPI实现时可以准确启用用户所配置的那一个Exporter + +比如配置时候你希望启用skyWalking,你规定的apollo.client.monitor.external.type配置值为skyWalking,那这里就实现如下方法 + +```java + @Override + public boolean isSupport(String form) { + return SKYWALKING.equals(form); + } +``` + +registerOrUpdateCounterSample,registerOrUpdateGaugeSample即是用来注册Counter,Gauge类型指标的方法,只需要根据传来的参数正常注册以及更新数据即可 + +```java +@Override + public void registerOrUpdateCounterSample(String name, Map tags, double incrValue) { + String key = name + tags.toString(); + Counter counter = (Counter) map.get(key); + + if (counter == null) { + counter = createCounter(name, tags); + map.put(key, counter); + } + + counter.increment(incrValue); + } + + private Counter createCounter(String name, Map tags) { + return Counter.builder(name) + .tags(tags.entrySet().stream() + .map(entry -> Tag.of(entry.getKey(), entry.getValue())) + .collect(Collectors.toList())) + .register(registry); + } + + + @Override + public void registerOrUpdateGaugeSample(String name, Map tags, double value) { + String key = name + tags.toString(); + Gauge gauge = gaugeMap.get(key); + if (gauge == null) { + createGauge(name, tags, value); + } else { + gaugeValues.get(key).set(value); + } + + } + + public void createGauge(String name, Map tags, double value) { + String key = name + tags.toString(); + AtomicReference valueHolder = gaugeValues.computeIfAbsent(key, k -> new AtomicReference<>(value)); + gaugeMap.computeIfAbsent(key, k -> Gauge.builder(name, valueHolder::get) + .tags(tags.entrySet().stream() + .map(entry -> Tag.of(entry.getKey(), entry.getValue())) + .collect(Collectors.toList())) + .register(registry)); + } +``` + +response是用于方便指标获取模式为拉取的监控系统,如Prometheus,但是SkyWalking用推送更常见,这里就不需要实现,用户自行配置SkyWalking即可 + +```java + @Override + public String response() { + // 返回需要的响应内容 + return "该方法在skyWalking的推送模式中不需要实现"; + } + } +``` + +至此,已经将Client的指标数据接入SkyWalking。