大家在初次使用spring-cloud的gateway的时候,肯定会被里面各种的Timeout搞得晕头转向。hytrix有设置,ribbon也有。我们一开始也是乱设一桶,Github上各种项目里也没几个设置正确的。对Timeout的研究源于一次log中的warning
The Hystrix timeout of 60000 ms for the command “foo” is set lower than the combination of the Ribbon read and connect timeout, 200000ms.
hytrix超时时间 log出自AbstractRibbonCommand.java
,那么索性研究一下源码。
假设:
这里gateway会请求一个serviceName=foo的服务
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 protected static int getHystrixTimeout (IClientConfig config, String commandKey) { int ribbonTimeout = getRibbonTimeout(config, commandKey); DynamicPropertyFactory dynamicPropertyFactory = DynamicPropertyFactory.getInstance(); int defaultHystrixTimeout = dynamicPropertyFactory.getIntProperty("hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds" , 0 ).get(); int commandHystrixTimeout = dynamicPropertyFactory.getIntProperty("hystrix.command." + commandKey + ".execution.isolation.thread.timeoutInMilliseconds" , 0 ).get(); int hystrixTimeout; if (commandHystrixTimeout > 0 ) { hystrixTimeout = commandHystrixTimeout; } else if (defaultHystrixTimeout > 0 ) { hystrixTimeout = defaultHystrixTimeout; } else { hystrixTimeout = ribbonTimeout; } if (hystrixTimeout < ribbonTimeout) { LOGGER.warn("The Hystrix timeout of " + hystrixTimeout + "ms for the command " + commandKey + " is set lower than the combination of the Ribbon read and connect timeout, " + ribbonTimeout + "ms." ); } return hystrixTimeout; }
紧接着,看一下我们的配置是什么
1 2 3 4 5 6 7 8 9 10 11 12 13 hystrix: command: default: execution: isolation: thread: timeoutInMilliseconds: 60000 ribbon: ReadTimeout: 50000 ConnectTimeout: 50000 MaxAutoRetries: 0 MaxAutoRetriesNextServer: 1
ribbon超时时间 这里ribbon的超时时间是50000ms,那么为什么log中写的ribbon时间是200000ms?
继续分析源码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 protected static int getRibbonTimeout (IClientConfig config, String commandKey) { int ribbonTimeout; if (config == null ) { ribbonTimeout = RibbonClientConfiguration.DEFAULT_READ_TIMEOUT + RibbonClientConfiguration.DEFAULT_CONNECT_TIMEOUT; } else { int ribbonReadTimeout = getTimeout(config, commandKey, "ReadTimeout" , IClientConfigKey.Keys.ReadTimeout, RibbonClientConfiguration.DEFAULT_READ_TIMEOUT); int ribbonConnectTimeout = getTimeout(config, commandKey, "ConnectTimeout" , IClientConfigKey.Keys.ConnectTimeout, RibbonClientConfiguration.DEFAULT_CONNECT_TIMEOUT); int maxAutoRetries = getTimeout(config, commandKey, "MaxAutoRetries" , IClientConfigKey.Keys.MaxAutoRetries, DefaultClientConfigImpl.DEFAULT_MAX_AUTO_RETRIES); int maxAutoRetriesNextServer = getTimeout(config, commandKey, "MaxAutoRetriesNextServer" , IClientConfigKey.Keys.MaxAutoRetriesNextServer, DefaultClientConfigImpl.DEFAULT_MAX_AUTO_RETRIES_NEXT_SERVER); ribbonTimeout = (ribbonReadTimeout + ribbonConnectTimeout) * (maxAutoRetries + 1 ) * (maxAutoRetriesNextServer + 1 ); } return ribbonTimeout; }
可以看到ribbonTimeout是一个总时间,所以从逻辑上来讲,作者希望hystrixTimeout要大于ribbonTimeout,否则hystrix熔断了以后,ribbon的重试就都没有意义了。
ribbon单服务设置 到这里最前面的疑问已经解开了,但是hytrix可以分服务设置timeout,ribbon可不可以? 源码走起,这里看的文件是DefaultClientConfigImpl.java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 public <T> T get (IClientConfigKey<T> key, T defaultValue) { T value = get(key); if (value == null ) { value = defaultValue; } return value; } protected Object getProperty (String key) { if (enableDynamicProperties) { String dynamicValue = null ; DynamicStringProperty dynamicProperty = dynamicProperties.get(key); if (dynamicProperty != null ) { dynamicValue = dynamicProperty.get(); } if (dynamicValue == null ) { dynamicValue = DynamicProperty.getInstance(getConfigKey(key)).getString(); if (dynamicValue == null ) { dynamicValue = DynamicProperty.getInstance(getDefaultPropName(key)).getString(); } } if (dynamicValue != null ) { return dynamicValue; } } return properties.get(key); }
以我们的服务为例:getConfigKey(key)
returns foo.ribbon.ReadTimeout
getDefaultPropName(key)
returns ribbon.ReadTimeout
一目了然,{serviceName}.ribbon.{propertyName}
就可以了。
小结 感觉ribbon和hytrix的配置获取源码略微有点乱,所以也导致大家在设置的时候有些无所适从。spring-cloud
的代码一直在迭代,无论github上还是文档可能都相对滞后,这时候阅读源码并且动手debug一下是最能接近事实真相的了。