Resurrector自动修复
Resurrector是Health monitor的一个插件,自动重新创建不能再访问的VMs。收不到Agent发来的心跳时触发一个任务给Director尝试复活VM,Director会做以下两件事之一:
- 创建新的VM如果老的找不着了
- 替换一个VM如果那个vm上的Agent无响应。
可以通过bosh vm resurrection
命令行关掉之。health monitor默认是关掉的。
配置支持Resurrector
- 修改Deployment manifest(typically colocated with the director)
properties: hm: resurrector_enabled: true
- 可选的配置参数
minimum_down_jobs:If the total number of instances that are down in a deployment (within time interval T) is below this number, the Resurrector will always request to fix instances.properties: hm: resurrector_enabled: true resurrector: minimum_down_jobs: 5 //deployment实例数字小于这个,尝试修复实例,优先级高于percent_threshold. percent_threshold: 0.2 //百分比0.2% time_threshold: 600 //计算以上数值的时间距 单位秒
使用UAA用户管理
properties:
uaa:
clients:
hm:
override: true
authorized-grant-types: client_credentials
scope: ""
authorities: bosh.admin
secret: "hm-password"
properties:
hm:
director_account:
client_id: hm
client_secret: "hm-password"
使用预配置用户
properties:
director:
user_management:
provider: local
local:
users:
- {name: admin, password: admin-password}
- {name: hm, password: hm-password}
properties:
hm:
director_account:
user: hm
password: hm-password
最后,部署。
定制你的Deployment
Small Deployment
If your deployment consists of only five VMs, you may not want the Resurrector to attempt to recreate your entire deployment in the event of a catastrophic failure. In this scenario, we recommend that you set minimum_down_jobs to 1 or 2.
Large Deployment
If your deployment consists of 1000 VMs, and you use the defaults, the Resurrector notifies the Director to recreate at least five VMs and up to 200 VMs. Depending on your deployment, you may consider even 100 down instances a catastrophic failure. In this scenario, set percent_threshold to 5% so that the Director resurrects 50 instances or fewer.
禁用Resurrector
- 改配置
properties: hm: resurrector_enabled: false
- 重新部署
- 可选)移除Health monitor使用的用户
查看Resurrector行为
查看和修改Director上的任务是家常便饭,可用bosh tasks --no-filter
查看当前运行/排队的Resurrector的动作,bosh tasks recent --no-filter
查看结束的任务。