Skip to main content

New feature for PAM(64bit IOS XR)

·659 words·4 mins
Rory
Author
Rory
Step by step the ladder is ascended

PAM(Platform Automated Monitoring),从 6.1.2 版本开始(64bit, not in 32bit)开始引入该功能, 并且默认情况下是自动启动,用于监视进程 crash,memory leak, CPU hog,traceback , disk usage 等, 具体点就是当检测到某一事件时, 会自动采集一些信息并默认保存到 harddisk:/cisco_support 目录下, 供我们 troubleshooting, 这一功能是全自动的,目前没法手动配置,具体示例可以参考以下文档:

PAM Events

从 6.6.1 开始新引入一个 feature, on-demand EDCD(Event Driven CLI Database ), 结合 PAM 能实现两种功能

  1. PAM Schedule: 每间隔一段时间采集一些信息
  2. PAM EEM Agent: 监控 syslog, 若符合条件 trigger 采集一些信息

EDCD Ondemand-Create
#

RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand ?
  add-update          Add or update ondemand EDCD entries
  add-update-trigger  Add or update ondemand EDCD entries
  delete              Delete ondemand EDCD entries
  delete-all          Delete all EDCD entries
  trigger             Trigger the collection of traces associated with given identifier

//创建一个 command list, 示例如下:

RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand add-update identifier xuxing_test commands "show run;show plat;show install active su"
Sun Apr 25 09:30:18.903 UTC

Ondemand EDCD has been updated (execute 'show edcd ondemand database' to verify.)

RP/0/RSP0/CPU0:ASR9910-B#
RP/0/RSP0/CPU0:ASR9910-B#show edcd ondemand database
Sun Apr 25 09:30:58.713 UTC
============================================================
               Identifier: xuxing_test
============================================================
 1: show run
 2: show plat
 3: show install active su
------------------------------------------------------------

//往已有的 command list 中新增一些命令的话, 使用如下的方法:

RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand add-update identifier xuxing_test commands "show clock"
Sun Apr 25 09:41:01.362 UTC

Ondemand EDCD has been updated (execute 'show edcd ondemand database' to verify.)

RP/0/RSP0/CPU0:ASR9910-B#
RP/0/RSP0/CPU0:ASR9910-B#show edcd ondemand database
Sun Apr 25 09:41:08.848 UTC
============================================================
               Identifier: xuxing_test
============================================================
 1: show run
 2: show plat
 3: show install active su
 4: show clock   <<<<<
------------------------------------------------------------

RP/0/RSP0/CPU0:ASR9910-B#

//admin cli 和 shell cli 同样是支持的:

RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand add-update identifier xuxing_test commands "admin show plat;run ng_show_version"
Sun Apr 25 09:48:36.510 UTC

Ondemand EDCD has been updated (execute 'show edcd ondemand database' to verify.)

RP/0/RSP0/CPU0:ASR9910-B#show edcd ondemand database
Sun Apr 25 09:48:39.145 UTC
============================================================
               Identifier: xuxing_test
============================================================
 1: show run
 2: show plat
 3: show install active su
 4: admin show plat    <<<<
 5: run ng_show_version    <<<<
------------------------------------------------------------

RP/0/RSP0/CPU0:ASR9910-B#

EDCD Ondemand – Delete
#

可以选择删除某个 command 或者删除整个 list:

RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand delete identifier xuxing_test ?
  commands  Specify a list of commands that to be deleted (if missing all entries under this sub-pattern will be deleted)
  <cr>

RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand delete identifier xuxing_test commands "show clock"
Sun Apr 25 09:43:31.815 UTC

Ondemand EDCD has been updated (execute 'show edcd ondemand database' to verify.)

RP/0/RSP0/CPU0:ASR9910-B#show edcd ondemand database
Sun Apr 25 09:43:34.277 UTC
============================================================
               Identifier: xuxing_test
============================================================
 1: show run
 2: show plat
 3: show install active su
------------------------------------------------------------

EDCD Ondemand – Trigger
#

如何测试 command lish 是否生效呢?可以使用以下命令:

RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand trigger identifier xuxing_test
Sun Apr 25 09:49:43.479 UTC
RP/0/RSP0/CPU0:ASR9910-B#

RP/0/RSP0/CPU0:Apr 25 09:36:40.033 UTC: run_cmd[69017]: %INFRA-INFRA_MSG-5-RUN_LOGIN : User cisco logged into shell from vty0
RP/0/RSP0/CPU0:Apr 25 09:36:46.775 UTC: run_cmd[69017]: %INFRA-INFRA_MSG-5-RUN_LOGOUT : User cisco logged out of shell from vty0
RP/0/RSP0/CPU0:Apr 25 09:49:54.118 UTC: logger[67945]: %OS-SYSLOG-4-LOG_WARNING : PAM has completed on-demand data collection for xuxing_test. All files are archived and saved at 0/RSP0/CPU0 : harddisk:/cisco_support/PAM-asr9k-ondemand-xr-xuxing_test-2021Apr25-094953.tgz (Please copy tgz file out of the router and send to Cisco support. This tgz file will be removed after 14 days.

如上所示,系统会尝试一个 tar 文件"harddisk:/cisco_support/PAM-asr9k-ondemand-xr-xuxing_test-2021Apr25-094953.tgz", 从设备中 copy 出来解压缩显示如下:

]

PAM Schedule
#

RP/0/RSP0/CPU0:ASR9910-B#edcd scheduler add-update cadence '*/10 * * * *' ?    <<<< 两种方式, schedule command或者schedule之前配置好的command list
  command     Command to be executed at the above cadence
  identifier  An identifier linked to a list of CLIs (defined in ondemand EDCD)
  <cr>
RP/0/RSP0/CPU0:ASR9910-B#edcd scheduler add-update cadence '*/10 * * * *' identifier xuxing_test
Sun Apr 25 10:03:26.302 UTC
Adding */10 * * * * root /pkg/bin/pam_is_active_rp && /pkg/bin/edcd_cli.py ondemand --operation trigger -i xuxing_test
Updating job file on remote RP
The following job has been added successfully:
*/10 * * * * root /pkg/bin/pam_is_active_rp && /pkg/bin/edcd_cli.py ondemand --operation trigger -i xuxing_test
RP/0/RSP0/CPU0:ASR9910-B#
RP/0/RSP0/CPU0:ASR9910-B#show edcd scheduler    <<<<   查看已有的scheduler
Sun Apr 25 10:03:33.842 UTC
<Job ID>: <job content>
1: */10 * * * * root /pkg/bin/pam_is_active_rp && /pkg/bin/edcd_cli.py ondemand --operation trigger -i xuxing_test
RP/0/RSP0/CPU0:ASR9910-B#

‘*/10 * * * *’, 代表每隔 10 分钟执行一次, 这里的参数如何设置可以参考 Linux crontab 介绍:

Linux crontab

如何删除该 schedule:

RP/0/RSP0/CPU0:ASR9910-B#edcd scheduler delete job-id 1    <<<< 使用job id 删除,job-id通过“show edcd scheduler”获得
Sun Apr 25 10:08:42.937 UTC
The following job has been deleted:
*/10 * * * * root /pkg/bin/pam_is_active_rp && /pkg/bin/edcd_cli.py ondemand --operation trigger -i xuxing_test

Updating job file on remote RP
RP/0/RSP0/CPU0:ASR9910-B#

PAM EEM
#