Skip to main content

New feature for PAM(64bit IOS XR)

·4 mins
Table of Contents

PAM(Platform Automated Monitoring),从 6.1.2 版本开始(64bit, not in 32bit)开始引入该功能, 并且默认情况下是自动启动,用于监视进程 crash,memory leak, CPU hog,traceback , disk usage 等, 具体点就是当检测到某一事件时, 会自动采集一些信息并默认保存到 harddisk:/cisco_support 目录下, 供我们 troubleshooting, 这一功能是全自动的,目前没法手动配置,具体示例可以参考以下文档:

PAM Events

从 6.6.1 开始新引入一个 feature, on-demand EDCD(Event Driven CLI Database ), 结合 PAM 能实现两种功能

  1. PAM Schedule: 每间隔一段时间采集一些信息
  2. PAM EEM Agent: 监控 syslog, 若符合条件 trigger 采集一些信息

EDCD Ondemand-Create
#

RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand ?
  add-update          Add or update ondemand EDCD entries
  add-update-trigger  Add or update ondemand EDCD entries
  delete              Delete ondemand EDCD entries
  delete-all          Delete all EDCD entries
  trigger             Trigger the collection of traces associated with given identifier

//创建一个 command list, 示例如下:

RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand add-update identifier xuxing_test commands "show run;show plat;show install active su"
Sun Apr 25 09:30:18.903 UTC

Ondemand EDCD has been updated (execute 'show edcd ondemand database' to verify.)

RP/0/RSP0/CPU0:ASR9910-B#
RP/0/RSP0/CPU0:ASR9910-B#show edcd ondemand database
Sun Apr 25 09:30:58.713 UTC
============================================================
               Identifier: xuxing_test
============================================================
 1: show run
 2: show plat
 3: show install active su
------------------------------------------------------------

//往已有的 command list 中新增一些命令的话, 使用如下的方法:

RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand add-update identifier xuxing_test commands "show clock"
Sun Apr 25 09:41:01.362 UTC

Ondemand EDCD has been updated (execute 'show edcd ondemand database' to verify.)

RP/0/RSP0/CPU0:ASR9910-B#
RP/0/RSP0/CPU0:ASR9910-B#show edcd ondemand database
Sun Apr 25 09:41:08.848 UTC
============================================================
               Identifier: xuxing_test
============================================================
 1: show run
 2: show plat
 3: show install active su
 4: show clock   <<<<<
------------------------------------------------------------

RP/0/RSP0/CPU0:ASR9910-B#

//admin cli 和 shell cli 同样是支持的:

RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand add-update identifier xuxing_test commands "admin show plat;run ng_show_version"
Sun Apr 25 09:48:36.510 UTC

Ondemand EDCD has been updated (execute 'show edcd ondemand database' to verify.)

RP/0/RSP0/CPU0:ASR9910-B#show edcd ondemand database
Sun Apr 25 09:48:39.145 UTC
============================================================
               Identifier: xuxing_test
============================================================
 1: show run
 2: show plat
 3: show install active su
 4: admin show plat    <<<<
 5: run ng_show_version    <<<<
------------------------------------------------------------

RP/0/RSP0/CPU0:ASR9910-B#

EDCD Ondemand – Delete
#

可以选择删除某个 command 或者删除整个 list:

RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand delete identifier xuxing_test ?
  commands  Specify a list of commands that to be deleted (if missing all entries under this sub-pattern will be deleted)
  <cr>

RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand delete identifier xuxing_test commands "show clock"
Sun Apr 25 09:43:31.815 UTC

Ondemand EDCD has been updated (execute 'show edcd ondemand database' to verify.)

RP/0/RSP0/CPU0:ASR9910-B#show edcd ondemand database
Sun Apr 25 09:43:34.277 UTC
============================================================
               Identifier: xuxing_test
============================================================
 1: show run
 2: show plat
 3: show install active su
------------------------------------------------------------

EDCD Ondemand – Trigger
#

如何测试 command lish 是否生效呢?可以使用以下命令:

RP/0/RSP0/CPU0:ASR9910-B#edcd ondemand trigger identifier xuxing_test
Sun Apr 25 09:49:43.479 UTC
RP/0/RSP0/CPU0:ASR9910-B#

RP/0/RSP0/CPU0:Apr 25 09:36:40.033 UTC: run_cmd[69017]: %INFRA-INFRA_MSG-5-RUN_LOGIN : User cisco logged into shell from vty0
RP/0/RSP0/CPU0:Apr 25 09:36:46.775 UTC: run_cmd[69017]: %INFRA-INFRA_MSG-5-RUN_LOGOUT : User cisco logged out of shell from vty0
RP/0/RSP0/CPU0:Apr 25 09:49:54.118 UTC: logger[67945]: %OS-SYSLOG-4-LOG_WARNING : PAM has completed on-demand data collection for xuxing_test. All files are archived and saved at 0/RSP0/CPU0 : harddisk:/cisco_support/PAM-asr9k-ondemand-xr-xuxing_test-2021Apr25-094953.tgz (Please copy tgz file out of the router and send to Cisco support. This tgz file will be removed after 14 days.

如上所示,系统会尝试一个 tar 文件"harddisk:/cisco_support/PAM-asr9k-ondemand-xr-xuxing_test-2021Apr25-094953.tgz", 从设备中 copy 出来解压缩显示如下:

]

PAM Schedule
#

RP/0/RSP0/CPU0:ASR9910-B#edcd scheduler add-update cadence '*/10 * * * *' ?    <<<< 两种方式, schedule command或者schedule之前配置好的command list
  command     Command to be executed at the above cadence
  identifier  An identifier linked to a list of CLIs (defined in ondemand EDCD)
  <cr>
RP/0/RSP0/CPU0:ASR9910-B#edcd scheduler add-update cadence '*/10 * * * *' identifier xuxing_test
Sun Apr 25 10:03:26.302 UTC
Adding */10 * * * * root /pkg/bin/pam_is_active_rp && /pkg/bin/edcd_cli.py ondemand --operation trigger -i xuxing_test
Updating job file on remote RP
The following job has been added successfully:
*/10 * * * * root /pkg/bin/pam_is_active_rp && /pkg/bin/edcd_cli.py ondemand --operation trigger -i xuxing_test
RP/0/RSP0/CPU0:ASR9910-B#
RP/0/RSP0/CPU0:ASR9910-B#show edcd scheduler    <<<<   查看已有的scheduler
Sun Apr 25 10:03:33.842 UTC
<Job ID>: <job content>
1: */10 * * * * root /pkg/bin/pam_is_active_rp && /pkg/bin/edcd_cli.py ondemand --operation trigger -i xuxing_test
RP/0/RSP0/CPU0:ASR9910-B#

‘*/10 * * * *’, 代表每隔 10 分钟执行一次, 这里的参数如何设置可以参考 Linux crontab 介绍:

Linux crontab

如何删除该 schedule:

RP/0/RSP0/CPU0:ASR9910-B#edcd scheduler delete job-id 1    <<<< 使用job id 删除,job-id通过“show edcd scheduler”获得
Sun Apr 25 10:08:42.937 UTC
The following job has been deleted:
*/10 * * * * root /pkg/bin/pam_is_active_rp && /pkg/bin/edcd_cli.py ondemand --operation trigger -i xuxing_test

Updating job file on remote RP
RP/0/RSP0/CPU0:ASR9910-B#

PAM EEM
#




Comments