但是hadop自身没有加CPU监控并不代表我们不可以加这样的监控,有一些程序可能就是那种应用内存并不多,但是会耗尽很多CPU资源的程序,比如说开大量的线程,但是每个线程都在做很简单的操作,就会造成机器线程占比过高的问题.基于这个出发点,我添加了CPU使用百分比的监控.
首先你要定义是否开启此功能的配置:
/** Specifies whether cpu vcores check is enabled. */
public static final String NM_VCORES_CHECK_ENABLED = NM_PREFIX
+ "vcores-check-enabled";
public static final boolean DEFAULT_NM_VCORES_CHECK_ENABLED = false;因为是新功能,默认是关闭的,然后你还需要定义1个使用阈值,在0~1之间,就是说一旦某个container的使用CPU的百分比超过这个值,就会被kill.
/** Limit ratio of Virtual CPU Cores which can be allocated for containers. */
public static final String NM_VCORES_LIMITED_RATIO = NM_PREFIX
+ "resource.cpu-vcores.limited.ratio";
public static final float DEFAULT_NM_VCORES_LIMITED_RATIO = 0.8f;默认这个值0.8,这个可以你随便设置.监控代码的逻辑,与内存监控完全类似,我将比较快的带过.
多定义2个变量值
private boolean pmemCheckEnabled; ... private boolean vcoresCheckEnabled; private float vcoresLimitedRatio;然后在serviceInit中进程配置初始化工作
... pmemCheckEnabled = conf.getBoolean(YarnConfiguration.NM_PMEM_CHECK_ENABLED, YarnConfiguration.DEFAULT_NM_PMEM_CHECK_ENABLED); vmemCheckEnabled = conf.getBoolean(YarnConfiguration.NM_VMEM_CHECK_ENABLED, YarnConfiguration.DEFAULT_NM_VMEM_CHECK_ENABLED); vcoresCheckEnabled = conf.getBoolean(YarnConfiguration.NM_VCORES_CHECK_ENABLED, YarnConfiguration.DEFAULT_NM_VCORES_CHECK_ENABLED); LOG.info("Physical memory check enabled: " + pmemCheckEnabled); LOG.info("Virtual memory check enabled: " + vmemCheckEnabled); LOG.info("Cpu vcores check enabled: " + vcoresCheckEnabled); if (vcoresCheckEnabled) { vcoresLimitedRatio = conf.getFloat(YarnConfiguration.NM_VCORES_LIMITED_RATIO, YarnConfiguration.DEFAULT_NM_VCORES_LIMITED_RATIO); LOG.info("Vcores limited ratio: " + vcoresLimitedRatio); }然后利用monitor监控代码中已计算出的cpu百分比变量
LOG.debug("Constructing ProcessTree for : PID = " + pId + " ContainerId = " + containerId); ResourceCalculatorProcessTree pTree = ptInfo.getProcessTree(); pTree.updateProcessTree(); // update process-tree long currentVmemUsage = pTree.getVirtualMemorySize(); long currentPmemUsage = pTree.getRssMemorySize(); // if machine has 6 cores and 3 are used, // cpuUsagePercentPerCore should be 300% and // cpuUsageTotalCoresPercentage should be 50% float cpuUsagePercentPerCore = pTree.getCpuUsagePercent(); float cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore / resourceCalculatorPlugin.getNumProcessors();最后进行大小判断即可
.... } else if (isVcoresCheckEnabled() && cpuUsageTotalCoresPercentage > vcoresLimitedRatio) { msg = String.format( "Container [pid=%s,containerID=%s] is running beyond %s vcores limits." + " Current usage: %s. Killing container.\n", pId, containerId, vcoresLimitedRatio); isCpuVcoresOverLimit = true; containerExitStatus = ContainerExitStatus.KILLED_EXCEEDED_VCORES; } if (isMemoryOverLimit || isCpuVcoresOverLimit) { // Virtual or physical memory over limit. Fail the container and // remove // the corresponding process tree LOG.warn(msg); // warn if not a leader if (!pTree.checkPidPgrpidForMatch()) { LOG.error("Killed container process with PID " + pId + " but it is not a process group leader."); } // kill the container eventDispatcher.getEventHandler().handle( new ContainerKillEvent(containerId, containerExitStatus, msg)); it.remove(); LOG.info("Removed ProcessTree with root " + pId); } else {对了,还要在这里添加1个新的ExitStatus退出码:
/** * Container terminated because of exceeding allocated cpu vcores. */ public static final int KILLED_EXCEEDED_VCORES = -108;CPU监控代码的改动就是这么多.此功能的完整代码可以查看文章末尾的链接.在这里我要特别申请一下,此功能代码由于我在本地电脑上不支持ProcfsBasedProcessTree,导致单元测试没法跑通,所以我还没有完整测过,理论上是OK,大家可以拿去试试,可以给我一些反馈.希望能带给大家收获.
Github patch链接:https://github.com/linyiqun/open-source-patch/tree/master/yarn/others/YARN-VcoresMonitor
顶 0 踩 0
我的同类文章
猜你在找
查看评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场