从Container内存监控限制到CPU使用率限制方案 - 走在前往架构师的路上 - 博客频道 - CSDN.NET 走(2)_H5之家

// Multiply by 1000 to avoid losing data when converting to int int milliVcoresUsed = (int) (cpuUsageTotalCoresPercentage * 1000 * maxVCoresAllottedForContainers /nodeCpuPercentageForYARN); // as processes begin with an age 1, we want to see if there // are processes more than 1 iteration old. long curMemUsageOfAgedProcesses = pTree.getVirtualMemorySize(1); long curRssMemUsageOfAgedProcesses = pTree.getRssMemorySize(1); long vmemLimit = ptInfo.getVmemLimit(); long pmemLimit = ptInfo.getPmemLimit();而这个pememLimit就不是pTree的信息,而是来自于外界所启动container时候所传进来的值,这个值其实就是java.opts的值.

ContainerId containerId = monitoringEvent.getContainerId(); switch (monitoringEvent.getType()) { case START_MONITORING_CONTAINER: ContainerStartMonitoringEvent startEvent = (ContainerStartMonitoringEvent) monitoringEvent; synchronized (this.containersToBeAdded) { ProcessTreeInfo processTreeInfo = new ProcessTreeInfo(containerId, null, null, startEvent.getVmemLimit(), startEvent.getPmemLimit(), startEvent.getCpuVcores()); this.containersToBeAdded.put(containerId, processTreeInfo); } break;然后是内存监控的核心逻辑

.... } else if (isPmemCheckEnabled() && isProcessTreeOverLimit(containerId.toString(), currentPmemUsage, curRssMemUsageOfAgedProcesses, pmemLimit)) { // Container (the root process) is still alive and overflowing // memory. // Dump the process-tree and then clean it up. msg = formatErrorMessage("physical", currentVmemUsage, vmemLimit, currentPmemUsage, pmemLimit, pId, containerId, pTree); isMemoryOverLimit = true; containerExitStatus = ContainerExitStatus.KILLED_EXCEEDED_PMEM; ....传入当前的内存使用量和限制值然后做比较,isProcessTreeOverLimit最终会调用到下面的这个方法.

/** * Check whether a container's process tree's current memory usage is over * limit. * * When a java process exec's a program, it could momentarily account for * double the size of it's memory, because the JVM does a fork()+exec() * which at fork time creates a copy of the parent's memory. If the * monitoring thread detects the memory used by the container tree at the * same instance, it could assume it is over limit and kill the tree, for no * fault of the process itself. * * We counter this problem by employing a heuristic check: - if a process * tree exceeds the memory limit by more than twice, it is killed * immediately - if a process tree has processes older than the monitoring * interval exceeding the memory limit by even 1 time, it is killed. Else it * is given the benefit of doubt to lie around for one more iteration. * * @param containerId * Container Id for the container tree * @param currentMemUsage * Memory usage of a container tree * @param curMemUsageOfAgedProcesses * Memory usage of processes older than an iteration in a container * tree * @param vmemLimit * The limit specified for the container * @return true if the memory usage is more than twice the specified limit, * or if processes in the tree, older than this thread's monitoring * interval, exceed the memory limit. False, otherwise. */ boolean isProcessTreeOverLimit(String containerId, long currentMemUsage, long curMemUsageOfAgedProcesses, long vmemLimit) { boolean isOverLimit = false; if (currentMemUsage > (2 * vmemLimit)) { LOG.warn("Process tree for container: " + containerId + " running over twice " + "the configured limit. Limit=" + vmemLimit + ", current usage = " + currentMemUsage); isOverLimit = true; } else if (curMemUsageOfAgedProcesses > vmemLimit) { LOG.warn("Process tree for container: " + containerId + " has processes older than 1 " + "iteration running over the configured limit. Limit=" + vmemLimit + ", current usage = " + curMemUsageOfAgedProcesses); isOverLimit = true; } return isOverLimit; }有2种情况会导致内存超出的现象,1个是使用内存超出限制内存2倍,理由是新的jvm会执行fork和exec操作,fork操作会拷贝父进程的信息,还有1个就是内存年龄值的限制.其他的上面注释已经写的很清楚了,如果还看不懂英文的话,自行找工具翻译.

最后如果发现container内存使用的确是超出内存限制值了,之后,就会发送container kill的event事件,会触发后续的container kill的动作.

.... } else if (isVcoresCheckEnabled() && cpuUsageTotalCoresPercentage > vcoresLimitedRatio) { msg = String.format( "Container [pid=%s,containerID=%s] is running beyond %s vcores limits." + " Current usage: %s. Killing container.\n", pId, containerId, vcoresLimitedRatio); isCpuVcoresOverLimit = true; containerExitStatus = ContainerExitStatus.KILLED_EXCEEDED_VCORES; } if (isMemoryOverLimit) { // Virtual or physical memory over limit. Fail the container and // remove // the corresponding process tree LOG.warn(msg); // warn if not a leader if (!pTree.checkPidPgrpidForMatch()) { LOG.error("Killed container process with PID " + pId + " but it is not a process group leader."); } // kill the container eventDispatcher.getEventHandler().handle( new ContainerKillEvent(containerId, containerExitStatus, msg)); it.remove(); LOG.info("Removed ProcessTree with root " + pId); } else { ...

这就是container的内存监控的整个过程.我们当时又恰巧把这个功能给关了,所以导致了大量的Ful gc的现象.

为什么只对Container内存做监控

对于小标题上的问题,不知道有没有哪位同学想过?当时我在解决掉这个问题之后,我就在想,同样是很关键的指标,CPU的使用监控为什么不在ContainersMonitorImpl一起做掉呢.下面是我个人所总结出来的几点原因.

1.内存问题所造成的结果比CPU使用造成的影响更大,因为OOM问题一旦发生,就会引起gc.

2.内存问题比较CPU使用问题更加常见.因为大家在平常生活或写程序时,经常发碰到类似"啊,内存不够用了"等类似的问题,相对比较少碰到"CPU不够用了"的问题.

3.内存问题与Job运行规模,数据量使用规模密切相关.内存的使用与Job所处理的数据量密切相关,一般大Job,处理数据量大了,内存使用自然会变多,CPU也会变多,但不会那么明显.

综上3点原因,所以CPU监控并没有被加入到监控代码中(个人分析而言).