SQL SERVER中用户定义标量函数（scalar user defined function）的性能问题 - 潇湘隐(2)_H5之家

回到正题，上面两个SQL语句的实际执行计划的Cost比值为81%：19%；Compute Scalar（进行一个标量计算并返回计算值）的Number of Executions都是1次。但是实际的CPU time &elapsed time的比值比这个大了好多。另外第一个SQL的Compute Scalar的代价比值居然只有1%。为什么会这样呢？我们是不是很迷惑？

clipboard[5]

关于这个大家疑惑的地方，T-SQL User-Defined Functions: the good, the bad, and the ugly (part 1)里面给了我们一个阐述，截取文章中两段在此(翻译如有不当，请参考原文)：

英文：

However, you may not be aware that the “Actual Execution Plan” is a dirty rotten liar. Or maybe I should say that the terms “Actual Execution Plan” and “Estimated Execution Plan” are misleading. There is only one execution plan, it gets created when the queries are compiled, and then the queries are executed. The only difference between the “Actual” and the “Estimated” execution plan is that the estimated plan only tells you the estimates for how many rows flow between iterators and how often iterators are executed, and the actual plan adds the actual data for that. But no “actual operator cost” or “actual subtree cost” is added to the corresponding estimated values – and since those costs are the values that the percentages are based on, the percentages displayed in an actual execution plan are still based only on the estimates.

翻译：

然而，你可能不知道“实际执行计划”其实是一个肮脏的烂骗子，或者我应该说“实际执行计划”和“估计执行计划”误导你了。当查询语句编译后，只有一个实际的执行计划。“实际执行计划”与“估计执行计划”的区别就在于“估计执行计划”只告诉你估计了有多少行流向迭代和迭代器执行频率，而实际执行计划将实际数据应用进来。但是“实际操作成本”或“实际子树成本”并没有添加到“实际执行计划”的估计值里面，因为这些代价都是基于百分比的值，在实际执行计划中显示的百分比仍然基于只估计数。

英文：

But note that, again, the execution plan is lying. First, it implies that the UDF is invoked only once, which is not the case. Second, look at the cost. You may think that the 0% is the effect of rounding down, since a single execution of the function costs so little in relation to the cost of accessing and aggregating 100,000 rows. But if you check the properties of the iterators of the plan for the function, you’ll see that all operator and subtree costs are actually estimated to be exactly 0. This lie is maybe the worst of all – because it’s not just the plan lying to us, it is SQL Server lying to itself. This cost estimate of 0 is actually used by the query optimizer, so all plans it produces are based on the assumption that executing the function is free. As a result, the optimizer will not even consider optimizations it might use if it knew how costly calling a scalar UDF actually is.

翻译：

但是需要再次注意，执行计划在欺骗你，首先，它意味着只调用了UDF一次，其实不是这样。其次，从成本（Cost）来看，你可能会认为0%是向下舍入影响，因为单次执行函数的开销如此之小，以至于执行100,000次的成本也很小。但如果你检查执行计划的功能迭代器的属性，你会发现所有的操作代价和子树代价实际的估计为0，这是一个最糟糕的谎言。因为它可能不只是为了欺骗我们，而是SQL SERVER为了欺骗它自己。实际上是查询优化器认为调用函数的成本为0，因此它生成的所有执行计划都是基于调用UDF是免费的。其结果是即使调用标量UDF的代价非常昂贵，查询优化器也不会考虑优化它。

如何优化UDF标量函数（Scalar-Valued Function）

如何优化上面SQL语句呢？从原理上来讲就是不用用户定义函数或减少调用次数。其实我在实际应用中，减少调用次数一般通过下面方法优化：

1：减少用户定义标量函数调用次数（子查询）

SET STATISTICS TIME ON; SELECT ProductID, Sales.FetchProductOrderNum(ProductID)FROM( SELECT DISTINCT ProductID FROM Sales.SalesOrderDetailWHERE ProductID=870) T SET STATISTICS TIME OFF;

clipboard[6]

2：减少用户定义标量函数调用次数（临时表）

SET STATISTICS TIME ON; SELECT DISTINCT ProductID INTO #SalesOrderDetail FROM Sales.SalesOrderDetailWHERE ProductID=870; SELECT ProductID, Sales.FetchProductOrderNum(ProductID)FROM #SalesOrderDetail SET STATISTICS TIME OFF;

为什么要用临时表呢？不是子查询就可以解决问题吗？问题是实际应用当中，有些逻辑复杂的地方需要借助临时表解决，有时候子查询反而不是一个好的解决方法。

clipboard[7]