Hive的内置HASH()函数使用哪种哈希算法

内置的HASH()函数使用哪种哈希算法？

我理想地是在寻找SHA512/SHA256哈希，类似于SHA()函数在Pig的linkedin datafu UDF中提供的功能。

最佳答案

HASH函数(从Hive 0.11开始)使用类似于java.util.List#hashCode的算法。

其代码如下所示:

代码语言：javascript复制

int hashCode = 0; // Hive HASH uses 0 as the seed, List#hashCode uses 1. I don't know why.
for (Object item: items) {
   hashCode = hashCode * 31   (item == null ? 0 : item.hashCode());
}

基本上，这是有效Java一书中推荐的经典哈希算法。引用一个伟人(和一个伟大的book):

The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically.

我离题了。您可以查看HASH源here。

如果要在Hive中使用SHAxxx，则可以使用Apache DigestUtils类和Hive内置的reflect函数(希望可以使用):

代码语言：javascript复制

SELECT reflect('org.apache.commons.codec.digest.DigestUtils', 'sha256Hex', 'your_string')

关于hive - Hive的内置HASH()函数使用哪种哈希算法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/21176602/

参考资料

Hive Operators and User-Defined Functions (UDFs): https://cwiki.apache.org/confluence/display/Hive/LanguageManual UDF#LanguageManualUDF-Built-inFunctions

Hive运算符和用户定义的函数（UDF）内置运算符运算符优先级关系运算符算术运算符逻辑运算符字符串运算符复杂类型构造函数复杂类型上的运算符

内建函数

数学函数十进制数据类型的数学函数和运算符收集功能类型转换功能日期功能条件函数字符串函数数据屏蔽功能杂项功能路径 get_json_object 内置汇总功能（UDAF）内置表生成函数（UDTF）使用范例 explode (array) explode (map) posexplode (array) inline (array of structs) stack (values) explode posexplode json_tuple parse_url_tuple GROUPing and SORTing on f(column) Utility Functions(实用功能) UDF internals(UDF内部) Creating Custom UDFs

hive https 网络安全编程算法

0 人点赞