神刀安全网

hive 学习系列四(用户自定义函数)

如果入参是简单的数据类型,直接继承UDF,实现一个或者多个evaluate 方法。

具体流程如下:

1,实现大写字符转换成小写字符的UDF

package com.example.hive.udf;  import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text;  public class Lower extends UDF {     public Text evaluate(final Text s) {         if (s == null) {             return null;         }         return new Text(s.toString().toLowerCase());     } } 

2,打包成jar 包。

建立maven 项目,使用maven 打包。
这里打包成的jar 包是,hiveudf-1.0.0.jar

3,上传到hdfs 路径上。

[root@master /opt]# hadoop fs -mkdir -p /user/hive/udf 18/06/07 09:41:09 WARN util.NativeCodeLoader: Unable  to load native-hadoop library for your platform... using builtin-java classes where applicable [root@master /opt]# hadoop fs -put hiveudf-1.0.0.jar  /user/hive/udf 18/06/07 09:41:24 WARN util.NativeCodeLoader: Unable to  load native-hadoop library for your platform... using builtin-java classes where applicable [root@master /opt]# hadoop fs -ls /user/hive/udf  18/06/07 09:41:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library  for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r--   3 root supergroup       8020 2018-06-07 09:41 /user/hive/udf/hiveudf-1.0.0.jar [root@master /opt]# 

4, 在Hive 命令行里面创建函数。

add jar hdfs:////udf/hiveudf-1.0.0.jar; create temporary function lower as 'com.example.hive.udf.Lower';  hive> delete jar  hiveudf-1.0.0.jar; hive> list jars     > ; hive> add jar hdfs:///user/hive/udf/hiveudf-1.0.0.jar     > ; Added [/tmp/416cfcca-9ea0-4eaf-9e54-8154b440f3a9_resources/hiveudf-1.0.0.jar] to class path Added resources: [hdfs:///user/hive/udf/hiveudf-1.0.0.jar] hive> list jars; /tmp/416cfcca-9ea0-4eaf-9e54-8154b440f3a9_resources/hiveudf-1.0.0.jar hive> create temporary function lower as 'com.example.hive.udf.Lower'; OK Time taken: 0.594 seconds hive>  

5,然后就可以用这个注册的函数了。

hive> select lower('AbcDEfg')     > ; OK abcdefg Time taken: 1.718 seconds, Fetched: 1 row(s) hive>   

至于入参是复杂数据类型,比如Array 等, 可以继承GenericUDF

1,同样的,先写一个类,继承GenericUDF,

此自定义函数实现的是,把一个点,根据经纬度,转换成一个字符串。

package com.zbra.udf;   import org.apache.hadoop.hive.ql.exec.UDFArgumentException; import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException; import org.apache.hadoop.hive.ql.metadata.HiveException; import org.apache.hadoop.hive.ql.udf.generic.GenericUDF; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.primitive.DoubleObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;  /**  * 针对复杂数据  */ public class GeoUdf extends GenericUDF {      private DoubleObjectInspector doubleObjectInspector01;     private DoubleObjectInspector doubleObjectInspector02;      public ObjectInspector initialize(ObjectInspector[] objectInspectors) throws UDFArgumentException {         if (objectInspectors.length != 2) {             throw new UDFArgumentLengthException("arrayContainsExample only takes 2 arguments: String,  String");         }         // 1. 检查是否接收到正确的参数类型         ObjectInspector a = objectInspectors[0];         ObjectInspector b = objectInspectors[1];         if (!(a instanceof DoubleObjectInspector) || !(b instanceof DoubleObjectInspector)) {             throw new UDFArgumentException("first argument must be a double, second argument must be a double");         }          this.doubleObjectInspector01 = (DoubleObjectInspector) a;         this.doubleObjectInspector02 = (DoubleObjectInspector) b;          return PrimitiveObjectInspectorFactory.javaStringObjectInspector;     }      public Object evaluate(DeferredObject[] deferredObjects) throws HiveException {          Double lat = this.doubleObjectInspector01.get(deferredObjects[0].get());         Double lng = this.doubleObjectInspector02.get(deferredObjects[1].get());          if (lat == null || lng == null) {             return new String("");         }          return new GeoHash(lat, lng).getGeoHashBase32();     }      public String getDisplayString(String[] strings) {         if (strings.length == 2) {             return "geo_hash(" + strings[0] + ", " + strings[1] + ")";         } else {             return "传入的参数不对...";         }     } } 

2,打包成jar 包

本文中打包成hiveudf-1.0.0.jar

3,同样的上传到hdfs 路径中

[root@master /opt]# hadoop fs -mkdir -p /user/hive/udf 18/06/07 09:41:09 WARN util.NativeCodeLoader: Unable  to load native-hadoop library for your platform... using builtin-java classes where applicable [root@master /opt]# hadoop fs -put hiveudf-1.0.0.jar  /user/hive/udf 18/06/07 09:41:24 WARN util.NativeCodeLoader: Unable to  load native-hadoop library for your platform... using builtin-java classes where applicable [root@master /opt]# hadoop fs -ls /user/hive/udf  18/06/07 09:41:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library  for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r--   3 root supergroup       8020 2018-06-07 09:41 /user/hive/udf/hiveudf-1.0.0.jar [root@master /opt]# 

4, 创建自定义函数。

hive> list jars; /tmp/3794df3a-687a-45dd-93d3-d6a712c43e85_resources/hiveudf-1.0.0.jar hive> delete jar /tmp/3794df3a-687a-45dd-93d3-d6a712c43e85_resources/hiveudf-1.0.0.jar     > ; Deleted [/tmp/3794df3a-687a-45dd-93d3-d6a712c43e85_resources/hiveudf-1.0.0.jar] from class path hive> add jar hdfs:///user/hive/udf/hiveudf-1.0.0.jar; Added [/tmp/3794df3a-687a-45dd-93d3-d6a712c43e85_resources/hiveudf-1.0.0.jar] to class path Added resources: [hdfs:///user/hive/udf/hiveudf-1.0.0.jar] hive> create temporary function geohash as 'com.zbra.udf.GeoUdf'; OK Time taken: 0.145 seconds 

5, 使用如下:

hive> select geohash(12.0d, 123.0d); OK wdpkqbtc Time taken: 0.8 seconds, Fetched: 1 row(s) hive> select geohash(cast('12' as Double), cast('123' as Double)); OK wdpkqbtc Time taken: 0.733 seconds, Fetched: 1 row(s) hive>  

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » hive 学习系列四(用户自定义函数)

分享到:更多 ()