Hive array_except GenericUDF编写


=Start=

缘由:

简单整理一下如何快速将 Apache Hive 在 GitHub 上新加的 UDF/GenericUDF 代码功能添加到公司内部平台的办法,方便后面有需要的时候参考。

正文:

参考解答:

用 collect_set 函数生成今天所用的去重设备数组 array1 ;
用 collect_set 函数统计过去 30 天所用的去重设备数组 array2 ;
统计在 array1 而不在 array2 中的元素,即可以发现今天新增的设备数量。

size(array_except(array1, array2)) > 0

size(nvl(array_except(array1, array2), array())) > 0
# Hive 官方并没有提供 array_except 这个函数,需要自己写UDF来实现。但是在 GitHub 仓库里有一个 GenericUDFArrayExcept 的UDF样例可以参考一下。

Hive ARRAY_EXCEPT 是 Hive 中的一个内置函数,它可以在两个数组之间执行差集操作。它返回一个新的数组,其中包含了第一个数组中不存在于第二个数组中的元素。

语法如下:
ARRAY array_except(ARRAY a1, ARRAY a2)

其中,a1 和 a2 是需要进行差集操作的两个数组,而 T 是数组元素的数据类型。

示例:
select array_except(array(1, 2, 3), array(2, 3, 4)) as result;

结果:
[1]
一开始其实没有准备完全**直接**用Hive仓库的那段代码的,只是想借助那段代码再熟悉和参考一下 Hive GenericUDF 的写法,因为用Java实现array之间的差集操作的逻辑本身并不复杂,只是 GenericUDF 编写规范里的一些 ObjectInspector 变量的处理和特定函数的使用不太熟悉,想趁着这个机会熟悉一下,但是发现 Hive 的这些代码里有几层封装(`GenericUDFArrayExcept`继承自`AbstractGenericUDFArrayBase`而非直接继承`GenericUDF`,另外代码中也用到了`GenericUDFUtils`的一些封装),从头熟悉和理解要花些时间(刚好又有点事情要处理),算了,先把整体流程跑通再说。后面如果功能实现有紧急需求的话也可以使用这种快速的方法。

GenericUDFUtils.java
AbstractGenericUDFArrayBase.java
GenericUDFArrayExcept.java

最简单的办法就是下载上面 3 个文件,然后直接拷贝到你对应的目录下面。然后对应改一下Java文件里面的 package 信息就好,其它的不用动。然后按公司规范生成jar包上传至相关平台,之后正常使用即可。

hive> add jar /path/to/ixyzero-udf.jar;
hive> create temporary function array_except as 'com.ixyzero.hive.udf.GenericUDFArrayExcept';
hive> SELECT array_except(array(1,2,3,4), array(2,3));
[1,4]

翻一翻仓库里面还有哪些其它功能的代码,后面有需要的时候可以直接拿来用。

$ grep "Description(name = " -A5 hive/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/*.java

$ ls -l hive/ql/src/java/org/apache/hadoop/hive/ql/udf/generic

generic/AbstractGenericUDAFResolver.java
generic/AbstractGenericUDFArrayBase.java
generic/AbstractGenericUDFReflect.java
generic/BaseMaskUDF.java
generic/Collector.java
generic/GenericUDAFApproximateDistinct.java
generic/GenericUDAFAverage.java
generic/GenericUDAFBinarySetFunctions.java
generic/GenericUDAFBloomFilter.java
generic/GenericUDAFBridge.java
generic/GenericUDAFCollectList.java
generic/GenericUDAFCollectSet.java
generic/GenericUDAFComputeBitVectorBase.java
generic/GenericUDAFComputeBitVectorFMSketch.java
generic/GenericUDAFComputeBitVectorHLL.java
generic/GenericUDAFComputeStats.java
generic/GenericUDAFContextNGrams.java
generic/GenericUDAFCorrelation.java
generic/GenericUDAFCount.java
generic/GenericUDAFCovariance.java
generic/GenericUDAFCovarianceSample.java
generic/GenericUDAFCumeDist.java
generic/GenericUDAFDenseRank.java
generic/GenericUDAFEvaluator.java
generic/GenericUDAFExceptionInVertex.java
generic/GenericUDAFFirstValue.java
generic/GenericUDAFHistogramNumeric.java
generic/GenericUDAFLag.java
generic/GenericUDAFLastValue.java
generic/GenericUDAFLead.java
generic/GenericUDAFLeadLag.java
generic/GenericUDAFMax.java
generic/GenericUDAFMin.java
generic/GenericUDAFMkCollectionEvaluator.java
generic/GenericUDAFNTile.java
generic/GenericUDAFParameterInfo.java
generic/GenericUDAFPercentRank.java
generic/GenericUDAFPercentileApprox.java
generic/GenericUDAFPercentileCont.java
generic/GenericUDAFPercentileDisc.java
generic/GenericUDAFRank.java
generic/GenericUDAFResolver.java
generic/GenericUDAFResolver2.java
generic/GenericUDAFRowNumber.java
generic/GenericUDAFStd.java
generic/GenericUDAFStdSample.java
generic/GenericUDAFStreamingEvaluator.java
generic/GenericUDAFSum.java
generic/GenericUDAFSumEmptyIsZero.java
generic/GenericUDAFVariance.java
generic/GenericUDAFVarianceSample.java
generic/GenericUDAFnGrams.java
generic/GenericUDF.java
generic/GenericUDFAbs.java
generic/GenericUDFAddMonths.java
generic/GenericUDFAesBase.java
generic/GenericUDFAesDecrypt.java
generic/GenericUDFAesEncrypt.java
generic/GenericUDFArray.java
generic/GenericUDFArrayContains.java
generic/GenericUDFArrayDistinct.java
generic/GenericUDFArrayExcept.java
generic/GenericUDFArrayIntersect.java
generic/GenericUDFArrayJoin.java
generic/GenericUDFArrayMax.java
generic/GenericUDFArrayMin.java
generic/GenericUDFArrayRemove.java
generic/GenericUDFArraySlice.java
generic/GenericUDFArrayUnion.java
generic/GenericUDFAssertTrue.java
generic/GenericUDFAssertTrueOOM.java
generic/GenericUDFBRound.java
generic/GenericUDFBaseArithmetic.java
generic/GenericUDFBaseBinary.java
generic/GenericUDFBaseCompare.java
generic/GenericUDFBaseDTI.java
generic/GenericUDFBaseNumeric.java
generic/GenericUDFBaseNwayCompare.java
generic/GenericUDFBasePad.java
generic/GenericUDFBaseTrim.java
generic/GenericUDFBaseUnary.java
generic/GenericUDFBetween.java
generic/GenericUDFBridge.java
generic/GenericUDFBucketNumber.java
generic/GenericUDFCardinalityViolation.java
generic/GenericUDFCase.java
generic/GenericUDFCastFormat.java
generic/GenericUDFCbrt.java
generic/GenericUDFCeil.java
generic/GenericUDFCharacterLength.java
generic/GenericUDFCoalesce.java
generic/GenericUDFConcat.java
generic/GenericUDFConcatWS.java
generic/GenericUDFCurrentAuthorizer.java
generic/GenericUDFCurrentCatalog.java
generic/GenericUDFCurrentDatabase.java
generic/GenericUDFCurrentDate.java
generic/GenericUDFCurrentGroups.java
generic/GenericUDFCurrentSchema.java
generic/GenericUDFCurrentTimestamp.java
generic/GenericUDFCurrentUser.java
generic/GenericUDFDate.java
generic/GenericUDFDateAdd.java
generic/GenericUDFDateDiff.java
generic/GenericUDFDateFormat.java
generic/GenericUDFDateSub.java
generic/GenericUDFDatetimeLegacyHybridCalendar.java
generic/GenericUDFDecode.java
generic/GenericUDFDeserialize.java
generic/GenericUDFElt.java
generic/GenericUDFEncode.java
generic/GenericUDFEnforceConstraint.java
generic/GenericUDFEpochMilli.java
generic/GenericUDFExceptionInVertex.java
generic/GenericUDFExtractUnion.java
generic/GenericUDFFactorial.java
generic/GenericUDFField.java
generic/GenericUDFFloor.java
generic/GenericUDFFloorCeilBase.java
generic/GenericUDFFormatNumber.java
generic/GenericUDFFromUnixTime.java
generic/GenericUDFFromUtcTimestamp.java
generic/GenericUDFGreatest.java
generic/GenericUDFGrouping.java
generic/GenericUDFHash.java
generic/GenericUDFIf.java
generic/GenericUDFIn.java
generic/GenericUDFInBloomFilter.java
generic/GenericUDFInFile.java
generic/GenericUDFIndex.java
generic/GenericUDFInitCap.java
generic/GenericUDFInstr.java
generic/GenericUDFInternalInterval.java
generic/GenericUDFJsonRead.java
generic/GenericUDFLTrim.java
generic/GenericUDFLag.java
generic/GenericUDFLastDay.java
generic/GenericUDFLead.java
generic/GenericUDFLeadLag.java
generic/GenericUDFLeast.java
generic/GenericUDFLength.java
generic/GenericUDFLevenshtein.java
generic/GenericUDFLikeAll.java
generic/GenericUDFLikeAny.java
generic/GenericUDFLocate.java
generic/GenericUDFLoggedInUser.java
generic/GenericUDFLower.java
generic/GenericUDFLpad.java
generic/GenericUDFMacro.java
generic/GenericUDFMap.java
generic/GenericUDFMapKeys.java
generic/GenericUDFMapValues.java
generic/GenericUDFMask.java
generic/GenericUDFMaskFirstN.java
generic/GenericUDFMaskHash.java
generic/GenericUDFMaskLastN.java
generic/GenericUDFMaskShowFirstN.java
generic/GenericUDFMaskShowLastN.java
generic/GenericUDFMonthsBetween.java
generic/GenericUDFMurmurHash.java
generic/GenericUDFNDVComputeBitVector.java
generic/GenericUDFNamedStruct.java
generic/GenericUDFNextDay.java
generic/GenericUDFNullif.java
generic/GenericUDFOPAnd.java
generic/GenericUDFOPDTIMinus.java
generic/GenericUDFOPDTIPlus.java
generic/GenericUDFOPDivide.java
generic/GenericUDFOPEqual.java
generic/GenericUDFOPEqualNS.java
generic/GenericUDFOPEqualOrGreaterThan.java
generic/GenericUDFOPEqualOrLessThan.java
generic/GenericUDFOPFalse.java
generic/GenericUDFOPGreaterThan.java
generic/GenericUDFOPLessThan.java
generic/GenericUDFOPMinus.java
generic/GenericUDFOPMod.java
generic/GenericUDFOPMultiply.java
generic/GenericUDFOPNegative.java
generic/GenericUDFOPNot.java
generic/GenericUDFOPNotEqual.java
generic/GenericUDFOPNotEqualNS.java
generic/GenericUDFOPNotFalse.java
generic/GenericUDFOPNotNull.java
generic/GenericUDFOPNotTrue.java
generic/GenericUDFOPNull.java
generic/GenericUDFOPNumericMinus.java
generic/GenericUDFOPNumericPlus.java
generic/GenericUDFOPOr.java
generic/GenericUDFOPPlus.java
generic/GenericUDFOPPositive.java
generic/GenericUDFOPScaleUpDecimal64.java
generic/GenericUDFOPTrue.java
generic/GenericUDFOctetLength.java
generic/GenericUDFParamUtils.java
generic/GenericUDFPosMod.java
generic/GenericUDFPower.java
generic/GenericUDFPrintf.java
generic/GenericUDFQuarter.java
generic/GenericUDFQuote.java
generic/GenericUDFRTrim.java
generic/GenericUDFReflect.java
generic/GenericUDFReflect2.java
generic/GenericUDFRegExp.java
generic/GenericUDFRestrictInformationSchema.java
generic/GenericUDFRound.java
generic/GenericUDFRpad.java
generic/GenericUDFSQCountCheck.java
generic/GenericUDFSentences.java
generic/GenericUDFSha2.java
generic/GenericUDFSize.java
generic/GenericUDFSortArray.java
generic/GenericUDFSortArrayByField.java
generic/GenericUDFSoundex.java
generic/GenericUDFSplit.java
generic/GenericUDFStringToMap.java
generic/GenericUDFStringToPrivilege.java
generic/GenericUDFStruct.java
generic/GenericUDFStructField.java
generic/GenericUDFSubstringIndex.java
generic/GenericUDFSurrogateKey.java
generic/GenericUDFTimestamp.java
generic/GenericUDFToArray.java
generic/GenericUDFToBinary.java
generic/GenericUDFToChar.java
generic/GenericUDFToDate.java
generic/GenericUDFToDecimal.java
generic/GenericUDFToIntervalDayTime.java
generic/GenericUDFToIntervalYearMonth.java
generic/GenericUDFToMap.java
generic/GenericUDFToString.java
generic/GenericUDFToStruct.java
generic/GenericUDFToTimestampLocalTZ.java
generic/GenericUDFToUnixTimeStamp.java
generic/GenericUDFToUtcTimestamp.java
generic/GenericUDFToVarchar.java
generic/GenericUDFTranslate.java
generic/GenericUDFTrim.java
generic/GenericUDFTrunc.java
generic/GenericUDFTumbledWindow.java
generic/GenericUDFTypeOf.java
generic/GenericUDFUnion.java
generic/GenericUDFUnixTimeStamp.java
generic/GenericUDFUpper.java
generic/GenericUDFUtils.java
generic/GenericUDFValidateAcidSortOrder.java
generic/GenericUDFWhen.java
generic/GenericUDFWidthBucket.java
generic/GenericUDTF.java
generic/GenericUDTFExplode.java
generic/GenericUDTFGetSQLSchema.java
generic/GenericUDTFGetSplits.java
generic/GenericUDTFGetSplits2.java
generic/GenericUDTFInline.java
generic/GenericUDTFJSONTuple.java
generic/GenericUDTFParseUrlTuple.java
generic/GenericUDTFPosExplode.java
generic/GenericUDTFReplicateRows.java
generic/GenericUDTFStack.java
generic/ISupportStreamingModeForWindowing.java
generic/InstantDateTimeFormatter.java
generic/InstantFormatter.java
generic/InstantFormatterCache.java
generic/InstantSimpleDateFormatter.java
generic/LeadLagBuffer.java
generic/NDV.java
generic/NGramEstimator.java
generic/NumericHistogram.java
generic/RoundUtils.java
generic/SimpleGenericUDAFParameterInfo.java
generic/UDTFCollector.java
generic/package-info.java
参考链接:

Hive SQL中如何判断某个设备属于新设备?
https://ixyzero.com/blog/archives/5368.html

hive array_except
https://juejin.cn/s/hive%20array_except

GenericUDFUtils.java
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFUtils.java

AbstractGenericUDFArrayBase.java
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDFArrayBase.java

GenericUDFArrayExcept.java
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayExcept.java#L38

数组函数和运算符
https://www.alibabacloud.com/help/zh/sls/user-guide/array-functions-and-operators

复杂类型函数
https://help.aliyun.com/zh/maxcompute/user-guide/complex-type-functions#section-e0m-o6l-r0k

hive udf函数 array_except 实现
https://blog.csdn.net/qq_35515661/article/details/130161544

spark sql 函数 array_except(arr1,arr2)能否确保arr1中原有元素的顺序
https://blog.csdn.net/qq_35515661/article/details/130141316

Hive GenericUDF 的一个简单样例
https://ixyzero.com/blog/archives/5432.html

Find elements which are present in first array and not in second
https://www.geeksforgeeks.org/find-elements-present-first-array-not-second/

Finding if an array contains all elements in another array
https://stackoverflow.com/questions/16524709/finding-if-an-array-contains-all-elements-in-another-array

java: check if any element in array 1 is present in array 2
https://stackoverflow.com/questions/57000384/java-check-if-any-element-in-array-1-is-present-in-array-2

=END=


发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注