如何用SQL查看某个字符串中某个字符出现的次数


=Start=

缘由:

简单记录一下,方便后面有需要的时候参考。

正文:

参考解答:
思路整理:
* 先将「非特定字符」替换成「空」,然后再计算字符串的剩余长度:
* 对于单个字符来说,原始字符串长度和替换后的字符串长度的差值就是单个字符出现的次数;
* 对于子串来说,原理一样只是最后用差值除以子串的长度得到子串出现的次数,还有就是对于 Hive SQL 来说,计算字符串长度的函数是 length 而不是 len

# Counting the occurrences of a character
SELECT
LEN(summary) - LEN(REPLACE(summary, 'x', '')) AS occurrences
FROM article

# Counting the occurrences of a substring of any length
If we don’t know the length of the substring, the expression to use is a bit more complex:

SELECT
(LEN(summary) - LEN(REPLACE(summary, 'France',''))) / LEN('France') AS occurrences
FROM article

# Case-sensitivity
Single character, case insensitive:

SELECT
LEN(summary) - LEN(REPLACE(LOWER(summary), LOWER('x'), '')) AS occurrences
FROM article

Any number of characters, case insensitive:

SELECT
(LEN(summary) - LEN(REPLACE(LOWER(summary), LOWER('France','')))) / LEN('France') AS occurrences
FROM article

# Performance
In the examples above, the expressions should be relatively fast (though, of course, it can be a problem when applied to a huge number of rows).

Nothing prevents us to use those expressions in a WHERE clause or in an ORDER BY clause, but this may heavily affect performance. The reason is that, if there is an index on the summary column, it cannot be used because the query optimizer cannot make assumptions on the results of the expressions.

If it is really necessary, this problem can be solved by building a computer column and an index on it, but this is out of the scope of this article.
Hive SQL中的字符串处理函数说明

replace(string A, string OLD, string NEW)

Returns the string A with all non-overlapping occurrences of OLD replaced with NEW (as of Hive 1.3.0 and 2.1.0).
Example:
select replace("ababab", "abab", "Z");
returns "Zab".

==

length(string A)

Returns the length of the string.
参考链接:

SQL Server: How to count occurrences of a substring in a string
https://sql-bits.com/sql-server-how-to-count-occurrences-of-a-substring-in-a-string/

Hive – Count number of occurences of character
https://stackoverflow.com/questions/36736022/hive-count-number-of-occurences-of-character

Hive LanguageManual UDF
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions

=END=

,

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注