如何判断字符串的编码/加密类型


=Start=

缘由:

由前段时间的一个数据变形(加密、编码)case引起的一些思考和尝试,简单记录一下,方便后面有需要的时候参考。

属于隐写术的一个初级版本,端上的这种变形千变万化,没有完全穷举的可能,这里只是做个思考,真正想要做好数据安全工作,需要强化“左移”的思想。

正文:

参考解答:

Ciphey的实际试用效果一般,可能是因为它具备的功能和我的期望不一致,我期望的是让它帮我快速判定输入的字符串/文件的编码/加密类型是什么(以辅助判断后面该如何操作),但是根据它的描述它的功能是自动解密、解码和破解哈希值

# 在macOS系统上安装 Ciphey
brew install ciphey

# 运行 Ciphey 的3种方式

1. 文件输入

ciphey -f encrypted.txt

2. 不符合要求的输入(Unqualified input)

ciphey -- "Encrypted input"

3. 常规方式

ciphey -t "Encrypted input"

通过使用Python读取文件内容然后判断读取的字符串中是否包含可打印字符的方式也不太可行(编码后存储的时候有人习惯用字符串的方式有人习惯用bytes的方式,字符串的判断不判断意义不大——因为基本进行过base64等编码处理,bytes的方式你也判断不出来)。

我想了想,一个基本可以跑通的流程是——先借助 file/wc 等命令判断文件类型,然后再对文件的 base64 行数量等指标进行统计,给出一些关于这个文件的标签、预测信息即可,不需要进行自动解密、解码处理,先记录后处理(因为你当时处理不一定能处理的过来,准确性和性能损耗可能都不行)

import subprocess
'''
file命令返回的文件类型信息中包含以下的关键词时需要关注:
with very long lines
Multitracker Version
data
'''
def get_file_info(filepath, info_type='file'):
    # 构建命令
    if info_type.strip().lower() == 'wc':
        cmd = 'wc {}'.format(filepath)
    else:
        cmd = 'file -b {}'.format(filepath)

    # 执行命令并返回结果
    try:
        p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
    except (OSError, ValueError) as e:
        print("{0} failed, reason {1}".format(cmd, str(e)))
        return -1, str(e)
    stdout_data, stderr_data = p.communicate()
    if p.returncode != 0:
        print("{0} failed, status code {1} stdout {2} stderr {3}".format(cmd, p.returncode, stdout_data, stderr_data))
        return p.returncode, stderr_data
    return p.returncode, stdout_data.strip()

print(get_file_info("1.txt"))
print(get_file_info("2.txt"))
print(get_file_info("3.txt"))
print(get_file_info("1.txt", "wc"))
print(get_file_info("2.txt", "wc"))
print(get_file_info("3.txt", "wc"))
#!/usr/bin/env python3
# coding=utf-8

import base64
import sys

'''
a    YQ==
ab    YWI=
abc    YWJj

hello    aGVsbG8=
hello7    aGVsbG83
'''

base64_char_set = {
    'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'
    ,'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'
    ,'0','1','2','3','4','5','6','7','8','9'
    ,'+','/','='
}

# true->1, false->0
def is_str_base64_encode(astr):

    # base64编码后的数据长度肯定是 4 的倍数
    str_len = len(astr)
    if 0 == str_len or str_len%4 != 0:
        return 0

    # base64编码后的数据中,等号(=)只会出现在字符串最后,可能没有或者一个等号或者两个等号
    if astr.count('=') > 2:
        return 0
    elif astr.count('=') != astr[-2:].count('='):
        return 0

    # base64编码后的字符串只可能包含(A-Z,a-z,0-9,+,/,=)字符
    # 但这个判断涉及到正则处理比较消耗资源就不进行了,最主要是,即便满足这种情况也不一定是base64编码
    for x in astr:
        if x in base64_char_set:
            continue
        else:
            return 0
    #for

    # 直接尝试解码,能解码成功则说明OK,这个最准确
    try:
        base64.b64decode(astr)
    except Exception as e:
        print(e)
        return 0

    return 1

def main():
    line_count = 0
    b64_count = 0
    with open(sys.argv[1], 'rb') as fp:
        for line in fp:
            line = line.decode(errors='ignore').strip() # https://stackoverflow.com/a/50359833
            if line:
                line_count += 1
                b64_count += is_str_base64_encode(line)
            #if
        #for
        print('{0}:b64_line_rate = {3}\nline_count = {1}\nb64_count = {2}\n'.format(sys.argv[1], line_count, b64_count, b64_count/line_count))
    #with

if __name__ == '__main__':
    main()
参考链接:

Ciphey的实际试用效果一般,可能是因为它的功能和我的期望不一致,我期望的是让它帮我判定输入的字符串/文件的编码/加密类型,但是它的描述是自动解密、解码和破解哈希值
Hacker Tools: Ciphey – Automatic decryption, decoding & cracking (在不知道密钥或密码的情况下自动解密、解码和破解哈希值)
https://blog.intigriti.com/2021/08/11/hacker-tools-ciphey/
https://github.com/Ciphey/Ciphey

Ciphey currently supports 51 encryptions, encodings, compression methods, and hashes.
https://github.com/Ciphey/Ciphey/wiki/Supported-Ciphers

The Cyber Swiss Army Knife – a web app for encryption, encoding, compression and data analysis
https://github.com/gchq/CyberChef
https://gchq.github.io/CyberChef/

Cipher Identifier – Tool to identify/recognize the type of encryption/encoding applied to a message (more 200 ciphers/codes are detectable). Cipher identifier to quickly decrypt/decode any text.
https://www.dcode.fr/cipher-identifier

Enjoy Encoding & Decoding!
https://dencode.com/

Encryption vs. Hashing vs. Salting – What’s the Difference?
https://www.pingidentity.com/en/resources/blog/post/encryption-vs-hashing-vs-salting.html

Encryption vs Encoding vs Hashing
https://www.geeksforgeeks.org/encryption-encoding-hashing/

Cryptography with Python – Quick Guide
https://www.tutorialspoint.com/cryptography_with_python/cryptography_with_python_quick_guide.htm

How to Encrypt and Decrypt Files in Python
https://thepythoncode.com/article/encrypt-decrypt-files-symmetric-python

How to determine what type of encoding/encryption has been used?
https://security.stackexchange.com/questions/3989/how-to-determine-what-type-of-encoding-encryption-has-been-used

Test if a python string is printable
https://stackoverflow.com/questions/3636928/test-if-a-python-string-is-printable/50731077#50731077

TypeError: a bytes-like object is required, not ‘str’
https://bobbyhadz.com/blog/python-typeerror-bytes-like-object-is-required-not-str

=END=


《 “如何判断字符串的编码/加密类型” 》 有 2 条评论

  1. How to determine what type of encoding/encryption has been used?
    如何确定使用了哪种类型的编码/加密?
    https://security.stackexchange.com/questions/3989/how-to-determine-what-type-of-encoding-encryption-has-been-used
    `
    总结:逆向工程、基于经验做测试和猜测

    ==

    问题:
    Is there a way to find what type of encryption/encoding is being used? For example, I am testing a web application which stores the password in the database in an encrypted format (WeJcFMQ/8+8QJ/w0hHh+0g==). How do I determine what hashing or encryption is being used?
    是否有办法找到正在使用的加密/编码类型?例如,我正在测试一个网络应用程序,该程序在数据库中以加密格式( WeJcFMQ/8+8QJ/w0hHh+0g== )存储密码。如何确定使用的是散列还是加密?

    回答一:
    Your example string (WeJcFMQ/8+8QJ/w0hHh+0g==) is Base64 encoding for a sequence of 16 bytes, which do not look like meaningful ASCII or UTF-8. If this is a value stored for password verification (i.e. not really an “encrypted” password, rather a “hashed” password) then this is probably the result of a hash function computed over the password; the one classical hash function with a 128-bit output is MD5. But it could be about anything.
    你的示例字符串 ( WeJcFMQ/8+8QJ/w0hHh+0g== ) 是一个 16 字节序列的 Base64 编码,看起来不像有意义的 ASCII 或 UTF-8。如果这是为密码验证而存储的值(即不是真正的 “加密”密码,而是 “散列”密码),那么这很可能是对密码进行散列计算的结果;**具有 128 位输出的经典散列函数是 MD5。但也有可能是任何东西**。

    The “normal” way to know that is to look at the application code. Application code is incarnated in a tangible, fat way (executable files on a server, source code somewhere…) which is not, and cannot be, as much protected as a secret key can. So reverse engineering is the “way to go”.
    了解这一点的 “正常”方法是查看应用程序代码。应用程序代码是以有形的、胖乎乎的方式(服务器上的可执行文件、某处的源代码……)体现出来的,它没有也不可能像秘钥那样受到保护。因此,逆向工程是 “必由之路”。

    Barring reverse engineering, you can make a few experiments to try to make educated guesses:
    除逆向工程外,您可以做一些实验,尝试做出有根据的猜测:

    * If the same user “changes” his password but reuses the same, does the stored value changes ? If yes, then part of the value is probably a randomized “salt” or IV (assuming symmetric encryption).
    如果同一个用户 “更改”了密码,但又重复使用了相同的密码,那么存储的值会发生变化吗?如果是,那么部分值可能是随机 “盐”或 IV(假设是对称加密)。

    * Assuming that the value is deterministic from the password for a given user, if two users choose the same password, does it result in the same stored value ? If no, then the user name is probably part of the computation. You may want to try to compute MD5(“username:password”) or other similar variants, to see if you get a match.
    假设给定用户的密码值是确定的,那么如果两个用户选择了相同的密码,会产生相同的存储值吗?如果不是,那么用户名可能是计算的一部分。您可以尝试计算 MD5(“username:password”) 或其他类似变量,看看是否匹配。

    * Is the password length limited ? Namely, if you set a 40-character password and cannot successfully authenticate by typing only the first 39 characters, then this means that all characters are important, and this implies that this really is password hashing, not encryption (the stored value is used to verify a password, but the password cannot be recovered from the stored value alone).
    密码长度有限制吗?也就是说,如果您设置了一个 40 个字符的密码,但只输入前 39 个字符就无法成功验证,那么这就意味着所有字符都很重要,这就意味着这确实是密码哈希算法,而不是加密(存储值用于验证密码,但仅凭存储值无法恢复密码)。

    ==
    Thanks for the inputs.. Pls tell me more about how you confirmed its a Base64 encoding for a sequence of 16 bytes. Regarding your experiments, Yes, this is a value stored for password verification. 1) if a user changes password, then the stored value changes too.. 2) if two users choose same password, the stored value is the same 3) password length is not limited.
    感谢您的意见。请告诉我更多你是如何确认 16 字节序列的 Base64 编码的。关于您的实验,是的,这是一个用于密码验证的存储值。1) 如果用户更改了密码,那么存储的值也会改变。2) 如果两个用户选择相同的密码,存储的值也是相同的 3) 密码长度不受限制。

    @Learner: any sequence of 24 characters, such that the first 22 are letters, digits, ‘+’ or ‘/’, and the last two are ‘=’ signs, is a valid Base64 encoding of a 128-bit value. And any 128-bit value, when encoded with Base64, yields such a sequence.
    @Learner: 任何由 24 个字符组成的序列,如果前 22 个字符是字母、数字、’+’或’/’,最后两个是’=’符号,那么这个序列就是 128 位值的有效 Base64 编码。而任何 128 位数值在使用 Base64 编码时,都会产生这样的序列。

    ==

    回答二:

    Generally speaking, using experience to make educated guesses is how these things are done.
    一般来说,利用经验进行有根据的猜测是做这些事情的方法。
    `

  2. How can I detect if hashes are salted? [duplicate]
    如何检测哈希值是否加盐?
    https://security.stackexchange.com/questions/105438/how-can-i-detect-if-hashes-are-salted
    `
    总结:
    因此,如果事先不知道所使用的哈希值,也没有可用的源代码/二进制文件来进行逆向工程,基本上只能靠猜测。为了增加一点压力,请谨慎选择,因为选择错误的哈希值会导致大量时间的浪费!

    ==
    question:
    Is it possible to detect hash function of a hash if I don’t have access to PHP code? I know that if a hash is some kind of MD5, but I don’t know if there is salt etc.
    如果我无法访问 PHP 代码,有可能检测哈希值的哈希函数吗?我知道如果哈希值是某种 MD5,但不知道是否有盐等。

    answer(s):

    Some tools make a educated guess regarding the encryption and salt type but there are numerous types of encryption schemes, some so closely related that the hashes nearly looks the same.
    有些工具会对加密和盐的类型进行有根据的猜测,但加密方案的类型繁多,有些甚至密切相关,哈希值看起来几乎一样。

    Searched around and found some interesting tools to find the encryption type and they can be broken down into two categories namely with source / binary available and without any source binary.
    通过搜索,我找到了一些有趣的工具来查找加密类型,这些工具可分为两类,即有源代码/二进制文件和无源代码二进制文件。

    Finding the encryption type through reverse engineering can be achieved via tools such as:
    通过逆向工程查找加密类型可通过以下工具实现:

    http://www.autistici.org/ratsoul/iss.html – A plugin for immunity debugger that identifies common encryption or encoding functions / structures etc.
    http://aluigi.altervista.org/mytoolz.htm#signsrch – is the binary version of the immunity plugin version
    http://www.hexblog.com/?p=27 – a plugin for OllyDbg to determine the type of encryption
    https://www.hex-rays.com/products/ida/tech/flirt/index.shtml – a plugin for IDA Pro to determine standard called libraries, could be used to identify encryption libraries

    Then there is the “educated” guess script:
    然后是 “有根据的 “猜测脚本:

    http://code.google.com/p/hash-identifier/ is a script that compares various attributes such as length, contained char types etc to produce a possible hash type used. Seems to be included in Backtrack5 standard.

    And the websites that allow for manual verification such as:
    以及允许人工验证的网站,如

    http://www.insidepro.com/hashes.php – Allows you to enter a password and compare the hash to your example hash
    http://forum.insidepro.com/viewtopic.php?t=8225 – Lists various encrypted hashes to allow for a manual comparison

    So basically it seems that without prior knowledge of the hash used, with no source / binary available to reverse engineer, you are basically left with serious guess work. And to add a little pressure, choose carefully since choosing the wrong hash can lead to a LOT of wasted time!
    因此,如果事先不知道所使用的哈希值,也没有可用的源代码/二进制文件来进行逆向工程,基本上只能靠猜测。为了增加一点压力,请谨慎选择,因为选择错误的哈希值会导致大量时间的浪费!
    `

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注