由Web服务器指定文件的编码


=Start=

缘由:

有时为了方便自己,会在VPS的博客根目录下放置一些文本文件(TXT或HTML),比如:http://ixyzero.com/blog/awk_sed.txt 和 http://ixyzero.com/blog/regex2.html ,但是在用浏览器访问这些页面的时候,会出现乱码,特别是TXT文件,对于HTML文件还可以通过设置<head>标签中的meta信息来指定浏览器打开时的编码方式:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
或
<meta charset="UTF-8">

但对于TXT文件来说我就不知道该怎么办了,文件内容本身就已经是UTF-8编码了(看到有一篇文章说对于UTF-8 BOM格式的文本文件,部分浏览器会将其识别为UTF-8编码格式,但是感觉这样治标不治本,没有很好的解决问题),在经过一番搜索之后解决了问题,在此记录一下,方便以后参考。

参考解答:

在Nginx配置文件中添加/修改下面的指令到 http 或 server 或 location 段中,即可为对应范围下的请求自动在HTTP头部中添加 charset 字段:

charset UTF-8;

修改Nginx的配置文件之后记得测试一下写的是否正确,然后重新加载一下即可:

$ curl -I http://ixyzero.com/blog/awk_sed.txt
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 21 Sep 2016 11:36:05 GMT
Content-Type: text/plain
Content-Length: 21406
Last-Modified: Tue, 20 Sep 2016 13:16:12 GMT
Connection: keep-alive
Vary: Accept-Encoding
ETag: "57e1369c-53e5"
Accept-Ranges: bytes

# nginx -t
# nginx -s reload

$ curl -I http://ixyzero.com/blog/awk_sed.txt
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 21 Sep 2016 11:44:04 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 21406
Last-Modified: Wed, 21 Sep 2016 11:40:16 GMT
Connection: keep-alive
Vary: Accept-Encoding
ETag: "57e271a0-539e"
Accept-Ranges: bytes

在Apache配置文件中对应的指令是:

AddDefaultCharset utf-8
参考链接:

=END=

,

《 “由Web服务器指定文件的编码” 》 有 7 条评论

  1. Common MIME types
    https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types

    MIME types (IANA media types)
    https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types

    Media Types
    https://www.iana.org/assignments/media-types/media-types.xhtml

    What is the meaning of “vnd” in MIME types?
    https://stackoverflow.com/questions/5351093/what-is-the-meaning-of-vnd-in-mime-types
    `
    vnd indicates vendor-specific MIME types, which means they are MIME types that were introduced by corporate bodies rather than e.g. an Internet consortium.
    vnd表示特定供应商的MIME类型,这意味着它们是由法人团体引入的MIME类型,而不是例如Internet联盟。

    .docx application/vnd.openxmlformats-officedocument.wordprocessingml.document
    .ico image/vnd.microsoft.icon
    .mpkg application/vnd.apple.installer+xml #Apple Installer Package
    `

    Do I need Content-Type: application/octet-stream for file download?
    https://stackoverflow.com/questions/20508788/do-i-need-content-type-application-octet-stream-for-file-download
    `
    No.

    在 RFC 2046 中,application/octet-stream 被定义为 “任意二进制数据”,它与那些唯一目的是保存到磁盘上的实体有一定的重叠,因此不属于任何 “网络 “内容。或者从另一个角度看,应用/八进制流的唯一安全做法就是将其保存到文件中,并希望别人知道它的用途。
    `

    Which JSON content type do I use?
    https://stackoverflow.com/questions/477816/which-json-content-type-do-i-use
    `
    For JSON text:
    application/json

    For JSONP (runnable JavaScript) with callback:
    application/javascript
    `

  2. Content-Type
    “`
    种类太多,前期先用白名单机制(只关注特定的Content-Type),后面可以采用黑明单机制(只关注新的Content-Type)。

    # json字符串
    application/json;charset=utf-8
    application/json

    # 任意二进制数据
    application/octet-stream

    # 由制表符分隔的 TSV 文件格式与 CSV 文件格式非常相似,但 CSV 文件中存储的数据字段是用逗号而不是制表符空格分隔的。两者都是一种分隔符分隔值格式。
    text/tab-separated-values;charset=utf-8

    # js/css
    text/javascript;charset=utf-8
    text/css;charset=utf-8

    # 图片
    image/gif
    image/jpeg
    image/png
    image/webp

    # 视频
    video/mp4
    video/mp2t

    # 字体
    font/woff2
    font/woff
    font/ttf
    application/font-woff
    application/font-woff2

    # 文本
    text/plain
    text/json #非IANA官方推荐的json字符串类型,但浏览器也能识别和支持
    text/html
    text/xml
    ……
    “`

  3. Which JSON content type do I use?
    https://stackoverflow.com/questions/477816/which-json-content-type-do-i-use
    `
    For JSON text:
    application/json

    For JSONP (runnable JavaScript) with callback:
    application/javascript
    ==
    IANA has registered the official MIME Type for JSON as application/json.

    When asked about why not text/json, Crockford seems to have said JSON is not really JavaScript nor text and also IANA was more likely to hand out application/* than text/*.

    A lot of stuff got put into the text/* section in the early days that would probably be put into the application/* section these days.

    The most widely supported non-standard media types are text/json or text/javascript. But some big names even use text/plain.
    ==

    IANA 已将 JSON 的官方 MIME 类型注册为 application/json。

    当被问及为何不注册 text/json 时,Crockford 似乎说 JSON 并非真正的 JavaScript,也不是文本,而且 IANA 更倾向于注册 application/* 而非 text/*。

    早期有很多东西被放到 text/* 部分,而现在这些东西可能会被放到 application/* 部分。

    最广泛支持的非标准媒体类型是 text/json 或 text/javascript。但有些大公司甚至使用 text/plain。
    `

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注