如何用PHP检测字符串是否为UTF-8编码


很常见的一个需求,但说实话,想要完全检测出来也不现实,只能求得个大概了,下面是从网上找的一些方法以及对应的参考链接:

方法一:
<?php
define('_is_utf8_split',5000);

function is_utf8($string) { // v1.01
	if (strlen($string) > _is_utf8_split) {
		// Based on: http://mobile-website.mobi/php-utf8-vs-iso-8859-1-59
		for ($i=0,$s=_is_utf8_split,$j=ceil(strlen($string)/_is_utf8_split);$i < $j;$i++,$s+=_is_utf8_split) {
			if (is_utf8(substr($string,$s,_is_utf8_split)))
				return true;
		}
		return false;
	} else {
		// From http://w3.org/International/questions/qa-forms-utf-8.html
		return preg_match('%^(?:
			[x09x0Ax0Dx20-x7E]			# ASCII
			| [xC2-xDF][x80-xBF]		# non-overlong 2-byte
			|  xE0[xA0-xBF][x80-xBF]		# excluding overlongs
			| [xE1-xECxEExEF][x80-xBF]{2}	# straight 3-byte
			|  xED[x80-x9F][x80-xBF]		# excluding surrogates
			|  xF0[x90-xBF][x80-xBF]{2}	# planes 1-3
			| [xF1-xF3][x80-xBF]{3}		# planes 4-15
			|  xF4[x80-x8F][x80-xBF]{2}	# plane 16
			)*$%xs', $string);
	}
}

var_dump(is_utf8('haha'));
方法二:
function detectUTF8($string) {
    return preg_match('%(?:
        [xC2-xDF][x80-xBF]             # non-overlong 2-byte
        |xE0[xA0-xBF][x80-xBF]        # excluding overlongs
        |[xE1-xECxEExEF][x80-xBF]{2} # straight 3-byte
        |xED[x80-x9F][x80-xBF]        # excluding surrogates
        |xF0[x90-xBF][x80-xBF]{2}     # planes 1-3
        |[xF1-xF3][x80-xBF]{3}         # planes 4-15
        |xF4[x80-x8F][x80-xBF]{2}     # plane 16
        )+%xs', $string);
}

 

还从网上找了个判断GB2312的PHP函数,不过还没测试正确性,先记录一下吧:

function isGb2312($string) {
	for($i=0; $i<=127; $i++) {
		if( ($v >= 228) && ($v < = 233) ){
			if( ($i+2) >= (strlen($string) - 1))
				return true;
			$v1 = ord( $string[$i+1] );
			$v2 = ord( $string[$i+2] );
			if( ($v1 >= 128) && ($v1 < =191) && ($v2 >=128) && ($v2 < = 191) )
				return false;
			else
				return true;
		}
	}
	return true;
}

 

参考网址:
,

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注