{"id":1555,"date":"2014-11-08T14:30:12","date_gmt":"2014-11-08T14:30:12","guid":{"rendered":"http:\/\/ixyzero.com\/blog\/?p=1555"},"modified":"2014-11-08T14:30:12","modified_gmt":"2014-11-08T14:30:12","slug":"%e7%94%a8python%e8%ae%a1%e7%ae%97%e6%96%87%e6%9c%ac%e7%9a%84%e7%9b%b8%e4%bc%bc%e5%ba%a6","status":"publish","type":"post","link":"https:\/\/ixyzero.com\/blog\/archives\/1555.html","title":{"rendered":"\u7528Python\u8ba1\u7b97\u6587\u672c\u7684\u76f8\u4f3c\u5ea6"},"content":{"rendered":"<p><strong>\u7528<\/strong><strong>Python<\/strong><strong>\u8ba1\u7b97\u6587\u672c\u7684\u76f8\u4f3c\u5ea6<\/strong><\/p>\n<p>\u56e0\u4e3a\u540e\u671f\u4f1a\u9700\u8981\u7528\u5230\u8fd9\u65b9\u9762\u7684\u77e5\u8bc6\uff0c\u6240\u4ee5\u5148\u63d0\u524d\u51c6\u5907\u51c6\u5907\uff1b\u5982\u4f55\u5224\u65ad\u7f51\u9875\u8fd4\u56de\u5185\u5bb9\u7684\u76f8\u4f3c\u5ea6\uff1f<\/p>\n<p>\u51c6\u5907\u597d\u5173\u952e\u5b57\uff0c\u7136\u540e\u5f00\u59cb\u641c\u7d22\uff1a<a href=\"http:\/\/search.aol.com\/aol\/search?q=use+python+to+calculate++text+similarity\" target=\"_blank\">http:\/\/search.aol.com\/aol\/search?q=use+python+to+calculate++text+similarity<\/a><\/p>\n<p>\u627e\u5230\u4e86\u51e0\u4e2aPython\u7684\u65b9\u6cd5\u548c\u5e93\uff1a<\/p>\n<ul>\n<li><a href=\"https:\/\/docs.python.org\/2\/library\/difflib.html\">difflib<\/a>\u5e93<\/li>\n<li>Google\u7684<a href=\"http:\/\/code.google.com\/p\/google-diff-match-patch\/\">diff-match-patch<\/a>\u5e93<\/li>\n<li><a href=\"http:\/\/en.wikipedia.org\/wiki\/Levenshtein_distance\">Levenshtein<\/a>\u6269\u5c55<\/li>\n<li>\u8fd8\u6709\u9ad8\u5927\u4e0a\u7684\u201c<a href=\"http:\/\/zh.wikipedia.org\/wiki\/TF-IDF\">TF-IDF\u65b9\u6cd5<\/a>\u201d{\u4e4b\u524d\u5728\u300a\u6570\u5b66\u4e4b\u7f8e\u300b\u4e2d\u770b\u5230\u8fc7\uff0c\u4f46\u8fd9\u91cc\u6211\u5c31\u4e0d\u8003\u8651\u4e86}<\/li>\n<\/ul>\n<p>\u4e0b\u9762\u4e3b\u8981\u8bb0\u5f55\u7528\u4e0d\u540c\u7684Python\u5e93\u6765\u8ba1\u7b97\u4e24\u6bb5\u6587\u672c\u4e4b\u95f4\u7684\u76f8\u4f3c\u5ea6\uff08\u6700\u540e\u8981\u5f97\u5230\u7684\u5c31\u662f\u4e00\u4e2a\u767e\u5206\u6bd4\uff09\uff1a<\/p>\n<p><strong>\u65b9\u6cd5\u4e00\uff1a<\/strong><strong>difflib<\/strong><\/p>\n<pre class=\"lang:default decode:true\">&gt;&gt;&gt; import difflib\n\n&gt;&gt;&gt; difflib.SequenceMatcher(None, 'abcde', 'abcde').ratio()\n1.0\n\n&gt;&gt;&gt; difflib.SequenceMatcher(None, 'abcde', 'zbcde').ratio()\n0.80000000000000004\n\n&gt;&gt;&gt; difflib.SequenceMatcher(None, 'abcde', 'zyzzy').ratio()\n0.0<\/pre>\n<p><strong>\u65b9\u6cd5\u4e8c\uff1a<\/strong><strong>Levenshtein<\/strong><\/p>\n<p>import Levenshtein \u62a5\u9519\uff1aImportError: No module named Levenshtein<\/p>\n<p>\u4e8e\u662f\u53bb\uff1a<a href=\"https:\/\/github.com\/miohtama\/python-Levenshtein\" target=\"_blank\">python-Levenshtein<\/a> \u4e0b\u8f7d\u6e90\u7801\u8fdb\u884c\u5b89\u88c5\uff08\u5728<a href=\"http:\/\/www.lfd.uci.edu\/~gohlke\/pythonlibs\/#python-levenshtein\" target=\"_blank\">http:\/\/www.lfd.uci.edu\/~gohlke\/pythonlibs\/#python-levenshtein<\/a> \u5176\u5b9e\u4e5f\u6709\u7f16\u8bd1\u597d\u7684exe\uff09\uff0c\u7b2c\u4e00\u6b21\u5b89\u88c5\u7684\u65f6\u5019\u62a5\u9519\uff1aerror: Unable to find vcvarsall.bat \uff0c\u4f46\u5176\u5b9e\u6211\u662f\u88c5\u4e86VS2010\u7684\uff0c\u6240\u4ee5\u6267\u884c\u5982\u4e0b\u6b65\u9aa4\u6b63\u5e38\u5b89\u88c5\uff1a<\/p>\n<p>1.\u8bbe\u7f6e\u73af\u5883\u53d8\u91cf\uff0c\u6267\u884c\uff1a<\/p>\n<p>SET VS90COMNTOOLS=%VS100COMNTOOLS%<\/p>\n<p>2.\u518d\u53bb\u5b89\u88c5\uff1a<\/p>\n<p>setup.py install<\/p>\n<p>\u5c31\u53ef\u4ee5\u6b63\u5e38\uff0c\u7f16\u8bd1\uff0c\u5b89\u88c5\u4e86\u3002<\/p>\n<pre class=\"lang:default decode:true\">$ python\n&gt;&gt;&gt; import Levenshtein\n&gt;&gt;&gt; help(Levenshtein.ratio)\nratio(...)\n    Compute similarity of two strings.\n\n    ratio(string1, string2)\n\n    The similarity is a number between 0 and 1, it's usually equal or\n    somewhat higher than difflib.SequenceMatcher.ratio(), becuase it's\n    based on real minimal edit distance.\n\n    Examples:\n    &gt;&gt;&gt; ratio('Hello world!', 'Holly grail!')\n    0.58333333333333337\n    &gt;&gt;&gt; ratio('Brian', 'Jesus')\n    0.0\n\n&gt;&gt;&gt; help(Levenshtein.distance)\ndistance(...)\n    Compute absolute Levenshtein distance of two strings.\n\n    distance(string1, string2)\n\n    Examples (it's hard to spell Levenshtein correctly):\n    &gt;&gt;&gt; distance('Levenshtein', 'Lenvinsten')\n    4\n    &gt;&gt;&gt; distance('Levenshtein', 'Levensthein')\n    2\n    &gt;&gt;&gt; distance('Levenshtein', 'Levenshten')\n    1\n    &gt;&gt;&gt; distance('Levenshtein', 'Levenshtein')\n    0<\/pre>\n<p><strong>\u65b9\u6cd5\u4e09\uff1a<\/strong><strong><a href=\"https:\/\/github.com\/seatgeek\/fuzzywuzzy\" target=\"_blank\">FuzzyWuzzy<\/a><\/strong><\/p>\n<pre class=\"lang:default decode:true\">git clone git:\/\/github.com\/seatgeek\/fuzzywuzzy.git fuzzywuzzy\ncd fuzzywuzzy\npython setup.py install\n\n&gt;&gt;&gt; from fuzzywuzzy import fuzz\n&gt;&gt;&gt; from fuzzywuzzy import process\n\nSimple Ratio\n&gt;&gt;&gt; fuzz.ratio(\"this is a test\", \"this is a test!\")\n    96\n\nPartial Ratio\n&gt;&gt;&gt; fuzz.partial_ratio(\"this is a test\", \"this is a test!\")\n    100\n\nToken Sort Ratio\n&gt;&gt;&gt; fuzz.ratio(\"fuzzy wuzzy was a bear\", \"wuzzy fuzzy was a bear\")\n    90\n&gt;&gt;&gt; fuzz.token_sort_ratio(\"fuzzy wuzzy was a bear\", \"wuzzy fuzzy was a bear\")\n    100\n\nToken Set Ratio\n&gt;&gt;&gt; fuzz.token_sort_ratio(\"fuzzy was a bear\", \"fuzzy fuzzy was a bear\")\n    84\n&gt;&gt;&gt; fuzz.token_set_ratio(\"fuzzy was a bear\", \"fuzzy fuzzy was a bear\")\n    100<\/pre>\n<p><strong>\u65b9\u6cd5\u56db\uff1a<\/strong><strong><a href=\"http:\/\/code.google.com\/p\/google-diff-match-patch\/\" target=\"_blank\">google-diff-match-patch<\/a><\/strong><\/p>\n<pre class=\"lang:default decode:true\">import diff_match_patch\ntextA = \"the cat in the red hat\"\ntextB = \"the feline in the blue hat\"\n\ndmp = diff_match_patch.diff_match_patch()\u00a0\u00a0#create a diff_match_patch object\ndiffs = dmp.diff_main(textA, textB)\u00a0\u00a0\u00a0# All 'diff' jobs start with invoking diff_main()\n\nd_value = dmp.diff_levenshtein(diffs)\nprint d_value\n\nmaxLenth = max(len(textA), len(textB))\nprint float(d_value)\/float(maxLenth)\n\nsimilarity = (1 - float(d_value)\/float(maxLenth)) * 100\nprint similarity<\/pre>\n<p>\u4e0a\u9762\u8fd9\u6bb5\u4ee3\u7801\u7684\u601d\u8def\u4e5f\u662f\u5148\u8ba1\u7b97Levenshtein\u8ddd\u79bb\uff0c\u7136\u540e\u518d\u5c06\u5176\u548c\u4e24\u5b57\u7b26\u4e32\u7684\u6700\u5927\u957f\u5ea6\u76f8\u9664\uff0c\u5f97\u5230\u76f8\u4f3c\u5ea6\uff08\u4e0d\u6e05\u695a\u8fd9\u6837\u548c\u76f4\u63a5\u4f7f\u7528Levenshtein\u6269\u5c55\u6709\u4ec0\u4e48\u533a\u522b\uff0c\u6bd5\u7adf\u90a3\u4e2a\u76f4\u63a5\u662f\u7528C\u5199\u6210\u7684\uff0c\u901f\u5ea6\u53ef\u80fd\u8fd8\u8981\u5feb\u4e00\u4e9b\uff0c\u76f4\u63a5\u4e00\u4e9b\uff09<\/p>\n<p>&nbsp;<\/p>\n<p><strong>\u53c2\u8003\u94fe\u63a5\uff1a<\/strong><\/p>\n<ul>\n<li><a href=\"http:\/\/search.aol.com\/aol\/search?q=use+python+to+calculate++text+similarity\" target=\"_blank\">use python to calculate text similarity &#8211; AOL Search Results<\/a><\/li>\n<li><a href=\"http:\/\/stackoverflow.com\/questions\/682367\/good-python-modules-for-fuzzy-string-comparison\" target=\"_blank\">Good Python modules for fuzzy string comparison? &#8211; Stack Overflow<\/a><\/li>\n<li><a href=\"http:\/\/stackoverflow.com\/questions\/145607\/text-difference-algorithm\" target=\"_blank\">c# &#8211; Text difference algorithm &#8211; Stack Overflow<\/a><\/li>\n<li><a href=\"http:\/\/stackoverflow.com\/questions\/246961\/algorithm-to-find-articles-with-similar-text\" target=\"_blank\">language agnostic &#8211; Algorithm to find articles with similar text &#8211; Stack Overflow<\/a><\/li>\n<li><a href=\"http:\/\/search.aol.com\/aol\/search?q=python+difflib\" target=\"_blank\">python difflib &#8211; AOL Search Results<\/a><\/li>\n<li><a href=\"http:\/\/search.aol.com\/aol\/search?q=python+Levenshtein\" target=\"_blank\">python Levenshtein &#8211; AOL Search Results<\/a><\/li>\n<li><a href=\"http:\/\/graus.nu\/thesis\/string-similarity-with-tfidf-and-python\/\" target=\"_blank\">Computing string similarity with TF-IDF and Python | thesis | graus.nu<\/a><\/li>\n<li><a href=\"http:\/\/en.wikipedia.org\/wiki\/Levenshtein_distance\" target=\"_blank\">Levenshtein distance &#8211; Wikipedia, the free encyclopedia<\/a><\/li>\n<li><a href=\"http:\/\/zh.wikipedia.org\/wiki\/TF-IDF\" target=\"_blank\">TF-IDF &#8211; \u7ef4\u57fa\u767e\u79d1\uff0c\u81ea\u7531\u7684\u767e\u79d1\u5168\u4e66<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/seatgeek\/fuzzywuzzy\" target=\"_blank\">https:\/\/github.com\/seatgeek\/fuzzywuzzy<\/a><\/li>\n<li><a href=\"http:\/\/stackoverflow.com\/questions\/12649740\/building-an-html-diff-patch-algorithm\" target=\"_blank\">python &#8211; Building an HTML Diff\/Patch Algorithm &#8211; Stack Overflow<\/a><\/li>\n<li><a href=\"http:\/\/useless-factor.blogspot.com\/2008\/01\/matching-diffing-and-merging-xml.html\" target=\"_blank\">Useless Factor: Matching, diffing and merging XML<\/a><\/li>\n<li><a href=\"https:\/\/code.google.com\/p\/google-diff-match-patch\/wiki\/API\" target=\"_blank\">https:\/\/code.google.com\/p\/google-diff-match-patch\/wiki\/API<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>\u7528Python\u8ba1\u7b97\u6587\u672c\u7684\u76f8\u4f3c\u5ea6 \u56e0\u4e3a\u540e\u671f\u4f1a\u9700\u8981\u7528\u5230\u8fd9\u65b9\u9762\u7684\u77e5\u8bc6\uff0c\u6240\u4ee5\u5148\u63d0\u524d\u51c6\u5907\u51c6\u5907\uff1b\u5982\u4f55\u5224\u65ad\u7f51\u9875\u8fd4\u56de\u5185\u5bb9\u7684\u76f8\u4f3c [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[23,12],"tags":[386,387,388,389,8],"class_list":["post-1555","post","type-post","status-publish","format-standard","hentry","category-knowledgebase-2","category-tools","tag-diff-match-patch","tag-difflib","tag-fuzzywuzzy","tag-levenshtein","tag-python"],"views":5131,"_links":{"self":[{"href":"https:\/\/ixyzero.com\/blog\/wp-json\/wp\/v2\/posts\/1555","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ixyzero.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ixyzero.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ixyzero.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ixyzero.com\/blog\/wp-json\/wp\/v2\/comments?post=1555"}],"version-history":[{"count":0,"href":"https:\/\/ixyzero.com\/blog\/wp-json\/wp\/v2\/posts\/1555\/revisions"}],"wp:attachment":[{"href":"https:\/\/ixyzero.com\/blog\/wp-json\/wp\/v2\/media?parent=1555"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ixyzero.com\/blog\/wp-json\/wp\/v2\/categories?post=1555"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ixyzero.com\/blog\/wp-json\/wp\/v2\/tags?post=1555"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}