Commit 37940a17 authored by Andreas Mueller's avatar Andreas Mueller
Browse files

Merge pull request #19 from petrushev/unicode-parsing

Parse words with unicode letters
parents a8fbf5bc 05bb5624
......@@ -178,7 +178,7 @@ def process_text(text, max_features=200, stopwords=None):
stopwords = STOPWORDS
d = {}
for word in re.findall(r"\w[\w']*", text):
for word in re.findall(r"\w[\w']*", text, flags=re.UNICODE):
if word.isdigit():
continue
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment