programming matrix: Adding an nltk stop word filter

Monday, December 19, 2016

Adding an nltk stop word filter

$ python

> from nltk.corpus import stopwords
> nltk_stopwords = stopwords.words('english')

except LookupError: raise e
LookupError: 
**********************************************************************
  Resource u'corpora/stopwords' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
    - '/Users/depappas/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************

Process finished with exit code 1

Install NLTK Data
This will take a few minutes if your Internet download speed is around 15mbs.

http://www.nltk.org/data.html
$ python

> import nltk

> nltk.download()

showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

Click on the Corpora tab and select the stopwords package you want. Wait for it to download.

from nltk.corpus import

Now this should work...


$ python

Python 2.7.12 (default, Oct 11 2016, 05:20:59) 

[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

>>> import nltk

>>> nltk.download()

showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

True

>>> from nltk.corpus import stopwords

>>> nltk_stopwords = stopwords.words('english')

>>>

Done!

programming matrix

Monday, December 19, 2016

Adding an nltk stop word filter

Followers

Blog Archive