Monday, December 19, 2016

Adding an nltk stop word filter

$ python
> from nltk.corpus import stopwords
> nltk_stopwords = stopwords.words('english')

except LookupError: raise e
LookupError: 
**********************************************************************
  Resource u'corpora/stopwords' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
    - '/Users/depappas/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************

Process finished with exit code 1

Install NLTK Data
This will take a few minutes if your Internet download speed is around 15mbs.

http://www.nltk.org/data.html
$ python
> import nltk
> nltk.download()

showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

Click on the Corpora tab and select the stopwords package you want. Wait for it to download.



from nltk.corpus import



Now this should work...
 

$ python
Python 2.7.12 (default, Oct 11 2016, 05:20:59) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download()
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
True
>>> from nltk.corpus import stopwords
>>> nltk_stopwords = stopwords.words('english')
>>> 
Done!