Text Analysis

Python is a phenomenally good tool for text analysis, and there are a few good tools out there you can use.

Natural Language Tool Kit (NLTK)

The most used library in social science is probably the “Natural Language Tool Kit”, normally referred to as “NLTK”. The library has lots of tools, and is very user friendly.

Moreover, the site for NLTK not only includes some simple examples on the main page, but also the full contents of the book “Language Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit” by Steven Bird, Ewan Klein, and Edward Loper (talk about generous!). However, if you want a paper copy, you can also find it on Amazon here.


TextBlob is a nice looking new library that seems to have a very streamlined interface. The library appears to actually use NLTK and another library (pattern) in the background, so much of the functionality should be similar to NLTK itself.

Stanford CoreNLP

There are some other libraries that are somewhat more powerful, but not as user-friendly. One of the most powerful libraries for Natural Language Processing is the Stanford CoreNLP library. The library itself is written in Java (not Python), but a number of people have written Python interfaces (also called wrappers or APIs) for the library which you can find here. Again, these are a little harder to use and the documentation is not as good, but they’re good options if you run into limitations with NLTK.