Nltk group similar words. FreqDist(words) # remove stopwords stopwords = nltk.