Topic Modeling with MALLET: Analyzing the Results

Initially, it was difficult for me to understand the definition and purpose of topic modeling. However, after using MALLET, a topic modeling tool, to find patterns in Sherlock Holmes stories, I began to understand how topic modeling works.

After entering the Sherlock Holmes stories into MALLET, I found 10 good topics. The first 6 topics came from 50 topics,1000 iterations, and 20 topic words printed. The topic names were Letter Writing, Crime, Marriage, Death, Clues, and Physical Description (Male). The other four topics came from 70 topics, 1500 iterations, and 15 topic words printed. These were Holmes in his Chair, Rooms in a House, London Finance, and Investigation Process. I experimented with other variations of iterations, topics, and topic words printed, but only had time to upload these output files onto my computer. By testing out many different variations I found that the more iterations and topic words you have, the easier it is to identify the topic name. After I picked out my 10 topics, I clicked on the topic words within them in order to see the top ranked documents within that topic. MALLET then allowed me to see the number of words in a specific document that were assigned to that topic. I found, for example, that 22 words in a document from The Stock Broker’s Clerkwere assigned to the London Finance topic. The words in this topic were: money business work hundred answered good pounds company asked thousand advertisement city price headed pay. The document excerpt that MALLET showed at the top of the page revealed that this part of the story was about a “gigantic robbery” in which “nearly a hundred thousand pounds worth of American railway bonds” were found in the robber’s bag. This explains why 22 of the words within the document were assigned to London Finance. MALLET also showed that only 12% of the words in that entire document were assigned to this topic. I went through this same process with all of my topics to figure out which Sherlock Holmes stories discussed certain topics, and how many words in each story were assigned to those topics.

Altogether, I think topic modeling with MALLET is a great way of distant reading. MALLET proved to be efficient after it sifted through mass amounts of text from Sherlock Holmes stories and found patterns within them faster than most of us could even finish reading just one of those stories. There were a few aspects of MALLET, however, that I disliked. First, it creates enormous files. These files take up a lot of space, and this makes the process of transferring them onto Google Drive and onto other computers extremely slow. On top of this, some of the topics it creates are extremely difficult to decipher names for because the words didn’t seem have much in common. A lot of the topics also reappeared after I changed the number of iterations, topics, and topic words (ex. London Finance, Death, Holmes in his Chair). I suppose that was inevitable though, because the text being read by MALLET didn’t change.

After completing this project, I understand that topic modeling tools such as MALLET are useful in that they can take texts and then find patterns in the use of words. topic modeling is most effective when we have many documents/texts that we want to understand without actually closely reading each individual text (distant reading!).

Mary Dellas


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s