In our collaboration, and by reviewing our topic modeling results, we have learned that the number of topics and iterations has a major effect on the results produced. Increasing the number of topics made it easier to find cohesive topics with an identifiable label, though it made picking through data much more labor intensive and got overwhelming as numbers increased. It seems like a small sacrifice to make, as reducing the number of topics increased the presence of unusable topics. We both seemed to agree that 40-60 topics was an ideal range for achieving good results. In terms of iterations, increasing the number really seemed to increase how well the words within topic groups related to one another. We both increased our number of iterations with each output and noticed that it got easier to identify topics. Ideal settings for the topic modeling tool, to us, seemed to be 50 topics, at least 2000 iterations and 20-25 words printed.
In choosing three of our favorite topics we narrowed it down to suicide, physical appearance, and written document.
Suicide: found man body dead lay blood head struck hand shot revolver blow knife stick heavy weapon unfortunate left death sign lying wound bullet handle formidable pistol finally escaped wounded tied fired carried world struggle dragged grotesque injury spot shirt gun
This topic was most prevalent in Norwood Builder, and least prevalent in Empty House.
- What can these topics tell us about Sir Arthur Conan Doyle’s writing style?
- Was suicide an actual phrase in twentieth century London?
Physical Appearance: black red white hair hat head large broad coat heavy small middle set short dress cut brown round thick centre grey faced dressed clean glancing
This topic was most prevalent in A Case of Identity, and least prevalent in The Blue Carbuncle.
- What do the colors symbolize in this short story?
- Did the weather factor into the physical appearances of characters in short stories based in twentieth century London?
Written Document: paper note table read papers box book pocket put handed writing written drew sheet glanced picked document slip envelope piece
This topic was most prevalent in The “Gloria Scott,” and least prevalent in The Second Stain.
- What prevalence does this document have in “Gloria Scott?”
- Were written documents important for all investigations?