I Write Like Me

Check which famous writer you write like with this statistical analysis tool, which analyzes your word choice and writing style and compares them with those of the famous writers.

Not trusting a single sample, I tested fifteen writing samples including stories and blog posts (excluding those with block quotes). The Cory Doctorow result was the most common at six.

I also received David Foster Wallace (3), Arthur Conan Doyle (3), J.K. Rowling (2), Isaac Asimov (1).

There was a clear pattern to the results.

  1. Cory Doctorow: Topic was work. Analyzer probably keyed on the dispassionately objective word choice.
  2. David Foster Wallace: Topic was my personal life. Analyzer probably keyed on me portraying the  absurdities.
  3. Arthur Conan Doyle: Topic was adventure story originated in high school. I probably thought too much like Sherlock Holmes then.
  4. J.K. Rowling: Topic was also adventure story composed in early college. I probably thought too much like Harry Potter then.
  5. Isaac Asimov: Topic was science. Its hard not to use scientific jargon when writing about science.

That there would be a difference between my high school and college story writing was interesting. The difference depending on whether I was writing about work, personal, or science was also interesting. I would have liked to see almost every sample I chose of my writing to reflect a single author. Otherwise, it seems results skewed towards word choice not style.

From the developer, Dmitry Chestnykh on how this works.

Actually, the algorithm is not a rocket science, and you can find it on every computer today. It’s a Bayesian classifier, which is widely used to fight spam on the Internet. Take for example the “Mark as spam” button in Gmail or Outlook. When you receive a message that you think is spam, you click this button, and the internal database gets trained to recognize future messages similar to this one as spam. This is basically how “I Write Like” works on my side: I feed it with “Frankenstein” and tell it, “This is Mary Shelley. Recognize works similar to this as Mary Shelley.” Of course, the algorithm is slightly different from the one used to detect spam, because it takes into account more stylistic features of the text, such as the number of words in sentences, the number of commas, semicolons, and whether the sentence is a direct speech or a quotation.

Bayesian filters I’ve seen given an item a score to how likely an item is something. I would like to see the strength of the scores, including distributions, and comparison of a given result to other close results. Guess I am just someone who wants to know why?

