Multiple Guess

Security Watch: The myth of online anonymity – CNET reviews:

In his talk, Kazwetz mentioned several studies on gender use of keywords which, when weighted–with specific numerical values for male and different numerical values for female–can determine the gender of the author. Sounds too simple to be true, but research (including Gender, Genre, and Writing Style in Formal Written Texts by Shlomo Argamon, et al, and Sexed Texts by Charles McGrath) has shown that some words are more likely to be written by one gender or the other. In informal writing, men are more likely to write “some,” “this”, and “as” while women are more likely to write “actually,” “everything,” and “because”. In formal writing, men write “around,” “more,” and “what” while women write “if,” “with,” and “where.” By determining the point totals in a given document, Dr. Krawetz can predict the gender of the author.

Dr. Krawetz admits upfront that this method is only 60 to 70 percent accurate, but it is far better than guessing, which is only 50 percent accurate. He further cautions that text including citations from poetry, quotes from others, and even the influence of copy editors on the original can all skew the results. It is best to collect a large number of examples, then average the point totals.

Is this method compared to guessing significantly different? I’d prefer a much higher accuracy rate. At the minimum, 85 percent, though 90 to 95 percent would be much better. Maybe in forensics accuracy isn’t so important?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: