Jack Schofield 

Microsoft patent on Bayesian spam filtering

Last year, I got quite excited about an idea of Bayesian probabalistic spam filtering, which grew out of a series of developments prompted by Paul Graham's A Plan for Spam. That's dated August 2002, and it was certainly a new idea to me. However, it doesn't seem to have been a new idea to Microsoft. It seems Eric Horvitz and others from Microsoft Research wrote a paper on A Bayesian approach to filtering junk email for AAAI Workshop on Learning for Text Categorization in July 1998. Not surprisingly, a patent application went in just before that, on June 23, 1998, for a "Technique which utilizes a probabilistic classifier to detect "junk" e-mail by automatically updating a training and re-training the classifier based on the updated training set". Curiously enough, the Microsoft development was reported in Fortune magazine, but either the reporter did not use the key words, Bayes or Bayesian, or a protective subeditor removed them, as they do. Perhaps using "Bayesian" would have made the piece far too hard for Fortune readers to understand, but it effectively hid the piece from anyone trying to find information on the topic.
  
  


Last year, I got quite excited about an idea of Bayesian probabalistic spam filtering, which grew out of a series of developments prompted by Paul Graham's A Plan for Spam. That's dated August 2002, and it was certainly a new idea to me. However, it doesn't seem to have been a new idea to Microsoft. It seems Eric Horvitz and others from Microsoft Research wrote a paper on A Bayesian approach to filtering junk email for AAAI Workshop on Learning for Text Categorization in July 1998. Not surprisingly, a patent application went in just before that, on June 23, 1998, for a "Technique which utilizes a probabilistic classifier to detect "junk" e-mail by automatically updating a training and re-training the classifier based on the updated training set". Curiously enough, the Microsoft development was reported in Fortune magazine, but either the reporter did not use the key words, Bayes or Bayesian, or a protective subeditor removed them, as they do. Perhaps using "Bayesian" would have made the piece far too hard for Fortune readers to understand, but it effectively hid the piece from anyone trying to find information on the topic.

 

Leave a Comment

Required fields are marked *

*

*