Aug. 27th, 2005

totient: (Default)
Self-adhesive #10 envelopes auto-feed quite nicely through Cartman, it turns out, so I'll be saving a lot of trouble on the program mailing by doing exactly that. But they ought to be going through at half speed (that is, 720 impressions per hour) and instead they are stalling. Stall recovery is only 20 seconds, so it's still faster than full speed on some printers I've had, but with 500 envelopes to print I'd rather not be at it for three hours. I think the problem is in the printer driver on my laptop; maybe separating the print into several smaller batches will help. I hope it's not that the driver thinks it's printing sideways, because I don't really have a plan for what to do about that.
totient: (Default)
It's not like I get all that much spam. But some of it causes Eudora 4.2 to crash, and I don't feel like using the adware version or paying all over again for a program that's fairly similar in features. So a while ago I installed CRM114, and I've been training it since. It got to 98 or 99% quickly, but stubbornly refuses to get any better than that. And the failures happen in both directions.

Recently I've been trying a compromise between TOE and TEFT; if a message comes through with a confidence level under 100, I'll train on it. That hasn't really helped CRM114 converge any quicker. I think fast convergence really requires shared data sets, a la Google.

Speaking of which, I've also got a Gmail account (with the same username as this one). I use this for signing up for commercial services that I think will sell my address or otherwise be annoying, and mostly only check that address when I am expecting a particular piece of mail. I thought of giving up on maintaining my own Bayesian filters and just forwarding all my mail to Gmail (which near as I can tell uses pattern-based filtering and ever-vigilant professional pattern authors), but 4.2 doesn't talk POP over SSL, and I want to be able to read mail offline, and to search current and historic mail together. And I do like Bayesian filters' ability to give me only that portion of a mailing list's traffic that will actually interest me, even when the non-interesting parts aren't spam per se.

So, why not filter just the spam to Gmail? Forwarding just the high-confidence messages should keep my Eudora from crashing, and I'll still get the false positives on my Eudora client where I can see them. But the mailbox filter to separate low-confidence and high-confidence spam was after whatever was making Eudora crash. Fortunately, rewriting the filter in procmail wasn't too hard, and now my high-confidence spam goes to my Gmail account, where it can rot for 30 days before Google automatically deletes it.

CRM114: Crash's Bayesian mail filtering program.
TOE: Train On Error.
TEFT: Train Everything.

Profile

totient: (Default)
phi

January 2026

S M T W T F S
    1 23
45678910
11121314151617
18192021222324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 8th, 2026 01:55 am
Powered by Dreamwidth Studios