Archive for the ‘Blogging’ Category

reCAPTCHA, I Think I Get It

I received a very helpful (and FAST) response from the reCAPTCHA team earlier today:

Basically, when you get a reCAPTCHA incorrect on wordpress, the comment is saved but it is marked as spam. When a legitimate user gets the CAPTCHA incorrect, they are redirected to a page that saves their comment and allows them to correct the CAPTCHA. Once we recover the comment, it’s deleted from the database. However, if a spam bot enters the comment, it will not follow the redirect, so sometimes the comment stays in the DB.

Basically, when using reCAPTCHA, any comment you see in the moderation queue was caught by reCAPTCHA — it doesn’t mean that the CAPTCHA is broken.

The only thing I’ll add to this is my observation that WordPress does not always make it clear which “comments” come from the comment form versus trackbacks and pingbacks. So “any comment you see in the moderation queue” MIGHT have come via reCAPTCHA, but it might also have originated via trackbacks. Read on…

Following Their Advice

Today I turned off email notification for the moderation queue, as the reCAPTCHA FAQ suggests. But this has a side-effect…I also had this option selected in WordPress: “Comment author must have a previously approved comment”.

Damn. This means I’ll no longer see email notifications for those legitimate human comments. So…I went ahead and turned off that, as well.

Trackback Vulnerability

After changing these settings, I very quickly saw a spam comment make it onto my blog, because (I think) the “Comment author must have a previously approved comment” checkbox is now unselected. I missed my opportunity to moderate the comment (which is actually a trackback), so it went on through.

AFTER the spam makes it through the system, WordPress sends an Email notification. Part of this email contains this phrase:

You can see all trackbacks on this post here:

Aha! Now I know this came from a trackback, not from a normal comment.

I now believe most of the comment spam I’ve been seeing in recent weeks has originated from trackbacks, it’s just that WordPress does not always make it easy to tell if the comments are coming from trackbacks or the comment form. reCAPTCHA does a very good job with the comment form, but does not address trackback or pingback spam in any way.

Closing Thoughts

I really, really like reCAPTCHA. It is very effective and the people on their team are always helpful and promptly answer questions. Having reCAPTCHA in place saves me significant time. It’s also damn cool that they are harnessing people power to digitize books. Watching Luis von Ahn’s Google Tech Talk was the reason I tried reCAPTCHA in the first place. It is an incredibly clever idea that any geek can appreciate.

I used Akismet in the past, and wading through thousands of spams in the Akismet queue grew very burdensome. Furthermore, it occasionally marked valid comments as spam — definitely more often than once every 6 months — so I never felt comfortable ignoring the spams.

Maybe WordPress needs to improve its moderation queue? Some ideas:

  • Offer separate moderation queues for comments, trackbacks, and pingbacks.
  • Let me configure different moderation policies for these different queues.
  • At all stages, make it very obvious if a comment originated from the comment form, a trackback, or a pingback.

Trackbacks and pingbacks are now off. Comments — from people — are open and encouraged. Spammers, you suck.

Spam Problems, I Need Help

For awhile, reCAPTCHA solved my WordPress spam problem. Then I started seeing a handful of spams, now I receive at least 5 per day in my moderation queue. I first blogged about this a bit over a month ago.

Here is a typical email I receive from my WordPress moderation queue:

SPAM Email

Here is what the WordPress moderation queue looks like:

Moderation Queue

Is this from a trackback or did a human enter this comment? How can I tell the difference between comments and trackbacks?

What should I do next to fight spam on this blog? Disable trackbacks? Install some sort of trackback SPAM fighting plugin? Help!

Atom Valid on WordPress 2.3

I just upgraded to WordPress 2.3, the whole process took a matter of minutes. According to the W3C Feed Validation Service, my Atom 1.0 feed is now valid. Woohoo!

Apostrophes Finally Work

Valid Atom means characters like & and < will render properly in an increasing number of tools. Thanks to clearly defined text constructs, the meaning of such characters is not ambiguous.

WordPress is but one link in the chain of syndication tools. Using Atom helps ensure sites like JavaBlogs, DZone, and others can consume and syndicate content accurately. As I wrote in early September, I explicitly customized my WordPress installation to disable RSS. I simply do not trust RSS because every person writing an RSS tool might interpret entities differently.

Apostrophe Rant

Now that WordPress Atom feeds appear to handle apostrophes a bit better, what can we do about Greengrocers’ apostrophes? It must be embarrassing to work at companies where the official company name makes this stupid mistake. (Answer: the plural of CD is CDs, not CD’s)

Now I fully expect my faithful readers to point out all of the grammatical and spelling errors in this blog. That should teach me to mock apostrophe abuse.

Origins of @InsertNameHere

I just recently caught on to the trend of referencing other people’s blog comments by prefixing their username with @. I like it. And I’m probably one of the last people on Planet Internet to “catch on”. (again, feeling behind the curve here)

Did this convention originate with Twitter?

I fully expect a whole bunch of comments where you guys reference each other’s comments with @.

Go.

Comic Tribute to Java Versus Ruby

This has been a great week for Java/Ruby geek blogging.

Gavin and Obie

It all started here, which eventually led to this. I love it.

Has reCAPTCHA Been Defeated?

Without reCAPTCHA, it is not uncommon to receive hundreds of spam comments per day. Akismet can help, but it still requires blog authors to spend a lot of time wading through suspected spam. Even worse, I’d see almost weekly “false positives” from Akismet. They do an admirable job, but spam remains a big burden on site maintainers.

With reCAPTCHA, my spam count immediately dropped to zero. So I deactivated Akismet. For awhile, the problem was solved.

SPAM

Last week, however, I received a spam comment. Today, one more. Both originated from Amsterdam. The first comment made absolutely no sense. It said something along the lines of “nice post”, but the web site it linked to was invalid. I don’t understand spam like this. If the spam has no usable links, how does it help the spammer sell a product? Today’s spam at least gets credit for including a link to a real web site.

For now, it is a minor nuisance. Since I moderate all comments, the spams still aren’t making it past me. There’s no way to know if a machine defeated reCAPTCHA or if a human typed in the text. Or perhaps they are exploiting a bug in WordPress or reCAPTCHA. I suppose if the problem gets worse, I’ll have to re-activate Akismet as a second line of defense after reCAPTCHA.

Spammers, you all suck. But blogs that disallow comments also suck. So I will continue to fight this battle.

404

Does your blog have a decent 404 error page?

Car Crashing into Water

I’m sure you can find mine all by yourself.

Let me know if you can think of other famous things we’ve never found.

Google Code Prettify

I just installed google-code-prettify. It only takes a few minutes and the results are really cool:

import java.util.*;

public class Foo {
  private String firstName;

  public Foo(String firstName) {
    this.firstName = firstName;

    // some other junk
  }

  /**
   * Comment here.
   */
  public int getSomething() { ... }
}

Thank you again, Google.

On WordPress and Atom

Although WordPress defaults to RSS, I prefer Atom. Here is how I configured my WordPress installation to suppress RSS.

Huge Feed Icon

The Feed Icon

When you view this and other blogs in a modern browser, you should see a little feed icon somewhere in the browser. In Firefox 2.0.x, this icon appears in the “location bar” as they call it. In IE 7, the feed icon appears in a toolbar. For a default WordPress installation, clicking this feed icon takes you to the RSS feed. In my blog, it takes you to my Atom feed:

http://stuffthathappens.com/blog/feed/atom/

Making the feed icon work with Atom instead of RSS is very easy in WordPress. You need to edit header.php in your WordPress theme. (I’m assuming your theme is roughly based on the ‘default’ theme…if not, I suppose your PHP file might be called something else). At any rate, simply replace the existing RSS <link> tag with this:

<head>
  ...
  <link rel="alternate" type="application/atom+xml"
      title="<?php bloginfo('name'); ?> Atom Feed"
      href="<?php bloginfo('atom_url'); ?>" />
  ...
</head>

That’s easy! And the results are instantly visible when you reload the page in your browser. Now, on to the footer.

The Footer

Scroll down to the bottom of any page on this site and you’ll see links to my Atom feed and my Atom Comments feed. The relevant PHP code is found in footer.php. Here is what I changed my footer to:

<p>
  Australopithecus afarensis (a.k.a. "Lucy") reads
  <cite><?php bloginfo('name'); ?></cite> every day via the
  <a href="<?php bloginfo('atom_url'); ?>">Entries</a>
  and <a href="<?php bloginfo('comments_atom_url'); ?>">Comments</a>
  Atom feeds.
</p>

Again, this is a piece of cake. Basically just search through your theme’s PHP files, replacing ‘rss’ with ‘atom’.

A Missing Feature

WordPress also gives you the ability to subscribe to comments for a particular post as RSS, but not (that I know of) as Atom. I found this code in single.php:

...You can follow any responses to this entry
through the <?php comments_rss_link('RSS 2.0'); ?> feed.

I tried changing that to comments_atom_link but had no success with that. So ultimately, I eliminated that from my blog.

Redirecting Away from RSS

After replacing or removing all RSS references from PHP files in my custom theme, I still had to find a way to keep people from navigating directly to an RSS feed. These are built-in to WordPress, so I had two options:

PHP Hacks?
  • Edit PHP files in WordPress
  • Redirect users by editing .htaccess

Editing PHP files is a Bad Idea. That’s because up until this point, all of the PHP editing I’ve done has been in my custom theme. To really remove RSS, you need to hack up WordPress itself. This will cause Bad Things when you try to upgrade to the next WordPress release.

Instead, I opted to edit .htaccess.

This file is found in the WordPress installation directory. If you’ve never heard of .htaccess, do a quick Google search for terms like “WordPress” and “.htaccess” to learn more about that. Here are the specific lines I added:

# Redirect RSS people to the Atom feeds
RewriteRule ^feed/$ /blog/feed/atom/ [R,L]
RewriteRule ^feed/rss/$ /blog/feed/atom/ [R,L]
RewriteRule ^feed/rss2/$ /blog/feed/atom/ [R,L]
RewriteRule ^comments/feed/$ /blog/comments/feed/atom/ [R,L]
RewriteRule ^comments/feed/rss/$ /blog/comments/feed/atom/ [R,L]
RewriteRule ^comments/feed/rss2/$ /blog/comments/feed/atom/ [R,L]

I’m sure I can make that shorter with some regular expressions, and I still think you can get to the RSS feed by entering the URL of the .php file directly. But for the most part, people who enter the most common RSS URLs are now greeted with an HTTP redirect that takes them to the “offical” Atom feed.

Why, Oh Why?

Because this is my blog and I prefer Atom. I prefer Atom because I’ve read both the Atom spec and the RSS spec, and I’ve written tools that produce and consume both types of feeds. Atom is clearly specified while RSS is ambiguous. When writing tools that produce syndication feeds, Atom allows you to clearly indicate if a “<” character is part of an HTML tag, an XML tag, or just plain text. That alone is enough reason to prefer Atom for information exchange.

When given a choice, I will always choose Atom over RSS.

Fresh Start

Since people started commenting within minutes of me installing WordPress with a dummy Lorem Ipsum post, I figured I’d better get this started for real.

Reset Button

My blog is back but the old content is not, nor will it ever be. I was not happy with the quality and it lacked focus. Hopefully that will change this time around.

If you want to subscribe, please use the Atom feed. I will completely disable any RSS feeds, even if I have to modify WordPress.

That’s all for now. Please let me know if you see any problems with this theme in your browser!