Archive for the ‘Software Development’ Category

O p t i m i z i n g

I spent the last few days profiling and optimizing one of our app’s primary GUI screens. Here are some observations:

  1. Our app includes a built-in diagnostic tool that logs every server call. We can bring up a dialog and watch the calls as they occur. For our first optimization, we moved one of these calls off the EDT to a SwingWorker. Although the call itself only took 10-30ms, it was called often enough to introduce noticeable sluggishness. Having built-in diagnostic tools is priceless, because you can monitor what is happening behind the scenes.
  2. NetBeans includes a decent profiler, and the price is right: $0. Despite using IDEA for most coding, I always turn to NetBeans for profiling. IDEA could really use an integrated profiler.
  3. Having a super fast PC is helpful, because profiling is extremely taxing on your system. I don’t have a super fast PC, which makes the profiling experience painful and tedious.
  4. Despite the pain of a slow PC, profiling is very rewarding when you find a problem and fix it.
  5. We found several hot spots — NetBeans does a great job making these obvious. Each time we’d fix one issue, we’d run our scenarios again and find the next most critical issue.
  6. When optimizing, start with the worst hot spot and work your way down the list. A “hot spot” might mean a method is called once and consumes 13 seconds, or perhaps a method only takes 3 milliseconds but is called 800,000 times.
  7. Profiling is always surprising. One bottleneck was caused by an obvious programming error, and that error has been in our code since 2005. Ouch.
  8. You will hit dead ends. For example, one hot spot involves a small helper class that is executed tens of thousands of times. Unfortunately, the code is mutable with a wide open public API, making optimization quite hard. If this class was immutable and had fewer methods, I’m sure I could optimize it further.
  9. I try to fire up the profiler and spend time looking for problems before customers complain about bugs. I tend to focus on either runtime performance or else memory leaks, but generally not both in the same session. I may spend a few days each month focusing on one or the other.
  10. Optimizing always involves risk. Fixing one thing may introduce a new bug, so proceed with caution. Mitigate this risk by focusing on the most critical bottlenecks first. Optimize enough to make these bottlenecks go away, but don’t go so far that your code becomes unmaintainable.
  11. I found that after optimizing, the code is often cleaner than before. This might seem counter intuitive, but spending time meticulously analyzing a particular method line-by-line often gives you insight into better implementations. It think it is wrong to blindly assume that all time spent on optimization and efficiency necessarily leads to cryptic code.

TODO: Fix the TODOs

One of the apps I work on contained 29 TODO comments. This is a problem because with so many loose ends lingering on in the code, newly introduced TODOs simply get lost in the noise and never get fixed. In fact, some of those 29 TODOs dated back to 2005.

So this afternoon, two of us decided to take a look at each of those 29 issues. We found a room with a projector, fired up IDEA, and analyzed every TODO. In about an hour, we eliminated almost every issue. We generated change requests for each of the remaining 5 or so TODOs, and will fix those tomorrow.

I am simply amazed we never got around to doing this sooner. We’ve been looking at this noise for years now, and it ended up taking an hour to eliminate most of the problems. It might take another few hours tomorrow for the “hard cases”, but that’s time well spent.

Next up? I think we have some nagging errors in the application server log files. These bogus errors, warnings, and overly chatty log statements must all die.

Advice

  1. Don’t let your TODOs, broken tests, and cluttered log files fester. Resolve the problems as soon as possible. (for the record, our tests always pass 100% on this project)
  2. Once you get to a clean slate, keep it clean. TODOs are fine for quick fixes and temporary workarounds, but make a concerted effort to resolve the TODOs as soon as possible. A lot of small, minor improvements are easier to deal with than one massive cleanup effort down the road.
  3. Noise masks true problems. If your IDE and log files flood you with false warnings, you will fail to notice really important issues.

Low Standards, Stupidity, or Both?

This is a rant. If you don’t like rants, feel free to stop reading now. You’ve been warned.

Set the Bar Low

For my first example, let’s take a look at a portion of the Prudential Patterson home page:

234 Items in Select

Here we find the “Quick Search” section. That <select> component shows three choices. In reality, it contains 234 items! It has so many items, the scroll bar is not even visible. You have to hold the down arrow for more than TEN SECONDS to scroll through the entire list; you can only see 1.28% of the available choices at any given time. This is their definition of “Quick Search”.

Their home page also features this gigantic image:

Unavailable

This is the “Featured Property”…perhaps a working picture is in order?

A programmer actually created this web page and said “This is Production Quality” and let it go through to the web site. Unbelievable.

Nobody Cares

Now let’s talk about the Suburban Journals. Here is a picture of their page that lets you submit classified advertisements online:

Classifieds

I filled this out because I did not want to place my ad by telephone. My ad contained a cryptic URL from tinyurl.com, so I wanted to ensure it was correct. After waiting several days, I tried again. A week later, after nobody called back, I finally drove to the office to place my ad in person.

While placing my order with a woman who typed my information into her computer, I told her about the broken web site. Her response? Something like:

That’s not my department.

This is kind of funny…because a few minutes into our transaction, she had to call THAT VERY DEPARTMENT to ask for help with her data entry task. Did she mention the broken web page? Of course not.

So the web page continues to exist. Who knows how many requests end up going to /dev/null each day.

MapQuest

Any web site that still uses old-school MapQuest makes me angry.

Development Environment From Hell

Brad left this wonderful comment on my blog the other day:

I am forced to use IBM Rational Application Developer for WebSphere at work. It blows. You have to remember all kinds of quirks (like manually saving your files before running). I use it to autogenerate web service stubs for my EJBs. then move the project to Idea. What really blows is that if I need to ask the tech lead a question, I have to go back to RAD. I also have to check in my RAD project to CVS. I pray daily for a return to Ant or Maven. It takes 25 grueling GUI commands in hidden submenus of hidden menus to build a project from scatch (to our spec.).

Here is how it should work:

ant dist

But no…it takes 25 manual steps in a bloated overpriced IDE from hell. There are two possibilities:

  1. Laziness. The first team member lazily used the GUI wizard to generate the first part the application in its infancy. Then the next guy came along and tacked on some tiny additional step. Hey, it’s just one tiny thing. No raindrop thinks it is responsible for the flood.
  2. Stupidity. Someone actually “designed” this system and intentionally chose IDE wizards and human steps instead of automated scripts. I’ll just pretend nobody is stupid enough to intentionally design a build system requiring 25 manual steps.

I’m a Fireman

When your job consists of “putting out fires” all day (because of horrific software), maybe your job title should be “fireman”.

Or maybe when you’re embarrassed to admit you work in this industry, you should just lie and tell people “I’m a Fireman”. Show them the callus on your mouse hand and tell them it’s from climbing the ladder.

Python Study Group

I’m starting a Python study group at work and am looking for ideas. We’ll probably meet weekly and will review Learning Python, by Mark Lutz. Weighing in at 700 pages, however, it hardly seems practical to cover the entire book. Instead, I think it might be more fun to solve small programming puzzles each week.

So hoosgot a list of small programming tasks we can choose from each week? Perhaps something along the lines of Project Euler, but much easier and less Math-focused. Our goal is to learn Python from the ground up, starting with simple programming exercises.

Ideas?

MapReduce Reading

For my 8 hour trip to Kalamazoo last week, I printed some Google white papers for some “light reading”. One of these was MapReduce: Simplified Data Processing on Large Clusters, which was recently updated. I read the original version last year and wanted to catch up. From the paper:

Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google’s clusters every day, processing a total of more than twenty petabytes of data per day.

Meh. 20 petabytes? You should see the JAR files in *my* app, I tell ya…

Seriously, I did not read that article and think…”hmm…that’s kind of like a database.” I cannot imagine anyone thinking that. Nor is MapReduce an index. You can use it to *create* an index, for example. But still…MapReduce does not compete with a database in any way. It is entirely different, for an entirely different kind of problem. Yet, we have the DeWitt/Stonebraker article, described next.

More Interesting Reading

The punchline is that upon my triumphant return to O’Fallon, MO, I discover everyone’s blogging about MapReduce. OK, that’s just a weird coincidence. I think this unbelievably inaccurate article written by David J. DeWitt and Michael Stonebraker (everybody’s saying they are database experts…) sparked a lot of the debate. It’s generally not cool to link to really awful material, but this one is worth it for the sheer entertainment factor. I recommend you read in this order:

The most telling part of all this is the fact that the original authors do not participate in the comments, at all. They are being ripped to pieces — FOR GOOD REASON — and they say nothing.

David J. DeWitt and Michael Stonebraker should retract their highly inaccurate article.

JTable: Time to Rethink NIH?

You know what Not Invented Here (NIH) means, Google it if you need help. I’m here to talk about one case where we may need more innovation: Swing’s JTable component.

Background

I really like JTable. JTable is a remarkably powerful GUI component, accommodating a huge spectrum of applications. Its design provides clear separation between the table, scroll pane, table header, cell renderers, cell editors, and the table model. The component is usable out of the box, and its modular design facilitates a great deal of customization. For instance, here is a screen shot from the RichJTable project:

RichJTable

You can find many other JTable extensions:

That’s just a quick search…there are many more out there.

Strong Specific, Weak General

All of these customizations — spreadsheets, tree tables, etc — extend JTable. The variety of custom components demonstrate just how flexible the original JTable design is. Is this a problem?

From page 164 of The Invisible Future: The Seamless Integration Of Technology Into Everyday Life:

In design, there is a trade-off between weak general and strong specific systems. Tools, like people, seem to be either a jack-of-all-trades, or specialized. Inherently “super-appliances” fall into the jack-of-all-trades weak-general category.

This weak general vs. strong specific trade-off plays a role in the design of all tools, including chisels, screwdrivers, bicycles and supercomputers. For reasons that we shall see, strong general systems have never been an option. One has always had to make a choice.

You can read Bill Buxton’s complete essay here. The point is, a Swiss Army Knife has a lot of tools, but they are all relatively weak. A dedicated spoon works a lot better than the weak spoon found on a Swiss Army Knife.

Lack of Outside Innovation

JTable is really good, but at its heart it is a weak-general tool. Writing a general-purpose table component is really hard, and the Swing team did a damn fine job with JTable. Thus, everybody else down the food chain (that I’m aware of) simply extends JTable.

JTable is good enough for 95% (?) of the use cases out there, but it is not as good as a dedicated spreadsheet. This is the problem. Grid components in dedicated spreadsheet applications are designed with one purpose in life: to display and edit spreadsheets. As such, they are strong-specific GUI components. In a dedicated spreadsheet, the GUI is crisp and responsive, focus traversal works, mouse clicks work, and data commits at the right time.

Many people have tried — including myself — to extend and coerce JTable into a spreadsheet component. But Swing JTable is subject to engineering tradeoffs like every other piece of software, and it can only be pushed so far. You can make JTable look and behave a lot like a good spreadsheet, but I doubt we will ever achieve perfection.

Consider IntelliJ IDEA, a product renowned for keyboard navigation. Unfortunately, keyboard navigation sucks whenever they use an editable JTable, such as in the Change Signature dialog:

IDEA Table

Perhaps I need to embed a little video here to demonstrate the problems…basically, keyboard and mouse navigation just feel incredibly broken and wrong. JTable does the best that it can, but it is not a dedicated data entry component. JTable’s primary function is tabular data display, tabular data entry is a secondary feature for this general purpose GUI component.

A Radical Idea

I submit to you this idea:

Someone should write a dedicated Swing data entry grid component from the ground up, without extending JTable.

Random thoughts…

  • This is a really hard project.
  • This is a strong-specific component, designed for one purpose: data entry in a grid, like a spreadsheet.
  • There are no constraints. You don’t have to support existing renderers, editors, or anything else.
  • You should be able to achieve 100% fidelity with a dedicated spreadsheet component.
  • Many businesses would *love* a data entry grid that *really works* flawlessly.

Anybody writing a data entry grid from scratch would certainly be accused of NIH. I can hear it now…”You are writing what??? You should just extend JTable! 95% of the functionality is already there!”

I believe that starting with a general-purpose component might make that final 5% impossible.

Conclusion

I hope I make it clear that this article is not a criticism of JTable at all. Pretending that a single GUI component can be perfect at *every* task is foolish and naive. Yet…I cannot say I have ever heard of someone writing a Java data entry grid — of Excel caliber — from scratch. Every single grid component I have seen extends JTable.

For most purposes, extending JTable is the right choice. But surely there are cases where a strong-specific, dedicated data entry grid component is more appropriate.

Scala Will Do

When the only problems left are the really hard ones…there is only one thing left to do…

Scala Hype

I give this fad 3 weeks.

Questioning Dogma

I am an increasingly skeptical developer. Things that make me think twice…

Shouting
  1. Whenever someone whips out the “premature optimization is the root of all evil” quote. I think most programmers do not realize this quote is taken out of context. The quote refers to “micro-optimization”, not large-scale architectural decisions. Many performance-related decisions DO have to be made early in a project. Thinking about performance is not the same thing as obsessing over micro-optimization.
  2. On a related note, many people (in classes I teach) are resolute in their belief that ORM tools like Hibernate or JPA are simply too slow. Sigh. These people are obsessing over performance too much, as opposed to ignoring performance considerations.
  3. Striving for 100% code coverage when testing. I’ve discussed this before, and subsequent comments and discussions have led me to believe that 100% code coverage is not all that important. It is probably more important to demand better tools that ignore trivial code, to ensure we do test things that really need testing. Filtering out “trivial” code and alerting us to complex code that probably SHOULD be tested…now that’s an interesting problem!
  4. Whenever people argue for or against Ruby/Rails, I am skeptical. Sorry, but the claims (on both sides) are just too nutty and people are too passionate to be objective. It’s not that black and white. Let’s just sit back and watch the comments this provokes…
  5. I feel the same way about Linux, Mac, and Windows. These are such heated topics that I tend to be skeptical of everyone. In fact, too much fan-boy-ism is a repellent for me.
  6. People who preach against Big Design Up Front (BDUF). Be careful, for many of these people warp the theory into “no design is ever good”. I’m no fan of BDUF, but I certainly think about many aspects of design very early on! For example, if I know my app needs to store billions of records, you can be damn sure I’ll be thinking about storage options and sorting algorithms from a completely different angle than for an app storing 300 records.
  7. The notion that unit tests are THE documentation. Uh, no. Tests are for testing. They have a happy side effect of serving as good examples, and I often refer to tests to learn how to exercise an API. But to argue that the whole point of writing tests is to create documentation…that seems wrong to me.
  8. Speaking of testing, people who rail AGAINST testing make me wonder…has this person EVER written a test? Maybe they make good points, but to NEVER write tests borders on incompetence.
  9. XML is all evil, all the time. Wrong.
  10. Ant sucks! Maven rules! Wrong. Or…Maven sucks! Ant rules! Again, wrong.
  11. YAGNI. To a certain extent, but you still need to anticipate and plan for the future.
  12. We need to lock down PCs. Restrict Internet access. Force procedures. Ick. Those things make me unhappy and I am very skeptical these techniques work. In fact, I believe as you clamp down on people, the best and brightest will leave.
  13. SOAP sucks! Web services suck! Yes, I agree. (what did you think I’d say???)

What are you skeptical about? Hopefully someone says no-name bloggers with over inflated egos… :-)

Legacy-Proof Your Frameworks

You know the drill. You design a perfect framework, release it to a group of programmers, and receive rave reviews. Yeah! Then you start version 2, but discover all those happy customers are actually using your API in ways you never intended…or imagined.

Public API

Sure, it’d be nice to rename that method…but you cannot, because that will break someone else’s app. Every time you change something, other programmers have to update their own code, which costs money and time. Features quickly become frozen and virtually impossible to change. You now have legacy code. What happened?

Minimize and Segregate

This article examines two goals when sharing code. First, you should make the public API as small as possible. The fewer things you share, the less tightly coupled clients are. If you share a public API with 500 members, clients have 500 potential connection points to your framework. An API with 20 members has far fewer couplings with client code.

Second, you should segregate public code from “private” code. Make it very clear which parts of your framework clients are supposed to use. In some cases, you can introduce barriers that make it harder for clients to accidentally use internal implementation details.

Hopefully you can guarantee stability with a very small, well-segregated public API, and retain some freedom to improve hidden implementation details behind the scenes.

Minimize Visibility

We’ll start with an easy one. Rather than blindly writing public getters/setters for every field, only expose what you really need. Use the private keyword liberally; only use public for things that need to be available to callers.

Basically, minimize visibility whenever possible. This helps keep your public API small.

Hide Behind Interfaces

Interfaces provide another mechanism to hide internal implementation details. Programmers commonly define interfaces with accessors:

public interface Name {
  String getFirst();
  String getLast();
}

As you can see, the Name is immutable. We don’t see constructors in interfaces. Interface-heavy APIs play nicely with dependency injection tools like Guice or Spring and can significantly reduce the need to code directly against implementation classes.

Use an impl Package

Interfaces need implementations. Where do these go? And what about “helper classes” like LangUtils or ComparisonHelper?

I recommend putting implementation details into an explicitly-named impl package. This gives a clear cue that everything in that package is an implementation detail that may change in future releases.

Keep in mind that Java packages only provide a hint to programmers. It is very likely that many classes in your impl package need to be public in order for other parts of your framework to use them.

Provide Static Factories

So how do you encourage people to only use public interfaces but avoid classes in the impl package? You could encourage them to use dependency injection. Or, you can borrow a technique from the Glazed Lists project. Check out the GlazedLists class.

This is a class in the “public” package with static factory methods that return interfaces. Internally, however, these factory methods construct implementation-specific classes from non-public packages.

Although this does not guarantee programmers avoid implementation details, it does provide an officially-supported mechanism to get to the implementations.

Use Exotic Class Names

Taking the impl package one step further, you might want to name implementation-specific classes using awkward names like _MyApiImpl_LangUtils. It is quite easy for programmers to inadvertently include an impl package in an import statement.

But seeing a butt-ugly class name like _MyApiImpl_LangUtils directly in the source code makes it significantly more obvious that programmers are using something they should avoid.

Include Weird Methods in Interfaces

While it is usually good to encourage interfaces, sometimes you might prefer people extend an abstract base class instead. As soon as you add a method to an interface, everybody implementing that interface is broken.

However if people extend a base class, you can add new methods to that base class without (usually) affecting people.

For Example…

Check out the Matcher code from the Hamcrest project:

public interface Matcher<T> extends SelfDescribing {
  boolean matches(Object item);

  /**
   * This method simply acts a friendly reminder not to implement
   * Matcher directly and instead extend BaseMatcher. It's easy to
   * ignore JavaDoc, but a bit harder to ignore compile errors .
   *
   * @see Matcher for reasons why.
   */
  void _dont_implement_Matcher___instead_extend_BaseMatcher_();
}

Another Reason…

I just used this technique for another reason. I’m writing a little P2P framework and each node in the network has an Address, which is an interface. Behind the scenes, however, the framework has to downcast to an implementation-specific subclass of Address. Without getting into too much gory detail, adding a huge ugly method to the interface helps remind programmers to get their Address from my framework, rather than by implementing the interface directly.

Require a Key

This is a little hack I’m quite fond of. Suppose you have a public factory:

public class MyFactory {
  // public class, but private constructor
  public final class Key {
    private Key() {}
  }

  public static Widget createWidget() {
    return new WidgetImpl(new Key());
  }
}

Let’s study that for a bit. The Key has a private constructor, so MyFactory is the only class that can construct it. This forces programmers (unless they use reflection tricks) to go through our createWidget() method to construct new widgets.

Here is how you’d write the WidgetImpl class, ideally in an impl package:

public class WidgetImpl implements Widget {
  public WidgetImpl(MyFactory.Key key) {
    if (key == null) {
      throw new IllegalArgumentException("Please use MyFactory");
    }
  }
}

Since you cannot construct a Key (again, without reflection tricks), you must use the factory to construct new Widget objects.

(my apologies if the above has a typo or something…I’m coding in WordPress…)

Not the JarJar from Star Wars…

You might consider a tool like Jar Jar Links to hide the fact that your framework uses some specific third party JAR file.

Summary

I hope you find some of these techniques useful. Some people will always try to bypass your public API and rely on implementation details. Who among us has NEVER used some com.sun.* code in our own apps? Ahem…moving on…

The point is that framework developers need to make it very clear which interfaces, classes, and packages are considered “stable” and part of the “public” API. Good frameworks will make every effort to maintain stability of these public APIs from release to release.

Although you could rely on JavaDoc comments or simple package names, it is better to make it as hard as possible for programmers to accidentally rely on internal implementation details. Code completion in modern IDEs makes it really easy to accidentally import something from an implementation package.

Just Fixed a Threading Bug…

…I think.

Now I wonder…did I really fix this thing? Or perhaps the problem occurs once per month instead of several times daily?