Static typing and correctness

I've been pointed to a post by Joel Spolsky advocating Hungarian notation so that code that fails to properly sanitise strings "looks wrong".

Here's Joel's example. The idea is that you prefix the names of all string variables and functions returning strings with either "s" or "us" depending on whether they're safe or unsafe respectively. Then assignments that have "s" on one side and "us" on the other just look wrong.

us = UsRequest("name") // ok, both sides start with US
s = UsRequest("name") // bug
usName = us // ok
sName = us // certainly wrong.
sName = SEncode(us) // certainly correct.

Well I can think of one pitfall already. How do you mark whether a function expects safe or unsafe data in a way that makes wrong code look wrong?

us = UsRequest("name") // okay
RandomMangler(s) // okay
RandomMangler(us) // errm... wrong?

If you're using a language that supports it, there is a far better way. Think about it. What does Hungarian notation do? It marks the fact that a variable contains data that statically has some property (that is, we know it has that property at compile time). What else can you think of that tells you about static properties of data? What, for example, do we know about any random string? We know it is a string, and that's about it. But that gives us a clue. In most languages a string has some type that marks it as a string. If it's not a string, it can't have that type. What if we made sure that if a string isn't safe, it can't have some type?

class SafeString extends String {
    public static SafeString sanitize(String s) {
        ... // sanitise it
        return new SafeString(s);
    }
    private SafeString(String stock) { return new String(stock); }
    ...
}

Now we can have something that is still a string (SafeString is a subtype of String), but if we want something that only accepts safe strings we can give its argument the type SafeString. Then because the only way we can create a safe string is by sanitising it first, we can never pass an unsafe string to something needing a safe one. To put it another way, wrong code now looks wrong to the compiler. And the compiler is much, much better at spotting things that look wrong than you are:

us = Request("name") // ok
s = Request("name") // type error
usName = us // ok
sName = us // type error
sName = SafeString.sanitize(us) // ok

These type errors will produce a compile time error -- the code simply won't compile, so it can't cause security problems, because it never becomes an executable program. Of course, this only works if your language is strongly, statically typed -- if it isn't, you still get binaries that either break (because the typing is weak, so you can pass unsafe strings to things expecting safe ones and it still compiles) or blow up (because the typing is dynamic, so passing unsafe strings to functions expecting safe ones causes a runtime error). You might consider runtime errors to be bad or not, depending on your perspective, but letting type errors slide is to be avoided.

Leave a Reply