Sunday, October 28, 2007

Best of this Week Summary 22 Oct - 28 Oct 2007

Sunday, October 21, 2007

Best of this Week Summary 14 Oct - 21 Oct 2007

  • Very interesting post about the Fotolog architecture. Note that they re-implemented a PHP part of the site in Java to improve performance! And they have to be scalable regarding these numbers for example: "300 Million photos and over 500,000 photos are uploaded each day. Over 30,000 new members are added each day and attracts more than 4.6 million daily users."

  • This is a good comparison and overview of 4 open source version control systems: CVS, SVN, Bazaar and Mercurial. The last two take a distributed repository approach, in contrast to a central repository approach like CVS and SVN.

  • Good high level overview of the history of software development. From the Waterfall Model to the current Agile (Iterative) Model. The main gain in all these years is that now it is possible to create quite complex systems with significant less persons because of the progress modern program languages and libraries have made. All of which you already knew of course, even if you didn't start in the 70's... ;-)

  • James Gosling made the (quite logical) announcement earlier this week that in the future (mobile) devices will get so powerful there won't be the need for a separate Java Micro Edition anymore; there will be only one Java SE (Standard Edition). Also a few interesting points about the upcoming Java 6 update N (preload of the Consumer Java Runtime at computer boot), which should be available a few months into 2008, preparing the desktops for JavaFX.

Saturday, October 13, 2007

Outstanding issues with OpenID and tips for improvements

In this fourth post about OpenID I'll try to give a complete overview of the outstanding issues with OpenID. At the end I'll give a couple of tips for improvements on the outstanding issues. You can find my previous posts here, here and here.

Not average-internet-joe ready
An OpenID is just not in the format an average user currently understands. My parents for example would not be able to "grasp" the idea of a URL being your identification. XRI is a work in progress to let people pick a username that is more or less like regular usernames like '=Paul.Smith'.

Many websites that support OpenID just point to OpenID.net (just renewed by the way, where it's now a lot easier way to find OpenID providers). From there on the user is just "on her own" to figure out what to do. This is not really OpenID's fault, but if the sites that are OpenID-enabled (consumers) are not making it easy for the user to create an account, users will just give up and sign in the "old fashioned" way. These sites should provide direct pointers to solid OpenID providers.

Delegation is quite hard to explain and understand. Even harder to actually use. It is just too hard for the average internet user to set it up. Though delegation is a real important aspect of OpenID, because it allows you to not be dependent on one OpenID provider. If you don't set up delegation for yourself from the start (because that will be your OpenID URL which you can then point (delegate) to any OpenID provider you want), you are out of luck if ever your OpenID provider goes busted.

Different OpenID providers show different, sometimes even confusing, messages when the user has to confirm the site they want to get access to. It gets even more difficult when you can assign multiple personas (see my second post for an explanation of personas). Which one to pick? And why?


Security
Of course there's the phishing issue (man-in-the-middle) that a malicious consumer site can just redirect the user to a fake OpenID provider. A solution some providers take is forcing the user to login via their regular login page first (and bookmarklet). Though it provides a small barrier, it makes the whole OpenID process just more confusing to the user. And it is not a full solution against phishing; on the phishing site, just tell the user the separate login-page has been fixed (just a bit of social engineering :-). Note that OpenID trusts DNS to direct the given URL to the correct machine; DNS servers are known for being hacked too.

There's also the replay-attack issue, where a sniffer can grab the authenticating response and replay it to the consumer. A partial barrier for that is the use of a nonce (number-used-once, see my third post for some references). Version 2.0 of OpenID should by default contain the nonce-fix for replay-attacks. This does not protect against the case that the man-in-the-middle is the first to use the resonse-URL (more a "pre-play" attack).

If an attacker gains access to a user's OpenID login, he immediately has access to all sites that user can login to; with the same OpenID/password combination.

Since all OpenID providers have the option to stay logged-in to it (thus authenticating without providing a password), CSRF attacks become very easy: no password is required.

Exploitation of an XSS flaw on trusted domains as something.CNN.com or else.microsoft.com to prevent an OpenID provider to know where the user is really signing in to. For a full explanation see "[OpenID] What's broken in OpenID 2.0? (IIW session)".

How can a consumer use OpenID in an API it provides? The consumer can not ask the user for credentials at each API call. It should ask via the OpenID provider. Work in this area is the oAuth protocol, which I'll cover a bit more in my next post about OpenID.

Privacy
Since all authentication (ownership prove of the OpenID) goes via the OpenID provider, the provider can track all the sites their users are accessing.


Improvement tips
Below I list a couple of ways OpenID can be improved to tackle the above mentioned problems:

  • Integrate the flow of signing up to an OpenId provider into your consumer/relying party (OpenID enabled) website.
  • OpenID providers should provide clarity upfront to the users whether their service will always be for free, whether it does support multiple personas etc.
  • Consumer sites should implement OpenID more transparently. There is no need to make a distinction at registration between OpenID or not. If the user enters no password, it's probably an OpenID so try OpenID authentication, otherwise it's probably a regular signup.
  • Find a better solution to handle phishing and replay-attacks. SSL client-certificates could be a solution, but then you'd have to bring the public-private keys to every browser and delete them again. The solution could be cryptography using private/public keys (thus not using a password).
  • OpenID providers should not recycle inactive accounts or at least use a nonce, which the consumers should also check.


References
- The Identity Corner » The problem(s) with OpenID
- Beginner's guide to OpenID phishing
- Single Sign-On for the Internet: A Security Story

Sunday, October 7, 2007

Best of this Week Summary 30 Sept - 07 Oct 2007

  • Interesting and provocative point of view in this blog "Why most large-scale Web sites are not written in Java". Quite an extensive discussion can be followed in this TSS refering post. I think the main reason is that the use of the programming language and stack depends on the requirements of the application. Most of the example websites given do not have serious transactional requirements, including transactions that would run over multiple systems, requiring XA. For those example websites no real big harm is done when a transaction occasionally fails. Note also that most Java/JEE implementations use at least some part of the LAMP stack, like Linux and Apache. See here for other reasons.

  • Danny Ayers is wondering whether JSON is missing a DTD or XSD. Because if you take an arbitrary JSON document from the Internet, you can't tell what it is containing; and usually you can't find it out either. Is that a good or bad thing? I'd say you've got XML and its associated XSD or DTD for interfaces that require a well-defined interface that can be passed on to other, potentially external, 3rd parties without too much effort. For interfaces that stay internal to your system (for example an AJAX call from the browser to the server), adding the extra overhead of some validation format like an XSD or DTD causes just too much overhead, losing the gain of the compactness of JSON. Data is validated anyway by the frontend and backend, though with a bit more programming effort.