Friday, August 24, 2007

Best of this Week Summary 20 August - 25 Aug 2007

No real world shocking discoveries for me this week. Still quite interesting though were:

  • Spring Web Services 1.0 was announced this week. One of its major features is that it facilitates contract-first ("design by contract") webservices creation. This is were you create/generate the WSDL first, then build the implementation (closely related to Spring's interface-based Spring framework). This is different from JAX-WS, where you generate the WSDL from the Java (implementation) classes. Definitely check the comments too, for example to get a feel on how these standards/frameworks relate to eachother: JAX-RPC, JAX-WS, XFire, Axis2, Spring-WS and REST.

  • On a I-wonder-why-they-did-this-side-note: Sun has changed their Nasdaq symbol from SUNW to JAVA. I'm quite suprised they did it, Sun is a lot more than Java and one day Java will be replaced by another programming language... really... trust me ;-)

  • Interesting support from Yahoo! for the Apache project Hadoop. Quoting the About page:

    "Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named map/reduce, where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework."

  • Finally, check these this IBM developerworks article about the new 2.0 release of Mylyn (formerly called Mylar), a task-driven management tool for Eclipse. It adds two facilities to Eclipse: integrated task management and automated context management.
    "Task management integrates your task/bug/defect/ticket/story/issue tracker into Eclipse and provides advanced task-editing and task-scheduling facilities. Context management monitors your interactions with Eclipse, automatically identifies information relevant to the task at hand, and focuses structured views and editors to show only the relevant information."

Sunday, August 19, 2007

An introduction to OpenID

Recently I've been working on extending an existing site with OpenID. In the coming weeks I'll be going into details of OpenID in different ways. This week I'm going to give an overview of OpenID and the areas of the specification that can be improved for readability. In the following weeks I'll be addressing:

  • A comparison of OpenID authentication providers.
  • A description of my experience implementing OpenID, including a comparison of libraries.
  • Outstanding issues with OpenID (security).
  • Where's OpenID going to.

In this first post I'm not going to give a full detailed explanation of OpenID. There are many sources that provide quite a good description. Here's a list of what I am not going to describe, but can be a good starting point to learn about OpenID:

  • Wikipedia has a good definition:

    "OpenID is a decentralized single sign-on system. Using OpenID-enabled sites,
    web users do not need to remember traditional authentication tokens such as
    username and password. Instead, they only need to be previously registered on a website with an OpenID "identity provider", sometimes called an i-broker. Since OpenID is decentralized, any website can employ OpenID software as a way for users to sign in; OpenID solves the problem without relying on any centralized website to confirm digital identity."
  • A good starting point is of course the home of the OpenID specification. You'll see there that the current version is 1.1 and 2.0 is in draft.
  • The difference between SAML and OpenID. Here's a good starting point.
  • Examples of major websites supporting OpenID are: WordPress, LiveJournal, AOL and Digg.

I'd like to focus on a few elements that I found not very well explained on the OpenID website. For example, not very well described (to me :-) is how the delegation of your OpenID provider works. It took me quite some investigation and looking at other sites to figure out how it exactly works. What is comes down to is that OpenID is all about you being able to prove that you are owner of a URL. And a URL is basically just a webpage. The idea is that normally that page contains a <link> tag in its HTML, within the tag, providing where your OpenID provider/authenticator is located. Say you have an OpenID account named 'mytest' at Than you can actually go to the URL with your browser. When you look at the HTML you'll find this:

<link rel="openid.server" href="" />

Use <link rel="openid2.provider" href="" /> for version 2.0 providers

This tells you which OpenID server should be used to authenticate the URL There are two major disadvantages to this:

  • Maybe you want to use a different URL, not with "" in it. E.g.
  • What if goes out of business? You can't login anymore to *any* of the sites you registered with that OpenID!

The solution for this is to use delegation. In that case the <link> tag in the page returned when going to the URL would look like this:

<link rel="openid.server" href="" />

<link rel="openid.delegate" href="" />

Note the additional "openid.delegate" <link> tag. In the above example it points to the OpenID provider and uses for authentication: authentication of is delegated to If ever is not available anymore, you can just create an account at another OpenID provider and put that in the href attribute in the above <link> tag instead of This is using the essential OpenID principle that you actually own the URL. Since you can change the OpenID provider that was contained in the URL, you must be owner of the URL!

Another thing to realize is that the OpenID protocol has no failover requirements defined for OpenID providers. You as owner of the URL will have to arrange that and make sure that you can still authenticate your URL in case the OpenID provider is down/out of business. The only way you can do that is via via delegation. I find this one of the lesser things of OpenID. Delegation is quite hard to understand for "average" Internet users, thus putting the responsibility of "failover" in their (own) hands can cause quite some suprises for them.

The third area where I found the OpenID home and this other good source of OpenID information not well done is giving a good visual diagram of the OpenID protocol. Much clearer is this very detailed description of the protocol flow in OpenID 2.0, including a nice sequence diagram, in this ServerSide article. It is already discussion version 2.0, but it also applies to 1.x except the XRI/XRDS parts. Below that diagram is shown for easy reference:

Sunday, August 12, 2007

Best of this Week Summary 05 August - 12 August 2007

  • A bunch of tips on branching and merging in SubVersion (svn).

  • There was quite a lot of security related news this week. Check this good short overview of what happened at Blackhat Ops)2007.

    At the conference it was shown that many Web 2.0 sites are making the same mistakes as they were in Web 1.0. For example:
    - Improper use of cookies (e.g. CSRF)
    - Putting business logic only in the Javascript client

    If you want to dive into some more low level security details, here's a presentation from the conference which shows three security related issues. It gives ways to exploit these security issues and ways to prevent and/or detect them:
    - DNS rebinding regarding Same Origin Policy in your browser. Also known as cross-IP scripting, also known as TCP relaying. It allows an external attacker to access your internal network, thus bypassing your firewall!
    - Provider Hostility, i.e. Internet providers modifying content of data from websites you visit.
    - Audio captchas, which is "speech, distorted and overlaid with a quieter speech".

Saturday, August 4, 2007

Best of this Week Summary 29 July - 04 August 2007

  • This is a very interesting overview (basically a summary) of YouTube's architecture and how it handles scalability. Python is used all over the place. Includes lessons learned! Of course YouTube does not need heavy business-logic nor a solid transaction-architecture as mentioned here too. This makes scaling a little less of a challenge; but since the numbers are so huge, the challenges are nevertheless large.
    This (a bit older) article describes how Digg handles scalability. Definitely interesting too.
    And here's the inside scoop on MySpace architecture and scalability.
    Here are details on Google's file system GFS and how it helps solve them scalability.
    This PDF shows how Japan's largest social networking site (SNS) Mixi is handling scalability with MySQL.
    Finally, interesting regarding scalability and availability is this architecture "mashup": Java JVM is used to improve scalability in Drupal.

  • This looks like an interesting upcoming feature in HTMLUnit for unittesting AJAX enabled web-applications. It will actually re-synchronize AJAX calls that should run asynchronously.