Warning: Cannot modify header information - headers already sent by (output started at /home/fourstar/public_html/wp-content/plugins/wordpress-automatic-upgrade/lib/pclzip.lib.php:5755) in /home/fourstar/public_html/wp-content/plugins/ozh-who-sees-ads/wp_ozh_whoseesads.php on line 624

Warning: Cannot modify header information - headers already sent by (output started at /home/fourstar/public_html/wp-content/plugins/wordpress-automatic-upgrade/lib/pclzip.lib.php:5755) in /home/fourstar/public_html/wp-content/plugins/ozh-who-sees-ads/wp_ozh_whoseesads.php on line 625
Four Starters | webapplications

Archive for the 'webapplications' Category

Monday, December 10th, 2007

Federating Social Networks review

Yesterday like I talked about, we had the Federating Social Networks meetup (Upcoming) at Mediamatic. This resulted in early mornings for those coming from outside of Amsterdam. Tijs, Mark, Pascal and myself took the early train and Blaine Cook and David Recordon had flown in from San Francisco for this meetup. Fortunately there was an espresso machine to keep us alert.

The aim of the day was to talk (see the Jaiku backchannel) about how we could use the technologies at our disposal and the data available to create better experiences for users. It looks like most of the specs are there and the conversation has been going on for most of the year. We should begin building stuff.

Mediamatic at least is dedicated to build something on top of anyMeta which needs to be ready by Q1 2008. The workshop was an effort to gather thoughts on the best way to move forward.

This week also saw the announcement of the DiSo project by Chris Messina. A way of building open social networks using Wordpress as the platform.

Problem statement

We spent quite some time on the problem statement and use cases. The discussion went all over the place both in subject matter, scope and level of detail. At this point I don’t think the philosophical considerations are very useful anymore. This is a broad subject and, yes, everybody has an opinion about it. We need to be a lot more concrete about what we want to build and steps we can take right now to get there. I thought that was clear from the event description but apparently not everybody read that.

Latop Crowd

The vision that is on the table is quite grand. By distributing and taking ownership of your own data, be it profile information, your relationships, your writing, your pictures or your videos, you get full control. Sites that want to participate in this effort will need to abide by your rules and read and write accepted standards. This means a dramatic redefinition of the way the internet works, so dramatic that it will not happen anytime soon. Also, I don’t think that we can standardize all that in an afternoon.

The problem that we need to solve and which is currently causing painful experiences is: Almost every site and application can be enhanced by adding information about the people you know. How do you do that without replicating effort both for developers and users time and time again?

Finally the list of use cases that we came up with:

  • Profile Aggregator (OpenID)
  • Access Privacy (profile, contacts, stuff I own, claim stuff/ publications)
  • Migration of data/ ownership
  • Content discovery/ finding stuff
  • Set privacy (noindex, etc)
  • Consolidation of data/ profiles
  • Personal Messaging
  • OpenId - reflection of profiles/ relations
  • Referencing accross sites
  • Control of representation of copy
  • Pingback when your object has been used/ altered

XMPP does it all

A lot of work especially by Ralph is focused on creating a Jabber/XMPP pubsub specification which can be used to post content and updates to and making it easy for interesting parties to be notified of those publications. This is very nice and Ralph’s presentation extolls most of the virtues of XMPP.

Ralph

Still it will take some time before this becomes relevant for the rest of the web. XMPP is the best thing since sliced bread and I imagine that the guys building it can make it do pretty much everything. There are two problems that hamper its adoption.
First the language and the concepts are sufficiently different that people need a lot of introducing before they are up and running with the concepts.
Secondly once you understand it, there is not much it will do for your blog running on a shared PHP host. Also if you do run XMPP on your own server, you can interface with existing services and you can do anything but there are not any well defined interactions yet.

Mediamatic is aware of this and for their own (PHP based) anyMeta sites and for the rest of the world that wants to participate they are going to provide a bridging server where websites can POST updates using HTTP and the service will publish notifications to interested parties both using XMPP and HTTP depending on the capabilities of the receivers.

HTTP may not be ideal and people fluent in XMPP describe most of the stuff it has been forced to do as hackish. Still, HTTP has a lot going for it. With Atom and REST, HTTP already drives a lot of application functionality over the internet. And with Comet style interaction starting to catch on non blocking HTTP servers will become more and more normal. This will make real time interaction and stuff that is currently not scalable easier.

Moving data

Luis Villa’s post eloquently makes the case for being able to move our data whereever we want. This is quite a big problem and not one that is going to be solved easily if at all.
Sites such as Flickr will allow you to get your data but there needs to be more incentive to open up and more standardization in container formats.

Presentation

The use case that was discussed of being able to own your pictures, the permalinks pointing to them as well as the comments on those pictures and being able to move that wholesale to a different site strikes me as somewhat too utopian. A site such as Flickr offers you their hosted application and hosts your pictures for you. As it happens Flickr has an API which allows you to get your data back but you will never be able to make a 1-to-1 mapping to another service.

Owning your namespace on a server not your own is a known problem: e-mail has the same problem and it still hasn’t really been solved. A few hosts such as GMail are gracious enough to let you POP your emails off their server, but you still have a middle man that you can’t cut out. Owning your own domain and forwarding it to another service (like Google Applications) seems like the way to go.

I don’t see this issue as going to be solved any time soon. The stakes are too high, the subject is too complex and in most cases a local copy will have to suffice. I have gotten used to losing some data at every significant computer migration. You can’t have your cake and eat it. If you really want to be in control, install phpAlbum on your domain on a generic host and move that around all you want.

Concrete steps towards the future

Marc promised a documentation server with the findings and draft specifications soon. Somewhere early next year Mediamatic will publish their public HTTP to XMPP bridge. Blaine, David and Ralph were supposed to draft something of a spec, but I don’t know when it’ll be made available.

Tijs has been creating quite the list of interesting sites in this space. Like the Attribute Exchange schema supported by OpenID 2.0 which looks very interesting. And a start page for all the standards for this initiative: Data Portability.org.

Another thing would be to start implementing the wordpress plugins listed at the DiSo wiki. I have an hAvatar plugin lying around which needs some testing before release.

A Wordpress plugin that will speak to the XMPP bridge service would need to do the following:

  • Add XMPP autodiscovery links to the <head> of the blog.
  • Ping the bridge service using HTTP every time a post is made or updated.
  • Maybe: listen to notifications from the server for stuff such as blogrolling or trackback.
  • Maybe: Publish your friend list as XFN to the bridge so interested parties can subscribe to that.

This won’t be too difficult to implement but it has to wait for the pubsub bridge to become public. It’s looks like the best way to converge to each other is to create stuff.

Tuesday, October 2nd, 2007

10 Really Interesting Things To Ask At FOWA

It’s less than 10 hours to the start of the Future of Web Apps conference in London, and I started to think about some interesting things to ask all the startups at the expo. I decided that, instead of asking the obvious things like “what does your app do?” it might be more interesting to focus on some of the bad issues we have with web applications these days. Here are the 10 probably really interesting questions (in no particular order) we will be asking at FOWA tomorrow:

FOWA

  1. Why would we really need this application?
  2. Do I need to login to Facebook before I can use your application?
  3. Do I need a Twitter account before I can use your application?
  4. What other company does your corporate logo feel close to?
  5. Do I need to register before I can even look at your application?
  6. Do I need to re-add all my friends when I join your site?
  7. If I join, will you start spamming my friends with invites?
  8. If you get bought, will you screw over your users?
  9. If you get bought, will you screw over your users?
  10. Does your mom understand how to use your app?

Got more questions we should ask, then add them to the comments before tomorrow.

Sunday, September 23rd, 2007

Quechup - How Invites Really Became Spam

At the beginning of this month I wrote about why I thought invites for uninteresting webapps could more and more be considered a type of spam. I was blaming both the users that let the app invite everyone they wanted, and the apps that just didn’t keep record of who they already spammed.

Now, there is news of a new webapp named Quechup (I’m not linking to them on purpose) that recently took it to the next level by just inviting everyone in your address book, even if you have told them not to do that. Eventually this leads to a “virus” much like those old school email viruses that spread themselves by mailing everyone in the receivers address book. The app has been in the news quite a lot, and I think it proves some of my point, including the fact that it stupid that people are starting to regard invites as something normal without thinking.

Obviously this new trend shows an even worse trend of people giving their email login details to any random app! People should realize that email functions much like a single sign-on at the moment, meaning that with your email login details anyone could get access to any of your other websites. I hope more people will start to realize that we need to develop and adopt new technologies that will enable apps to intercommunicate without having to share the login details.

Tuesday, September 4th, 2007

How Invites Became Spam

Recently I have been spoiled with invites for so called “private Betas“, so much even that I had to add some of these web companies to my spam list. I simply don’t even want to try out these web apps anymore, because they simply annoyed me too much.

What is the problem? It is that these web2.0 tools have every new user select everyone from their Gmail/Hotmail/YahooMail contact list and send them a mass invite. Not only is this not personal, it can also lead to more than 10 invites for one person. I already received 11 invites for Doostang, an app that I am not going to try, although it might be very useful. (I actually received more that 1 invite from 1 person! Thank you Chakib!)

I seriously like some of the new web apps, but in the beginning of the “invite”-hype it was an honour to get an invite simply because they were scarce. These days, an invite is as common as the air we breath (although I don’t consider air spam!) and I simply consider them spam because they annoy me and I can’t seem to unsubscribe from them. Invites these days are overrated, impersonal, and highly annoying.

So how can web apps change this phenomenon? A couple of things come to mind. First of all I would like to see a “refuse all future invites”-button that I can click to stop (so-called) friends from sending me invites. If I can’t do that, than an unwanted invite means as much to me as unsolicited mail - a.k.a. spam.

Secondly, I think that any product should be able to engage people to make contacts on their own. If people need to make contacts to further enjoy their product, then it will automatically motivate them to invite their friends. And if I get a personal invite from a friend via Twitter, MSN, or email, then creates so much more impact than some automated message. Obviously the problem that arises here is that if your product actually sucks, it makes sense to have people invite their friends automatically as soon as you can!

In the end I realized that any web app that has to use some kind of automated and impersonal invite scheme to got their users to invite their friends is probably a rubbish product to start with. So remember, if anyone send me loads of invites for the same product, I will automatically thank you for the “regard this product as useless”-notification you just virtually generated.

Tuesday, September 4th, 2007

GMail Meets the Desktop [Update]

The people that follow my personal tech-blog might already know about Mailplane, but I thought it was time to spread the word about Mailplane to a more mature crowd. Mailplane is a Mac OS X application that has Google Mail (Gmail) meet the desktop. It combines the power of a conventional desktop application (like Mail.app) with the flexibility and quality of an online email client.

Explaining the origin of the idea is best let to Ruben Bakker, the creator of the application:

“I really love Gmail. It is superior to Outlook and MS Exchange I experience at work: The Outlook Webclient is a joke, I get more spam than normal messages and server space is so limited I constantly must delete messages” …. “But I missed quite a few features that Mail.app and other traditional mail clients offer. Gmail with its browser interface just didn’t reach my desktop. For example uploading an attachment involved too many steps: Exporting the image from iPhoto, somehow resizing the picture and then attaching it by using the ‘Choose file’ button.”

So what does Mailplane really offer? For me the features and advantages are simple:

  1. Gmail in it’s own application, instead of hanging around in my Safari, in a tab that I never close.
  2. Integration with iPhoto, making it possible to simply email a photo from iPhoto, just as you would do with Mail.app. No templates though, so no nice photo emails as you can send from Mail.app.
  3. Drag and drop attachments. Just drag and drop any file from your desktop straight into Mailplane as an attachment. Way easier than the web browser method. No support for folders though (would be nice to have it auto-zip folders (especially .app folders).

Mailplane runs on Tiger (no Leopard support yet as I tested yesterday) and even already supports iPhoto ‘08. It currently comes in multiple languages, with a Dutch version being added soon. I took the honor of localizing this app to Dutch, making this my first localization project. I hope it’s not too bad as my Dutch has degraded since my move to the UK.

For now it is free, but there are plans to license the application. I am hoping this price will stay low as I don’t see many people want to pay for something that is normally free. The current version is at 1.51 and is clearly still in Beta, including some bugs and lots of features to be added. That said I expect that Ruben will make future revisions of this app more and more interesting for people who want to use Gmail as a desktop app.

If you want an invite for the Beta, than drop me an email with your details on cbetta[at]gmail.com.

Update: Ruben just announced version 1.51 which comes in 6 more languages, including Dutch.

Wednesday, July 25th, 2007

Regexps are a security leak?

This is a -very- technical post, so if you aren’t a programmer you may not be able to follow along.

Regular Expressions are used in virtually all webservices. mod_rewrite, a very popular apache plugin, uses them. Django, a popular Python web framework, uses regexps to map URLs onto the code that can handle the requests. Perl is virtually built on regular expressions. Virtually all languages popular for web development support regexp parsing.

Unfortunately, certain regular expressions have what I call ‘runaway nature’. A regexp with ‘runaway nature’ has the following property:

There exists at least 1 input string which will cause the act of matching this input string against the regexp to take a very long time.

Simple example: Given the regexp (x+x+)+y and the input string xxxxxxxxxxxxxxxxxxxx, most regexp parsers just hang. Smart ones realize this can’t work (as all matching strings must end in a y, but the input string does not. Unfortunately most aren’t that intelligent). Turns out on e.g. the C# regexp parser, an average powerful machine needs 25 SECONDS to realize that the input does not match the output. See This codinghorror article on the details of this particular case. Clearly the regexp (x+x+)+y has runaway nature, at least on the C# regexp parser.

There are many regexps which have ‘runaway nature’ on only certain platforms. However, no implementation of a regexp parser that I know of is completely immune to ‘runaway nature’ - some regexp strings just implicitly have it, regardless of implementation.

This is a security leak; causing one of the CPU cores of a webserver to hang for 25 seconds makes it totally trivial to crash the server; this is known as a Denial of Service attack. No data is compromised, but the server just stops working.

There are 2 ways this issue can be fixed, that I can see.

  1. Determine if it is possible for a machine to determine in constant time if a certain regexp pattern has ‘runaway nature’, and generate a warning if this is true. This allows web programmers to be warned in advance that they have a security risk.
  2. When running a string against a pattern, allow the programmer to specify a ‘limit’. Once the regexp parser backtracks that many times, it just quits and throws an error instead of getting bogged down. By choosing a careful limit, a web programmer can trade off ‘correctness’ against server security. I get the feeling that any input string that causes runaway performance troubles is very likely to be an invalid usecase anyway.

    Unfortunately, neither fix is available as standard solution in any mainstream programming language that I know of.

    I’m not sure how large this problem really is but I can imagine there are lots and lots of webservices out there which can be brought to a grinding halt by feeding it the right (wrong) input.


    NB: This issue crossed my mind when I crafted the following regexp to check if an input string appears to be a URL. I’m not sure if this regexp has ‘runaway nature’. If you’re a real regexp guru and can figure this out, or if you spot any errors, help me out and let me know in the comments! Thanks a lot!

    • ^([hH][tT][tT][pP][sS]?://)?

    • ((?:[a-zA-Z0-9][a-zA-Z0-9-]*?[a-zA-Z0-9]?)(?:\.[a-zA-Z0-9][a-zA-Z0-9-]*?[a-zA-Z0-9]?)+)
    • (:\d+)?
    • (/[\w/\.;\?:\&=+\$,#]*)?$”

    (1,2,3,4 stands for: protocol, server, port, path string).

Tuesday, July 24th, 2007

It’s Time to Replace eBay

To me, eBay is more and more becoming a useless product as it totally fails to take care of what is so important to me and many others: the social aspect of a transaction. I believe eBay is one of the last big Web1.0 players that really has to start an innovation cycle, as they keep on focusing on business and barely on any social aspect of a selling/buying activity. Yes they have a nice rating system which even gives me a rating of 48, but that still barely ever guarantees me a pleasant experience with any user.

Basically I can think of a whole list of things that are wrong with eBay, and I thought I would just put them here to inspire some people that can or want to make a change:

  • Make it possible to publicly comment on a listing or on a person. This would allow for far quicker and more natural protection against scams.
  • Don’t remove a listing until the seller approves that it is sold. All those scammers out there make it too annoying to have to re-list stuff over and over again.
  • Allow for real listings instead of auctions. Sites like GumTree.com and Marktplaats.nl prove that this works. An auction is a very business-like concept, and for most people a worth-of-mouth-deal with a personal-local-delivery will do fine.
  • Integrate the social side of eBay with the listing side. EBay already has some social parts like a community and a forum, but they are simply not directly bound to the real social objects of the site: transactions.
  • If you make your site social, than you should yourself be social too. The eBay community forum is a good of example of how NOT to do it, as no eBay employee reads the forum to give feedback.
  • Don’t use the help/faq as a measure against complaints. If I report a scammer, don’t send me an email with a printout of the help page about what a scammer is. This is what most annoys me about eBay: respect your smart users in a way that they can make a constructive contribution.

All and all pretty simple ideas, and I think that most of you might have another 100 ideas. If I wasn’t studying I might have made my own eBay clone, using PayPal AND Google checkout. Anyone else up for the task?

Sunday, June 17th, 2007

All Transactions are based on Trust - Part 3

The series finale. (previously: intro, part 1 and part 2)

Part 3: Future of Trust: Ponderings on the future of the social web

In parts 1 and 2 we’ve created a pretty sweet hypothetical article recommendation engine based on networks of trust relations.

That was merely an example; almost everything you do on the web involves trust. Consider the following current internet practices that really need some sort of trust web to solve a bunch of defects:

  • eBay needs more trust. Not just the general “Can I trust this guy to actually deliver what he’s selling?”, but even simpler, what if you could reduce a buyer/seller’s feedback score to only the feedback given by your trust network?
  • Receiving e-mail: What if your spam filter could take into account the trust relation between you and the email sender? A respectable company would be able to get their form emails easily past your spam filter, and any companies that do engage in spam will see real repercussion and cost: Massive loss of trust, undermining any future endeavours. If the trust vectors are interconnected, this loss of trust hurts them on the entire web, not just on email.
  • Blog commenting: Almost analogous to receiving email —no more need for akismet. Knowing the trustability of a server operator also helps directly in cutting down on linkjacking and shill blogging (Trust #3 in the reddit/digg/delicious analysis: Can I trust the host of the linked article to be honest is satisfied with such a system!)

You can come up with similarly elegant fixes to just about everything you do on the web.

Trust is universal

The world is your oyster

The one problem with setting all web services up to work with webs of trust is that it’s annoying to upload your list of friends to all these webservers. Optimally you really want a single site/space/page where you can drop your list of people you trust, and let all other services —your email provider, digg, reddit, del.icio.us, eBay, your blog software, your web browser, your flickr account, etc.— simply read out your trust web from there.

There are 2 separate movements underway to help out in this regard.

Facebook and Open web APIs
A number of disjointed web platforms already are aware of (some of) your web of trust. For example, your average ‘social network’ server (facebook, MySpace, Hyves, etc) knows about your friends, your friends’ friends, etcetera. In theory at least you trust your friends at least somewhat. Facebook has made the bold first move of making it relatively easy to ‘surf’ this network of trust, which should make it possible for other sites to simply glean your trust network from there instead of re-inventing the wheel.
Hopefully other services which have a part of the network of trust relations will open up their services as well. For example, mutual email conversations —you both sending and receiving— is a pretty good indicator of trust. Some crafty database queries on the gmail server could produce a very useful web of trust graph. The blogs you have in your RSS feed are also a (usually) positive reflection on the amount of trust you have for a given user in this case the author/operator of the sites behind those feeds.

Opportunity

Interesting startup idea here - or probably more likely a lucrative opportunity for an existing social network service, like facebook: You leave your username and password details of a number of web services, to get a heuristic attempt at recreating your complete trust web. Because the indicators I named above sometimes might be wrong, this site should also offer a simple way to give someone an explicit positive or negative review.

The biggest challenge here is simply realizing that two accounts on two different web services belong to the same person. Not all web services use email, and most people have more than one e-mail address.

OpenID
The OpenID movement is taking a more distributed approach to the problem. We at Four Starters have written lots about OpenID, but the basic gist is simply the ability to store all the information you usually need to fill in to register at sites (username, password, email, home address, website, thumbnail foto, etcetera) on one server, so that you can then allow other websites to simply ask that server for the information. The ‘OpenID’ server, upon getting a request for any sort of information —including just authenticating that you are you— then asks you to identify yourself. The upshot is that only your OpenID provider even needs to know a password. For all other sites you simply enter your OpenID —which is a URL. Mine is http://reinier.zwitserloot.com/ for example.

OpenID Logo

The amount of data you can put on the OpenID server is extensible; it doesn’t have to be limited to just the usual name, email, address information. You could stuff your trusted contacts in your OpenID database as well —a list of OpenIDs combined with a trust percentage. This system solves the problem of linking identities that the aggregate existing services plan listed above suffers from.

This is really the solution Dick Hardt seems to be talking about in his world famous Identity 2.0 presentation. I had the good fortune to see an extended and updated version of it live at The Next Web 2007 where he presented.

I’d love to delve deep into what needs to happen to the web to make this solution work, but I couldn’t possibly do as good a job at it as Dick’s presentation, so I will simply suggest you watch it, if you haven’t already.

Maintaining the trust web

Wrench

As Cristiano wrote yesterday, and as Deborah Schultz talks about in her presentation, there are gradations of friends. There’s a parallel here to trust: There’s also a gradation of trust. Some people I trust almost completely, others I only trust a little bit. Just like friends, these levels are also dynamic —sometimes trust (and friendship) waters down over time, sometimes you make new friends, or learn to trust new people and sometimes someone does something to lose your trust.

Because it’s important to keep your web of trust updated, the idea of letting each site run its own little web of trust doesn’t scale very well. Centralizing your web of trust into a single repository is crucial to making this vision of the web a reality. It also means that this trust relation thing really needs to be a read/write proposal: It must be possible for me, optimally speaking, to very very quickly downgrade or upgrade an individual’s trust percentage in reaction to for example getting screwed on/satisfactorily completing a transaction with someone on eBay. There’s no good reason why OpenID (or the facebook API) can’t be extended in such a way as to make this possible.

While it can be argued that trust is dependent on the type of action. For example I trust my baker to make me a nice pie much more than e.g. some of my friends who can’t cook for beans. I doubt this is needed. After all, I DO trust my friends not to try and saddle me up with a nasty tasting pie I don’t actually want.

Security: Hurdles ahead

Unfortunately it is now time to delve into the issues that will have to be solved before this is going to work.

Hurdle

Primarily, there is identity theft. It’s already a big problem now, but with trust webs, getting your identity jacked is even more of an issue. Lots of spam is already sent from compromised computers. It’s a small leap to go from there to also jacking that user’s OpenID login, so that the spam software can add itself as a trusted resource, or, alternatively, to just identify itself as you. Either way, everyone who trusts the user with a compromised computer now also trusts the spammer. It doesn’t even have to involve keylogging. The world doesn’t change in day, in practice we’ll be stuck with old services using user/pass based login for decades. Random users are very likely to use the same password there as they do for their OpenID provider. In effect we create a single point of failure by centralizing identity in this way.

One solution is to not use a password to identify for OpenID. Instead, use a ‘shape password’ (the act of drawing a little image), or a ‘visual password’ (the act of picking an image or a series of images out of a large set of them). By aggregating all the user/pass stuff into a single page, it is possible to be a little more thorough and intelligent about the way this site verifies your identity. Another option is to use hardware, like a USB key, to serve as authentication device.

Still, none of these solutions are completely impervious to security leaks of some sort. As Bruce Schneier explains, in general security products tend to suck. Designing for failure is going to be necessary.

I don’t really have the answer here, unfortunately. Brighter minds will have to crack this nut.

Going the distance

A couple of web-based services would be made possible with such a centralized web of trust that currently aren’t really feasible. Just to really dig deep into the possibilities, imagine a political system based on this web of trust. Instead of electing a representative based solely on ideas, you elect on trust - basically on the idea that a given individual will be honest and integral about representing you. If a system exists to anonymously inform your representative about your preferences, the representative will then have to filter and interpret the spirit of his constituents’ opinions. Attempts to pander to company lobbyists, or to go too far against the opinions of those who voted such an individual into power should lead to a loss of trust, which will prevent re-election, or, preferably, at some point just means he is ‘fired’ from his job as representative the moment his trust level drops too far.

Vote but better

Thursday, June 14th, 2007

All Transactions are based on Trust - part 2

The series continues. (previously: intro and part 1)

Part 2: Analysing a trust-aware internet transaction: del.icio.us network

In Part 1, we analysed a typical transaction on reddit, an article aggregator with the principal function of recommending you interesting articles to read when you are bored or just in need of some news.

Today, we look at a service with a very similar premise - del.icio.us network. While the premise is exactly the same (give you a list of articles which might be interesting), and while the basic notion is similar (a disconnected set of people basically ‘vote’ on stories), the actual implementation is completely different. Specifically, the way trust is interweaved into the the network feature compared to reddit’s system is entirely different.

Where reddit seems to actively try to eliminate trust as a factor (it is for example impossible to see who votes for what, only comments and submissions can be found, though not easily) - del.icio.us network works solely on trust relationships.

Let’s revisit the same transaction of part 1, but this time with del.icio.us network.

del.icio.us recommending me something to read

I go to my del.icio.us/network page. I will need to trust the operators of del.icio.us, which can be problematic, as del.icio.us is owned by Yahoo, a business. Businesses, in theory, have no morals. Fortunately in practice I can take off my paranoia hat and trust healthy competition - google does not point me to any convincing evidence that Yahoo is trying to surreptiously hawk political views or allow unmarked advertising. I’ll trust this site - enough, at least, to let it recommend articles to me.

The network page is a lot like any reddit page - a bunch of articles, some with very obvious descriptions, some less so. There’s some extra fluff (total number of del.icio.us users who bookmarked a given article, and a tag list). While potentially interesting, from a trust point of view this information is just as useless as reddit’s article score.

The next issue of trust is, for each article that appears here, if I can actually trust that I should give it my due attention. This is where del.icio.us/network differs from reddit and digg: An article is on that page ONLY because one of my direct connections thought it was sufficiently cool to bookmark it. I trust those people I manually add to my delicious network. Thus I can directly trust the articles that show up on my network page. The exact mechanism of trust is left to the user; I may trust one of my network contacts because they are my friend. I may trust someone else because I like his blog and the articles he links to there seem interesting. Regardless of why I trust my network contacts - the point is that I personally trust them.

The final step of trust is - once I decide to read an article, can I trust that the operators of the server that hosts the article are trustworthy? This step is also much more adequately addressed: One of my personally trusted contacts saw fit to go through the trouble of bookmarking it. At least a modicum of due diligence has probably been applied.

From a trust point of view then, del.icio.us/network is on the up and up. There is no problem here - trust-wise, this system will not collapse under the weight of its own popularity. Of some schmoe manages to sign up for a del.icio.us account and starts bookmarking spam, tripe, and drivel, I don’t even notice.

London Eye

Basically, my network is a wheel: I’m at the center, with all my connections arranged around me, feeding article recommendations to me.

There’s even a responsibility system built in: If one of the users in my network keeps bookmarking crappy articles, I can remove them. One common problem with responsibility (a.k.a. karma systems - scores for users) is that the trust issue isn’t addressed at all: The karma of any given user is again determined by untrustable, unaccountable masses. Removing someone from recommending articles to you completely is much more effective from a trust point of view.

Trust is neccessary… but not sufficient

Unfortunately, though, just because you built a system that maintains trust in the transaction, doesn’t mean your idea is any good.

Some problems with del.icio.us:

  • Traffic - once you run out of articles, there are no more. On reddit and digg, there are always more stories to read because the pool of submitters is much larger.
  • GroupThink - If all the users in your network read the same blogs, work in the same area, and have the same thoughts, your network is very unlikely to bring you new ideas in new topics, or well written arguments for viewpoints you do not hold. In practice large communities suffer just as much (Digg and Reddit have of late sported front pages where every single article is either extolling the virtues of one Ron Paul, presidential candidate for the 2008 elections in the United States of America, or taking the mickey out of George W. Bush).
  • Rating - While on reddit each article has a score and thus you can sort them, on del.icio.us an article is either on your network page, or it isn’t. Once your network produces more articles than you can handle, there is no way to prioritize them usefully.

Fortunately, trust can help us out here, if you apply some more of it to del.icio.us network. None of the steps I’m going to explain here have been implemented by del.icio.us yet. It would make for a much better experience if they would.

A wheel does not a network make!

By acknowledging that a network is more than just a wheel with spokes, these problems can be addressed!

In the ‘wheel’ view of a del.icio.us/network, I can actually check out the networks of friends, check out people THEY have deemed fit to add to their network, check out what those people have been posting, and if I like it, add it to my network. That’s one way of solving a dearth of articles: Just add more people to the network.

So, instead of a wheel, I can treat delicious as a connected network:

Social Network

There’s really no reason why this can’t be done automatically. Anytime I’m out of articles, so to speak, it should be possible to just say: Go to the ‘next layer’ - give me articles recommended by friends of my friends. Trust is more or less multiplicative, after all: If I trust Jack, and Jack trusts Joe (I don’t know Joe), I can trust Joe to some extent. Once 2 layers no longer give me enough articles, I can go to a third layer, ad nauseam.

We can solve the other problems in a similar fashion, but a more holistic approach solves them all.

First, we establish a scoring system on a per-article basis, dependent on the network. The network of del.icio.us basically consists of users, connected to each other (each connection represents someone being in the network of someone else). Now add the articles themselves to this network: Anytime I bookmark an article, I am connected to the article directly. Anytime a friend of mine bookmarks it, I’m connected to it through my friend.

It is of course possible that I’m connected to an article in a number of ways. A friend of a friend bookmarked it, a colleague’s brother’s girlfriend’s classmate bookmarked it, and one of the bloggers read by someone whose opinions I admire bookmarked it, for example. In the network this is represented by the network by having 3 different ‘paths’ I can take to arrive at the article.

These paths can be distilled into one final personalized score. Each connection takes a chunk of 80% out of the total score - so a friend’s friend’s friend, 3 steps, is .8 * .8 * .8 = 0.512 in total score. For multiple different paths, you can’t just sum them up (or you could end up with a score above 100%), but there are a number of algorithms (naively: of all paths, take the highest scoring, divide by 2. Take the next highest scoring, divide by 4. Take the third highest scoring, divide it by 8, ad nauseam, then add them all up. This number can never exceed 100%. Another way of doing this is to consider each link in the network as a resistor in an electric circuit. Multiple resistors placed in a series multiply their resisting effects and thus reduce current. However, multiple resistors placed in parallel lessen the effect, but, whatever you do, you can never get more power out than you put in. Now replace resistors with links on the network and you have an algorithm!)

This scoring/recommendation algorithm can even be extended to del.icio.us users: The score of a user is then entirely dependent on how well he’s connected to your own network (though relying too much on this can lead to GroupThink!).

Such a system solves all 3 problems. To wit:

Traffic

Research in social networks finds that usually social networks are virtually completely connected. There’s a path from any one person to any other. Thus, it’s possible to derive a score for every article and you can just keep reading indefinitely, though, of course, as you keep reading, each further article has a lower score.

There’s some excellent research by GustavoG on the social network of Flickr (also a web app that allows you to set up a network of friends). Very pretty pictures of tightly interwoven networks, such as this one:


Flickr’s demographics in January 2005. Click on the image for the full story and more graph images.

Rating

As already explained, any given article is no longer a simple yes/no proposal: Articles recommended by a number of your direct friends rate highly. Articles only recommended by one distant link (A friend’s friend’s friend, and that’s it) rate lowly.

GroupThink

This is where it gets very interesting. Because everyone builds their own unique community, GroupThink is no longer a virtual guarantee. For example, on digg or reddit, if a well written article that happends to put a ‘taboo’ topic in a good light (like Java, Microsoft, George Bush, traditional media, and a few others), or a ‘holy’ topic in a bad light (Ruby on Rails, web2.0, digg/reddit itself, Apple, Linux, and a few others), chances are very high it gets drowned out in the noise of the crowd. Even if all the people whose judgement I actually trust did vote it up, I never see it. Contrast this your own unique community, where articles at least have a chance.

There are two forms of GroupThink: Accidental and intentional. On both Digg and Reddit, you occasionally see a post imploring to put an end to the flood of the latest meme-of-the-day posts. Ironically these also get voted up with some frequence. Clearly then not all GroupThink is actually desired by those experiencing it. In a social network this GroupThink is eliminated; you can simply hunt down which elements in your network are fielding the majority of an onslaught of a certain meme, and toss them from your network or at least lower your level of trust in them.

The other type is intentional: Where a reader actually wants to read more about the same topic over and over again. There’s not all that much to be done; trying to force reading other things onto such a person is tantamount to censure and very hard to distinguish from forced propaganda.

In practice, in real life, GroupThink is somewhat rare, because you have friends from many places. Colleagues, family, old school buddies - friends of people you’ve dated that you kept in contact with, etcetera. If these real life bonds also exist in your del.icio.us network, ostensibly the chance of GroupThink is much reduced.

I could be wrong, but a system like that sounds like the ultimate source of articles. As much or as little as you want to read, resilient to GroupThink, nearly impossible to spam, and ever evolving to your tastes. Unfortunately, as far as I know, nothing quite like it exists just yet.

… or does it? The remarkable quality of the early phase

A version of this ultimate article recommendation engine did exist, briefly.

reddit itself, meets this system! At least, it did, in the first few months after the launch. The users of reddit back then amounted to a single connected social network. A number of important features weren’t there (all votes are equal instead of being attenuated by the distance in ‘friend links’ from you, for example), but on the whole this was it. If you happend to use reddit in those days, or you know someone who has (I fortunately managed to catch the tail end of those days), you may hear about or remember the amazing quality of articles.

This idea actually can be observed in many budding social networks. For a little while, Orkut (google’s ‘myspace’) was a trove of excellent networking opportunities. This was back when Orkut required very scarce invites.

Invites are an excellent way to keep the size of a social network into the efficient phase as long as you can, but of course it does restrict growth - by its very definition that’s how it manages to keep the efficiency of the social network high. In fact, a number of more or less ’secret’ smaller social networks that work on invites and a strong sense of responsibility (a misbehaving user gets kicked, and the one who invited the abuser also gets kicked!) have been running strong for years. The one problem with that tactic is that it can’t scale.

A trust network can!

The final part 3 will be posted the day after tomorrow (Friday evening). In it, expanding this idea to other walks of the web and of life in general, the importance of identity in such a trust-bound world, and how Identity 2.0 and open APIs are the beginning of a brave new world. As an encore, part 3 will also briefly discuss a problem I’ve so far omitted: Doing all these scoring calculations is computationally speaking extremely difficult. spoiler: There’s a way out of it, more or less!

To continue reading, go to part 3.

Tuesday, June 12th, 2007

All Transactions are based on Trust - part 1

As promised, today part one of a series on trust.

Part 1: Analysing a typical web transaction: The Reddit Breakdown

Flashback to a year ago. Reddit is relatively new and has a limited but very active userbase.

I go to reddit.com. I will need to trust the site which implies I need to trust its operators, as it is impossible to trust a computer (they do what they are told, without questioning orders, hence it’s folly to trust a machine implicitly). This will be referred to as Trust #1.

Trust #2. I see an article on the front page, with 100 votes. I need to trust those who have voted that this actually means it’s a good — I basically need to trust that this score number has any meaning.

Trust #3. I follow the link and read the story. I trust that the story doesn’t lie and that any further action I take, like bookmarking it, or recommending it to others, won’t get me any surprises (I’ll need to trust the site author that he didn’t e.g. linkjack the content if I’m going to share it with others, for example).

In the early days of reddit, all 3 forms of trust are more or less met to my satisfaction. Here’s a break down:

For #1: The mere fact that Paul Graham recommends these guys is good enough; I trust Paul Graham. Not very much, I don’t know him personally, but he has a lot of reputation to lose if he recommends a bunch of swindlers. Thus I trust Paul Graham’s judgement enough for me to be satisfied here. Note here that it’s possible to trust someone purely on what they have to lose.

This is an example of trust-by-chaining (I trust the operators of reddit because Paul Graham trusts them, and I trust Paul Graham) and trust-by-buy-in (By recommending them, Paul Graham has effectively placed money (the value of his reputation) on the table which he will lose if reddit is swindling my time by e.g. making crappy articles look highly rated for cash. Paul Graham has a certain level of buy-in to this recommendation). He could be wrong - but trust doesn’t need to be perfect.

For #2: This is where it gets interesting. I trust the votes (back then) because reddit was only known to those ‘in the know’ - the first redditors were personal friends of Graham and the authors of reddit, and had personal buyin not to screw it up for their friends. The userbase then exploded outwards like a viral infection but elitism kept the quality high for quite a while. Those who just post lolcats all day and ‘abuse’ the site by downvoting well written insightful articles that don’t happen to coincide exactly with a voter’s viewpoints, for example, didn’t happen, because the vast majority of the redditors, by mere virtue of being so in the loop that they knew about reddit in the first place, don’t do that sort of thing. There’s also the issue of intent: There’s very little to gain by gaming the site. Unfortunately, trolls and social rejects exist, but as a rule there are far less negative influences if there’s nothing to be had. Back then there the user base was too small and too new and thus flew under spammer’s radar.

This is an example of trust-by-chaining, but without me actually seeing the chains: I trust the authors to only have friends they recommend reddit to who are known by them to have a modicum of nettiquette. This type of trust isn’t very ’strong’, but I don’t need much just to accept a recommendation to read an interesting article. Note that trust is multiplicative: If I trust Jack for 80%, and Jack trusts Joe for 80%, I can trust complete stranger Joe for 64%.

You may realize at this point that the ‘trust from elitism’ argument no longer applies to reddit, nor to digg - they are too famous now. It also explains why almost all ‘open’ social systems, where every user’s vote has an effect, start off stellarly well and always drop off. From kuro5hin, to digg, to reddit.

For #3: This is a bigger problem. The only practical ‘proof’ you have for the majority of links (specifically: Every link to an article on a site that you aren’t familiar with) being trustworthy is the personal recommendation of the original submitter. I don’t trust the voters enough to expect them to have done due diligence on the trustworthiness of the operator of the linked article. For the same reasons as #2, in general this trust was at least satisfied to some extent in the early days.

Fast forward a year.

Reddit is now so famous, the trustworthiness percentage of any one vote has dropped to absolute 0. Not 0.00001, there’s nothing left. The value of a normal redditor’s vote is extremely low, and some of the redditors are actively abusing the system, voting their own blogs up just for the traffic, posting their own linkjacked material, voting other stories down just so that their own has a better shot, etcetera. These cancel with the very low value of the vote of an unknown internet user, resulting in a value of absolute 0. The value of a vote is also not negative (which would mean I could just read the most negatively rated articles!) because the same scammers would force the equilibrium back to 0 if reading the most lowly rated articles ever became a useful way to use reddit.

lottery

A trust level of 0 is the only trust level which is utterly worthless. The information available to me for any given recommendation from reddit is: The ’score’ (upvotes - downvotes) + the username of the one that posted the article. The practical value of knowing that 150 more reddit users thought article X was good enough to vote it up versus down, coupled with the fact that user ‘foobar’ thought it was good enough to post to reddit, assuming I don’t know ‘foobar’, is valueless. I might as well pick a completely random web link to read - it’s a lottery.

There is no such thing as a ‘wisdom of the crowds’ unless you meet the stringent requirements for this: No attempts to screw up the system (or those attempts are symmetric and thus cancel out), and no practical way for mob mentality to form - no way for each individual of the crowd to be influenced by the rest of the crowd. Reddit and digg definitely do not meet the qualifications and thus there is no trust to be found by knowing “a bunch of” random people’s opinions happend to coalesce. Hence: The recommendation has no value. It is worthless. Practically, this will probably manifest itself as sucky submissions, and this is in fact exactly what’s going on. The function has changed - it’s a rolling window on the current meme of the web, no longer a site that recommends interesting articles.

There are ways out of this dilemma. Specifically: If I did trust the user ‘foobar’ directly, for example because I remember that his submissions have been excellent so far, I’m satisfied for Trust #2 and Trust #3 and I can go read the article (A form of trust by past performance - inductive reasoning is behind the assumption that he will continue to do so. It’s certainly not worth 100% trustworthiness, but it’s enough for article recommendations). Unfortunately that completely goes against the idea of popularity aggregators and as expected both digg and reddit make it very hard for you to work in this manner. There’s no easy way to mark someone as a ‘friend’, for example.

A site which actually works almost entirely on that principle (articles recommended by people you already trust) is del.icio.us, the online bookmark service. You can add people as ‘friends’ and watch the stuff they bookmark in your inbox. While each link does list the # of random users who also bookmarked that link, no amount of ‘votes’ (in the sense that bookmarking is a vote) will make a story appear in my del.icio.us inbox until someone in my personal circle of friends (obviously, people I trust) personally bookmarks it, thus allowing transfer of trust: I trust my friend, he apparently trusts the link he just bookmarked, and thus I trust the link.

Lots of startup ideas I hear about base themselves on the notion that the opinion of random unknown people has an intrinsic value. The problem is, in the early stages of a startup, they do, because the people from whom you cull the opinion aren’t actually unknowns: They are tied to you by your viral marketing scheme. Your startup is doomed to fail unless you can manage to toss some form of trust in there, for example by allowing users to reduce the site experience to just those people they personally trust, or by explicitly staying small.

To continue reading, go to part 2.