powered by STREAMPAD
Click to launch FredWilson.FM music player

« New Shins | Main | The Crunchman »

Data Portability


  "Do not underestimate the power of the Dark Side" 
  Originally uploaded by yahooza.

Google provides free email but you’ve got to store your email on their servers (unless you use a pop client). Same with Google Calendar, Writely, JotSpot, Google Reader (they’ve got your OPML file), etc, etc. Same with any other web app you use, like Hotmail, Typepad, Facebook, Flickr, and YouTube.

Free web apps come with a hidden price. You are storing your private data on someone else’s servers. And the biggest complaint I hear about the coming world of web apps is that many users are not comfortable with their data being stored anywhere but on their own servers.

I think anyone who provides a web app should give users options for where the data gets stored. The default option should always be to store the data on the web app provider’s servers. Most people will choose that option because they don’t care enough about this issue to do anything else.

The next option should be to store the data on some other web storage system, like Amazoon’s S3. I think Google should make a deal with Amazon to offer S3 as an option for all of Google’s web apps. You’d pay for this privilege, maybe you pay Google and they pay Amazon. Or maybe you just pay Amazon.

The third option is to save the data on a “dumb storage appliance”. A dumb storage appliance could be something like a Buffalo Terrastaion or an Infrant box which offers a terabyte of data for less than $1000. In my imagination, you’d simply put the appliance on your network, get an IP address, and enter that IP address in the web app storage configuration page and all your data goes there instead. I am guessing that it’s not that simple, and certainly firewalls wreak havoc on this scheme, but I do think it’s not impossible to do this simply and easily.

Why does this matter? Because trust is going to become a bigger issue going forward. I realize that many people trust Google and others to safeguard their data. But the best way to garner trust is to tell people that they “own their data” and they have the right to put it anywhere they want. The simply act of doing that will garner even more trust.

To be honest, I would rather be storing all my blog posts somewhere other than TypePad. My RSS feed provides a simple way to pull all of that content out of TypePad, so it doesn’t keep me up at night. But I’d love it if I was able to back up three years of posts (well over 1000 posts) onto the Infrant box in my basement. Same with my Flickr account and my YouTube account.

So I would like to see someone take some leadership in this area. Google is the logical leader. But if they won’t do it, then Microsoft and Yahoo! and others should. I think whomever takes a leadership position on this issue will get a lot of good karma from users as a result.

Comments (28) | Posted November 2, 2006 in Venture Capital and Technology

Comments

Fred, I agree that the issue of trust is going to be important to many web companies moving forward. Many services today still focus on keeping consumers tightly locked in, which restricts what they can eventually do with "their" data. The fact that Gmail offers POP capability for free is great and I think others should follow suit and allow data to flow more freely. I don't think that companies should be concerned with their users defecting if they make their data or platform more open. Instead, I think users would actually appreciate that option and flexibility.

I don't know if you have been following this, but Adobe is making great strides in data persistence and web/desktop applications. Adobe has Flex 2, that combined with Flex Data Services, can allow browser-based Flash applications still maintain the data you are working on even while offline. Then there's the unreleased Apollo, which is a software development platform for building web applications, but run on the desktop, eventually closing the gap between local storage-based and web-based activities.

For Convos, we're using Flex 2 for the interface and expect to utilize its data persistence capabilities down the road.

Posted by: JP Checa | Nov 2, 2006 8:29:31 AM

Great idea! Web app leaders like 37signals make a big deal about how safe your data is. But what happens if they get into financial problems and the hosting gets turned off?

I think that for 99% of people the backup has to be another webapp in itself, like the S3 service. If you've got to own, support and maintain your own backup server then you're missing out on a big part of the benefit of using a web app in the first place.

Posted by: Tom Nixon | Nov 2, 2006 8:41:05 AM

I agree with JP about Adobe Apollo. It has the potential to be a game changing technology.

Posted by: Dan Cornish | Nov 2, 2006 9:17:02 AM

Fred, I think you may be confusing storage space with bandwidth. If, for example, you opted to store all of your email on a local device rather than a Borg hosted disk, then you run into some problems with the bandwidth between that disk and the GMail application. Say you wanted to search your 2GB of mail for everything written by 'Greg', then GMail (which runs the search application), would have to query your disk (which is just a dumb storage device). Even if you have super fancy internet connectivity, there is no way around the problem that Google trying to access gigabytes of email over a connection whose speed is measured in megabits.

If Google were to let you host their _application_ on your local box, then this would work. But I don't think they are about to open source their search IP. Even if it remains closed source, this is still not a likely option.

The other alternative would be for both you and Google to keep a copy of your mail. If you want to search it, or apply any fancy operation beyond the simply store & retrieve, then Google will use its local copy. And you would have your own copy in case Google has a snafu and loses your mail.

This, of course, does nothing to assuage any fears that a nefarious web app provider will use your dada for some purpose that you did not sign up for. But until internet connectivity speeds (megabits) approaches disk access speeds (many many gigabits), this ain't going to work for any complex operation, such as search.

Posted by: Josh Reich | Nov 2, 2006 9:22:33 AM

I would add that at the very least the web-based apps should provide some mechanism for backup. As you mentioned, Google does a decent job with this as you can access it via POP (how I wish they would roll out IMAP) and you can export calendars. But you brought up an interesting point that I have thought of before, namely Flickr. Although I have redundent backups of my photos, I have almsot come to rely on Flickr as a secondary backup source. However, I know of no mechanism to download your photos (en masse) if you had to. I would suspect that the folks at Yahoo are probably looking into this for a price.

As for Amazon's S3 being the host... based on Google's history, I would think that it would be just as likely that they would develop a competing product. I use S3 myself and love it, but we are still far removed from online storage that the South Koreans offer. I work for a Korean company and our backup is a 1TB online storage which is far cheaper and faster than S3. Of course you have to read Korean (which I do not). But the point is that there is a lot of development that we, in the US, do not hear about. I would like it if Silicon Valley had a bit of competition.

Posted by: Ted | Nov 2, 2006 9:26:11 AM

For many web-apps (like blogs), I would think to be able to export and import a single XML file would suffice.

But for more advanced applications, such as Google spreadsheet and Writely, it's not enough to just store the data locally, but you would want to run the app locally sometimes as well. This means that it's Microsoft who is in the position to make a nice move here. They could just allow (user-designated) information to be mirrored on their websites. Synch up Outlook to Hotmail, etc. so that you can either use the app and view your information on-line, or run it locally. The difference should be transparent.

Posted by: David | Nov 2, 2006 10:57:45 AM

For some reason I want a soundtrack for this post:

Charlton Heston commanding "let my data go"

Posted by: Fraser | Nov 2, 2006 11:12:50 AM

Very timely post now, with the potential customer migration off JotSpot a as a result of the Google acquisition.

Another issue to think of is the "hidden SaaS business model" i.e. data mining, becnhmarking based on customer data on the provider's servers. This is actually a positive potential, but there will be a lot of security / confidentiality / privacy implications the industry needs to deal with.

Posted by: Zoli Erdos | Nov 2, 2006 11:14:53 AM

You use the line:

"The simply act of doing that will garner even more trust."

It should likely be:

"The simple act of doing that will garner even more trust."

Also, you might want to include a general tips@email email address so people can send you tps.

Posted by: Bill | Nov 2, 2006 11:45:31 AM

When I first started tinkering with setting up a blog, I wondered why blog storage, blog editing, and blog publishing all have to be the same system... You can already get a local application for editing blog posts; now I wish that the object store and the rendering layer were separate too...

Posted by: joshua schachter | Nov 2, 2006 11:46:21 AM

Fred,

There are several options with TypePad for saving a local copy of your blog to your own drive (not server). You can export a copy of your blog and save it to disk, which is the easiest way to back up. There are two drawbacks to this method: one is that you need to do it every time you post if you want to have the full archive, and the second is that it exports the entire blog as one file.

A better way to handle backing up a TypePad blog is to use ecto. You can set ecto to download your posts and store them locally on your hard drive. I don't always use ecto to post to my blog, so periodically I hit the refresh button and it checks the local copy against the one on Typepad's servers. Ecto automatically downloads any posts that aren't in the local database and gives me the option of replacing local files with the ones on the TP server if it finds they've been edited or changed.

As for email, I didn't realize that Gmail offered POP… thanks for pointing that out. I've been using mail.app for years because I like having my mail available when I'm offline and having local archiving capabilities. On the other hand, I've lost a fair amount of email a few times when my machine crashed and wiped the database. Using Gmail and Mail together would solve that problem nicely.

I'm not really as worried about privacy as I am about redundant storage and backups.

Posted by: john t unger | Nov 2, 2006 11:54:24 AM

Your desktop where you work and where your files already reside in also an option. At WiredReach, we've built peer to web technology that turns your desktop into a lightweight web server that can be accessed across NATs and firewalls. Your data can be distributed across multipe private machines (most secure) or work in conjunction with multiple 3rd party web apps.

Posted by: Ash Maurya | Nov 2, 2006 11:59:46 AM

I think it is important to separate privacy from backup concerns. All the suggestions mentioned above work for the later but not the former. If you want to search your email using Gmail then the indexes need to be there, unless the search functionality is local and it stops being a web2.0 solution. Having the index but not the content is not enough to protect your privacy as you can reconstruct practically the whole thing from the indices anyway

Posted by: Max | Nov 2, 2006 12:58:14 PM

I think it's worth exploring *why* users want to store their own data.

Is it to have a back-up in case they want (or have to) exit later?

Or a privacy concern? That applies during the active life of an account? Or after you leave? (If you cancel your GMail account, does all your old email get *really* deleted? So if they got a subpoena later they wouldn't have anything to cough up?)

Posted by: Bill Seitz | Nov 2, 2006 2:17:16 PM

Hmm, seems like there would be an interesting business to provide an online backup service, but instead of backing up from the user's hard drive, you back up all their web-resident data via screenscraping (or API's or pop mailboxes where available). Wonder if anyone would pay for that though?

Posted by: Joe Agliozzo | Nov 2, 2006 2:19:24 PM

great post Fred. organizing all the data on my machines is hard enuff. The on-line part makes it even more messy.

Is Ecto better than Mars Edit? I've been using Mars Edit for a few weeks and it seems good.

Posted by: iain | Nov 2, 2006 4:50:44 PM

Fred, Sounds like a great idea for an incubation project. I think the fun part would be when a user wants to have their gmail interact with flickr and salesforce and then track/analyze it with omniture then sell it on ebay and be paid via paypal.

It would be extremely valuable to have a kind of third party neutral arbitrator that had viewing access of files going in and out of all of the aformentioned networks to ensure each 'vendor' for lack of a better term, is living up to their end of the deal. Similar to what checksum accomplishes but deeper and not packet based perse.

At least as things are currently, if you lose your gmail data you don't lose your flickr data. That doesn't do anything to quell the privacy concern and that is why I think the arbitrator play is a good one.

Posted by: tomo | Nov 2, 2006 9:56:27 PM

the evolution of computing?

1st wave: mainframe
2nd wave: client/server
3rd wave: personal computer
4th wave: back to client/server (web server/browser)
5th wave: server/server? (fred's suggestion of need for local data maintenance)

Posted by: steve | Nov 3, 2006 9:57:34 AM

I think the best solution would be to store the data in a P2P partitioned and encrypted data store somewhat like Oceanstore (http://oceanstore.cs.berkeley.edu/). The main problem such a system would suffer is a lack of bandwidth. Of course the application provider could cache the data for performance reasons, I wonder if the cache could be seized by authorities? If not that would be an immediate advantage of such a system. Also if our data reside outside of the application provider clutches, the biggest advantage would be that we can revoke a relationship and switch provider.

Posted by: Jean-Francois Noel | Nov 3, 2006 9:57:44 AM

My view is that people can and should take control over their own data, then leverage it individually and en masse for a better "digital deal". Noone whose business is based on leveraging the unfettered use of others' data is going to trade their revenues for good karma alone.

Interoperability, transparency, access and representation should be regarded as rights, not privileges, when it comes to user-generated and associated assets. This problem is only going to heat up as the ownership of user-side data becomes even more contested in the face of the need of media to come to grips with slippery concepts such as "return on engagement" a term yet to have been defined, but supposedly central to the coming connected metrics of the overall media environment.

Posted by: Nicholas Givotovsky | Nov 3, 2006 10:25:04 AM

>>>The third option is to save the data on a “dumb storage appliance”.

I used to think this was an option too, but the issue here is, do people trust themselves with their own data? Can they secure their home servers? How will they handle data backup?

Once your personal server gets to 500GB or 1TB, you start to wonder how you're going to deal with all the data, how you are going to back it up. Which takes us back to your second option as a necessary piece of the future puzzle.

Plus, in terms of redundancy, connectivity, and where things are going with virtual servers running on grid networks, you have an extreemly stable environment.

I think what needs to happen, is the sites which provide accounts, data services, etc. need to work with a set of standards/specs for working with data structures like social net profiles, and to support standards like MetaWeblog API. Then these sites would need to provide options for their users to select the online storage provider they use, and authorize the storing of your data and media.

This would probably require an additional fee to the user, but it gives them a complete backup of all their data, blog posts, music, etc. This also paves the way for what Marc Cantor talks about, portability of one's data/media.

Posted by: Gideon Marken | Nov 3, 2006 11:06:12 AM

Huge problem & opportunity in the enterprise space. Having worked with a lot of banks / brokers / insurance companies over the year as soon as an app handles a single piece of client data as their strong preference moves to local storage.

Isn't Google Desktop Search be a good model here? You just point it at the local storage folders you want to pick up and it goes from there. Sure, it's an executable and not a web app, but it leverages their IP and still pulls the results back into the browser. More of the web apps companies should create client side apps imo.

Posted by: Bill Davenport | Nov 3, 2006 4:50:03 PM

I think the problem is deeper. Backing up is one thing but but exporting profiles and information and seemlessly moving them between services and apps would be a very useful thing to have. Of course, this would require everyone giving out an API or scraping their stuff but i think it is doable and would provide a great service to sort of organize all the networking and content (pics, vids, blogs) based tools, sort of a meta networking tool...the question is, how do you make money off of this? I don't think there is a business model...

Richie

Posted by: Rich Hecker | Nov 3, 2006 7:50:30 PM

Fred,

I think the next generation of web applications would have pluggable data storage sthg. we already see on the desktop. You have a single filesystem but multiple applications massaging the videos, photos,etc. to give an example.

As storage becomes cheap and reliable (no longer a myth!), we would see users demanding that feature. The first movers should be the photo sharing sites -- instead of charging users a fee to recover the infrastructure costs -- have a discount for using S3. We can dream plenty about it!

Indus

Posted by: Indus Khaitan | Nov 7, 2006 1:46:57 PM

Most online sites have an API that let you grab your data for backing up (and if not, they should). I've written stuff to automatically backup my Wordpress.com blog, etc.

What's missing is software that understands the APIs for a lot of these sites, so that general users could easily create backups.

The first thing I look for in any online web too.many application is import/export.

Posted by: engtech | Nov 7, 2006 3:47:05 PM

Post a comment

This weblog only allows comments from registered users. To comment, please Sign In.