Sunday, February 15, 2009

Google App Engine 2, Conclusion of Foray #1 into the cloud.

In the previous post I mentioned several limitations of Google App Engine. In the time that's passed since then (less than a week), Google has announced that several of these restrictions are no more with the release of 1.1.9. Specifically:
  • It is now permissible to use urllib, urllib2 or httplib to make HTTP requests. (previously users were restricted to using urlfetch. Python programmers will be familiar w/ urllib, urllib2, and etc, and welcome this (they won't have to revise modules that use urllib2, as I did with my Blip API wrapper).
  • The dreaded 10-second deadline for a request has been expanded to 30 seconds. While it's still not good form (actually, it's horrible form) to keep a user waiting for 30 seconds, this prevents errors if a website or API you are querying behind the scenes is slow.
  • No more 'high CPU request' warnings. Note that just as George Carlin once observed that buying a 'safe car' doesn't excuse you from the responsibility of learning how to drive, it's also true that this is not Google's way of saying 'to hell with everything, write wasteful code now'
  • The old 1MB limit on requests and responses was raised to 10MB
The take-away point here is that Google listens to user feedback (up to a point: Ruby/PHP/etc users can still suck it, as far as Google App Engine goes), which is encouraging to those investing time and effort in learning the platform.

Unfortunately I ran into some other issues with my application. While my restructuring (using naive, wrote it myself Javascript b/c I wasn't yet familiar with jQuery or the like) led to something much more robust, and the addition of a simple progress bar made waiting for Blip to respond more tolerable, in testing I ran into an issue with the API failing on a call to pull back certain users' 'blips'. Further investigation revealed a 500 error was being returned from the Blip API due to certain characters being present in the string for the blip.

The good people of Blip were, as has always been my experience, quick to respond, and a fix is on the way, but it's not in place yet. As I mentioned in the previous post, the API is still private beta, so this is more a 'shame on me' matter, but as also mentioned in the previous post, this exercise is mostly an excuse to play around with Python and Google App Engine in order to learn more about it and generally 'keep brain from freezing', and as far as that goes, success was had. We'll re-visit it once Blip.fm has a fix in place.

For now, some good resources I've found for learning about the Google App Engine follow.

Web App Wednesday - Michael Bernstein puts out a new web app, plus the code, every week.

Giftag - BestBuy used Google App Engine to put together Giftag, a gift registry add-on for Firefox and Explorer. The blog is a good source of GAE info.

App Engine Fan - This guy has been experimenting with GAE since it was first released, recording the results of his efforts in this blog.

App Engine Samples - code samples from Google itself.

Monday, February 9, 2009

Google App Engine, or How I learned to Stop Worrying and Love Javascript (Part I)

It has been over a week since my last post, so things are not exactly getting off to a rip-roaring start here. Rather than dwell on the past, though, here's an entry on some experimenting I've done recently with Google App Engine.

There is much hype about 'the cloud', and like a lot of hype, most of it is not necessarily worth the paper it is or isn't printed on. However, as a programmer and not a computer guy (to some of you, this will make sense), the idea of abstracting away the server provisioning process is not without appeal. As a tightwad, the idea of 'only paying for what you use' is not without appeal. Finally, as an extreme tightwad, the idea of free (which Google App Engine is, up to 10 apps) sold me. I'm not starting a business here, I'm just getting my feet wet.

As far as what to do, I am a huge fan of the website blip.fm, (the twitter-length pitch:'It's like twitter, but for music') and not long ago they released an API. It is currently in private beta, where it has been for a while now. At any rate, I got my keys, re-wrote the sample PHP wrapper for the API in Python, and I was ready to go. At least I thought I was.

At this point I would like to praise the Google App Engine Launcher for MacOSX (and, indirectly, praise MacOSX). This was put together by John Grabowski at Google in his 20% time, so apparently that is not a Google urban legend like the one about Sergey or Larry sometimes give an underperformer a brand new Prius under the sole condition that they 'drive away, far away'. In this case using the Launcher simplified the process of getting something up and running using the development server quickly. It's very intuitive, and the interactive console is handy.

Not so fast, pal

After a bit of hacking I had something fairly simple up and running which would grab info for a user via the Blip API, then show the user via an intensity map what the global distribution of 'listeners' and 'favorites' is. For this I used the lovely Visualization API from Google. It also showed who you are following that is not following you, which is essential info not easily obtainable via Blip's website.

The problem I encountered off the bat was that the queries against the API take time, and there really isn't a good way to get just the subset of info you want at this time. On the development server, requests could take awhile, especially for users with thousands of listeners (such people exist).

Curious as to how I'd fare on the real thing, I deployed (a one-click operation w/ the GAE Launcher) the app to appspot.com. At this point all hell broke loose.

Read the fine print

In turns out, amongst the other limitations of GAE (for more Beavis and Butthead immature laffs, check out the Google App Engine Backup and Restore tool, aka GAEBAR) you may have heard about (no background/batch processes, no mischief with sockets, etc) there are a couple tight limitations I should have looked into before deploying:
  • If a call to urlfetch takes more than 5 seconds, you lose, it times out.
  • If your request takes more than 10 seconds, you lose: 'DeadlineExceededError'.
(It is worth noting here that according to this release from Google today, some of these limitations will vanish in the next 6 months).

In addition, there is a limit to CPU that can be consumed by a single request. So even in the event you can fetch your data quickly, if you do too much crunching per request, you are going to violate a quota and start getting errors (the specifics of this limit: a 'high CPU request would be one consuming 0.84 CPU seconds, with the CPU in this case being a 1.2 GHz Intel x86. You are allowed 2 of these per minute).

Thus, I needed to factor in that there'd be a lot of retries going on, but I would not be doing those retries within a request. The inescapable conclusion was that I'd have to use the back end as a simple data store, and rely on Javascript on the front end to handle the retries, putting the pieces together, boiling down the data, and pumping out the results. Also, I'd obviously need a progress indicator to keep the poor end-user updated as things proceeded, rather than leave them hanging.

At this point I will leave you, the reader, hanging, until next time when we get into the Javascript side of things in Part II.