Serious Privacy Problems with Bots on Google Wave

Posted by knorby on February 25, 2010 under Python, app engine, facebook, fortune, google, internet, privacy, wave | Be the First to Comment

I started writing this post while Google Wave was still pretty new, but it has been out for a while and half forgotten. It is still in closed preview, but it shouldn’t be hard to find an invite if you want to check it out. As I mentioned in my last post on wave, I wrote a quick fortune bot for wave. The bot got a decent bit of use at first, as many people played around, but now  use has dropped to almost nothing. Based on my own use, I figured early on that most of the use was from 1 or 2 real people interacting with a bunch of bots. I tested and confirmed that with the data google records by default.

Google App Engine, on which all bots must be hosted, by defaults logs any request and any error. A bot can register a number of different events, which will trigger a request to the bot. In the request, the state of the wave is contained in a json format. The log files can easily be downloaded, and the json easily parsed. From that, you see everything. You see the addresses of everyone, you see what has been entered,  even if it doesn’t relate to the events of the bot. As far as I am aware, no TOS or privacy agreement exists that covers the use of this data, and even if it were, the most nefarious uses still would be silent.

By putting data on any web app, you put yourself up to the same risks and invasions. The google ads in gmail are targeted at you for a reason after all. If you are using gmail though, it is a safe belief that google will be the only one other than you to see your data. A bot could be maintained by anyone. Facebook apps are a decent comparison. I have looked at the API a couple times, but my understanding is even with the permissions a user can grant or deny, apps get to see a lot. A fair bit of criticism has been made of this platform, but it is very safe to say the privacy structure in place on bots is much worse. Aside from the lack of permission controls, would you use something like facebook apps on your e-mail or google docs (to the extent that makes sense…)? I hope not.

A wave user has a somewhat unique problem here. If a bot provides a useful service to a particular use, and the wave for this use is private, should you use it? That isn’t a question anyone should have to ask. The question of “put this data in this web app or not” is one thing, but you shouldn’t have to worry about using a pivot tables tool on an online spreadsheet, which is essentially what is going on with bots here. There isn’t really way to distinguish what is a good bot vs. a bad one either. If I wanted to snoop on people on wave, I would write a useful bot, and no one, google included, would be the wiser to what I was doing with the collected data.

I don’t think there is an easy way to fix bots as they are. Anonymous search results aren’t really that anonymous, and I would guess wave data would be much worse. The problem isn’t that App Engine logs requests; the problem is what wave sends. If you consider the data in a wave in anyway private, I would recommend against using  bots.

My Project Ideas for Google Wave

Posted by knorby on November 10, 2009 under Python, coding, doit, google, internet, wave | 3 Comments to Read

Silly:

  • fortune/doit – Implemented. See Wave Fortune. You can use it be adding wavefortune@appspot.com to your contacts. I mostly made this bot to satisfy my fortune lust, and to get more familiar with app engine and the wave bot api.
  • wompus/adventure – Not sure I am actually going to do this one. If I do, it will be the wompus. Basically, the problem to solve is effectively storing state for such games. Wompus is tiny, and the games are short, so it wouldn’t take much thinking. Adeventure/zork would require a lot more work, and I honestly don’t care that much.

Tools:

  • logging interface – It occurs to me that wave might work great in a situation where I think e-mail falls short now: data/msg dumps. I see this sort of thing at my jobs a lot. I get a log messages I generally don’t care about, and I filter them out, and as a result I sometimes miss something. A similar case is something like a bug tracker, where so many replies can be generated that the thread is easy to ignore. Centralization would help a lot I think, but again, I am not sure I care.
  • RPN calculator - Nothing really to explain here. Could do save the calculator’s state in past blips, and make them editable. The end result would be a collaborative calculator of sorts. Could be interesting.
  • something with jMol – Not too much thought here. When I was a student in the Computational Material Science group at ORNL, I ended up playing with jMol a bit from javascript. Some sort of gadget/bot combo could do some interesting stuff, but again, I don’t care.

I will post more about my thoughts on wave later on, as I have many mixed thoughts on it. Google has a lot to do, both on wave itself and extensions that they should provide. I am hesitant to work on large projects, as I don’t want to have google copy my work, or experience some odd situation with app engine. I don’t think anyone, google included, has any remote idea of what to expect from wave yet.

More Thoughts On Twitter

Posted by knorby on March 15, 2009 under internet, twitter | Be the First to Comment

I read an interview with Stephen Fry on the BBC, which seemed to respond nicely to a few of my problems with twitter that I posted about, at least for some. He points out that, on the whole, media really don’t like twitter that much. If he wants to make an announcement, he just does it, and people find it; he doesn’t have to go through a swarm of interviews, which is why the media hates it. I have to admit, I am a lot more interested in public figures as a result of twitter than I ever was before, and if I did care, I probably would stick to twitter rather than the more traditional infotainment channels.

I suppose if I cared more about spoken langauge, writing, etc… I could appreciate how 140 or so characters was an interesting change to language. I guess my problem is really understanding why it matters in the first place. Much bothers me about SMS (more to do with price gouging than anything else), so it might just be my distrust of anything based off of SMS.

DOIT is now on twitter!

Posted by knorby on December 1, 2008 under Python, internet | Be the First to Comment

I was in need of a break from work, so I finally got around to implementing DOIT of the Day on twitter! For those not familiar with DOIT jokes, they are a set of jokes from USENET, or even before that. A while ago, I started maintaining a fortune database, which I wrote about here. I haven’t gotten around to packaging it up for various Linux distros yet, but I will eventually. So, you can now get your daily fill of DOIT jokes from twitter. Non-twitters can DOIT with RSS.

I wrote a couple of python script/libraries to auto-follow followers (CPAN was having some problems, so I didn’t use a perl script someone else had already written) and a fortune->twitter script. I will clean those up at some point soon and put them up somewhere.

IQs and the Internet

Posted by knorby on November 26, 2008 under Python, google, internet | Be the First to Comment

After reading the comments on a story on reddit on IQs, I became curious about how IQs are reported on the internet. A few people were saying that when they see someone mention their IQ on the internet, it is usually above 130. The explanations given were along the lines of people lying, biased online tests, and segmentation in where people browse. I was curious what sort of frequencies the different IQs are mentioned, so I wrote up a little python to get the google search results for IQs 50-199 (I would have included lower values after seeing the result, but I choose to go the scraping route rather than gdata, which ends up getting you blocked by google, something I didn’t know). I ran the number with the word “iq”; I think there may be better queries, but simple seemed good enough. Here are the results, plotted with matplotlib:

I found these kind of surprising. Most of the result counts were around 6 million, but there were a few sharp drops. I was especially surprised by 100 and 130, since, if memory serves, 100 is the 50th-percentile for IQs and 130 is the 99th; I would expect a greater count on these two, since more sites would include those numbers while explaining the scale; instead, there are large drops. Weird. I don’t think there is any connection between these results and anything proposed on reddit either.

First Sucker on shirt.woot!

Posted by knorby on under internet, personal | Be the First to Comment

I had the great privilege of being the first buyer of today’s woot shirt (called the “first sucker” on woot). It took me all from the time they posted it for me to see it and then buy it. It is just that awesome. You can see the honer on the shirt’s discussion page. Woot!

My Experience with the Netflix Problems

Posted by knorby on August 16, 2008 under humor, internet, media, movies | Be the First to Comment

As some of you may have heard or experienced, Netflix is having lots of problems with like every part of their system right now with their shipment system. After I realized the problem and got the message from Netflix apologizing for the problem, I have watched the whole thing for its humor value. I shipped in four movies on Monday, which are normally received on Tuesday (city life is awesome). I normally go through movies at a pretty fast rate (it was a summer goal of mine to watch and ship in movies the next mail day–I feel proud), so I first though it was Netflix fucking around to screw me over. On Wednesday, I got emails for the two of the four, but on my queue, all of the movies appeared to have been received on the website as the status was listed as ’shipping today.’ Normally, that message is only up for an hour or two; all of the movies I was requesting were common enough that there shouldn’t have been a problem with supply, so I realized the problem when status stayed that way that something was up. Imagine my surprise when I got Serpico on Thursday, but none of the others that were slated to come! Supposedly, everything is fixed and the last few were sent on Friday (a call to customer service revealed that there are still some more to be sent today). I got the documentary Fuck today (that was one for headphones in the maclab; the movie uses the word ‘fuck’ more than any other movie by a lot: 834 times in 93 minutes), but I am still missing two. The best part of it all is that the website is just totally and completely borked. Serpico still reads as “shipping Wednesday” along with the other two I still haven’t received, although I did receive an e-mail for Serpico on Friday. Fuck (or ‘F**k’ as netflix calls it) is registered correctly, but it was still in my queue of things yet to be shipped as well, but I deleted it. I also got an e-mail for it after I had already watched it. We will see what Monday brings…. I am supposed to receive a 15% discount for this month, which comes to about $3.50 on a 4 at a time plan, which is really just exact compensation. I average a little over 20 movies a month or more, so that means that I am paying around $1 per movie with my rental. One was delayed a day, which I can’t really complain about, but I was outed a full watch-return cycle, which by that logic is about $3. With the Olympics, there is plenty to watch, but the failure to provide service is a bigger sting. I was pretty close to canceling before they announced the problem, and I don’t really feel like they have done a whole lot to make feel better about them.

The problem is rather curious. Seems like some sort of database problem, especially with the website, but they are still managing to ship, just slowly. If the whole system was down, nothing would be coming out, and if it was some problem updating the site, the shipping would presumably still be working fine. They are somewhere between those two states. I suppose the details of the problem will  meet the news in some form sooner or later, so we will have to find out.

What Happened to Google Street View?

Posted by knorby on July 20, 2008 under Chicago, google, internet, uchicago | 5 Comments to Read

Google Street View Map of Hyde Park. The streets without highlighting cannot be viewed.

Google Street View Map of Hyde Park. The streets without highlighting cannot be viewed.

I noticed recently that many of the streets in Hyde Park lost Google Street View, notably where my current apartment is. I also noticed that many of the streets had darkened. Is it really necessarily to remove the images? There used to be pictures taken inside the quads as well, which are now gone; I thought those might have been removed by request of the university, but I don’t really get why they removed the other ones. If they wanted to update them, fine, but there is no reason to remove images. I suppose it is a free service, so I have no right to complain, but I just think it is screwy when I can see my home in Oak Ridge, but not in Chicago. I did some quick googling, but nothing came up. Any ideas?