Posted by knorby on February 25, 2010 under Python, app engine, facebook, fortune, google, internet, privacy, wave |
I started writing this post while Google Wave was still pretty new, but it has been out for a while and half forgotten. It is still in closed preview, but it shouldn’t be hard to find an invite if you want to check it out. As I mentioned in my last post on wave, I wrote a quick fortune bot for wave. The bot got a decent bit of use at first, as many people played around, but now use has dropped to almost nothing. Based on my own use, I figured early on that most of the use was from 1 or 2 real people interacting with a bunch of bots. I tested and confirmed that with the data google records by default.
Google App Engine, on which all bots must be hosted, by defaults logs any request and any error. A bot can register a number of different events, which will trigger a request to the bot. In the request, the state of the wave is contained in a json format. The log files can easily be downloaded, and the json easily parsed. From that, you see everything. You see the addresses of everyone, you see what has been entered, even if it doesn’t relate to the events of the bot. As far as I am aware, no TOS or privacy agreement exists that covers the use of this data, and even if it were, the most nefarious uses still would be silent.
By putting data on any web app, you put yourself up to the same risks and invasions. The google ads in gmail are targeted at you for a reason after all. If you are using gmail though, it is a safe belief that google will be the only one other than you to see your data. A bot could be maintained by anyone. Facebook apps are a decent comparison. I have looked at the API a couple times, but my understanding is even with the permissions a user can grant or deny, apps get to see a lot. A fair bit of criticism has been made of this platform, but it is very safe to say the privacy structure in place on bots is much worse. Aside from the lack of permission controls, would you use something like facebook apps on your e-mail or google docs (to the extent that makes sense…)? I hope not.
A wave user has a somewhat unique problem here. If a bot provides a useful service to a particular use, and the wave for this use is private, should you use it? That isn’t a question anyone should have to ask. The question of “put this data in this web app or not” is one thing, but you shouldn’t have to worry about using a pivot tables tool on an online spreadsheet, which is essentially what is going on with bots here. There isn’t really way to distinguish what is a good bot vs. a bad one either. If I wanted to snoop on people on wave, I would write a useful bot, and no one, google included, would be the wiser to what I was doing with the collected data.
I don’t think there is an easy way to fix bots as they are. Anonymous search results aren’t really that anonymous, and I would guess wave data would be much worse. The problem isn’t that App Engine logs requests; the problem is what wave sends. If you consider the data in a wave in anyway private, I would recommend against using bots.
Posted by knorby on November 10, 2009 under Python, coding, doit, google, internet, wave |
Silly:
- fortune/doit – Implemented. See Wave Fortune. You can use it be adding wavefortune@appspot.com to your contacts. I mostly made this bot to satisfy my fortune lust, and to get more familiar with app engine and the wave bot api.
- wompus/adventure – Not sure I am actually going to do this one. If I do, it will be the wompus. Basically, the problem to solve is effectively storing state for such games. Wompus is tiny, and the games are short, so it wouldn’t take much thinking. Adeventure/zork would require a lot more work, and I honestly don’t care that much.
Tools:
- logging interface – It occurs to me that wave might work great in a situation where I think e-mail falls short now: data/msg dumps. I see this sort of thing at my jobs a lot. I get a log messages I generally don’t care about, and I filter them out, and as a result I sometimes miss something. A similar case is something like a bug tracker, where so many replies can be generated that the thread is easy to ignore. Centralization would help a lot I think, but again, I am not sure I care.
- RPN calculator - Nothing really to explain here. Could do save the calculator’s state in past blips, and make them editable. The end result would be a collaborative calculator of sorts. Could be interesting.
- something with jMol – Not too much thought here. When I was a student in the Computational Material Science group at ORNL, I ended up playing with jMol a bit from javascript. Some sort of gadget/bot combo could do some interesting stuff, but again, I don’t care.
I will post more about my thoughts on wave later on, as I have many mixed thoughts on it. Google has a lot to do, both on wave itself and extensions that they should provide. I am hesitant to work on large projects, as I don’t want to have google copy my work, or experience some odd situation with app engine. I don’t think anyone, google included, has any remote idea of what to expect from wave yet.
Posted by knorby on June 9, 2009 under Android, Python, doit, fortune, google, humor |
I downloaded the Android Scripting Engine (ASE) on my G1 last night (oh yeah, I got a G1), which adds in Python and Lua functionality in a basic way. Along with the Text-To-Speach library, I hacked up a quick script to playback random DOITs. The slow, dull computerized British voice inspired me to make a fortune library off of Walter’s My Secret Life, the secret sex diary of a British Victorian gentleman. Though I first found of this book’s existance in actual book form, it is online. Anyway, I hacked up a quick script to scrape that site into a fortune library, which you can find here. For copyright reasons, the script is the only thing I am putting up. The script is written in python with lxml (you may need BeutifulSoup as well). It is horribly documented, but you should just need to run the script. To make it a library you can use with fortune, you need to run strfile on it, which the script should do if it is in the path. The book is 12 volumes, and almost every paragraph is lurid. The script is hacked together, so there might be some problems, but I haven’t found any glaring ones yet. By the way, the playback is fantastic.
Posted by knorby on December 1, 2008 under Python, internet |

I was in need of a break from work, so I finally got around to implementing DOIT of the Day on twitter! For those not familiar with DOIT jokes, they are a set of jokes from USENET, or even before that. A while ago, I started maintaining a fortune database, which I wrote about here. I haven’t gotten around to packaging it up for various Linux distros yet, but I will eventually. So, you can now get your daily fill of DOIT jokes from twitter. Non-twitters can DOIT with RSS.
I wrote a couple of python script/libraries to auto-follow followers (CPAN was having some problems, so I didn’t use a perl script someone else had already written) and a fortune->twitter script. I will clean those up at some point soon and put them up somewhere.
Posted by knorby on November 29, 2008 under IT, Linux, Python, Solaris, coding, shell scripting |
I am mostly jotting this down so I can work on this later and to see if anyone has any suggestions.
I was thinking last night that I need to setup my own personal environment on new systems quite a bit, and that isn’t going to change anytime soon. I do not want to work off of something too centralized, as I really don’t have that option. I need to be able to maintain a setup on my home machines, various UChicago machines, and various other machines. In some cases, I just need a work environment for a short term period, such as on maclab machines (although I usually just ssh into one of the linux cluster machines and use X11 forwarding to load up XEmacs GUI goodness). sshfs isn’t an option, as it sucks, and FUSE isn’t always installed everywhere (for good reason). Subversion might serve nicely, but I can’t assume that it is installed, as it often isn’t; I tend to think Subversion or other repository systems shouldn’t be used for much beyond software development. I also need to worry about various differences in systems. I can always install software to the system or to my home directory, and various UNIX flavors have their own quirks, especially Solaris. So what I want is an initializing setup script that downloads and extracts a basic environment from some central server. Everything in this set of scripts should be sectionalized. There should be some decent metadata format (probably some XML format) to store information about these sections and on the sections installed. There should be some update system on top of that. In the case that a package management system is available, the system should be able to use it, and as fall back, download and install a few specific packages into my home directory. Things like python would use already existing systems for setting this sort of thing up. Given that the system would assume almost nothing, most of the initial system would probably need to do processing on the server side. Other than that, the only software that the system would assume would be ssh, bash, and tar (maybe). This thing will take time, but I think it would be useful for a long time to come.
Posted by knorby on November 26, 2008 under Python, google, internet |
After reading the comments on a story on reddit on IQs, I became curious about how IQs are reported on the internet. A few people were saying that when they see someone mention their IQ on the internet, it is usually above 130. The explanations given were along the lines of people lying, biased online tests, and segmentation in where people browse. I was curious what sort of frequencies the different IQs are mentioned, so I wrote up a little python to get the google search results for IQs 50-199 (I would have included lower values after seeing the result, but I choose to go the scraping route rather than gdata, which ends up getting you blocked by google, something I didn’t know). I ran the number with the word “iq”; I think there may be better queries, but simple seemed good enough. Here are the results, plotted with matplotlib:

I found these kind of surprising. Most of the result counts were around 6 million, but there were a few sharp drops. I was especially surprised by 100 and 130, since, if memory serves, 100 is the 50th-percentile for IQs and 130 is the 99th; I would expect a greater count on these two, since more sites would include those numbers while explaining the scale; instead, there are large drops. Weird. I don’t think there is any connection between these results and anything proposed on reddit either.
Posted by knorby on October 23, 2008 under Python, coding, javascript |
I have always enjoyed putting as much work as possible into a line, especially in higher level languages. In the crazy javascript (speaking of which, MochiKit 1.4 was finally release!) system I wrote (still need to put the final touches on that….), I was thrilled when I was able to combine all of the parts of my system into one one line. Since I mainly program in python, list comprehensions and generator expressions make it pretty easy to use one liners a lot, and at least with generators, it often means that is efficent too. Basically, what I am trying to say is that I love the one liner. Last night, while I was bored while doing my discrete homework, I came up with a memoized factorial function one liner. The standard, niave version is straightforward:
fac = lambda n: int(n==0) or n*fac(n-1)
Since I was bored, I wanted to see if I could memoize that expression, and still keep it in one line. Here is what I came up with:
fac_dict, fac = {}, lambda x: ((x in fac_dict or fac_dict.update({x:(lambda n: int(n==0) or fac(n-1)*n)(x)})) and False) or fac_dict[x]
which can be simplified slightly to:
fac_dict, fac = {0:1}, lambda x: ((x in fac_dict or fac_dict.update({x:(lambda n: fac(n-1)*n)(x)})) and False) or fac_dict[x]
There are still issues with recursion depth for large values, but a helper function could probably solve that.
I needed to write something for my blog, as I have gotten out of the habit, and this thing seemed as good as anything.
Posted by knorby on May 24, 2008 under ACM, Chicago, GSoC, Python, globus, google, personal, uchicago |
The ACM (just Borja really) organized a trip to Google Chicago, where all of the Google Summer of Code students who were accepted from UChicago (and in the US) gave lightening talks on our projects, which included me. The other GSoC students were Marcus Westin, Jordon Lewis, and Nick Edds. I put up my talk, as well as a more general page for my project on my CS site. Marcus and I both have projects with the Globus Alliance, so I was quite happy that he went before me, as I didn’t have to explain what Globus is. My project is fairly straight foreword to explain and I still don’t know the Globus Toolkit (GT) that well, so I couldn’t answer too many questions, and I ended up going under in time. Everyone seemed most interested in Nick’s project, since it is on the 2to3 tool in python, and a decent amount of the audience used Python, some with a great deal of dedication (it was at Google after all). I am pretty excited to see how Nick’s project turns out; we both went to the talk that his mentor, Collin Winter, gave at PyCon on the tool and the issues that Nick is working to fix.
The Chicago office’s engineering crew is dominated by subversion developers (in the small selection of software I like), but most of the presentations were about most unrelated projects. Ben Collins-Sussman discussed a VM for interactive fiction games like zork (I’ll still play my zork on the SDF TWENEX Machine; the version of zork installed is from 1981!). Karl Fogel, not a current Google developer, but subversion developer and good friend of the other googlers, gave a talk on script he wrote to help track patches from non-core developers based on logs. He put up some stats on the differences between subversion and GNU Emacs as projects; it further straightened my reasoning for using XEmacs. I went to a Russian choir concert the night before, as I had to go to a concert from a genre I don’t have any familiarity with, which he apparently was in; what a small world I live in. Brian Fitzpatrick gave a shortened version of the keynote he have at PyCon on balancing functional complexity with usability in software. Like all the other talks I have heard him give, it was an excellent talk; he has one of the best uses of slide shows I have seen, and I always end up thinking about the talks much later. There was also a talk from a developer for Blogger (he said he was now on feedburner); I would give his name, but I can’t remember it at the moment. I talked to him for a bit; I think my social awkwardness was in full swing at the time. I asked him about something I read on Valleywag about Google adding some preference search rankings with Blogger (I can’t find the post at the moment; I will link to it if I do); as I am sure is the case, he said that Google does no such evil. He also mentioned that Google crawls its own site with the same bot, which makes sense, but I hadn’t thought about it before. I wish I knew Blogger better, as I used it once for something else and had a couple thoughts about its workings.
It was a fairly awesome evening. I was very sleep deprived after one of my harder weeks here, so I was defintely in a strange state for the entirity of the thing. My thanks and appretiation go out to Borja and Google for this event. Apparently, my glorious face might end up on the GSoC blog or the Google open source blog.