• Home
  • About Me
  • Contact

kanorben.net - blog

My personal blog on technology, programming, life, and the random

 

November 2007
M T W T F S S
« Feb   Dec »
 1234
567891011
12131415161718
19202122232425
2627282930  

Blogroll

  • Boing Boing
  • BorjaNet
  • Brian Mayer
  • Dean Armstrong’s Blog
  • Ellen Smith’s blog
  • Faraocious
  • Gross or Awesome?
  • Marcus Westin’s Blog
  • Nightmares of David Bowie’s Package
  • Paul Mantz’s Blog
  • Slashdot
  • Tomorrow with Alex Beinstein
  • Valleywag

Personal Sites

  • DOIT Fortune Database
  • My bookmark’s on del.icio.us
  • My CS account page
  • My Facebook Profile
  • My LinkedIn Page
  • My Picasa Albums
  • My Twitter
  • pyXSD
  • The SUCCESS Blog
  • UofC ACM Site

webcomics

  • Questionable Content
  • Saturday Morning Breakfast Cereal
  • The Perry Bible Fellowship
  • Welcome To The Future
  • xkcd

Meta

  • Register
  • Log in
  • Entries RSS
  • Comments RSS
  • WordPress.org
Add to Google Add to My Yahoo! Subscribe with Bloglines
Bloggers' Rights at EFF

Twitter Updates

    RSS My Del.icio.us

    • ledger
    • Git User's Manual (for version 1.5.3 or newer)
    • Don’t overuse classes in Python | The GITS Blog
    • BBC NEWS | UK | Magazine | The rival to the Bible
    • Elite Officer Recalls Bin Laden Hunt, Delta Force Commander Says The Best Plan To Kill The Al Qaeda Leader In 2001 Was Nixed - CBS News
    • How Laser TVs work at BrainStuff
    • uMac | University of Utah | Xhooks

    RSS My Facebook Posted Items

    • Elite Officer Recalls Bin Laden Hunt, Delta Force Commander Says The Best Plan To Kill The Al Qaeda
    • Language Fail
    • Safety Fail
    • Gnarls Barkley Crazy Theremin Jam
    • Domino's Scientists Test Limits Of What Humans Will Eat | The Onion - America's Finest News Source

    Digg is the Tabloid of the Internet

    November 6th, 2007 by knorby

    Digg is the Tabloid of the Internet
    I have always thought that the social dynamics of digg were a little odd to say the least, but I have never been able to put my finger on it. I don’t know quite why I even browse digg, but I do; at least some of the posted stories are interesting or useful. Digg is one of the few high traffic sites I have seen where headings like “BREAKING,” “AMAZING,” or some other word in all caps is somehow considered acceptable. I guess the most noticeable dynamic though is the pure sensationalism. It is hard to believe half of the stories posted. Sure, the Internet is famous for bullshit, but “web 2.0″ + pure bullshit seems to be in a category of its own. Perhaps digg is simply the combined expression of Internet culture, but I believe it is a force far darker.

    Posted in culture, digg, internet | No Comments

    Using Firefox to Screen Scrape from the Command-line

    November 6th, 2007 by knorby

    So, here is the problem. I want to be able to get the source for a page after it has been rendered by Firefox (that is, loading javascript manipulations have been made, etc…). In other words, I want to be able to serialize the DOM in Firefox, from the command-line. Essentially, I am trying to write a massive hack. There are few problems that need to be overcome first. For one, Firefox requires some display. Since I only really care about Linux/BSD/Sun systems, I have to go through X11 (speaking of massive hacks…). Basically, I need a dummy X11 session. I don’t care what is displayed, I just want to send it somewhere. VNC, fortunately, provides this interface. It is worth noting at some point that I have not fully written this yet (laziness + hard-ass school = project stagnation), but I have a very good idea of what it will do. Anyways, the display is one small part of the problem; the trick here is getting the DOM out. I had some fun here. Unfortunately, DOM serialization must be done through javascript. Gecko provides a really nice little tool: XMLSerializer. I am not aware of anything like it in another browser, which just further supports my belief that Firefox/anything Gecko-based is simply the lesser of evils (bad design being evil of course). Why Mozilla decided the mix of XUL (an xml format Mozilla came up to design interfaces) and javascript would be sensible things to build a browser around, I don’t know, but it is useful here. The normal browser interface can be found at chrome://browser/content/browser.xul. You can have a lot of fun loading lots of these inside each other (see image). If you load browser.xul with firebug, you can play around with all of Firefox’s standard functions, which is always fun.

    browser.xul window multiload

    If you are creating a tradition extension, I suppose you would want to look at this stuff as well, but it is especially helpful here. Once this set of deep Firefox functions has been revealed, the actual loading of page is rather trivial. The real problem is I/O. I need to be able to pass firefox the link I want to open from the command-line, and write it to a specified location. Fortunately, there is JSLib, which provides things like I/O in javascript. From here, the solution is simple. I just want to make a copy of browser.xul, and add a few scripts into it. I then want to parse GET arguments on this file when loaded, since I can pass these to Firefox from the command-line. I would want one for the url, and one for the output path. Of course, these would have to be escaped before they could actually be passed to Firefox. That’s it! I was planning on calling it FireScraper. Hopefully I can finish it soon.

    Posted in VNC, XUL, coding, design, firebug, firefox, internet, javascript, mozilla, screen scraping | 3 Comments

     
    Add to Technorati Favorites - Creative Commons License - © 2007 Karl Norby