Waxy.org
Waxy.org is the sandbox of Andy Baio, an independent journalist and programmer living in Portland, Oregon. I created Upcoming.org and some other stuff too.

Contact Me: log@waxy.org or waxpancake on AIM

The Faces of Mechanical Turk

Posted Nov 20, 2008

When you experiment with Amazon's Mechanical Turk, it feels like magic. You toss 500 questions into the ether, and the answers instantly start rolling in from anonymous workers around the world. It was great for getting work done, but who are these people? I've seen the demographics, but that was too abstract for me.

Last week, I started a new Turk experiment to answer two questions: what do these people look like, and how much does it cost for someone to reveal their face?

Answer #1. This is what Mechanical Turk looks like (click for full-size):

Answer #2. About $0.50.

Results

Here's my original request:

Upload a photo of yourself holding a handwritten sign that says "I Turk for ...", filling out why you turk. For example, "I Turk for Cash," "I Turk for My Kids," "I Turk to Kill Time," or whatever else you like. Be honest, be funny, be whatever you like.

As a good faith gesture, here's my photo.

If you have a webcam, you can simply go to Cameroid to snap a photo from your web browser, download the JPG, and upload it below. (Don't worry if the text is backwards, I can fix that myself.) DON'T provide any identifiable information, like your name or email, since that's a violation of MTurk policy.

The result will be used in a collage that can be found on my personal weblog, http://waxy.org. By uploading your image and accepting payment for the image, you give permission to me, Andy Baio, to use your image in all forms and media for any lawful purposes. (That's just cover-my-ass language. I'm almost certainly only going to restrict it to this one project.) The collage will show up there shortly after the HIT is complete. Thanks, everybody!

I started the task at $.05, but only two people responded in the first 24 hours. (And one of those was Joshua Schachter, who I'd told about the project.) Clearly, that was too low, so I increased it to $.25, receiving only eight submissions in 48 hours. (For reference, all 500 of my Girl Talk tasks were done in about an hour.) Increasing it to $.50 got me 20 more submissions in about 48 hours, after which it started to drop off quickly. I wasn't about to give dollar bills to random people for their photos, so I ended the experiment there. People aren't willing to give up their anonymity for cheap.

The final results: 30 people total — 10 women, 20 men. Almost all were white, mostly in their 20s and 30s. 21 said they turked for money, 9 for fun or boredom.

Thanks for pulling back the curtain, Turkers.

29 comments

Musicians Get Meta in Guitar Hero and Rock Band

Posted Nov 19, 2008 (Updated Dec 4, 2008)

There's something satisfyingly self-referential about watching talented musicians try to play their own music in Rock Band and Guitar Hero. Especially when they're worse than you.

Here's a list of every video I could find. Let me know if I missed any.

Anthrax's Scott Ian, "Madhouse" at Best Buy

"You suck. You're going to have to write easier songs... 20 years ago."

Continue reading (260 more words)...
8 comments

Deconstructing Google Mobile's Voice Search on the iPhone

Posted Nov 18, 2008 (Updated Nov 19, 2008)

I've experimented with audio transcription lately, but always with big, clumsy humans. I'd happily use cyborgs speech recognition software, but even today, automatic conversion of voice-to-text is still flawed. Naturally, I was intrigued when Google announced they were adding voice searching to their Google Mobile iPhone app.

Google's flirted with voice-to-text conversion in the past, with GOOG-411 and their Audio Indexing of political videos on YouTube. But this is the first time they're offering a web-accessible interface for speech conversion, albeit completely undocumented, so I decided to poke around a bit to see what I could find.

Over the last few hours, I've been analyzing the traffic proxied through my network, trying to reverse-engineer it to get to something usable, but I've hit my limits. I'm posting this with the hopes that someone out there can run with it and find out more.

Behind the Scenes

Here's what we know so far: When you first start speaking into the microphone, the app opens a connection to Google's server and starts sending over chunks of audio, almost certainly encoded with the open-source Speex codec.

The waveform image is generated on the phone and displayed along with a "Working" indicator and the adorable "beep-boop" sounds. In the background, a tiny file is being sent as a POST request to http://www.google.com/m/appreq/gmiphone. Here's what the headers look like:

POST /m/appreq/gmiphone HTTP/1.1
User-Agent: Google/0.3.142.951 CFNetwork/339.3 Darwin/9.4.1
Content-Type: application/binary
Content-Length: 271
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Pragma: no-cache
Connection: keep-alive
Connection: keep-alive
Host: www.google.com

The response from Google is an even smaller attachment. These two files are the same for every query, so don't contain any meaningful information.

HTTP/1.1 200 OK
Content-Type: application/binary
Content-Disposition: attachment
Date: Tue, 18 Nov 2008 13:06:53 GMT
X-Content-Type-Options: nosniff
Expires: Tue, 18 Nov 2008 13:06:53 GMT
Cache-Control: private, max-age=0
Content-Length: 114
Server: GFE/1.3

After the audio's sent to Google, they return an HTML page with the results and a second request is triggered, this time a GET request to clients1.google.com with the converted voice-to-text string.

GET /complete/search?client=iphoneapp&hjson=t&types=t
    &spell=t&nav=2&hl=en&q=chicken%20soup HTTP/1.1
User-Agent: Google/0.3.142.951 CFNetwork/339.3 Darwin/9.4.1
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Pragma: no-cache
Connection: keep-alive
Connection: keep-alive
Host: clients1.google.com

The response is an array of search terms in JSON format, for use in search autocompletion.

["chicken soup",[["http://www.chickensoup.com/","Chicken Soup for the Soul",5,""],["http://www.chickensoupforthepetloverssoul.com/","Chicken Soup for the Pet Lover's Soul",5,""],["chicken soup recipe","489,000 results",0,"2"],["chicken soup for the soul","1,470,000 results",0,"3"],["chicken soup dog food","462,000 results",0,"4"],["chicken soup with rice","467,000 results",0,"5"],["chicken soup diet","453,000 results",0,"6"],["chicken soup from scratch","364,000 results",0,"7"],["chicken soup for the soul quotes","398,000 results",0,"8"],["chicken soup crock pot","604,000 results",0,"9"]]]

Help!

Unfortunately, until we can isolate and decode the audio stream, playing with the voice recognition features is out of reach.

Any ideas on cracking this mystery would be hugely appreciated. Anonymity for Google insiders is guaranteed!

Updates

As several commenters figured out, and confirmed to me by Google, the audio is being sent to Google's servers for voice recognition. The two binaries I posted above aren't the actual transmission, and are actually identical for every query, so can be disregarded. Sorry about the red herring.

Gummi Hafsteinsson, product manager for Google's Voice Search, says, "I can confirm that we split the audio down to a smaller byte stream, which is then sent to Google for recognition, but we can't really provide any details beyond that." Responding to my request for a public API, he added, "I appreciate the suggestion to provide voice recognition as a service. Right now we have nothing to announce, but we'll take this feedback as we look at future product ideas."

Also, Chris Messina discovered some secret settings in the application's preferences file, including alternate color schemes and sound sets for "Monkey" and "Chicken." Beep-boop!

Next step: As Paul discovered in the comments, the Legal Notices page says clearly that the app uses the open-source Speex codec for voice encoding. Can anyone capture and decode the audio being sent to Google?

November 19: I rewrote most of this entry to reflect the new information, since it was confusing new readers.

29 comments

Yes We Did

Posted Nov 4, 2008 (Updated Nov 6, 2008)

(Credit: Michael Buchino, also available as a shirt)

9 comments

Girl Talk's Feed the Animals: The Official Sample List

Posted Oct 29, 2008 (Updated Nov 10, 2008)

Last month, I dissected Girl Talk's Feed the Animals using the list of samples lovingly collected by hundreds of Wikipedia users. But that was totally unofficial, a crowdsourced attempt to find musical needles in a giant mashup haystack.

Well, the official CDs were shipped out last week to everyone who donated more than $10. Inside, as promised, was the official sample list — a one-page insert with every single sample on the album. Steve Heil was the first to scan it and contact me.

Unfortunately, a huge block of printed small-caps text isn't very useful for my kind of fun, so I tried throwing into several OCR engines on WeOCR to turn the image into text. Tesseract gave the best results, but it was still a mess that needed quite a bit of cleanup.

Anyway, here it is. The complete list of all 322 samples in Girl Talk's Feed the Animals, available as a CSV, Excel, or Google Spreadsheets document.

Continue reading (227 more words)...
18 comments

Memeorandum Colors: Visualizing Political Bias with Greasemonkey

Posted Oct 10, 2008 (Updated Nov 10, 2008)

Like the rest of the world, I've been completely obsessed with the presidential election and nonstop news coverage. My drug of choice? Gabe Rivera's Memeorandum, the political sister site of Techmeme, which constantly surfaces the most controversial stories being discussed by political bloggers.

While most political blogs are extremely partisan, their biases aren't immediately obvious to outsiders like me. I wanted to see, at a glance, how conservative or liberal the blogs were without clicking through to every article.

With the help of del.icio.us founder Joshua Schachter, we used a recommendation algorithm to score every blog on Memeorandum based on their linking activity in the last three months. Then I wrote a Greasemonkey script to pull that information out of Google Spreadsheets, and colorize Memeorandum on-the-fly. Left-leaning blogs are blue and right-leaning blogs are red, with darker colors representing strong biases. Check out the screenshot below, and install the Greasemonkey script or standalone Firefox extension to try it yourself.

Note: The colors don't necessarily represent each blogger's personal views or biases. It's a reflection of their linking activity. The algorithm looks at the stories that bloggers linked to before, relative to all other bloggers, and groups them accordingly. People that link to things that only conservatives find interesting will be classified as bright red, even if they are personally moderate or liberal, and vice-versa. The algorithm can't read minds, so don't be offended if you feel misrepresented. It's only looking at the data.

For example, while Nate Silver of FiveThirtyEight may be a Democrat, he has a tendency to link to stories conservative bloggers are discussing slightly more often than liberal bloggers, so he's shaded very slightly red. (Geeks can read on for more details about how this works.)

Continue reading (971 more words)...
85 comments

Found Footage: Sarah Palin's 1984 Miss Alaska Pageant Video, Swimsuit Competition

Posted Sep 26, 2008 (Updated Nov 10, 2008)

Somehow, a 22-year-old University of Alaska student named Richard Millay got his hands on a videotape that's eluded the media since John McCain asked Sarah Palin to be his running-mate — original footage of her 1984 Miss Alaska Pageant.

Of course, this is all very frivolous and has nothing to do with the current campaign. But like Barack Obama's high school basketball footage, it's a little glimpse into the early life of a highly-visible national figure.

In the first part added to YouTube, he posted the portion from the swimsuit competition, prefaced by a brief introduction mentioning the demand for the "88 minutes of Alaska Gold."

Update: The original video was removed, but I managed to save a copy of the relevant footage without Richard's original intro. YouTube's removing every copy of this video, so I'm streaming the clip below from my own server. It won't be removed.

Continue reading (366 more words)...
54 comments

Kickstarter

Posted Sep 23, 2008

I wanted to take a moment to announce that I've joined the board of directors for Kickstarter, a brand-new startup based out of Brooklyn and Chicago.

Kickstarter aims to let creative people of all kinds — journalists, artists, musicians, game developers, entrepreneurs, bloggers — raise money for their projects by connecting directly with fans, who receive exclusive access and rewards in exchange for their patronage. More than just a fundraising app, Kickstarter's a publishing platform where project creators can communicate with the people that are supporting them. (Think Jill Sobule, A Swarm of Angels, or Sean Tevis.)

I was introduced to founders Charles Adler, Perry Chen, and Yancey Strickler by Caterina Fake back in June, and sealed the deal after a trip to NYC to meet the team. They're a great group of guys with a strong vision, and I feel lucky to be involved.

Ultimately, everybody should be able to support themselves doing what they love using the web, and I think Kickstarter will be a great way to get there. Expect to hear more on Waxy.org as launch day gets closer.

To help them on their way, they're currently looking for a CTO to join the founding team. I've been helping guide some of the technology decisions and building the development team, but we're looking for a passionate and talented person to devote themselves to the project full-time.

If you're interested, drop me an email or IM and I'll introduce you!

3 comments

Cheap, Easy Audio Transcription with Mechanical Turk

Posted Sep 22, 2008 (Updated Nov 10, 2008)

After recording last week's interview, I was left with a 36-minute MP3 and a profound feeling of dread. You see, I hate transcribing audio. I used to transcribe interviews in high school, and it's always tedious, taking upwards of eight times the length of the clip itself.

Bracing for a good four or five hours of rewinding and writing and rewinding, I remembered that this is The Future! So, instead, I tossed the job over to the global anonymous workforce at Amazon Mechanical Turk instead.

The result: my 36-minute recording was transcribed while I slept, in less than three hours, for a grand total of $15.40.

This is a fraction of the cost/time of any other transcription service online, including the Turk-driven Casting Words, though you potentially sacrifice some quality. In my experience, though, there were virtually no errors.

Here's how to do it yourself, with no programming knowledge required. The instructions below are verbose, but using my template, it shouldn't take you more than five minutes of setup per job.

Continue reading (1070 more words)...
70 comments

Interview with David Winton, Director of "Code Rush" Mozilla Documentary

Posted Sep 19, 2008

First, the bad news. Two days ago, I received a polite email from David Winton, the director of Code Rush, asking me to take the out-of-print documentary off of Waxy.org. As promised, I immediately complied.

Now, the good news — In my reply, I asked David if he'd mind being interviewed, and he agreed! He's an accomplished director and producer, the creator of the Big Thinkers series for TechTV, and the cofounder of Winton/duPont Films, located in San Francisco's Presidio.

We had a wonderful conversation about the film, which revealed for the first time that he's planning on not only re-releasing Code Rush digitally, but considering releasing the original outtakes (100 hours of footage) to the public domain on Archive.org.

I wish all my takedown notices were like this! Read on for the full interview, with selected clips from Code Rush, used by permission.

Continue reading (2864 more words)...
7 comments
« September 2008
Waxy Links
Ads via The Deck
January 6, 2009
The Perils of Zero-Gravity Videography — Matt Harding discovers hard drive-based camcorders don't work in zero-gravity (via)
Screenshot: 4chan hacks MacRumorsLive during Apple keynote — the 4chan thread shows how they found the admin interface, password hashes, and finally cracked a user's password
January 5, 2009
xkcd's Guide to Converting to Metric — even Liberia and Myanmar are mostly metric, compared to the U.S.
Crowdsourcing an Ethical Dilemma — Dolores Labs uses Mechanical Turk to answer the Trolley Problem
January 3, 2009
Stamen's Mike Migurski on extreme programming vs. interaction design — the linked interview is great
January 2, 2009
Jason Scott on the closure of AOL's online communities — like physical evictions, there need to be laws protecting community data in the event of closure
JPG Magazine to stop publishing, turn off website — with only three days notice; here's the response from Derek and the JPG community
December 31, 2008
Wikipedia over DNS — loony hack serves summaries of Wikipedia articles; also available as JSON and JS
Leap year bug caused every 30GB Zune to crash at 2am this morning — as strange as the Android bug that ran every keystroke as root
Metafilter's exhaustive tour of the early origins of Adult Swim — the Cartoon Network breathed new life into old cartoons, while constantly trying to find the next big thing
December 30, 2008
Infochimps' massive scrape of Twitter's friend network — Twitter gave their blessing on sharing the 56-million records, which includes 10M tweets and 220k hashtags
The Lonely Island's We Like Sportz — the sequel to Just 2 Guyz
Niall Kennedy documents the undocumented Google Reader API — whoops, this was three years ago; here's an updated version
Sakurako Shimizu's Waveform Jewelry — the "I Do" wedding band and Atari chip ring are cute, too
Fimoculous' 30 Most Notable Blogs of 2008 — an incredibly well-researched list, with related recommendations for every entry
December 29, 2008
DJ Earworm's United State of Pop 2008 — mashing up the top 25 singles of the year into a single song and video
Twit 4 Dead, four Twitter bots fight zombies in real-time — watch their collected activity here
Facebook sentiment mining predicts presidential polls — like StateStats, Facebook Lexicon is tons of fun
Giganews reports Usenet upload growth since 2001 — note this doesn't reflect Usenet popularity, but most likely the rise of huge Blu-Ray and HD rips
December 28, 2008
List of Starbucks employee jargon — culled from the Starbucks Gossip blog
December 27, 2008
Rocketboom covers the history of the Lip Dub — the Know Your Meme series is consistently well-researched and fun to watch
Jennifer 8. Lee on the history of General Tso's Chicken — different cultures each localized their own versions of Chinese food around the world (via)
Top 20 freeware games released by Cactus, this year — is Jonatan the most prolific game developer alive?
December 26, 2008
Paul and Storm finish their 25 Days of Randy Newman — hosted on Bandcamp, and now with the solo piano track used in each song
AutoPager, infinite scrolling for Firefox — love the idea, but too clunky for everyday use (via)
December 24, 2008
Net Cafe archives, dot-com nostalgia TV show from 1996-2002 — Sergei Brin in 2000 at the newly-opened Metreon, Mondo 2000 and Boing Boing, awkward Webby broadcasts, and hundreds of dead dot-coms (via)
The Offworld's best indie and overlooked games of 2008 — also: Gamasutra's top 5 indie games
Left 4k Dead — lo-fi zombie shooter in 4k of Java (via)
NORAD's Santa Tracker on Twitter — they just passed through Kazakhstan; also tracking on Google Maps and in 3D on Google Earth
December 22, 2008
ScummVM adds support for 7th Guest — I didn't realize they expanded into non-Scumm engines last year, including the Sierra AGI games

Andy Baio lives here. Some rights reserved, for your pleasure.