Waxy.org
Waxy.org is the sandbox of Andy Baio, an independent journalist and programmer living in Portland, Oregon. I created Upcoming.org and some other stuff too.

Contact Me: log@waxy.org or waxpancake on AIM

Deconstructing Google Mobile's Voice Search on the iPhone

Posted Nov 18, 2008

I've experimented with audio transcription lately, but always with big, clumsy humans. I'd happily use cyborgs speech recognition software, but even today, automatic conversion of voice-to-text is still flawed. Naturally, I was intrigued when Google announced they were adding voice searching to their Google Mobile iPhone app.

Google's flirted with voice-to-text conversion in the past, with GOOG-411 and their Audio Indexing of political videos on YouTube. But this is the first time they're offering a web-accessible interface for speech conversion, albeit completely undocumented, so I decided to poke around a bit to see what I could find.

Over the last few hours, I've been analyzing the traffic proxied through my network, trying to reverse-engineer it to get to something usable, but I've hit my limits. I'm posting this with the hopes that someone out there can run with it and find out more.

Behind the Scenes

Here's my best guess: When you first start speaking into the microphone, the iPhone app opens a connection to Google's server, waits for you to finish talking, and then does a quick and dirty conversion into a smaller binary representation of the waveform. (And I do mean tiny. These files are between 100-300 bytes.) These binary files aren't the audio, read the Updates section below for more.

The waveform image is generated on the phone and displayed along with a "Working" indicator and the adorable "beep-boop" sounds. In the background, the binary file is being sent as a POST request to http://www.google.com/m/appreq/gmiphone. Here's what the headers look like:

POST /m/appreq/gmiphone HTTP/1.1
User-Agent: Google/0.3.142.951 CFNetwork/339.3 Darwin/9.4.1
Content-Type: application/binary
Content-Length: 271
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Pragma: no-cache
Connection: keep-alive
Connection: keep-alive
Host: www.google.com

The response from Google is an even smaller binary attachment. This is probably just an encrypted or compressed version of the converted text. In this case, for the words "chicken soup." These binaries are irrelevant — read the Updates section below for more.

HTTP/1.1 200 OK
Content-Type: application/binary
Content-Disposition: attachment
Date: Tue, 18 Nov 2008 13:06:53 GMT
X-Content-Type-Options: nosniff
Expires: Tue, 18 Nov 2008 13:06:53 GMT
Cache-Control: private, max-age=0
Content-Length: 114
Server: GFE/1.3

After receiving the binary response to the POST, a second request is triggered, this time a GET request to clients1.google.com with the converted voice-to-text string.

GET /complete/search?client=iphoneapp&hjson=t&types=t
    &spell=t&nav=2&hl=en&q=chicken%20soup HTTP/1.1
User-Agent: Google/0.3.142.951 CFNetwork/339.3 Darwin/9.4.1
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Pragma: no-cache
Connection: keep-alive
Connection: keep-alive
Host: clients1.google.com

The response is an array of search terms in JSON format, for use in search autocompletion.

["chicken soup",[["http://www.chickensoup.com/","Chicken Soup for the Soul",5,""],["http://www.chickensoupforthepetloverssoul.com/","Chicken Soup for the Pet Lover's Soul",5,""],["chicken soup recipe","489,000 results",0,"2"],["chicken soup for the soul","1,470,000 results",0,"3"],["chicken soup dog food","462,000 results",0,"4"],["chicken soup with rice","467,000 results",0,"5"],["chicken soup diet","453,000 results",0,"6"],["chicken soup from scratch","364,000 results",0,"7"],["chicken soup for the soul quotes","398,000 results",0,"8"],["chicken soup crock pot","604,000 results",0,"9"]]]

Aaand that's as far as I can get.

Help!

Unfortunately, until I can figure out the format of the binary request and response to/from Google, playing with the voice recognition features is out of reach.

How much processing is happening on the phone, and how much on Google's servers? If it's happening remotely, in what form is the audio being transmitted and the results being returned? As Ilya points out in the comments, the response binary file is too limited to even hold the text.

Any ideas on cracking this mystery would be hugely appreciated. Anonymity for Google insiders is guaranteed!

Updates

As several commenters figured out, and confirmed to me by Google, the audio is being sent to Google's servers for voice recognition. The two binaries I posted above aren't the actual transmission, and are actually identical for every query, so can be disregarded. Sorry about the red herring.

Gummi Hafsteinsson, product manager for Google's Voice Search, says, "I can confirm that we split the audio down to a smaller byte stream, which is then sent to Google for recognition, but we can't really provide any details beyond that." Responding to my request for a public API, he added, "I appreciate the suggestion to provide voice recognition as a service. Right now we have nothing to announce, but we'll take this feedback as we look at future product ideas."

Also, Chris Messina discovered some secret settings in the application's preferences file, including alternate color schemes and sound sets for "Monkey" and "Chicken." Beep-boop!

Next step: Can anyone figure out the format of the audio and spoof a request to Google? Some commenters think it's in AMR format, which makes sense.

25 comments

Yes We Did

Posted Nov 4, 2008 (Updated Nov 6, 2008)

(Credit: Michael Buchino, also available as a shirt)

9 comments

Girl Talk's Feed the Animals: The Official Sample List

Posted Oct 29, 2008 (Updated Nov 10, 2008)

Last month, I dissected Girl Talk's Feed the Animals using the list of samples lovingly collected by hundreds of Wikipedia users. But that was totally unofficial, a crowdsourced attempt to find musical needles in a giant mashup haystack.

Well, the official CDs were shipped out last week to everyone who donated more than $10. Inside, as promised, was the official sample list — a one-page insert with every single sample on the album. Steve Heil was the first to scan it and contact me.

Unfortunately, a huge block of printed small-caps text isn't very useful for my kind of fun, so I tried throwing into several OCR engines on WeOCR to turn the image into text. Tesseract gave the best results, but it was still a mess that needed quite a bit of cleanup.

Anyway, here it is. The complete list of all 322 samples in Girl Talk's Feed the Animals, available as a CSV, Excel, or Google Spreadsheets document.

Continue reading (227 more words)...
10 comments

Memeorandum Colors: Visualizing Political Bias with Greasemonkey

Posted Oct 10, 2008 (Updated Nov 10, 2008)

Like the rest of the world, I've been completely obsessed with the presidential election and nonstop news coverage. My drug of choice? Gabe Rivera's Memeorandum, the political sister site of Techmeme, which constantly surfaces the most controversial stories being discussed by political bloggers.

While most political blogs are extremely partisan, their biases aren't immediately obvious to outsiders like me. I wanted to see, at a glance, how conservative or liberal the blogs were without clicking through to every article.

With the help of del.icio.us founder Joshua Schachter, we used a recommendation algorithm to score every blog on Memeorandum based on their linking activity in the last three months. Then I wrote a Greasemonkey script to pull that information out of Google Spreadsheets, and colorize Memeorandum on-the-fly. Left-leaning blogs are blue and right-leaning blogs are red, with darker colors representing strong biases. Check out the screenshot below, and install the Greasemonkey script or standalone Firefox extension to try it yourself.

Note: The colors don't necessarily represent each blogger's personal views or biases. It's a reflection of their linking activity. The algorithm looks at the stories that bloggers linked to before, relative to all other bloggers, and groups them accordingly. People that link to things that only conservatives find interesting will be classified as bright red, even if they are personally moderate or liberal, and vice-versa. The algorithm can't read minds, so don't be offended if you feel misrepresented. It's only looking at the data.

For example, while Nate Silver of FiveThirtyEight may be a Democrat, he has a tendency to link to stories conservative bloggers are discussing slightly more often than liberal bloggers, so he's shaded very slightly red. (Geeks can read on for more details about how this works.)

Continue reading (971 more words)...
85 comments

Found Footage: Sarah Palin's 1984 Miss Alaska Pageant Video, Swimsuit Competition

Posted Sep 26, 2008 (Updated Nov 10, 2008)

Somehow, a 22-year-old University of Alaska student named Richard Millay got his hands on a videotape that's eluded the media since John McCain asked Sarah Palin to be his running-mate — original footage of her 1984 Miss Alaska Pageant.

Of course, this is all very frivolous and has nothing to do with the current campaign. But like Barack Obama's high school basketball footage, it's a little glimpse into the early life of a highly-visible national figure.

In the first part added to YouTube, he posted the portion from the swimsuit competition, prefaced by a brief introduction mentioning the demand for the "88 minutes of Alaska Gold."

Update: The original video was removed, but I managed to save a copy of the relevant footage without Richard's original intro. YouTube's removing every copy of this video, so I'm streaming the clip below from my own server. It won't be removed.

Continue reading (366 more words)...
54 comments

Kickstarter

Posted Sep 23, 2008

I wanted to take a moment to announce that I've joined the board of directors for Kickstarter, a brand-new startup based out of Brooklyn and Chicago.

Kickstarter aims to let creative people of all kinds — journalists, artists, musicians, game developers, entrepreneurs, bloggers — raise money for their projects by connecting directly with fans, who receive exclusive access and rewards in exchange for their patronage. More than just a fundraising app, Kickstarter's a publishing platform where project creators can communicate with the people that are supporting them. (Think Jill Sobule, A Swarm of Angels, or Sean Tevis.)

I was introduced to founders Charles Adler, Perry Chen, and Yancey Strickler by Caterina Fake back in June, and sealed the deal after a trip to NYC to meet the team. They're a great group of guys with a strong vision, and I feel lucky to be involved.

Ultimately, everybody should be able to support themselves doing what they love using the web, and I think Kickstarter will be a great way to get there. Expect to hear more on Waxy.org as launch day gets closer.

To help them on their way, they're currently looking for a CTO to join the founding team. I've been helping guide some of the technology decisions and building the development team, but we're looking for a passionate and talented person to devote themselves to the project full-time.

If you're interested, drop me an email or IM and I'll introduce you!

3 comments

Cheap, Easy Audio Transcription with Mechanical Turk

Posted Sep 22, 2008 (Updated Nov 10, 2008)

After recording last week's interview, I was left with a 36-minute MP3 and a profound feeling of dread. You see, I hate transcribing audio. I used to transcribe interviews in high school, and it's always tedious, taking upwards of eight times the length of the clip itself.

Bracing for a good four or five hours of rewinding and writing and rewinding, I remembered that this is The Future! So, instead, I tossed the job over to the global anonymous workforce at Amazon Mechanical Turk instead.

The result: my 36-minute recording was transcribed while I slept, in less than three hours, for a grand total of $15.40.

This is a fraction of the cost/time of any other transcription service online, including the Turk-driven Casting Words, though you potentially sacrifice some quality. In my experience, though, there were virtually no errors.

Here's how to do it yourself, with no programming knowledge required. The instructions below are verbose, but using my template, it shouldn't take you more than five minutes of setup per job.

Continue reading (1070 more words)...
59 comments

Interview with David Winton, Director of "Code Rush" Mozilla Documentary

Posted Sep 19, 2008

First, the bad news. Two days ago, I received a polite email from David Winton, the director of Code Rush, asking me to take the out-of-print documentary off of Waxy.org. As promised, I immediately complied.

Now, the good news — In my reply, I asked David if he'd mind being interviewed, and he agreed! He's an accomplished director and producer, the creator of the Big Thinkers series for TechTV, and the cofounder of Winton/duPont Films, located in San Francisco's Presidio.

We had a wonderful conversation about the film, which revealed for the first time that he's planning on not only re-releasing Code Rush digitally, but considering releasing the original outtakes (100 hours of footage) to the public domain on Archive.org.

I wish all my takedown notices were like this! Read on for the full interview, with selected clips from Code Rush, used by permission.

Continue reading (2864 more words)...
7 comments

Oddpost Co-Founder Launches Bandcamp, Publishing Platform for Musicians

Posted Sep 16, 2008 (Updated Sep 17, 2008)

Ethan Diamond, co-founder of the pioneering webmail service that became Yahoo! Mail, today lifted the veil on his new startup and gave me an exclusive first look.

Bandcamp is a free hosted publishing platform for musicians, taking the technical challenge out of setting up a site — transcoding music into different formats, streaming audio, analytics, payment processing, and so on.

Band websites are often pretty bad, hacked together by a friend of the band with Flash and Dreamweaver, or worse, by the record label. There are exceptions, but mostly, it's a sea of Flash intros, popup windows, mystery navigation, and 30-second sound clips.

Bandcamp is trying to change that, giving every album and track its own page with clean URLs and semantic markup, with the accompanying SEO benefits. Even before launch, they're topping Google results for many searches for song titles of participating bands.

As an infoviz geek, I'm particular fond of their analytics and audio visualizations. Detailed stats let bands track recent activity on their songs and albums, including where people are coming from, trend tracking, and which songs were skipped, played partially, or played in full. A number of real-time audio visualizations in Flash are available on each song's page, which can be shared and embedded on other websites.

Like Oddpost, the team's small and nimble — only four people, all splitting engineering and design duties. Co-founder/CTO Shawn Grunberger (also formerly with Oddpost and Yahoo! Mail) and two engineers working from Seattle and Vermont round out the distributed team.

Ethan was kind enough to sit down with me on launch day to talk about their inspiration and process developing Bandcamp.

Continue reading (1676 more words)...
14 comments

Computability: Steve Allen and Jayne Meadows' Computer Video from 1984

Posted Sep 14, 2008 (Updated Nov 10, 2008)

Election coverage, natural disasters, and Wall Street meltdown got you down? Let's go back to a simpler time — 1984! It's morning in America again, and the dawn of a new information age.

Fortunately, one unlikely celebrity couple is here to guide us through the brave new world of spread sheets, data banks, and modems. In Computability, an instructional VHS tape from 1984, comedian Steve Allen and actress Jayne Meadows "take us on a light-hearted but detailed tour of the ways a home computer can change your life by simply using the correct software packages to suit your needs."

The video was originally inspired by the couple's Grammy-nominated "Everything You Wanted to Know About Home Computers," a vinyl LP released by Casablanca/Polygram Records in 1983. The LP's completely unavailable, but thanks to Sammy Reed's wonderfully strange podcast, I was able to recreate the full album. (Stream it below or download the 11 MB MP3.)

Continue reading (131 more words)...
6 comments
« September 2008
Waxy Links
Ads via The Deck
November 18, 2008
Bike Hero, biking a Guitar Hero level in the real world — most likely a commercial viral, and maybe even fake, but does it matter? beyond awesome
Chuck Klosterman reviews Chinese Democracy — mostly posting this just to beat Rex to it
The A.V. Club's 27 popular websites that became books — though they missed Belle de Jour, The Washingtonienne, Fucked Company, Fark, and ZUG
Speed Guitar goes to the Los Angeles County Museum of Art — every hour, on the hour, for one solid minute of metal complete with gothic arch and smoke machine
MGMT's "Kids" on the iPhone Ocarina — "the iPhone Ocarina officially replaces the recorder as the nerdiest instrument I can play"
Mena Trott responds to Valleywag article about their Disneyland vacation — my favorite was Space Mountain Snob
LIFE Magazine photo archive hosted by Google — millions of high-res photos, most never published
Amazon launches CloudFront, their pay-as-you-go CDN — very complementary with S3
November 17, 2008
John Hodgman, Jonathan Coulton, and the Long Winters perform "Tonight You Belong to Me" — "Thank you, normal-sized man."
Jerry Yang stepping down from Yahoo's CEO post — it never really fit him well, though I'll miss his e.e. cummings memos
Woman asks Apple community about an unusual iPhone glitch — no, raunchy photos don't accidentally attach themselves to outbound email
Greasemonkey script to pull WikiDashboard visualization into Wikipedia — I made a LazyWeb plea for this last week, and Paul Irish came through
Lee Byron's Fireflies, anaglyph 3D game for Mac — part of Kokoromi's Gamma 3D showcase of anaglyph games
Flickr Boundaries, tool to explore Flickr's shapefiles — read Tom Taylor's entry for more information
Cooking Mama, the Unauthorized PETA Edition — a strangely obscure target for their attention, with a petition to write to the game's publisher (via)
Boing Boing launches gaming blog, Offworld — good writing in a nice design from Brandon Boyer, former news editor of Gamasutra
"Violet" wins the Interactive Fiction Comp 2008 — play it online; glancing at the charts, it looks like Buried in Shoes was the most divisive
Trailer for J.J. Abrams' Star Trek prequel — looks surprisingly good, but I'm a sucker for origin stories; I even liked Enterprise
What would Depression 2009 look like? — Tim sums up the thought-provoking Boston Globe article
The Pirate Bay hits 25 million simultaneous peers — that's not unique people, but concurrent connections; Napster peaked at 26M users
Peter Hirschberg releases Adventure as a free iPhone app — related: Chasing Ghosts will finally be released on BitTorrent Showtime in December (via)
The Big Picture on the California wildfires — also: first-person coverage on Twitter and YouTube, like this freeway on fire and aftermath
Tim-Tams available at Target until March, first time available in the U.S. — best chocolate cookies ever, the Tim Tam Slam is a chocolaty revelation (via)
JS-909, a Javascript drum machine without Flash — through a hack, it even works in IE 6
November 14, 2008
Esquire's hosting Between, the new two-player networked game by Jason Rohrer — from the creator of Passage
"What's that buzzing noise from my BBQ?" — he thought he was killing a few bees, but ends up annihilating an entire colony (via)
November 13, 2008
Kottke explains how to embed high-quality YouTube videos — I knew how to save, link, and change the default, but the embedding hack was new to me
Web 2.0 Origami — lazyweb, please build a converter that creates folding patterns from an uploaded image
Pixar's Burn-E short on YouTube — here's an interview with the director
Valleywag folded into Gawker, all but Owen Thomas laid off — I won't miss it; they hurt a lot of good people and interesting projects in the quest for pageviews (via)

Andy Baio lives here. Some rights reserved, for your pleasure.