Waxy.org
Waxy.org is the sandbox of Andy Baio, an independent journalist and programmer living in Portland, Oregon. I created Upcoming.org and some other stuff too.

Contact Me: log@waxy.org or waxpancake on AIM

Memeorandum Colors: Visualizing Political Bias with Greasemonkey

Posted Oct 10, 2008 (Updated Oct 12, 2008)

Like the rest of the world, I've been completely obsessed with the presidential election and nonstop news coverage. My drug of choice? Gabe Rivera's Memeorandum, the political sister site of Techmeme, which constantly surfaces the most controversial stories being discussed by political bloggers.

While most political blogs are extremely partisan, their biases aren't immediately obvious to outsiders like me. I wanted to see, at a glance, how conservative or liberal the blogs were without clicking through to every article.

With the help of del.icio.us founder Joshua Schachter, we used a recommendation algorithm to score every blog on Memeorandum based on their linking activity in the last three months. Then I wrote a Greasemonkey script to pull that information out of Google Spreadsheets, and colorize Memeorandum on-the-fly. Left-leaning blogs are blue and right-leaning blogs are red, with darker colors representing strong biases. Check out the screenshot below, and install the Greasemonkey script or standalone Firefox extension to try it yourself.

Note: The colors don't necessarily represent each blogger's personal views or biases. It's a reflection of their linking activity. The algorithm looks at the stories that bloggers linked to before, relative to all other bloggers, and groups them accordingly. People that link to things that only conservatives find interesting will be classified as bright red, even if they are personally moderate or liberal, and vice-versa. The algorithm can't read minds, so don't be offended if you feel misrepresented. It's only looking at the data.

For example, while Nate Silver of FiveThirtyEight may be a Democrat, he has a tendency to link to stories conservative bloggers are discussing slightly more often than liberal bloggers, so he's shaded very slightly red. (Geeks can read on for more details about how this works.)


Install it!

Greasemonkey users: memeorandum_colors.user.js
Standalone Firefox Extension: memeorandumcolors.xpi

After it's installed, go to any page on Memeorandum and wait a second for the coloring to appear. I hope you like it!


How It Works (Nerds Only)

The first challenge was getting the data. I emailed Gabe Rivera, and he graciously gave offered a full dump of every blog listed on Memeorandum. This didn't include relationship data, showing which blogs linked to which stories, so Joshua and I crawled the site instead. Using the historical archives, we took a snapshot of the site's homepage for every six hours for the last three months — about 360 total. With a Python script, Joshua scraped the links from the saved HTML to get the link data.

Armed with the spreadsheet of over 50,000 blogger-to-article relationships, we needed to somehow find correlations in the data. We used a method called Singular Value Decomposition (SVD), a method to break down complex data in matrices to its component parts. It's extremely flexible, used in applications as diverse as weather prediction, movie recommendations, genome modeling, clustering search results, and image compression.

Inspired by GovTrack's use of SVD to visualize the political spectrum for members of Congress, we attempted to do the same thing for political blogs.

Here's how Joshua describes the methodology:

I created an adjacency matrix, with discussion sites as the rows and the discussed articles as the columns. When a site discusses an article on Memeorandum, we fill in a 1 in that cell; everything else is left as zero.

Every site becomes a very high dimensionality vector into link-space. This is very difficult to visualize. (Unless your monitor displays many dimensions. Mine only has two.) Since a bunch of sites tend to link to the same groups in the same way, we don't need all those dimensions. So, very roughly, what SVD lets us do is reproject the points in space into a new coordinate system, so that the points that are similar are near each other and we know which dimensions are most important. We can take just the most significant ones.

We could use two or three for a nifty visualization, but we wanted to show the bias as a spectrum, which is just a single dimension. In this case, the second most significant dimension (v2) ends up corresponding to linking similarity. The first dimension (v1) corresponds to how much linking they do in general.

Curiously, when running the exact same analysis on Techmeme, the second most significant dimension ends up being Business vs. Technology. (The conservatives/liberals of the geek world?)

Did you get all that? If you'd like to try to figure out what the other dimensions represent, take a look at columns v3-v5 on the full spreadsheet below and let us know if you come up with anything. (We didn't have much luck.)

Once we'd realized that the second dimension (v2) highly correlated with political leaning, we uploaded the spreadsheet into Google Spreadsheets and created a new column with a normalized score, scaled between a range of -1 and 1. The spreadsheet, with all of the sources and their respective scores, is below. (Download the Excel document or CSV if you want to sort or filter the data.)

After deriving the scores, writing the Greasemonkey was straightforward. Google offers XML feeds for Spreadsheets, so I queried this public feed of our data using XMLHttpRequest, parsed it, and colored it based on the score.

If you have any improvements to the code, please pass them on by emailing me or IMing me using my contact information at the top of the page.


Conclusion

I'd love to know what dedicated Memeorandum fans think of this. For me, it makes the site much easier to skim. At a glance, I can see what left-wing and right-wing bloggers each find interesting and, more importantly, when there's an article that's of genuine interest to both parties. It's also interesting to quickly see which bloggers cross party lines, willing to link to stories that don't favor their own candidates.

I hope you like it, and please contribute your changes to make it better!


Further Reading

Puffinware's SVD tutorial is one of the most concise, coherent explanations of SVD I could find for the layman. Ilya Grigorik applied SVD to build a recommendation system in Ruby, with great explanations and source code. Simon Funk explains how he used SVD to tie for third in the Netflix Prize leaderboard (for a short time).

This was my first Greasemonkey script, and I found Mark Pilgrim's Dive into Greasemonkey invaluable. I highly recommend writing a couple scripts yourself; it's incredibly empowering to modify other people's websites.

A special thanks to Gabe Rivera for building Memeorandum and Techmeme and for supporting this little project.


Updates

October 10: J. Chris Anderson built a bookmarklet for use with non-Firefox browsers, or by anyone who just wants to test it out without installing an extension. This also has the benefit of working on sites beyond Memeorandum, like Google News. (Though, of course, it will only color sites that appear in our spreadsheet.)

October 11: Brendan O'Connor compared our unsupervised machine-derived rankings to human judgments of political bias on Skewz, and found there's a significant correlation. He released the code and full dataset on his entry.

72 comments

Found Footage: Sarah Palin's 1984 Miss Alaska Pageant Video, Swimsuit Competition

Posted Sep 26, 2008 (Updated Oct 6, 2008)

Somehow, a 22-year-old University of Alaska student named Richard Millay got his hands on a videotape that's eluded the media since John McCain asked Sarah Palin to be his running-mate — original footage of her 1984 Miss Alaska Pageant.

Of course, this is all very frivolous and has nothing to do with the current campaign. But like Barack Obama's high school basketball footage, it's a little glimpse into the early life of a highly-visible national figure.

In the first part added to YouTube, he posted the portion from the swimsuit competition, prefaced by a brief introduction mentioning the demand for the "88 minutes of Alaska Gold."

Update: The original video was removed, but I managed to save a copy of the relevant footage without Richard's original intro. YouTube's removing every copy of this video, so I'm streaming the clip below from my own server. It won't be removed.


As the future vice-presidential candidate parades on stage, an off-screen announcer reads her early biography: "Contestant #8, Sarah Heath. Sarah says that she wants to prepare for a career in television broadcasting by majoring in Telecommunications and Political Science. It's no wonder that she has also been recognized by Who's Who, since she has displayed her leadership in all areas, from academics to student politics to athletics, having led her basketball team to the championship at the state tournament. Ladies and gentleman, contestant #8, Sarah Heath."

I've emailed Richard asking for a brief interview, and will update here if he gets in touch. (Thanks to Jeff Milner for the original tip this morning.)

Continue reading (174 more words)...
54 comments

Kickstarter

Posted Sep 23, 2008

I wanted to take a moment to announce that I've joined the board of directors for Kickstarter, a brand-new startup based out of Brooklyn and Chicago.

Kickstarter aims to let creative people of all kinds — journalists, artists, musicians, game developers, entrepreneurs, bloggers — raise money for their projects by connecting directly with fans, who receive exclusive access and rewards in exchange for their patronage. More than just a fundraising app, Kickstarter's a publishing platform where project creators can communicate with the people that are supporting them. (Think Jill Sobule, A Swarm of Angels, or Sean Tevis.)

I was introduced to founders Charles Adler, Perry Chen, and Yancey Strickler by Caterina Fake back in June, and sealed the deal after a trip to NYC to meet the team. They're a great group of guys with a strong vision, and I feel lucky to be involved.

Ultimately, everybody should be able to support themselves doing what they love using the web, and I think Kickstarter will be a great way to get there. Expect to hear more on Waxy.org as launch day gets closer.

To help them on their way, they're currently looking for a CTO to join the founding team. I've been helping guide some of the technology decisions and building the development team, but we're looking for a passionate and talented person to devote themselves to the project full-time.

If you're interested, drop me an email or IM and I'll introduce you!

3 comments

Cheap, Easy Audio Transcription with Mechanical Turk

Posted Sep 22, 2008 (Updated Sep 23, 2008)

After recording last week's interview, I was left with a 36-minute MP3 and a profound feeling of dread. You see, I hate transcribing audio. I used to transcribe interviews in high school, and it's always tedious, taking upwards of eight times the length of the clip itself.

Bracing for a good four or five hours of rewinding and writing and rewinding, I remembered that this is The Future! So, instead, I tossed the job over to the global anonymous workforce at Amazon Mechanical Turk instead.

The result: my 36-minute recording was transcribed while I slept, in less than three hours, for a grand total of $15.40.

This is a fraction of the cost/time of any other transcription service online, including the Turk-driven Casting Words, though you potentially sacrifice some quality. In my experience, though, there were virtually no errors.

Here's how to do it yourself, with no programming knowledge required. The instructions below are verbose, but using my template, it shouldn't take you more than five minutes of setup per job.


Step 1: Prepare your audio.

First, I split my 35-minute audio file into seven five-minute MP3s. Why? Mechanical Turk workers are all working in parallel, so the more discrete tasks, the faster the job gets done. This also diminishes the risk of one bad worker ruining your whole job. (Though you're always allowed to reject bad submissions, and you'll never have to pay for those.)

I used the open-source Audacity to split the files, but you could just as easily use any audio utility or editing software. Optionally, you might want to make each clip overlap by a few seconds, so you'll be able to easily recognize where each segment of the transcript starts and stops.

Name the files sequentially. In my case, they were interview_1.mp3 through interview_7.mp3. When you're done, upload the files somewhere they can be downloaded publicly. You'll need the full URLs later.


Step 2: Design your HIT template.

Mechanical Turk jobs are called HITs — short for the dystopic-sounding "Human Intelligence Tasks." After you've signed up as a new Requester on Mechanical Turk, you can design a new template from the homepage using one of several samples. Choose the Default Template.

On the Properties screen, we'll write a short description of the task, define how many people we want to work on it, and how much we're willing to pay them.

For a five-minute MP3, I think allotting two hours per assignment is ample time, and I expired the entire HIT in 12 hours because I was in a hurry. As for pay rate, you'll need to determine the "Reward per Assignment" based on the difficulty of the task and what you think is fair. I chose $2.00 per five-minute MP3, or about $0.40/minute. Depending on the difficulty, you might want to try going higher or lower.

I only wanted one worker to attempt each clip, so I changed the "number of assignments per HIT" to 1. (If you want redundant transcripts for each clip, change this to 2 or 3... But be aware your costs will double or triple!)

After entering all this information, here's what my finished Properties screen looked like:

On the Design Layout screen, you design the template that gets displayed to each worker, using basic variables that will be substituted later. For this template, we make up only one variable named "$url." You can call it anything you like.

The basics you'll need are a title, some simple rules, the link to the audio file with a substitution variable, and a text form for the worker to type the transcription into. If you'd like to use my template HTML, here it is. (Make sure you change the path to your own audio files!)

Two things to notice in my example. First, the "${url}" variables will be substituted with values in the "url" column of the spreadsheet we'll create in the next step. Second, any form element you create will end up in your final output from Mechanical Turk, so don't worry about the naming. I called mine simply "transcription." Here's what the relevant part looks like in the final template:

Please transcribe this five-minute MP3:
<a href="${url}">;${url}</a>

Enter your transcription below:
<textarea name="transcription" cols="80" rows="30"></textarea>

For the worker's convenience, I also added an embedded Flash player for the MP3, but this is entirely optional. When you're done designing your template, it should look something like this:

On the next screen, make sure it looks the way you like, and click "Preview and Finish" to save the HIT template.


Step 4: Upload the data for your HITs.

Once we're done designing our template, we can select it to create a new HIT batch. We'll be creating a simple comma-separated file (.CSV) filled with the data that will be substituted into our template.

On the Publish tab, select the template you just created by clicking the "Select" button:

Now, Amazon generates a sample CSV for you to put the URLs to your MP3s in. Click the link to "Download a sample input file" and open the downloaded CSV in a text editor. If you've done everything right, it should look like this:

url
Hit1_url_data
Hit2_url_data
Hit3_url_data

Replace the "Hit1_url_data" lines with the full URLs to your own MP3 files. For me, this looked something like:

url
http://waxy.org/temp/phonecall_1.mp3
http://waxy.org/temp/phonecall_2.mp3
http://waxy.org/temp/phonecall_3.mp3

And so on. Save the CSV file, and upload it to Amazon. When you're done, your uploaded file should appear, with the number of input lines.


Step 5: Publish your HITs.

Select your uploaded input file, and preview the finished batch of HITs. You'll be able to page through each HIT, seeing exactly what workers will see. Use this opportunity to test that your audio files can be downloaded and heard properly. If it all looks good, click "Next" to confirm and publish your batch. This is what the final screen looks like:

If you don't have any money in your Amazon Payments account, you'll be prompted to fund it with a credit card. After you've paid, click "Publish HITs" and you're done!

Your HITs will publish out to the Mechanical Turk workers, who will find and work on your task. Depending on the length and number of your MP3s, expect some work back within an hour.

As they're working, you can browse and approve the results. The final output is an exported CSV, a spreadsheet of all the finished work that can be opened in Excel for your review.


Conclusion

You'd be insane not to use this for your own transcription projects. Absolutely nothing else comes close in price and speed.

One thought: I suspect it'd get even faster if you split clips into more pieces. I'd bet that splitting into one-minute segments would reduce the time by at least half. I'll bet you'd be able to command lower rates with smaller MP3s too, since the time commitment would be lower, driving more competition for the tasks. If anyone experiments along these lines, please let me know!

51 comments

Interview with David Winton, Director of "Code Rush" Mozilla Documentary

Posted Sep 19, 2008

First, the bad news. Two days ago, I received a polite email from David Winton, the director of Code Rush, asking me to take the out-of-print documentary off of Waxy.org. As promised, I immediately complied.

Now, the good news — In my reply, I asked David if he'd mind being interviewed, and he agreed! He's an accomplished director and producer, the creator of the Big Thinkers series for TechTV, and the cofounder of Winton/duPont Films, located in San Francisco's Presidio.

We had a wonderful conversation about the film, which revealed for the first time that he's planning on not only re-releasing Code Rush digitally, but considering releasing the original outtakes (100 hours of footage) to the public domain on Archive.org.

I wish all my takedown notices were like this! Read on for the full interview, with selected clips from Code Rush, used by permission.

Continue reading (2864 more words)...
5 comments

Oddpost Co-Founder Launches Bandcamp, Publishing Platform for Musicians

Posted Sep 16, 2008 (Updated Sep 17, 2008)

Ethan Diamond, co-founder of the pioneering webmail service that became Yahoo! Mail, today lifted the veil on his new startup and gave me an exclusive first look.

Bandcamp is a free hosted publishing platform for musicians, taking the technical challenge out of setting up a site — transcoding music into different formats, streaming audio, analytics, payment processing, and so on.

Band websites are often pretty bad, hacked together by a friend of the band with Flash and Dreamweaver, or worse, by the record label. There are exceptions, but mostly, it's a sea of Flash intros, popup windows, mystery navigation, and 30-second sound clips.

Bandcamp is trying to change that, giving every album and track its own page with clean URLs and semantic markup, with the accompanying SEO benefits. Even before launch, they're topping Google results for many searches for song titles of participating bands.

As an infoviz geek, I'm particular fond of their analytics and audio visualizations. Detailed stats let bands track recent activity on their songs and albums, including where people are coming from, trend tracking, and which songs were skipped, played partially, or played in full. A number of real-time audio visualizations in Flash are available on each song's page, which can be shared and embedded on other websites.

Like Oddpost, the team's small and nimble — only four people, all splitting engineering and design duties. Co-founder/CTO Shawn Grunberger (also formerly with Oddpost and Yahoo! Mail) and two engineers working from Seattle and Vermont round out the distributed team.

Ethan was kind enough to sit down with me on launch day to talk about their inspiration and process developing Bandcamp.

Continue reading (1676 more words)...
13 comments

Computability: Steve Allen and Jayne Meadows' Computer Video from 1984

Posted Sep 14, 2008 (Updated Sep 15, 2008)

Election coverage, natural disasters, and Wall Street meltdown got you down? Let's go back to a simpler time — 1984! It's morning in America again, and the dawn of a new information age.

Fortunately, one unlikely celebrity couple is here to guide us through the brave new world of spread sheets, data banks, and modems. In Computability, an instructional VHS tape from 1984, comedian Steve Allen and actress Jayne Meadows "take us on a light-hearted but detailed tour of the ways a home computer can change your life by simply using the correct software packages to suit your needs."

The video was originally inspired by the couple's Grammy-nominated "Everything You Wanted to Know About Home Computers," a vinyl LP released by Casablanca/Polygram Records in 1983. The LP's completely unavailable, but thanks to Sammy Reed's wonderfully strange podcast, I was able to recreate the full album. (Stream it below or download the 11 MB MP3.)

With an Apple II, a Kaypro 2, cheeseball computer animation and a grab-bag of corny jokes, this is classic computing from the VHS era. Keep an eye out for references to Wargames, hackers, Boy George, Ronald Reagan, and more.

I've highlighted the different sections and my own highlights in the video's comments, but feel free to add your own on Viddler.

Special thanks go to Dave Cassel from 10 Zen Monkeys for finding and loaning the VHS tape to me. Thanks to Colin Devroe at Viddler for the support for their brilliant service.

4 comments

Girl Turk: Mechanical Turk Meets Girl Talk's "Feed the Animals"

Posted Sep 10, 2008 (Updated Sep 11, 2008)

Girl Talk's Feed the Animals is one of my favorite albums this year, a hyperactive mish-mash sampling hundreds of songs from the last 45 years of popular music. Gregg Gillis created a beautiful, illegal mess of copyright clearance hell, which you should download immediately. (It's free, but I kicked in $20 for Gregg's legal fund and a copy of the CD.)

Last month, Rex Sorgatz asked about collecting metadata on the album for data crunching. After spelunking through Billboard's chart history, that sounded like my idea of a good time.

So I compiled all the data into spreadsheets, used Amazon's Mechanical Turk to collect some additional information, and pulled out a few charts. As always, I've provided CSV downloads for all the data along with the original output from Mechanical Turk, for those interested in experimenting with the platform.

Results

Here's the final spreadsheet with all the collected data. You can download the CSV or browse it using Google Spreadsheets. For more information about how the data was collected with Wikipedia and Amazon's Mechanical Turk, I wrote about my methodology in the next section.

There are 14 tracks on Feed the Animals, with a total of 264 sampled songs. "What It's All About" and "Like This" have 26 sampled songs each, tying for the most, while "Don't Stop" has the fewest at 11 songs. Overall, the album averages 19.8 songs sampled per track.

The timeline below shows where each sample was triggered across the entire album, as a percentage of the song's duration. (For example, a marker at the 50% mark on the 9th line means that a sample started halfway through track #9, "Hands In the Air.") You can get a sense of the flow of the album, how Gregg spaces samples apart and occasionally switches moods entirely by introducing three samples in quick succession.

Using the sample release dates collected from Mechanical Turk, the chart below shows the median sample age for each track. (The bars above and below each point represent the earliest and latest years for each track.) I was surprised to see a trend — the album uses relatively recent songs for the first three tracks, before taking us back to the late '80s and early '90s for the middle of the album, with the exception of "No Pause." Then, every song from track 9 to the end of the album gets progressively more modern. For the whole album, 1995 was the median year.

The chart below shows the sample release years in more detail, telling another story. Here, we can see how heavily Gregg uses samples from the last three years, and strongly avoids samples for the previous three-year period from 2001 to 2004. (Too old to be cool, but not old enough to be retro?)

I'm sure there's more that can be explored here, so feel free to send on your own analysis.

Methodology

Getting the sample list was easy. I took a snapshot of the album's Wikipedia entry and extracted all the samples using Excel's Text to Columns feature.

Now, I had a spreadsheet of all 264 songs sampled across 14 tracks, with each sample's original artist and song name. But to get the sample's release year, I'd need to go elsewhere. The Last.fm and Yahoo! Music APIs all support album release dates, but during testing, I found that the dates were unreliable. (Compilation albums and reissues led to incorrect dates, and some artist/song searches led to incorrect results.)

Instead, I decided to use human labor to fill in the gaps using Amazon's Mechanical Turk. I created a new request using the new web-based tools for generating HITs (or "Human Intelligence Tasks") from a simple spreadsheet.

I paid $0.02 for each request, with each song verified by two different workers. Each worker was asked to search for the song on Billboard.com, All Music Guide, Wikipedia, or Google, and fill in the original release year. Here's an example of one of the requests.

Within an hour, all but 4 answers were submitted. The median time to finish a request was an impressive 26 seconds. (Amazingly, over 110 answers were completed in under 10 seconds without any errors.)

For 193 songs, about 73%, the two workers agreed on the year, so were approved immediately. For the rest, 27% of the songs, the workers came up with different answers, so I checked them manually. (In hindsight, I should have required three workers per song to resolve different answers.)

Surprisingly, I couldn't find a correlation between the amount of time spent on each task and the error rate. Workers who made mistakes took just as long as the accurate workers.

The spreadsheet below is the source data from Amazon's Mechanical Turk. (View it on Google Docs or download it in Excel format.) The "raw" sheet is the default output from Amazon, while the rest of the sheets are my own edits, breaking out the final set of accepted answers, the responses that were immediately approved, and the ones that were contested.

Overall, it cost me $13.20 for all 528 answers and took a little over two hours, an hourly rate of about $1.64. Simple to use, affordable, and I'll almost certainly use it again — for something a little more interesting next time.

If anyone out there wants to take a pass at getting the sample endings, sample genres, or any other additional metadata with Mechanical Turk or otherwise, send it along and I'll add it to the spreadsheet. Thanks!

Update: If you're in the San Francisco Bay Area, you might want to wrangle an invite to Yahoo!'s Open Hack Day in Sunnyvale tomorrow. Hint, hint.

27 comments

Pirating the Olympics, Then and Now

Posted Aug 12, 2008 (Updated Aug 13, 2008)

Back in 2004, I wrote about how high-quality videos from the Olympics in Athens were being digitized and posted online, in defiance of the networks and the IOC's rules.

At the time, NBC's online coverage was restrictive by today's standards — mostly highlight clips and no live video, delayed until after the events aired on TV, and required a valid credit card to verify residency in the United States.

But that was four years ago! YouTube hadn't launched yet, HD-quality streaming video on Vimeo was three years away, and BitTorrent or HDTV were only popular with early adopters.

This year, it's much improved, albeit with some caveats. NBC's official video is great quality, if you and your computer can stomach Silverlight (unavailable on non-Intel Macs). Their coverage is fantastic, though still tape-delayed. And, because of IOC regulations forbidding international distribution, NBC won't allow you to download, embed, or transcode any videos for your iPod or phone.

Is this availability enough to satiate the pirates, and what does the quality look like compared to 2004? I went poking through Usenet and some public and private BitTorrent trackers to see.

Usenet

Back in 2004, the place to go for illegal Olympic videos wasn't BitTorrent, popular trackers like Suprnova, or mainstream P2P clients. The best coverage, surprisingly, was found in the old-school Usenet binaries. It was a mish-mash of events, skewed heavily towards events with bikini-clad women, Brazilians, or bikini-clad Brazilian women, but other popular events and the opening ceremonies also showed up.

Today, the event coverage in Usenet is just as sporadic, but the quality is dramatically better. Compare the three videos below. The first is a sample from the gymnastics high bar finals from the 2004 games, followed by the same footage of Michael Phelps' win from Saturday's 400m IM final, as seen on NBCOlympics.com and a 720p HDTV rip found in Usenet.

Size Comparison (See Actual Size)

Sample Videos (right-click to download):

  • Men's Gymnastics High Bar Finals - Usenet, 2004 (25MB MPEG1)
  • Men's Swimming 400m IM Final - NBCOlympics.com, 2008 (5MB MPEG-4)
  • Men's Swimming 400m IM Final - Usenet, 2008 (15MB MPEG-4)

Here's the full list of Olympics videos currently up on Usenet, as of this evening:

Olympic Games Opening Ceremony (720p)
Football - Group A - Ivory Coast vs. Argentina Extended Highlights
Football - Group B - Netherlands vs. Nigeria Extended Highlights
Football - Round 1 Highlights
Gymnastics - Men's Qualifying - USA
Shooting - Women's 10m Air Pistol Final
Swimming - Men's 100m Backstroke Semifinals
Swimming - Men's 100m Breaststroke Final
Swimming - Men's 200m Freestyle Semifinals
Swimming - Men's 400m Individual Medley (720p)
Swimming - Men's 4x100m Freestyle Final
Swimming - Women's 100m Backstroke Semifinals
Swimming - Women's 100m Breaststroke Semifinals
Swimming - Women's 100m Butterfly Final
Swimming - Women's 400m Freestyle Final
Volleyball - Women's Preliminaries - China vs. Switzerland

Most of these are in alt.binaries.tv, but some are also posted to alt.binaries.multimedia.sports. I'll update this list at the end of next week.

BitTorrent

But the trend for this year is clear — Usenet passed the torch to BitTorrent.

A quick search on Mininova or BTJunkie returns a huge list of every video found on Usenet, plus dozens more and growing hourly. Beyond public trackers, I've seen extensive activity on several private communities. On one of them, its members compiled a list of every event and were slowly adding their own recordings to create a massive archive of Olympics video.

And this is only Day 4! It'll be interesting to see how much of the Olympics was captured, digitized, and uploaded by the end of the games.

Also interesting: If this chart from Mark Ghuneim is accurate, the thirst for pirated Olympics coverage is greatest in China.

19 comments

Friendfeed and Flickr

Posted Jul 23, 2008

How often is Friendfeed hitting Flickr, and how many Friendfeed users are on Flickr?

We now have a glimpse into Monday's traffic, thanks to a snapshot provided by Kellan and Rabble's in their talk, Beyond Rest: Building Data Services with XMPP PubSub, presented earlier today at OSCON in Portland:

On July 21, 2008, Friendfeed hit Flickr 2.9 million times to get the latest photos of 45,754 users, of which 6,721 visited Flickr in that 24-hour period, and could have potentially uploaded a photo.

Three million requests for 6,000 updates. Clearly, polling isn't ideal. Don't miss the rest of the slides.

(Also, at its peak, Flickr is currently receiving 60 uploaded photos a second, "roughly 10 times the number of people born on Earth per second.")

3 comments
« July 2008
Waxy Links
Ads via The Deck
October 12, 2008
Ultima creator Richard Garriott launched into space — Lord British spending 10 days on the International Space Station for a cool $30 million
Moral psychology testing on Amazon Mechanical Turk — Brendan O'Connor's blog is one of my new favorites; he works at Dolores Labs
Conplot, a commandline plotter with ASCII art — for use with piping from sort|uniq -c and the like
October 11, 2008
Preview of Gomibako, like Tetris with garbage — every object has physical properties, so trash can be crushed, burned, or toppled
October 10, 2008
Drew on the financial crisis — see also: Woot's Google ads (via)
Gary Vaynerchuk on recession-proof marketing and dumb advertising — don't miss the part where he queries his UStream followers in real-time
You Fell Asleep Watching A DVD (via)
October 9, 2008
Content-aware scaling in Photoshop CS4 — from SIGGRAPH demo to product in a year (via)
Unicode Snowman for You — ☃
October 8, 2008
Tuttuki Bako, poke virtual characters in a little box — like Tamagotchi meets Levelhead
Kevin Mitnick on the indictment of Sarah Palin's email hacker — he also touches on his own recent encounter with U.S. customs
Portal: Prelude, extensive fan-made Portal mod, released a day early — Gamespy loved it, but warns that it's very hard
Inspired by xkcd comic, YouTube adds audio previews for comments — this works nicely for quick speech synthesis with simple URL hacking
NYT to close International Herald Tribune website — I remember when their 2000 redesign blew away everyone with impressive DHTML features
Steven Levy visits Jay Walker's insane personal library — funny, I keep all my priceless artifacts in cardboard boxes in the basement
October 7, 2008
Yahoo! Calendar finally, finally launches redesign — ten years in the making
Chuck Klosterman's Brief History of the 21st Century — like Kottke said, there's too much in here to like; related: Phone Sex AI (via)
YouTube in Super HD! — after it buffers, try clicking "Restart" to get it in sync (via)
Little Big Computer, a virtual electronic 8-bit calculator built in Little Big Planet — see also: Mechanical 5-bit Calculator in the Half-Life engine
October 6, 2008
Mail Goggles, Gmail tries to prevent late-night drunk emails — this could also be used to keep you from answering personal email during work hours
DJ Z-Trip's Obama Mix — very listenable pastiche of rock and hip-hop from Pink Floyd to Saul Williams with a strong political undercurrent
This American Life's Another Frightening Show About the Economy — followup to The Giant Pool of Money episode from May
Take on Me: The Literal Version — if songs sang what was happening in the music video (via)
TIGSource's Bootleg Demake competition winners — best game competition ever
Sarah Palin's evening gown entry to the 1984 Miss Alaska pageant — "In Alaska, we have mosquitoes."
Damn It Feels Good To Be a Banksta — Banksta 4 Life
The Big Picture on Yann Arthus-Bertrand's Earth from Above photographs — with convenient Google Maps links for each
October 4, 2008
Keith Loutit's tilt-shifted time-lapse videos — reminds me of an ant colony (via)
The VP Debate on Auto-Tune — it's got a beat and you can dance to it (via)
October 3, 2008
Flickr adds rainbow-vomiting panda feature to Explore — finally, some innovation in the photo sharing space (via)

Andy Baio lives here. Some rights reserved, for your pleasure.