Find a path from to

Note: probably broken at the moment. This thing needs more and more memory as Github grows, and my poor Linode is way underprovisioned.

The wycats number: the Erdos / Bacon number for the Ruby community.

How it works:

  1. Yehuda Katz (wycats) has wycats number 0.
  2. If you and wycats have collaborated on the same public GitHub repository, then your wycats number is 1.
  3. Otherwise, your wycats number is the minimum wycats number of anyone you have collaborated with on a public GitHub project plus one.

Why wycats?
Paul Erdos worked with a whole bunch of different mathematicians, and so he makes a natural center for the mathematical community. Kevin Bacon starred in a whole bunch of different movies, and so he makes a natural center for the set of movie stars. Likewise, Yehuda Katz worked with a whole bunch of people on a whole bunch of important Ruby projects, so he makes a natural center for the Ruby community.
That's dumb. Clearly, so-and-so is the logical choice.
Sure! You can get your so-and-so number too; just type their GitHub username into the box.
Why don't I have a wycats number of 1? I totally have a fork of Rails that I've committed a bunch of stuff to.
Forks don't count. Your commits to Rails have to make it to the canonical repository. Same goes for forks of any other projects.
Can't I just make some public project on my repository, make a couple commits as me@example.com, and then a couple using wycats's email address?
Yes, but please don't. It's like peeing in the pool: you'll feel good for a minute, but if others notice, they'll think you're a jerk.

The technology behind this is composed of three main parts: the fetcher, the API, and the page.

The fetcher is the GitHub-crawling robot that's responsible for fetching collaboration data for public repositories. It's written in Ruby and is basically just a set of Resque jobs that use the GitHub API to populate the database and enqueue other Resque jobs.

The API is the webapp responsible for actually computing the paths between different authors. It's a Compojure application that provides a couple of API routes. It manages to be as fast as it is by building a graph out of all the authors and projects when it starts up and then just keeping that in memory. That makes it a bona fide memory hog, but it's the only way I could get it reasonably fast without using many many terabytes of disk to store the graph's predecessor matrix.

The page is this thing you're looking at right now. It's a Javascript + jQuery application written by some bozo who doesn't really do Javascript applications. It seems to work, but that's about all you can say for it.

The code for all three parts lives in the same repository; take a look if you're curious.