A very weird experiment involving word embeddings.

Rainbow Zero is a... toy? widget? thingy? that allows you to explore a part of the space defined by the GloVe word vectors.

Before trying to explain any of the details of this strange gadget, a brief description of how to interact with it:

  • After a bit of loading you'll be present with a description of "where" you are and which "way" you are "facing". You will always begin at the word "rainbow" and be facing the word "zero".
  • You'll also get a list of words that are "nearby" and a little log of messages.
  • You can "move" by clicking the "Step zero-ward" button -- this will move you along the direction you are currently facing.
  • If you want to turn toward a different direction, enter a word in the text box above the "Turn" and "Teleport" buttons, then click "Turn".
  • If the word is known to Rainbow Zero your direction will be changed so that you are "looking towards" the word you entered from your current position; otherwise you'll be told the word isn't recognized.
  • You can also click the "Teleport" button to change your current location to the location of the word you've entered.
  • Note that everything is a bit slow, since there is rather a lot of mostly un-optimized JavaScript-y math happening

General Strangeness

I haven't explored too much myself, but here's a couple of things to try / notice:

  1. Teleport to "king" and turn to face "bagel", then step bagel-ward a few times; now teleport back to "king" and face "doughnut" and take a few steps... apparently there is some sort of forest between the monarchy and toroidal foodstuffs.
  2. Getting from "dance" to "beer" takes just two steps!
  3. If you start from "tomorrow" and move towards "yesterday" there seems to be a pretty vast track of "yesterday"-ness that continues in that direction.

Wait, What?

Because of the work I've been doing on WordSalad recently I've had a lot of interest in word lists, natural language processing, and so on. One interesting technique that one comes across when searching for these kinds of things is the idea of word embeddings or word vectors.

The general idea (and I'll speak vaguely because the topic is complicated and I am not at all an expert) is that you can use some variation of machine learning to examine a large corpus of text and assign each word in the text a (long) set of numeric values. Words with similar meanings (or at least similar usage) should wind up getting similar sets of values.

Since these representations of words are just vectors of decimal numbers, you can do math on them -- and that sometimes reveals interesting things about the relationships between the words and about what has been encoded. You might, for example, be able to compute something like "tree" - "branch" and wind up with a word vector that is close to "trunk".

It's not unusual to see these word vectors flattened down by one or another sort of analysis into 2D vectors that can be plotted on a graph -- and the result is usually a handful of satisfying clusters of words that are related to one another. That feels somehow ... incomplete to me though...

These word vectors normally have 50, or 100, or hundreds of elements and I wondered what it might mean to try and explore them more directly; in their own nonsensically high-dimensional space.

I started out by grabbing the GloVe 6B package of word vectors. The data set is pretty straightforward, but I did do a bit of preprocessing and clean-up (mostly to remove entries that were more confusing than interesting -- URLs, arbitrary numbers, and the like -- and restructure the whole thing as JSON).

The strangest part of the whole thing, really, is that there just aren't any words that describe being in or moving through a space with 50 dimensions (though I did eventually trim it down to just 25, mostly so the performance was... tolerable). That's what ultimately leads to the idea of turning to "face" or "look at" a word; you can't meaningfully go up or north but you can, at least in some sense, go towards a word that exists in the space.

Anyway, this is all a bit long-winded, but hopefully it explains something about what is going on with this strange project. I don't know that there is any use for this thing (I suppose it does let you find synonym-like words -- but it's probably the least efficient way to do that ever invented), but it is a very unusual way to navigate a computer's attempt at understanding language.

Leave a comment

Log in with itch.io to leave a comment.