Lore

Robots Weekly ?: RNNs 101 ?

Posted on 08/24/2018 by Kyle Mensing

What does it stand for? ?

So, what is RNN short for? Robot News Network, obviously. ::insert joke about major news networks here::

But seriously, RNN stands for Recurrent Neural Network. I’m sure that clarifies things. RNNs are neural networks that perform one action on each segment of the input data. Or, it is a model of one recurring action. Puzzling out the name doesn’t help as much as getting into the nuts and bolts of how these things work and what they do, so let’s move on.

How do they work? ⚙

RNNs process sequential data in order to provide an output. Sequential data is a string of information such as:

audio
text
DNA
video

These examples all have one thing in common, they unfold in a sequence that determines their meaning. Rearranging the words of Romeo & Juliet creates an entirely new story; or an impenetrable soup of words, but that’s not the point. In order for the model to “understand” the input it must process it in sequence.

I’m not going to get into the technical workings, but basically the model uses the context of earlier parts of the sequence to classify the current part. So given the sentence:

I would like a glass of orange ______.

The model will use the words and the order in which they appear to guess that the last word is “juice” (we hope).

If you want to nerd out on this stuff, check out this post on Hacker Noon.

What can they do? ?

I’d say RNNs are usually used to create or interpret things. Expanding on the list of the sequential data, RNNs can:

Perform speech recognition: turning an audio file into text
- this is how your phone takes dictation to write text messages from whatever you yell at it
Create music (or something approximating music)
Classify the sentiment of online reviews: for instance, a model could take the text of a review and generate a star review based on what the text says
- a lot of tech companies are developing this to do all kinds of things with reviews
Translation: take text in one language and output the same text in another language
- pretty sure this is what Google Translate is doing
Generate text
- like I showed in last week’s post

Let’s make some music! ?

Last week I showed off some RNN text generation. If you want to try your hand at it check out textgenrnn, there are links to a free, cloud based tool you can run it on.

This week let’s make some music. For this piece I used a model that had been trained on some jazz music. It turned the notes in the audio into values, turning this music generation into a math problem. It then runs these values through its layers to create a probabilistic model of the music.

What now? It generates probability levels that are how likely it thinks one value is to follow another. The letter “h” is pretty likely to follow the letter “t”, but the letter “z” is very unlikely to follow the letter “t”.

The model then uses these values to create it’s own piece of robo-jazz. Enjoy!

Want more AI-made music goodness?