Synesthetic Phonetics Part 2

Use this fiddle to color your own lyrics!


Github for CLI


In my previous blogpost I described a program I wrote that colorizes the phonetic patterns in song lyrics. I included a few examples that demonstrated what the program can do, but I did not give others the opportunity to play with its functionality. After a year and change, I decided to implement the changes necessary to expose this nifty program to the public. The result is above, enjoy!


If you’re looking for implementation details related to the colorization of lyrics, see my previous blogpost.


When AWS Lambda was announced, I knew it would be the perfect avenue for this project. The serverless, atomic nature of Lambda suited the needs of a straight-forward I/O application like this one. This section will describe how I retrofitted my program to work with AWS Lambda. To summarize, this application uses a javascript front-end to POST a request to an AWS API Gateway endpoint which routes the input to an AWS Lambda function that returns the colorized lyrics in HTML. Using AWS Lambda + API Gateway did not come without challenges. I tried a number of misguided workarounds to get my application working, but I’ll only describe what actually worked.

Supporting Libraries

The AWS Lambda application environment is a specific flavor of 64 bit Amazon Linux that does not contain 32 bit libraries. I originally developed this application on 64bit Ubuntu that had compatibility for 32 bit libraries. eSpeak, the application I used to convert text to phonetic symbols, is not available on the Lambda flavor of Linux as well. I had to get a portable version of 64bit eSpeak. Solving this problem was a headache and a half, but the solution ended up being relatively straight-forward. I pulled the source for eSpeak off SourceForge and spun up an EC2 instance running the flavor of Linux Lambda uses. There I compiled the source for eSpeak with a few tweaks. eSpeak by default expects dictionary files to be located in /usr/share/espeak-data , but a Lambda program doesn’t have permissions on those folders. I made a config change to expect the dictionary files in the folder where the program is executed. eSpeak also expects an audio library to execute because one of its main function is text-to-speech. Luckily commenting out the audio compilation steps in the Makefile worked without breaking everything. Lessons learned:

  1. Don’t try to port an application to another environment without stepping into that environment.
  2. Don’t fear the Makefile, but respect the Makefile.
  3. Portable programs often require nontrivial additional work.

Supporting Python3

At the time of writing, AWS Lambda did not support Python3. This recently changed.

AWS Lambda also only supports a few specific runtimes for executing the application: Python2.7, Java8, Node, and .NET. In my application’s case, I was using a mix of bash, Python3 and an application fetched from APT (espeak). Luckily, you can call the system via shell script from the limited selection of runtimes, and Python3 just so happens to be available. So essentially, Python3 is technically supported with this workaround. I believe other scripting languages like Ruby are also available using this hack.

I/O Workarounds

AWS Lambda really wants you to use JSON. It’s understandable; I’m sure most applications using Lambda are talking to other applications. However, I wanted: input:text, output:html. This blogpost was very useful for returning HTML from Lambda + API Gateway. In order to accept plain text as an input, I used the Integration Request template “Method Request passthrough” which maps the body of the request to a JSON element “body-json”. The Lambda application I wrote reads in this JSON element. You cannot avoid the JSON in AWS Lambda (easily).


I’m glad I decided to finish this project. This experience gave me the opportunity to learn about the intricacies of serverless deployments and how to make them work. I will certainly consider using serverless architecture providers like AWS Lambda in the future.

Synesthetic Phonetics

“Lose Yourself” by Eminem, phonetically colored.

This blog post explores synesthesia as it relates to music and lyrics. I wrote a program that colorizes song lyrics to expose the complex rhyming patterns used by talented lyricists.

The following is the lyrics to Eminem’s “Lose Yourself” paired with its International Phonetic Alphabet notation. I recommend listening to the song first or while scrolling through this frame to fully experience the patterns.


Synesethesia is really weird. To quote Wikipedia, synesthesia is “a neurological phenomenon in which stimulation of one sensory or cognitive pathway leads to automatic, involuntary experiences in a second sensory or cognitive pathway.” Some people see colors when they see letters or numbers, some see numbers in two or three dimensional space, and some even taste guacamole when they hear the word “Chipotle” (ok, maybe that’s everyone). I’ve always been fascinated by synesthesia and its possible practical applications. If numbers were innately colored, not just by individual digits, what kind of patterns would emerge? What if words were colored? Would it be easier to read? A form of synesthesia you’re experiencing right now is the association of these words with sounds in your mind… unless you’re mute. Let’s go further. Some forms of synesthesia combine audible experiences like music with colors. Could lyrics in music invoke color?

Patterns in Music

Music theory in a nutshell: music sounds good because it follows certain patterns. As demonstrated in my previous blog post, even a little structure can make random garbage sound good. The best music, however, creatively combines various structures together in ways that engage the listener. The most standard way to visualize music is through notes drawn on a set of bars that denote what is to be played. For the musically talented, this allows them to translate what they see into sound. As I write this post, I’m imagining sheet music colored. It would be ugly, but wouldn’t it be convenient if every note had its own color? Sure enough, someone thought of it first:

Colored musical notes. Source:

I’m surprised this isn’t more common. The most useful aspect of this coloring would be for notes that are far above or below the clef. I’d like to see something like this without the letters on a more complex piece. Another project for another day.

Poetry in Motion

The words that artists use in their music add an entirely new dimension to the art form. Combine poetry and music and what’s left is a song. Poetry is an art unto itself that has various techniques and patterns lyricists use to make the words sound more pleasing. The most common of which is rhyme, and there is an incredible amount of depth in the subject. Many songs use rhyme, but rap tends to rely on the synergy of words and their collective sounds more than any other genre. Check out this video for a demonstration of the outstanding detail in the rhyme of Eminem’s “Lose Yourself”:

This fantastic video on rhyming inspired me to write this program, and it made me wonder: What if each phonetic sound had a unique color and were superimposed over the lyrics. Wouldn’t it be cool to see the lyrical detail exposed in color?


So that’s what I did. I made a tool that colorizes the most common phonetic sounds in song lyrics and converts them to an HTML page for people to view. From here I’ll go into the technical details of how I wrote this program.

Words to Sounds

First, I began looking for ways to convert the complex English language into phonetic symbols like one might find in the dictionary. I knew that this task alone was an enormous undertaking, so I furiously googled for a program that already existed. After multiple attempts to use third party libraries or scrape Wiktionary, I found the solution under my nose. Linux distributions usually come preinstalled with a program called eSpeak. This made converting lyrics to the International Phonetic Alphabet as easy as

cat loseYourself | espeak --ipa -q

This funnels the lyrics into eSpeak and outputs the words in IPA. The -q operator prevents eSpeak from audibly speaking the lyrics. eSpeak is no Eminem.

eSpeak has some limitations. eSpeak implies an interesting, possibly European, accent and will never be able to capture slant rhymes. Some words may be translated into a phonetic sound that we don’t really use in America, but it’s close enough.

Sounds to Colors

The next challenge was converting the newly phonetic lyrics into some sort of meaningful color scheme. First, I needed a method of coloring. I looked into libraries that converted text to an image like ImagingBook python library, but I needed something faster. Then I remembered that the Linux terminal supports colors! Luckily, someone has already written a python library for formatting terminal output with color, termcolor. Next, I need to figure out how to color these lyrics.

I considered searching for rhyme patterns, but it became too complex. I decided to give each IPA symbol its own color and the individual colors would work together to expose these rhyme patterns. However, there are a limited number of colorings available to the terminal (6 colors and 6 color backgrounds). This limitation was unfortunate, but I think it helps keep the final product from becoming too cluttered. To cope with this limitation, I decided that vowel sounds were the most important for coloring rhyming patterns. I filtered out everything but the vowel sounds and vowel modifiers. IPA has a number of modifiers that change the way a character sounds. The most common is ‘ː’ which indicates a long vowel. I decided to pair the modifier with the associated vowel and treat it as a single unique sound character as well. For example, ‘uː’ represents the long ‘u’ sound.

Given these phonetic vowel characters, I found the 12 most used characters and gave them a unique color. I gave the most popular characters the background colors because they stand out the most. After I had a reliable coloring scheme I had a bunch of gibberish on the screen that was color coded for some reason. To make this readable, I printed out each plain English line before each colorized phonetic line. There was some additional black magic hackery to make the lines match properly, but that’s less interesting. Finally, I used this script to convert the ANSI output to html and added some of my own formatting. The end product is fairly readable and the patterns are evident. Here’s one of my favorite verses from the Wu Tang Clan’s “Da Mystery of Chessboxing”:


Synesthesia is a cool concept/phenomenon that can enhance our perception of reality. I think mixing senses helps us grok things more quickly, and there are plenty of other ways we can combine senses for our collective benefit. This tool is my first stab at it, but maybe I’ll make something better in the future. I have an Oculus Rift DK2, which I noticed has a distinct lack of smell-o-vision.

So this is the part where I link to the code on Github, right? The code is pretty nasty right now, so I think I’ll take some time to clean it up before sharing. However, if you have a sick rhyme that needs coloring, let me know.

I promise my next blog post won’t be about music!

Creating a Pop Ballad Generator

Randomly Generated Content

Procedurally Generated Content is a method of creating content using algorithms. It originally served to save space for video games on systems with limited memory, but game developers soon discovered that they could create near infinite unique experiences by procedurally generating content with the power of Random Number Generators.

Game devs continue to improve techniques of random content generation. Random numbers allow them to multiply the creative potential of their works. One of the most popular examples of random numbers powering content generation is Minecraft. Minecraft creates an entirely new world when a player starts a new game. While algorithms guide the creation of their world, each and every player’s world is unique. Minecraft systematically generates a wonderful hodgepodge of forests, oceans, deserts, tundra, caves, enemies, and treasures that are unique to each and every player. This trend has taken off and provided humanity with some of the world’s best gaming experiences.

A thing of beauty. Source:

Randomly Generated Music

I continue to enjoy games that use this type of content creation. In fact, it may be my favorite genre of games. I suppose it’s the result of my obsession that I began to think of other mediums where we could use random numbers to procedurally generate content. The first thing that came to mind was music. I’m not going to tell you that I’m the first person to think of this concept. However, I want to present to you my attempt at outsourcing musical creativity to the power of computers and random numbers.


Middle school band, where dreams are made

As a quick musical autobiography, I played piano for 2 years starting in 2nd grade and played trumpet for 4 years from 5th to 8th grade. In middle school I played in the jazz band, and our instructor introduced us to a beautiful thing, improvisation. While learning to improvise, students are given a set of rules: play at X beats per minute and use these seven specific notes (a key) in any octave. Given these rules, it isn’t very hard to do some basic improvisation with some minor proficiency in scales. By mixing up the rhythm and the seven notes given, improvisation almost comes naturally.

Pop knows best, right?

Have you ever thought that all pop sounds the same? Have you ever been frustrated with hipsters who say these things? Well, there might be some truth to their ire. There is a common formula (read: algorithm) to many of the pop songs we hear on the radio and television. This formula is the combination of any key and a special chord progression: I-V-vi-IV. It is truly stunning how many songs use this chord progression to drive their theme. Somehow it manages to capture the human ear in a special way that unites Western culture. Could the popularity of the chord progression simply be self perpetuating? Maybe, but I think there’s more to it than that. I’m not a musical theorist or any sort of sociologist; I’m a software engineer. So I’ll just write something in javascript that’s useful for about five minutes before people move on to the next thing on the internet.

Here’s a fun example of the four chords in action. There’s no denying its influence.

Bringing it all together

So now I’ve gone over a few concepts: Random content generation, improvisation, and the pop mega chord progression of your dreams/nightmares. Based on these concepts, let’s make some really naive assumptions.

  • Random content generation is awesome.
  • Improvisation is easy when given a beat and a key.
  • Music is easy to make in I-V-vi-IV.

Given these assumptions, anyone can make a hit pop record by randomly playing notes in a key while playing the pop chord progression in the background. Even a computer. I ran with this idea and created the music generator at the top of the page. After listening for a moment, it sounds like a chaotic pop ballad, hence the name!


Now I’ll go over the technical details of the project. If you’re not technical and/or familiar with music theory, this section may get a little hairy.

Picking a platform

I wanted to make a computer attempt to make random music in a key with a special chord progression. I also wanted people I know to be able to use it free of charge. The only platform I know that’s available ubiquitously in that manner is the web browser. Javascript runs on almost every web browser from your phone to your PC to your Mac. This availability made it perfect for my program. And luckily, it has a relatively simple library for artificial sound.

Determining frequencies

The javascript library for sound, AudioContext, allows you to create pure oscillators that take an input frequency and play a basic waveform until you tell it to stop. There isn’t any built in logic for musical notes, so I needed to set up the math to allow me to work within the framework of modern music. I was surprised to learn that there is a very specific equation based on a constant derived from a fractional exponent to determine musical notes. Given this constant, you can determine each note of a scale by taking the appropriate steps. Below is the equation for the nth half step in a scale given a base note , which in this case is the key.

Deriving scales

Given this equation, I was able to generate 7 note scales with a base key. Scales follow another formula to determine each note in the scale. There are 2 full steps followed by a half step, 3 full steps, and finally one half step. If this sounds completely foreign, take a look at a piano keyboard. Each directly adjacent key is a half step, so if there is a black key inbetween, the two keys are a whole step apart. The C scale is a great example because it uses no black keys.

A scale starting at C, E-F and B-C are half steps. Source:

Here’s the for loop I use to generate an array that I use to find the correct frequencies to play in a scale. The base variable is derived from a set of constant frequencies for each key and multiplied by the desired octave. The range is the number of notes in the scale we want to generate.

// Build scale
var buildScale = function () {
  notes = [];
  var freq = base;
  var step = 0;
  for (var i = 0; i < range; i++) {
    notes[i] = freq
    //handle half steps
    if (i % 7 != 2 && i % 7 != 6) {
    freq = base * Math.pow(a, step);

Chord Progression

Chord progressions, in a nutshell, are a sequence of chords played in the background of a song to help drive the theme and feel of the music. Chords can be loosely defined as 3 notes played simultaneously. Now that I have an array of notes to pick from, it’s easy to generically define the notes I need to play for each chord in the progression. Here is the data structure I use:

//I   V    vi     IV
//Standard Pop Progression
chordProg = [
  [0, 2, 4], //I
  [4, 6, 8], //V
  [5, 7, 9], //vi
  [3, 5, 7]  //IV

The Beat

You can’t make music without a beat. For simplicity, I use four beats per measure and change the chord on each measure. I originally implemented this music generator with 4 quarter notes per measure, but I knew that real improvisation mixes up the duration of notes along with their frequencies. Currently the program randomly chooses to change the note length minimumNote/quarterNote * 100 % of the time it progresses the length of a minimum note. The default setting for the minimum note is an eighth note, so it changes notes 50% of the time every eighth note. This isn’t the best variety, but I think it gives just enough to make it interesting without going off the rails. The random note length and time signature implementation certainly have room for improvement.

The Notes

Given the scale and the range, the program selects a new, random note in that range each time the program decides to change note length. This has the added benefit of further randomizing note length given the chance that the same note is played.

Chord Progression + Beat + Notes = Jam

Let’s walk through the melody function. To preface, the melody function is called using javascript intervals. Before the interval is defined, the oscillators melody, chor1, chor2, chor3 are started and continue to generate sound until the program is stopped. Every interval (defined in ms), the function is run again. The interval is defined by the minimum note that we expect to play. In the programs configurations, this is set to an eighth note. (An eighth note at 100 bpm is 300 ms.)

//run melody function based on minimum note length
melodyInt = setInterval(melodyFun, minNote);

//melody function
var melodyFun = function () {
  //chord progression    
  if (time % (notesPerMeasure * (qtrNote / minNote)) == 0 && time != 0) {
    chord %= 4
    chor1.frequency.value = notes[chordProg[chord][0]];
    chor2.frequency.value = notes[chordProg[chord][1]];
    chor3.frequency.value = notes[chordProg[chord][2]];
  //Random note length
  if (time % (Math.floor(Math.random() * (qtrNote / minNote))) == 0) {
    //random note 0 to range-1
    var note = Math.floor((Math.random() * (range - 1)));
    freq = notes[note];
    melody.frequency.value = freq;

Everything is based on an integer time and the minimum note. If you want to simplify the logic, imagine qtrNote == minNote. Through liberal use of the modulus function, we determine when to change chords and notes. The chords are changed every (predefined) 4 quarter notesPerMeasure. The chord integer runs through the 2D array we defined earlier to play the correct chord each measure by assigning the oscillators the correct frequency from the notes array we defined as our scale. The melody randomly decides to change notes at a rate dependent on the shortest note possible as discussed above. Then the function decides which note to play within the scale and range and sets the oscillator’s value to that frequency.

The Final Product

So there you have it, a random pop ballad generator. I’d like to note that this is the most tonally basic implementation possible. It uses four AudioContext oscillators (the maximum), three for the chords and one for the melody. I’m aware the code isn’t perfect. I’m not a javascript developer, and I don’t feel like refactoring it.

Play it for more than 5 seconds. I can’t say that it will sound amazing to you, but it will be unique to you.

Room for Improvement

There’s plenty of room for improvement. Here’s a few of my ideas:

  • More chord progressions
    • This shouldn’t be too hard to implement based on how the code is structured, but it isn’t there today.
  • Minimum note length toggles
    • I want to add a “solo” button that temporarily lowers the minimum note to a sixteenth note.
  • New time signatures
  • Pauses
  • Stock percussion?
  • Harmony? Unlikely given the current limits of AudioContext.
  • Move to Github (If there’s ever enough pressure I will, but right now it’s not a huge priority.)