A Better Playlist Shuffle with Go

On the rare occasions that I hit “shuffle” in music players like Spotify, I don’t really want complete randomization. In fact, what I want isn’t very random at all. More than anything, my intention is “please play these songs with as much track-to-track variation as possible”.

Published April 19 2018, updated September 28 2018

Let’s visualize this with an example:

Nirvana: Smells Like Teen Spirit
Nirvana: Come as You Are
Nirvana: Heart-Shaped Box
Nirvana: Something in the Way
Nirvana: Aneurysm
Pearl Jam: Once
Pearl Jam: Daughter
Leviathan: Dawn Vibration
Leviathan: The Smoke of Their Torment
Nile: Black Seeds of Vengeance

A whopping half of the tracks are by the same band. As there are relatively few orderings of this list where there aren’t two Nirvana tracks back to back, a purely random shuffle has a good chance of playing me two Nirvana tracks in a row. I don’t want that - I want those Nirvana tracks spaced out evenly.

A good shuffle of this playlist would play Nirvana every other song, like this:

Nirvana
?
Nirvana
?
Nirvana
?
Nirvana
?
Nirvana
?

Similarly, I would like those two Pearl Jam tracks spaced as evenly apart as possible. If I was distributing them in a vacuum, I’d go for the following:

Pearl Jam
?
?
?
?
Pearl Jam
?
?
?
?

Unfortunately, both those places are already taken, so instead I’ll settle for:

Nirvana
Pearl Jam
Nirvana
?
Nirvana
?
Nirvana
Pearl Jam
Nirvana
?

There are two Leviathan tracks as well, I’d like those spaced out too:

Nirvana
Pearl Jam
Nirvana
Leviathan
Nirvana
?
Nirvana
Pearl Jam
Nirvana
Leviathan

The final track is plotted into the only available spot:

Nirvana
Pearl Jam
Nirvana
Leviathan
Nirvana
Nile
Nirvana
Pearl Jam
Nirvana
Leviathan

Attempting an implementation

Let’s have a go at this using, ehm, Go.

Disclaimer: I used this task to learn Go. Perhaps unsurprisingly, this sort of data processing isn’t really Go’s strongest suit. Feel free to see the companion post on processing Data With Clojure and Go for an implementation in a dynamic functional langugage.

Disclaimer 2: Go code should be indented with tabs. For presentational reasons, I’ve used spaces in the listings here, meaning you can’t expect to copy-paste it without editing. Don’t worry though, there’s a Github link to working code at the bottom.

Disclaimer 3: I probably should’ve used more pointers, but 1) I’m not a fan of mutation and will avoid it if possible, and 2) I don’t think it would make a big difference perfomance-wise for the kinds of datasets this code will work on.

Distributing the Artists

We’ll start with a function Distribute. Its task is to distribute the artist the specified number of times as evenly as possible. The passed in distribution gives us the total amount of tracks, but it’ll also inform us of which positions have already been filled.

func Distribute(distribution []string, artist string, tracks int) []string {
  stepSize := len(distribution) / tracks
  index := 0

}

We’ll now go into an “indefinite” loop that will run until we’ve distributed all the tracks. As with all things recursive and indefinite, we’ll start with the termination condition:

func Distribute(distribution []string, artist string, tracks int) []string {
  stepSize := len(distribution) / tracks
  index := 0

  for {
    if tracks == 0 {
      return distribution
    }
  }
}

Do not run this particular sample. When there are no more tracks we’re done, and return the (now updated) distribution. The implied mutation is a bit gross, but Go does not have built-in persistent data structures, so we’re going with the flow.

Next, we’ll add a clause that identifies a suitable position:

func Distribute(distribution []string, artist string, tracks int) []string {
  stepSize := len(distribution) / tracks
  index := 0
  remainder := stepSize

  for {
    if tracks == 0 {
      return distribution
    }

    if remainder == stepSize {
      // Place track
    }

    // Prepare next iteration
  }
}

With 5 tracks to distribute among 10 tracks, stepSize will be 2. We cannot simply drop the one track every 2 indexes, as they might be occupied. So we use the remainder to flag when to position a track. The remainder is initialized to the stepSize to force a track in the first available position.

func Distribute(distribution []string, artist string, tracks int) []string {
  stepSize := len(distribution) / tracks
  index := 0
  remainder := stepSize

  for {
    if tracks == 0 {
      return distribution
    }

    if remainder >= stepSize {
      index = IndexOf(distribution, "", index)
      distribution[index] = artist
      remainder -= stepSize
      tracks--
    }

    // Prepare next iteration
  }
}

When the remainder is at least as much as the required step size, we find the first available position from the current index. IndexOf is a function I wrote. In Go, every type has a “nil type”, and for strings this is "". This is why we look up the first index in the distribution that holds an empty string, starting at the desired index.

We then put the artist into the position we found, subtract the step size from the remainder, and reduce the number of tracks to distribute. To prepare the next iteration, we increment index and remainder:

func Distribute(distribution []string, artist string, tracks int) []string {
  stepSize := len(distribution) / tracks
  index := 0
  remainder := stepSize

  for {
    if tracks == 0 {
      return distribution
    }

    if remainder >= stepSize {
      index = IndexOf(distribution, "", index)
      distribution[index] = artist
      remainder -= stepSize
      tracks--
    }

    index++
    remainder++
  }
}

This will work so long as the stepSize is an integer. Let’s replace one of the Nirvana tracks with something else, and see what happens:

Nirvana: Smells Like Teen Spirit
Nirvana: Come as You Are
Nirvana: Heart-Shaped Box
Nirvana: Something in the Way
Pearl Jam: Once
Pearl Jam: Daughter
Leviathan: Dawn Vibration
Leviathan: The Smoke of Their Torment
Nile: Black Seeds of Vengeance
Deathspell Omega: Abscission

Distributing this yields:

[]string{"Nirvana", "", "Nirvana", "", "Nirvana", "", "Nirvana", "", "", ""}

Oops! Not what we wanted. The problem is that we defined stepSize as an int, but it really should be 10/4 == 2.5 in this case. Two type conversions, and we’re all good:

func Distribute(distribution []string, artist string, tracks int) []string {
  stepSize := float64(len(distribution)) / float64(tracks)
  index := 0
  remainder := stepSize

  for {
    if tracks == 0 {
      return distribution
    }

    if remainder >= stepSize {
      index = IndexOf(distribution, "", index)
      distribution[index] = artist
      remainder -= stepSize
      tracks--
    }

    index++
    remainder++
  }
}

Which yields:

[]string{"Nirvana", "", "", "Nirvana", "", "Nirvana", "", "", "Nirvana", ""}

Voila!

We can call the function twice to distribute two artists:

distribution := DistributeSimple(make([]string, 10), "Nirvana", 4)
distribution = DistributeSimple(distribution, "Pearl Jam", 2)
//=>
[]string{
  "Nirvana",
  "Pearl Jam",
  "",
  "Nirvana",
  "",
  "Nirvana",
  "Pearl Jam",
  "",
  "Nirvana",
  "",
}

Looks good. However, note that the order of the calls matter:

distribution := DistributeSimple(make([]string, 10), "Pearl Jam", 2)
distribution = DistributeSimple(distribution, "Leviathan", 2)
distribution = DistributeSimple(distribution, "Nirvana", 4)
//=>
[]string{
  "Pearl Jam",
  "Leviathan",
  "Nirvana",
  "Nirvana",
  "",
  "Pearl Jam",
  "Leviathan",
  "Nirvana",
  "",
  "Nirvana",
}

If we distribute the artists with fewer tracks first, there might not be room left to properly spread the ones with more tracks. We’ll create another function to distribute a whole list taking this into account.

Distributing all Tracks by Artist

Next up is a function that takes a playlist and “shuffles” it by evenly distributing tracks from the same artist. This is how we’ll do it:

Group all the tracks by artist
Sort the groups descending on tracks per artist
Loop through the sorted grouping and create the distribution
Loop through the distribution and replace artist name with actual track

The playlist will be a slice of this data type:

type Item interface {
  GroupingKey(string) string
}

That is, it’ll work for anything that implements the GroupingKey function that takes a string, like "artist", and returns a grouping key for the specific track, like "Nirvana".

1. Group Tracks by Artist

Because Go is a low-level language, and because it does not have generics, the language does not provide niceties like “group by” out of the box. Instead we’ll have to write our own for the desired data type. I wrote one that outputs the following type:

type GroupedItems map[string][]Item

In other words, a map of string keys to slices of Items. This type declaration is there mostly to improve readability of the consuming code. The GroupBy function takes a slice of Items and a function (Item -> string) and returns a GroupedItems:

func GroupBy(items []Item, fn func(Item) string) GroupedItems {
  res := make(GroupedItems)

  for _, item := range items {
    key := fn(item)

    if res[key] == nil {
      res[key] = make([]Item, 1)
      res[key][0] = item
    } else {
      res[key] = append(res[key], item)
    }
  }

  return res
}

Note that this function already introduces some light randomness into our algorithm - map access is random in Go. We’ll sort this later, but artists with the same number of tracks will come out at random positions because of how map access works.

2. Sort Groups Descending on Tracks per Artist

Next up, we need to sort the entries in the map by the length of the []Item slice. Boy, oh, boy, sorting is no easy task in Go. No generics, remember? We have to implement an interface in order to sort a collection. Because Go’s sort.Sort function is very generic, we not only have to implement the comparison of two elements, we also have to tell Go how to get the length of the collection and how to swap two items:

type Bucket struct {
  label       string
  occurrences int
}

type byOccurrences []Bucket

func (s byOccurrences) Len() int {
  return len(s)
}

func (s byOccurrences) Swap(i, j int) {
  s[i], s[j] = s[j], s[i]
}

func (s byOccurrences) Less(i, j int) bool {
  return s[i].occurrences > s[j].occurrences
}

Now we’ll loop through our grouped map, turn it into a slice of Buckets and sort it:

import (
  "sort"
)

func bucketsByOccurrences(grouped GroupedItems) []Bucket {
  res := make([]Bucket, len(grouped))
  i := 0

  for bucket, bucketItems := range grouped {
    res[i] = Bucket{bucket, len(bucketItems)}
    i++
  }

  sort.Sort(byOccurrences(res))
  return res
}

If you’re wondering why I’m mixing UpperCamelCase and lowerCamelCase, in Go, any UpperCamelCase name is public, while lowerCamelCase names are private.

3. Create the Distribution

We now have almost all the building blocks to create the distribution:

func DistributeByArtist(items []Item) []Item {
  grouped := GroupBy(items, func (track Item) {
    return item.GroupingKey("artist")
  })

  buckets := bucketsByOccurrences(grouped)
  distribution := make([]string, len(items))

  for _, bucket := range buckets {
    Distribute(distribution, bucket.label, bucket.occurrences)
  }

  return ??
}

4. Create the Final Playlist

But what to return? Currently we have this:

distribution //=>
[]string{
  "Nirvana",
  "Pearl Jam",
  "Leviathan",
  "Nirvana",
  "Nile",
  "Nirvana",
  "Pearl Jam",
  "Leviathan",
  "Nirvana",
  "Deathspell Omega",
}

grouped //=>
GroupedItems{
  "Nirvana": []Item{
     {"Artist": "Nirvana", "Track": "Smells Like Teen Spirit"},
     {"Artist": "Nirvana", "Track": "Come As You Are"},
     {"Artist": "Nirvana", "Track": "Heart-Shaped Box"},
     {"Artist": "Nirvana", "Track": "Something in the Way"},
     {"Artist": "Pearl Jam", "Track": "Once"},
     {"Artist": "Pearl Jam", "Track": "Daughter"},
     {"Artist": "Leviathan", "Track": "Dawn Vibration"},
     {"Artist": "Leviathan", "Track": "The Smoke of Torment"},
     {"Artist": "Nile", "Track": "Black Seeds of Vengeance"},
     {"Artist": "Deathspell Omega", "Track": "Abscission"},
   }
}

The final step is to loop over the distribution, replacing the first occurrence of "Nirvana" with the first Nirvana track, the second occurrence with the second track, and so on. We’ll call this ReifyDistribution:

func DistributeByArtist(items []Track) []Track {
  grouped := GroupBy(items, func (track Track))
  buckets := bucketsByOccurrences(grouped)
  distribution := make([]string, len(items))

  for _, bucket := range buckets {
    Distribute(distribution, bucket[0], len(bucket[1]))
  }

  return ReifyDistribution(distribution, grouped)
}

Which can be implemented as such:

func ReifyDistribution(distribution []string, groups GroupedItems) []Item {
  items := make([]Item, len(distribution))

  for i, key := range distribution {
    items[i], groups[key] = groups[key][0], groups[key][1:]
  }

  return items
}

Random Much?

We now have a completely deterministic shuffler. This is all fine and dandy, but a “shuffle” should have some randomness to it, right? Yes. We can introduce two levels of randomness:

Randomize the initial playlist
Randomize the starting point

1. Randomize the Initial Playlist

Randomizing the initial playlist serves to randomize the order the tracks by each artist are played.

Go doesn’t provide a ready to use “randomize collection” function, but it does offer rand.Perm(int), which creates a random permutation of the specified length. We can build on this to randomize the collection:

import (
  "rand"
)

func randomize(src []Item) []Item {
  dest := make([]Item, len(src))
  perm := rand.Perm(len(src))

  for i, v := range perm {
    dest[v] = src[i]
  }

  return dest
}

We’ll call this from our new, random wrapper around DistributeByArtist:

func ShuffleByArtist(items []Item) []Item {
  return DistributeByArtist(shuffle(items))
}

2. Randomize the Starting Point

We can view the resulting playlist as a circle, and then randomly select a starting point. This will help ensure that every permutation of our playlist does not start with Nirvana, at the cost of one potentially weakly distributed section (original end -> start). That seems like a small price to pay for a good amount of perceived randomness:

func startRandomly(items []Item) []Item {
  index := rand.Intn(len(items))
  return append(items[index:], items[:index]...)
}

func ShuffleByArtist(items []Item) []Item {
  return startRandomly(DistributeByArtist(randomize(items)))
}

And there you have it - a playlist shuffler that respects the listeners desire for a random, but balanced listening experience.

Parting Thoughts

While we’ve come some way from randomize(playlist), our implementation still has weakenesses. Let’s consider an example output from our shuffler:

Deathspell Omega: Abscission
Nirvana: Something in the Way
Pearl Jam: Once
Leviathan: Dawn Vibration
Nirvana: Smells Like Teen Spirit
Nile: Black Seeds of Vengeance
Nirvana: Come as You Are
Pearl Jam: Daughter
Leviathan: The Smoke of Their Torment
Nirvana: Heart-Shaped Box

Looks good enough. However, let’s see the album names:

Deathspell Omega: Abscission - Paracletus
Nirvana: Something in the Way - Nevermind
Pearl Jam: Once - Ten
Leviathan: Dawn Vibration - Scar Sighted
Nirvana: Smells Like Teen Spirit - Nevermind
Nile: Black Seeds of Vengeance - Black Seeds of Vengeance
Nirvana: Come as You Are - Nevermind
Pearl Jam: Daughter - VS
Leviathan: The Smoke of Their Torment - Scar Sighted
Nirvana: Heart-Shaped Box - In Utero

Look at those Nirvana tracks: Nevermind, Nevermind, Nevermind, In Utero. Terrible! Even worse, take a look at those genres:

Deathspell Omega: Abscission - Paracletus / Black Metal
Nirvana: Something in the Way - Nevermind / Grunge
Pearl Jam: Once - Ten / Grunge
Leviathan: Dawn Vibration - Scar /Sighted, Black Metal
Nirvana: Smells Like Teen Spirit - Nevermind / Grunge
Nile: Black Seeds of Vengeance - Black /Seeds of Vengeance, Death Metal
Nirvana: Heart-Shaped Box - In /Utero, Grunge
Pearl Jam: Daughter - VS / Grunge
Leviathan: The Smoke of Their Torment - Scar /Sighted, Black Metal
Nirvana: Come as You Are - Nevermind / Grunge

Very poor distribution indeed. What our shuffler needs to do is to operate along multiple grouping keys. Ideally, we’d distribute by genre, then artist, then album. Had we done that, we migth have ended up with something like this:

Nirvana: Come as You Are - Nevermind / Grunge
Leviathan: The Smoke of Their Torment - Scar /Sighted, Black Metal
Pearl Jam: Daughter - VS / Grunge
Nile: Black Seeds of Vengeance - Black /Seeds of Vengeance, Death Metal
Nirvana: Heart-Shaped Box - In /Utero, Grunge
Deathspell Omega: Abscission - Paracletus / Black Metal
Nirvana: Something in the Way - Nevermind / Grunge
Pearl Jam: Once - Ten / Grunge
Leviathan: Dawn Vibration - Scar /Sighted, Black Metal
Nirvana: Smells Like Teen Spirit - Nevermind / Grunge

To achieve this, DistributeByArtist needs to become DistributeBy and take an additional parameter: a list of functions to group the playlist by. Imagine this call:

DistributeBy(playlist, []func(Item) string{
  func(c Item) string {
    return c.GroupingKey("genre")
  },
  func(c Item) string {
    return c.GroupingKey("artist")
  },
})

It would first create a distribution of genres. Before reifying the distribution with the tracks in that genre, it would recursively distribute all the songs in each genre by artist. If you’re curious how the code needs to change to support this, check out the Go shuffle code on Github.