Here’s the podcast episode of this article. Enjoy!
Matching is an important issue in modern life, so much so that the 2012 Nobel prize in Economics was awarded for their advancement. Finding the perfect match isn’t only a task or dating, although it’s obviously important in this area. We use matching algorithms to place med students at hospitals, for finding which patients should receive organs from donors, for evaluating which financial products someone buying their first home should own. Matchmaking has evolved beyond human intuition towards using the data we have on ourselves and others to make a connection. As the amount of information we acquire about how we live, work, and love increases we will lean more heavily on algorithms to match us well.
You can imagine a future where matching algorithms have detailed information on our desires, capabilities, needs, and preferences.
They might know a lot more about our health, wealth and happiness. Imagine needing to submit blood work to join a dating site, or showing your bank statements to be properly matched with an organ donation, or having to take a survey about your relative levels of happiness before applying for an apartment. In a world where we share the secrets of our personal lives freely online those examples might not sound too unreasonable or farfetched. But for others, the prospect of providing that information is dystopian. The data scientists building these algorithms might think that with enough data we can make perfect matches. But for anyone who has been picked last for a middle school basketball game, the feeling of isolation is hard to shake. And now the stakes are much higher than suffering through a bad date.
How the algorithms do the matching
There are two general approaches used when building these algorithms: collaborative filtering and content-based filtering.
Let’s say we built a website that recommends movies to users. And for each user, we know which movies they have liked in the past. We could use collaborative filtering to match two users, say Alex and Bess on their respective tastes in film. So if Alex and Bess have the same taste in movies and Alex has watched and liked The Notebook then we could confidently recommend it to Bess, even if she had never heard of the movie previously. In addition to identifying the similarity between users, this kind of recommendation system can make suggestions based on items. We have all read the line: “Users who liked this item also liked item X.”
The data science part of this task lies in the creation of the similarity score. There are different ways of identifying similarity but a common method is cosine similarity. This is a measure of distance between two vectors. (Christian Perone has an excellent blog post on similarity) And the greater the angle between these vectors the more dissimilar they are from each other. Imagine building a vector of properties representing movies in our database. If we were to visualize the cosine similarity scores between these movie vectors it might look something like the following:
Let’s say instead that our movie recommendation company has a lot of information on the movies themselves. We might know the kind of strategy involved, the difficulty of playing it, the theme of the movie. If this is true then we can use content-based recommendations to make guesses about what kinds of movies Alex and Bess might like. So if Alex liked Titanic which has the tags of “romance”, “love”, “vintage” then he might enjoy movies with similar tags…such as The Notebook. Because Alex has identified which kinds of movies he liked in the past we can confidently recommend films of similar style to him in the future. Note that in this case the relationship between Alex and Bess is not considered.
To summarize these strategies: for collaborative filtering, we need to know a lot about our users and particularly how these users relate to each other and we don’t need a lot of information about our products; for content-based recommendation systems we need to know a lot about our products and we don’t care as much about user relationships.
Both of these strategies can work as the basis for recommendation systems and have been used to build successful businesses. Which strategy should your business use? Well, it depends on the kinds of data that you can collect about your users and products. With content-based recommenders, you need a lot of tagged data on your products which is sometimes expensive to obtain. Song recommendation website Pandora is attempting to build a content-based music recommendation system by tagging many various attributes of songs, which is a labor-intensive process.
Since astrological compatability is such a thing, I wonder why dating apps don’t use that as part of the matching algorithm.
— Soulja Girl (@OGNellyV) March 12, 2018
However, the Spotify algorithm has arguably been much more successful in it’s matching which is a more advanced form of collaborative filtering. It has been so successful that it’s studied as part of computer science courses. The following is an excerpt from a blog post on a Cornell course studying this algorithm:
Spotify isn’t unique in their music recommendation as most other music streaming sites like Pandora, Soundcloud, etc. have similar algorithms, however Spotify’s is unique in that it has a much more complex algorithm which takes more factors into account. For example, Pandora’s recommendation algorithm recommends songs almost solely on the binary ranking (thumbs up=like or thumbs down=don’t like) that you’ve given songs before and based on those adjust its recommendation accordingly. Spotify, however, combines numerous factors in its decision to recommend and group certain songs. For one, it analyzes playlists that other users have created and, based on songs you’ve already listened to repeatedly or saved to your library, suggests songs frequently grouped with it. For example, if a playlist contains 7 of your favorite songs, Spotify would recommend another song in that same playlist considering you and the playlist’s creator share similar tastes. Secondly, the algorithm takes your “taste profile” into account. Your “taste profile” uses music analytics to break down your music taste into the most niche of musical subgenres to make the most accurate suggestions in the future. Finally, the last ingredient is Spotify’s version of Google’s PageRank algorithm that we learned about in Networks I. Similarly to how Google used clicks as votes, this algorithm analyzes a user’s personal use of a song from the Discover Weekly playlist of the week(s) prior to see what songs the user likes to hear that song alongside and uses what songs they incorporated from Discover Weekly as votes. Then by analyzing the user’s personal playlist and seeing how they group songs together the Spotify algorithm computes these 3 main factors (along with a few less significant factors) to suggest new songs. The beauty of the algorithm is that it isn’t stringent and adapts to each week and shift in the user’s streaming history to suggest songs that might fit the user’s current mindset. While not necessarily the traditional matching model of strict preferences and matching to single items, Spotify’s algorithm shows how a complex network of users and their individual preferences can be used to match others to similar data. Source
But there is something else we could do entirely.
When a new user or new piece of content comes into our system we could introduce a bit of randomness into the equation.
And this issue highlights the importance of including serendipity in our recommendation system
If there is no randomness in our system then users will be forsaken to be recommended the same kinds of items all the time. Sure they might like this content but what about other content they know nothing about. Perhaps you would like the taste of some exotic new food but neither you nor your friends have ever tried it. How would you know? We have all felt unfairly rated by others at times. We didn’t get enough sleep the night before a critical math class, we had the flu the day of the basketball tryouts. We should not be forever assigned to ranks in these areas because of poor performance on a certain day or because we liked a certain item thoughtlessly.
As a data scientist creating these matching algorithms you need to acknowledge that if you’re too stringent you can damn individuals to forever live in a certain category. The most successful matching algorithms introduce a degree of randomness to their suggestions. It’s important to recognize that we don’t know everything about everyone. In our movie website example, there are other films individuals might enjoy if they just had the chance to watch them. This movie site example is benign, people don’t die or make significant life choices because they were recommended a bad film or missed one they really would have loved. In other areas where matching systems are being deployed the effects of poor matching can ripple through individuals’ lives forever.
How to beat the algorithms
“Our matching algorithms have identified you as a great candidate”
_looks at job req_
* 10 – 12 years engineering exp
* Masters or PHD in CompSci
* Experience with EDA, C++
I don’t have any of those things. Your algorithm is broke af.
— Cecy Correa ⭐ (@cecycorrea) February 27, 2018
There are other, darker, aspects of matching we need to consider. Suppose you don’t think you’ll be able to compete with others in your space as you are. Perhaps you don’t think you have the right education to get looked at for a new job even though you have non-quantifiable skills that make you the perfect fit. What do you do when the algorithms are blocking you because the data about you is limited, incorrect, or incomplete?
Knowing how these algorithms work means having an edge in getting past them.
To get around applicant screening bots you could include many different kinds of keywords in your resume and cover letter. If you’re a single man looking for a date on a local matching platform you can embellish your height (to some extent). If you want your Itunes podcast to rank higher you can release a podcast each day for days in a row to give you some juice.
These algorithms have become such a significant part of our lives that some individuals have rightly become a bit obsessed with how they might appear to the system. We’ll leave you with a story to think about: