Fuzzy pandas


Back in my graduate school days, I would sometimes play Eternal Lands, a free MMORPG. Once I had finished school, I stopped playing for several years, until recently when my son discovered the game and started playing. As I was exploring with my old account the other day, I noticed that sometimes a trading bot or NPC is selling an item for less than another bot/NPC is paying for that item, creating an opportunity for profit (through arbitrage). I developed some code to find all of these opportunities at any given moment. This required some interesting data manipulation, web scraping, and fuzzy text matching.

The complete code for this small project can be found on my github, but I think the most interesting part for most readers will be how to extend the capabilities of pandas dataframes so that they can be merged in a “fuzzy” way. If you want to skip to those details, see the section below titled “fuzzy merging with pandas.”


System overview

Here are the general steps used to find the biggest opportunities for arbitrage:

1.  Pull from this page a list of all items currently being bought and sold by bots/NPCs, which looks something like this:


2.  Merge the “selling” part of the dataframe with the “buying” part, to create a dataframe of opportunities, as follows:

oppo = df[df.Action == 'Selling'].merge(
    df[df.Action == 'Buying'], on = 'Item'

Filter this new dataframe to only cases where the buying price is higher than the selling price:

oppo = oppo[oppo.Price_y > oppo.Price_x]

3.  For each item, find its weight and stackability by webscraping the Eternal Lands Wiki. This will be useful to know because my character can only carry up to 260 units of weight at a time, or up to 35 items if the items are not “stackable.”

This is the step that requires fuzzy merging, because the names of items scraped from the Eternal Lands Wiki do not always match perfectly with the names in the opportunities table. For example, a “Fluffy Summoning Stone” in the opportunities table is called a “Fluffy Rabbit Summoning Stone” in the Wiki.

The line of code to fuzzy merge the opportunities table with the table of weights is as follows. More details on this are given later.

oppo = oppo.fuzzymerge(
    weights, left_on = 'Item', right_on = 'WikiPage'

4.  Finally, the opportunities for trading between bots are sorted according to how much each trip from one bot to the other is worth, based on per-item profit, weight, etc. The largest few opportunities are then reported in plain English.



When I ran the system recently, the top opportunity given was as follows:

5000gc: Buy 20 Red Dragon Scale from Locuas (MM(C1) 84,115) 
for 55000gc and sell to mufo (MM(C1) 66,113)

This means that 5000 gold coins of profit could be made by buying 20 red dragon scales from a bot called Locuas in Morcraven Marsh (“MM(C1)”) at coordinates (84, 115) for 55000 gold coins…


…and then selling the 20 red dragon scales to a bot called mufo, who happened to also be in Morcraven Marsh, at coordinates (66, 113).


It was only by lucky coincidence that the two bots happened to be in the same part of the game world in this example; in general, they could be quite far apart, which means that some domain knowledge is necessary to weed out cases where the opportunity is too small when compared to the time or effort required.


Fuzzy merging with pandas!

Again, the complete code for my system is on github, but the way I enabled fuzzy merging in pandas deserves special attention. The following code adds a new fuzzy merging method to the standard pandas dataframe object:

import pandas as pd
from difflib import get_close_matches

# Define the fuzzy merging method for dataframes
def fuzzymerge(self, right, on = None, left_on = None, right_on = None, 
               how = 'left', cutoff = 0): 
    '''Fuzzy merging. Same as standard pandas merge, but the 
    `on` columns do not have to match exactly. Parameter `cutoff` 
    controls this, e.g.: 
        * If cutoff = 0 (default), a match is always found.
        * If cutoff = 1, only a perfect match is acceptable.'''

    # Preliminary handling of `on` parameters
    if on is not None:
        left_on, right_on = on, on
    elif left_on is None or right_on is None:
        raise ValueError('Either `on`, or both `left_on` '
                         'and `right_on`, must be specified.')
    if type(left_on) != str or type(right_on) != str:
        raise TypeError('Must fuzzymerge on a single '
                        'column name in each frame.')

    # Fuzzy matching function
    right_values = right[right_on].unique()
    def fm(s):
        ret = get_close_matches(s, right_values, 1, cutoff)
        try: return ret[0]
        except: return None

    # Build intermediate dataframe for matching
    matches = self[[left_on]].drop_duplicates()
    matches['__matchcol__'] = matches[left_on].apply(fm) 
    matches.dropna(subset = ['__matchcol__'], inplace = True)

    # Return results
    return self.merge(matches, on = left_on, how = how).merge(
        right, how = how, 
        left_on = '__matchcol__', 
        right_on = right_on
    ).drop('__matchcol__', 1)

# Add fuzzy merging capability to all dataframes
pd.DataFrame.fuzzymerge = fuzzymerge

For example, the following line would merge dataframes df1 and df2 by fuzzy matching column col_A in df1 to column col_B in df2:

df1.fuzzymerge(df2, left_on = 'col_A', right_on = 'col_B')


Closing thoughts

It would be fun to factor in the travel distance between each pair of bots when reporting the largest opportunities, along with a few more subtle aspects of the game’s mechanics, but I think at this point I’ve spent enough time on this silly little project.

The most broadly useful thing to come out of this, at least for me, is the “FuzzyFrame” above. For the really large fuzzy matching tasks I sometimes need to carry out at work, I generally use a Fuzzy Lookup Transformation in SSIS. But now that I’ve seen how to accomplish the same thing via pandas, at least for smaller cases, I imagine I’ll start doing it this way from time to time.


Title image source: Eternal Lands Wiki

Bonus content if you got this far: http://bit.ly/wranglingpandas