Advent of Code 2020, day 21

Leaving aside day 20 for a bit, let's look at day 21 Allergen Assessment. We must gather provisions for a journey by raft to our destination. The problem? The ingredient list on the foods is in an unknown language. The allergen information is in English, but they're not great about always marking an allergen on a given food. So we must do some work to find out which ingredients are allergens. For part 1 we need to determine the number of individual ingredients that don't contain an allergen. Let's get started.

To begin with, we'll work on the sample list of foods:

foodlist = [
    'mxmxvkd kfcds sqjhc nhms (contains dairy, fish)',
    'trh fvjkl sbzzf mxmxvkd (contains dairy)',
    'sqjhc fvjkl (contains soy)',
    'sqjhc mxmxvkd sbzzf (contains fish)',
]

We can solve this puzzle by hand. The first two foods contain dairy, so dairy must be "mxmxvkd". The first and fourth contain fish -- since "mxmxvkd" is dairy, that means that "sqjhc" must be fish. That leaves soy as "fvjkl". So we have "kfcds", "nhms", "trh", and "sbzzf" -- "sbzzf" appears twice, so counting each individual non-allergen ingredient we have the answer of 5.

Now to do this in code. First we need to separate the words for foods and for allergens, while keeping them linked. I did that by using two dictionaries with the same key:

def parse_allergens(foodlist):
    foodlist_dict={}
    allergen_dict={}
    count = 0
    for food in foodlist:
        (ingreds, allergs) = food.replace(')', '').split(' (contains ')
        foodlist_dict[count] = ingreds.split(' ')
        allergen_dict[count] = allergs.split(', ')
        count+=1
    return (foodlist_dict, allergen_dict)

So foodlist_dict contains the ingredients for item 0, item 1, etc with item number as the key. allergen_dict similarly contains the allergens for each item with the item number as the key.

Next up, we need to match the allergens to the ingredients. My solution ends up feeling like a house of cards -- I've cleaned it up a bit for the final version, but you can see the original working solution on my github and it's messy. Let me just explain the subroutines with words and a bit of code and how they link together.

def match_allergens(food_dict, allergen_dict):

Get a set of all the allergens, each listed once: allergens = set(flatten(allergen_dict.values()))

Step through these allergens, determine which items (keys) contain that allergen. Send that list of keys to collect possibilities and save the result in yet another dictionary "all_food_match" (i.e. allergen- food match; I realised later that "all"=allergen is a bit confusing...)

Then, take that list of possibilities for each allergen and send it to the narrow_to_final function.

def collect_possibilities(keys, food_dict):

Here we have collected all the keys that contain a particular allergen. We take the first key and populate "possibilities" with all the foods from that key.

Then we step through the other keys and remove any foods from our list that don't appear in other items. Note: "possibilities" is a terrible variable name. It's difficult to spell when typing -- too many i's. Super annoying. Note 2: I actually create a new list each time I look at a new key-- that's because I loop through my list of possibilities and remove those that aren't present in the new key -- removing items from a list you're looping through is problematic, so I create a new list that and append only items that are also present in the new key.

def narrow_to_final(all_food_match):

We take our dictionary where the keys are allergens and the values are all the possible ingredient matches. I set up a "while" loop that won't terminate until each allergen is only associated with one ingredient. Note: this is not a great idea for actual software -- there should be an escape in case this condition is never met. But for a one-shot solution for a puzzle that we know has a unique answer, I didn't worry about creating an escape.

I create a list of keys (keys = list(all_food_match.keys())) for the same reason I created a new list in collect_possibilities -- I'm going to step through the keys and when I get a unique answer for an allergen, I'll add that to my done list and remove that key from the all_food_match dictionary. Removing keys that you are stepping through is a problem, so I create a new list to keep track of the original list of keys -- this list of keys is updated each time through the while loop.

If the list of foods only contains one value, I add it to my food_match_final dictionary and delete it from all_food_match.

If the list of foods has more than one value, I compare the values to my list of "done" foods (i.e. foods that have already been matched with allergens) and remove any "done" foods from my list.

Once all foods have been matched with one allergen, this routine is down and returns to match_ allergens, which then returns the dict to the main program.

In the main program I then:

   #create a list of all the foods
    all_foods =  list(flatten(foodlist_dict.values()))

    #turn that into a set then back into a list to get each food only once
    all_foods_once = list(set(all_foods))

    #go through the allergens and remove them from my list of foods
    for key in allerg_food:
        all_foods_once.remove(allerg_food[key])

    #step through the foods and count how many times they appear
    count=0     
    for food in all_foods_once:
        count+=all_foods.count(food)
    return count

Ta-da! We have an answer. It was pretty messy, but separating different processes into functions helped quite a bit. As I was writing this I did a bit of refactoring: I replaced some nested for loops with list comprehensions and used panda's 'flatten' routine to combine the lists of lists into a single list of values. I changed some variable names to be more descriptive.

If I were writing this code for regular use and not just for a one shot puzzle problem, I would go through and divide things further into unitasker subroutines. I'd replace some of the variable names to make them more streamlined and easier to follow. And I'd add a lot more comments.

Okay, Part 2: Now we need to get the definitive list of allergen foods, ordered alphabetically by their english translation. This is a wonderful case of Part 2 being a trivial subset of our Part 1 solution. I simply add these lines:

    zipped_foods = sorted(zip(allerg_food.keys(), allerg_food.values()))
    sorted_foods = [x for _, x in zipped_foods]
    print("Part 2: ", ','.join(sorted_foods))

to create a zipped list of allergen-food pairs, then sort them by allergen and print the foods. No fuss, no muss. All done.