Advent of Code day 4

In day 4 we count valid passports, i.e. passports that have all the required fields.
In solving it, we have to decide how complex to be. The first part doesn’t require much complexity -- we could just search each passport for ‘byr:’, ‘iyr:’, etc. But the second part is likely to require a more robust solution. So we try for good programming practice and provide a solution that allows for added complexity and extensibility.

I started creating a passport class, but then decided that it was better to just use a dictionary to hold the passport information. We’ll see if part 2 proves me wrong. I’ve saved my class definition in another file just in case.

I set up the code to build a dictionary from each entry and then write a ‘check_passport’ function that uses ‘has_key’ chained with ands to check if the passport is valid. I’m not happy with the long chain of ands but it will do for now. The function will return True if the passport is valid, False if it’s not. Then the code uses python's += (which automatically turns a True into a 1 and a False into a 0) to count the valid passports.

I try to run my code and I get a very unhelpful error:

/Documents/coding/python/AOC/AOC_2020/day4.py"
File " <stdin> ", line 1
Documents/coding/python/AOC/AOC_2020/day4.py"
^
SyntaxError: invalid syntax

I check all my indents, all parenthesis are closed... Still not running. So I shut down the IDE (Visual Studio Code) and restart it.

Ah! We've moved on to a more helpful error. Sometimes the old "turn it off and back on" trick works. The new error:

if (passport.has_key('byr') and passport.has_key('iyr') and 

AttributeError: 'dict' object has no attribute 'has_key'

What!? A quick search on Stack Overflow finds that has_key is deprecated. Now we use “in”. Okay. Got it.

Now the sample input returns the right answer so I move on to the real input.

Then I deal with a couple more errors in parsing the official input -- the newlines within each passport create a bit of a problem. I use passports.split(‘\n\n”) to separate the individual passports and then passport.replace(‘\n’, “ “) to handle the newlines within each passport.

With that I have the solution to part 1.

Moving on...

The second part is pretty straightforward. Rather than just look for the existence of each field, we now look at the values as well to ensure they’re valid. That should be easy to implement.

It’s mostly bookkeeping, but the hair colour requirement cries out for us to use regex.

hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.

I speak regex, but I’m not fluent, so I consult with Stack Overflow. They suggest the line: match = re.search(r'^#(?:[0-9a-fA-F]{1,2}){3}$’, str)

To break that down (for the reader, but more importantly to test my own understanding), we use the re module, which deals with re (so ‘import re’ above that line). Then

^ -- used to start the string literal

# -- the hair color will start with '#'

?: -- means it’s a non-capturing group (I don’t understand this part -- more later in this post).

[0-9a-fA-F] -- picks out just the hex digits

{1,2} -- Each colour can have 1 or 2 digits

{3} -- there are 3 colours so the entire pattern repeats 3 times

I modify this a bit to match the stated requirements. I change [0-9a-fA-F] to [0-9a-f] And change {1,2}...{3} to {6} -- we just want 6 numbers/letters

After that it’s just a matter of filling in the rest of the requirements, tracking down some bugs and submitting my answer. All done.

Okay, more on the non-capturing group -- it’s not something I knew well enough to explain so I did some reading. I didn't find a great resource, but this helped a bit: regular-expressions.info/brackets.html, regular-expressions.info/named.html

From what I can understand a non-capturing group recognises the group, but doesn't save it for later use -- for example for a search and replace. So it's a way of grouping a regex combination to, for example, look for multiple iterations of the same combination. In the above example code we look for the pattern to repeat 3 times, 1 for each colour. You can name your group ("capture" it) in order to look ahead for the same grouping, but we don't need that, so we just keep it as a non-capturing group. Now if I understand this all correctly, since we just checked for 6 hex digits, not 3 groups of 1 or 2, we don't need a grouping at all. I tried '''re.search('^#[0-9a-f]{6}$', passport['hcl']))''' in my code and it still gave the right answer.
So rewriting the original regex, with an emphasis on grouping:

r'^# -- look for a regular expression that starts with '#'

(?:[0-9a-fA-F]{1,2}) -- look for a group that fits this description

{3} -- have that grouping repeat 3 times

$ -- end of instructions

I hope that helps someone else. It certainly helped me.