Advent of Code

Language-Agnostic Comments

AoC requires programming to solve the problems but does not require the use of any specific language. I recommend Python and will focus on it in the next section, but this section's comments are language-agnostic.

A few important general pointers...
- Solve the stated problem; do not pre-optimize or pre-generalize. After you've done a few of these, you may be tempted to anticipate Part 2 and try to write a generalized solution for Part 1 that will make Part 2 super easy. Or, you may be tempted to write ultra-optimized code in anticipation of the data set growing in a particular way. This sometimes works out, but more often it's a trap. You'll write a beautiful general solution that operates at lightning speed, and then Part 2 changes things in an unanticipated way, and you're back to the drawing board. It's better to just solve Part 1 in the most straightforward way possible, and figure out what you need to do for Part 2 once you actually have the problem statement.
- Leverage "test input". Nearly all problems include example scenarios that get walked through in detail as part of the problem description, which helps facilitate clear understanding of the problem. It is a very good idea to use these scenarios as "test input" for your code, to ensure it's producing the correct result for the example case before running it on the "real" input data, which is always more complex.
- Broadly speaking, you may completely ignore data validation concerns. All input data provided is properly formatted according to each problem's description; the event does not ever give you malformed data (i.e., that's inconsistent with the problem description).
- Attention to detail is critical. Sometimes, the input data contains content that may seem a bit "sneaky" (e.g., when compared to the "test input" content) but is entirely fair in the context of the problem statement, even if it wasn't considered when writing your code (which may cause it to fail). Also, there have been a handful of times where I didn't initially pay close enough attention to exactly what the problem statement actually said, and it resulted in some frustration.
  Example: 2015's problem 23 describes an invented assembly language instruction set, which includes a jie and a jio instruction. The descriptions for jie and jio were the final two instructions provided in the overall list, with jie given first: jie r, offset is like jmp, but only jumps if register r is even ("jump if even"). I far too easily elided over the detailed description for jio because my brain naturally leapt to it being "jump if odd", but that's not what it actually said; it was "jump if one". So, of course, my code wasn't working to produce the correct result, despite meticulous debugging of it, because the problem was in what I'd implemented, not how I'd done it. Oops!
- Answers to problems vary from person to person. Each problem statement is identical for everybody, but the input data provided to participants varies, so the correct answer for your problem may not be the same as other people's correct answers for that same problem. (The basic structures of the data/problems do not vary, though, so they still may be easily discussed among participants.)
- Save your work. Keep separate source code files for all your solutions. This includes separate solutions for Part 1 and Part 2. Copying your Part 1 solution as the starting point for your Part 2 solution typically works well. You'll almost certainly find yourself wanting to refer back to previous solutions as you come across new problems with similarities to other problems you've already solved.
"Structural mapping" often works well, rather than lots of special case if-else/switch/match/etc. constructs.

A couple use cases:
- Deltas list – Need to know all neighbors around a grid cell? Make a list of [dx, dy] pairs, then loop through them, processing them all identically. (This is trivially easy to update if diagonal cells aren't needed, then are, and it makes for highly readable/maintainable code.) See examples under zip of the "Especially useful functions" Python-specific section.
- Option-mapping dictionary – Need different information, or to respond in different ways, to things that are similarly structured? Make a dictionary to map them directly, rather than conditionally checking and branching code. This may include mapping data to other data, or data to different functions to run in response, or something else. See the "Option-mapping dictionary example" under the "Miscellaneous" Python-specific bullet.
Regular expressions frequently dramatically ease data ingestion/parsing; learn them well.

Certainly, not all problems need them, but they can drastically simplify code and effort for a lot of problems.
Leverage data types to your advantage – sets, dictionaries, tuples, lists, etc.

Sets and dictionaries are hash-type storage objects and so are generally far more efficient to check if something is in them than are, for example, lists (arrays). They also ensure uniqueness (rather than multiplicity) of membership. Of course, they're also not (necessarily) guaranteed to be in a particular order, so there can be both pros and cons to them.

Sometimes a somewhat simple/naïve approach may work perfectly well for Part 1 of a problem, but you'll need to adapt your approach to a different and more efficient data representation for its Part 2. Keep such possibilities in mind. (See, for example, 2021's problem 6.)

Also, a bit of a side note: It's very efficient to implement "circular storage" and the like by using lists (arrays) with modulus math. That is, you can just keep incrementing an index and always mod it by the length of the list; so, i=(i+1)%len(alist), and alist[i], will just keep endlessly cycling through alist. One caveat to keep in mind for those languages that support negative indexing (such as Python) is to keep your indexing non-negative (≥0), or you may encounter problems. If you want to decrement an index by 1, add len(alist)-1 to it instead of just -1; in modulus math, this results in "i-1" regardless of where in the list the index is—including at i=0. (Of course, this only applies to non-empty lists. Not even i=0 is valid for empty lists.)
Notice data/computational patterns.

Some of the problems are deliberately designed so that you must identify patterns in order to solve them. For example, maybe Part 1 of a problem has you implement a process and compute a result by iterating it a few hundred times, but then Part 2 asks you what the result will be after iterating it several trillion times! If you don't notice that there's a repeated pattern in the computational behavior, and figure out how to leverage it to your advantage, then you will never get the answer; but if you do identify that fact, and you're careful in how you exploit it, you can chop out the vast majority of the computation in the middle that's fundamentally repeatedly redoing the same work, and instead only do the lead-in and run-out "clean up" processing in order to arrive at the correct answer before the heat death of the universe occurs.
Solve for the data you're given, not necessarily the generalized problem that seems to be being stated.

This is definitely a minority case, but it does crop up here and there. Most problems expect general solutions, but some require that the data be structured in a particular way, and discovering that and using it may be essential to solving them. (Basically, the fundamental structure of the data is part of the problem statement, but that fact may not be clearly stated. Noticing it is important.)

Sometimes even if it's not required, it's still easier to solve for the data given rather than the general problem. The objective is to get the correct answers to this problem-set. Corner cases that may crop up in more general hypothetical input data, but do not in the provided data, may safely go unaccounted for (albeit, perhaps with a comment in the code, noting this), and sometimes you can actually leverage this in order to significantly simplify solution development.

I know of a few past problems that needed more of a reverse-engineering type of solution than a programmatic one. And, there are some others that are more easily solved in less than strictly programmatic ways (especially via visualization, but not only that).
Depth First Search (DFS), Breadth First Search (BFS), etc.

DFS is more often used for exploring solution-space, but there's often a problem or two thrown in where BFS is needed instead. (Note that this may be more of a conceptual approach than a traversal of an actual graph data structure.)

So, keep this in mind, i.e., the idea that sometimes the best way forward is . . . all of them. And, prune the results as needed.

Of course, it may also be helpful to be familiar with some common/useful graph algorithms, such as Dijkstra's algorithm.
Review recursion vs iteration (stacks).

Recursion often results in cleaner code. Iteration is typically more performant. One generally successful approach for converting a recursive solution to an iterative solution is to use a "work queue". Create a data structure to capture converting "state" (usually a tuple works fine), and then create a work queue (e.g., a Python list, or even better, a deque) with a single state element representing the starting state. Remove the first element from the queue, evaluate it to determine all new possible states, and append those to the queue. Repeat until the queue is empty, or you have found the end state you were looking for.
Understand closures and lambda functions.

Lambda functions (aka anonymous functions) are very handy for inline purposes, from doing filtering, to transforming data, to mapping special functionality.

Closures are good to understand if you want to use generators, coroutines, and more.
Learn about dynamic programming, also called memoization (yes, without the r).

This programming technique is somewhat less known to a lot of people, but it is very useful for certain problems that occasionally appear in the events, so it's recommended as something to be familiar with. I've seen it drop execution times from many minutes, or hours (or potentially even much longer), down to near-instant, and with very little additional effort required, so it's a good thing to be aware of. (If using Python, see a potential memoization gotcha in the Miscellaneous section bullet, Default-value parameters do not necessarily "reset" to default value. Also, see the Python-specific bullet: A full dynamic programming example.)

Python-Specific Comments/Topics

This is a bit of a hodge-podge of topics/tips that could probably be better crafted.

Comprehensions and Generators / Coroutines

A few examples:
- evens = [i for i in range(0, 101, 2)] ⟶ list of all even numbers from 0 through 100
- mul_tbl = [[col*row for col in range(1, 13)] for row in range(1, 13) if row != 5] ⟶ 12x12 multiplication table (list of lists), but skipping row 5
- for n in (i for i in range(1000000000) if i not in set_a): ⟶ loop n from 0 to a billion, skipping any values in set_a, and generating one value at a time, not a whole list at once (as would be the case if brackets were used instead of parentheses)
The first two examples use list comprehensions. The third uses a generator expression. These are conceptually and syntactically very similar, although there are big differences in what happens under the covers that make it important to understand what you're doing, especially when creating large amounts of data.

Comprehensions

A comprehension is special syntax that allows in-place initialization of a list, set, or dictionary using a very succinct expression:
- list: [ expression for variable in iterable ]
- set: { expression for variable in iterable }
- dict: { immutable : expression for variable in iterable }
Note that comprehensions are fully populated when they are evaluated. Thus, they can consume large amounts of memory, and for lists in particular, they can be very slow. (Many list operations are O(n).)

Generator Expressions

The syntax for a generator expression is very similar to a list comprehension:
- ( expression for variable in iterable )
- the parentheses are optional in some scenarios (e.g., first parameter to a function)
The difference is that a generator expression only produces its elements one at a time, on demand. This keeps generator expressions lightweight in terms of memory usage. However, it also means that certain operations are not available (e.g., accessing an element by index).

Note that generator expressions are distinct from lists, sets, and dictionaries; they're their own thing, even though they may appear very similar syntactically, and may also generate individual elements of content of similar forms to any of them. There is no "going back" or "random access" nature to them; there's only "give me the next item" (until there are no more to give).

Generator Functions

Generator functions offer a more general-purpose approach with similar functionality to generator expressions, albeit with completely different syntax. To create a generator function, you simply use the keyword yield. This is best explained by example:
```
def mygen(n):
  for i in range(n):
    yield i ** 2
	
for i in mygen(5):
  print(i)

OUTPUTS:
0
1
4
9
16
```
The important thing to note here is that the mygen function effectively "pauses" after each yield. The generator function above is equivalent to the generator expression ( i ** 2 for i in range(n) ); however, you have considerably more flexibility in a generator function to create complicated generator functionality.

Coroutines

It's also possible to use the yield keyword in the context of creating coroutines, though I won't discuss these in detail here since I haven't found them necessary in AoC. (Feel free to research them, though.) That said, here's a very short code sample to show one in action:
```
def a_coroutine(): # define a coroutine function
  while True:
    val_rcvd = yield    # assigning from yield makes it a coroutine, not a (regular) generator
    print('Value consumed:', val_rcvd)

c = a_coroutine() #create instance of coroutine
print(type(c), c) # print coroutine object
next(c) # prime the coroutine (i.e., start it)
for string in "first", "second", "third":
  c.send(string) # send in the data (into the coroutine)

OUTPUTS:
<class 'generator'> <generator object coroutine at 0x000001BB1724B350>
Value consumed: first
Value consumed: second
Value consumed: third
```
List (and tuple and string) indexing/slicing

A few examples:
- list_a[::-1] ⟶ list that is reverse of list_a
- list_a[:i] + list_b[j:] + list_c[i:j] ⟶ list that is composed of list_a elements 0 through i-1 (or its end), list_b elements j through its end, and list_c elements i through j-1 (or its end)
- "0123456789abcdef"[13:4:-3] ⟶ "da7"
Those are just a few example uses, but the general slicing format is start:stop:step, with different combinations of the parts being explicitly declared being optional.
Destructuring and packing/unpacking

A few examples:
- a, b = [b, a] ⟶ swaps values of a and b
- a, b, c, *rest = [1,2,3,4,5,6,7,8,...] ⟶ a=1, b=2, c=3, rest=[4,5,6,...]
- def var_args_func(a, b, *args): ⟶ can call with var_args_func(1,2) (so, a=1, b=2, args=()) or var_args_func(1,2,3) (so, a=1, b=2, args=(3,)) or var_args_func(1,2,3,"abc",5,False,7.0) (so, a=1, b=2, args=(3,'abc',5,False,7.0)), etc.
  - Side note: to initialize a single-element tuple (via "shorthand"), you need to include the trailing comma, like this: t = (1,)
  - Also note that there is a **kwargs notation, which is used to collect named arguments ("keyword arguments") into a dictionary that can then be used inside the function. If *args is also being used with it, then *args should appear before **kwargs, and any more specific parameters should be listed before both, e.g.:
    def var_args_func(a, b, c, *args, **kwargs):
- return args, a, b ⟶ returns a tuple – (args, a, b) – which can be captured in a variable, or multiple variables via destructuring
- for k, v in my_dict.items(): ⟶ my_dict.items() is an iterator that generates (key, value) tuples, which are then destructured into variables k and v, for use in each loop
- print(*grid, sep='\n') ⟶ print a "grid" (list of lists), one row (unpacked from grid) at a time, with intervening separator character '\n' (i.e., newline)
Those are a few somewhat arbitrary examples, but hopefully they communicate some useful information. Packing and unpacking can look similar, other than context, and they also sometimes happen implicitly.
Especially useful functions (built-in)...
- open – for opening files, often best to use within the context of a with block – example:
  - with open("input.txt") as in_file:
- len – get the length/size of a list/set/etc.
- map – transform iterable data from one form to another
- iter – create an iterator for an object
- filter – filter iterable data according to defined criteria, passing through only those items that satisfy the filter test
- enumerate – handy way to tack on an iteration number in a loop
- sorted – what it sounds like, but also allows you to customize how it's sorting (see also reversed, and also the functools.cmp_to_key function, which enables deprecated versions of sorted to be updated and still work)
- type/isinstance/issubclass/callable – do typing-related checks
- zip – generator that "zips" together n iterables – examples:
  - zip("abcd", [1,2,3,4,5], (True,False,True)) ⟶; iteratively produces: ('a', 1, True), ('b', 2, False), and ('c', 3, True), then stopping because it ran out of elements in the third iterable
  - See also: zip_longest function in the itertools module
  - rc = [3,5] # e.g., indices into a grid, perhaps passed into a function as a parameter for drdc in [[-1,0],[0,1],[1,0],[0,-1]]: # e.g., cycle through all horizontal and vertical neighbor-cell offsets r, c = [sum(z) for z in zip(rc, drdc)] ⟶ r, c end up as the row, column grid indices of neighbor cells of rc. (Note: This is easy to extend to arbitrary dimensions (change rc, and drdc list), or to include diagonal neighbor cells (edit drdc list).)
    The pattern above is handy in many of the grid- and/or coordinate-related problems, which are plentiful.
  - transposed_matrix = list(zip(*matrix))
- all – True if all items in an iterable are True; evaluates with short-circuiting
- any – True if any items in an iterable are True; evaluates with short-circuiting
- eval – evaluate a string as Python code
- Some others: round, sum, min, max, abs, divmod, ord, chr, hex, id, hasattr, str/int/..., and so forth; you can look up a full listing of the built-in functions online (e.g., https://docs.python.org/3/library/functions.html), or get an approximate listing programmatically like this:
```
import builtins, types
print([nm for nm, obj in vars(builtins).items() if isinstance(obj, types.BuiltinFunctionType)])
```
Modules worth having a look at...
- re – regular expression module; especially useful functions are compile and match, but there are others
- itertools – many useful functions: combinations, permutations, product, groupby, cycle, repeat, accumulate, batched, chain, compress, pairwise, dropwhile, takewhile, ...
  - Be sure to check out the itertools recipes in the Python documentation for the itertools module. There are lots of examples of useful ways to use itertools., many of which you may end up pulling into your code with no changes required (e.g., grouper).
- functools – probably especially look at reduce (often conceptually grouped with map and filter) and cache (a decorator)
- collections – at least Counter, defaultdict, namedtuple, and deque
  - Side note: a very useful pattern when building/updating a dictionary:
    d.setdefault(k, []).append(new_value) OR d.setdefault(k, set()).update(new_set) OR d[k] = d.get(k, 0) + new_nbr etc.
- math – at least comb and also the usual suspects, otherwise
- copy – copy and deepcopy, at least (if needing to make copies of arbitrarily complex objects)
- time – time function inside it makes it very easy to time things executing
- Other modules to potentially consider for various reasons: os, sys, subprocess, threading, multiprocessing, asyncio, abc
- pyvis – a powerful and simple-to-use third party module for visualizing graphs, which can be immensely useful for some problems (to better understand the structure of input data)
There are, of course, many other modules that could be of use—including other third party modules, such as numpy.
Custom data types (classes)

A lot of the solutions don't really require custom data-type definitions, but they can also be useful for some of them. So, it's still a good idea to review the details of user-defined Python classes, such as self and cls, the so-called dunder functions (__init__, __str__, etc.), getters and setters (@property and @propertyname.setter), static and class methods (@staticmethod and @classmethod), class vs instance variables, leading single- and double- underscore named "private" variables, and of course inheritance, polymorphism, etc. If you really want to get more into this, a potentially relevant module to examine is abc (short for Abstract Base Classes).
Miscellaneous
- Understand mutable vs immutable types, which may importantly impact program capabilities/functionality
  - Mutable and immutable:
    - Example immutable types: frozenset, tuple, int, float, str, bool
    - Example mutable types: dict, list, user data type objects
- Dynamic programming / memoization / @cache / [custom] decorators (See A full dynamic programming example below.)
- Note: Inner functions cannot change outer function immutable variables (but can access them), but they can change the contents of outer function mutable variables.
- Note: Default-value parameters do not necessarily "reset" to default value.
  - This happens when the default variable is mutable. The default variable's value is set at the time the function is defined, not every time the function is called. In the example below, when the function is defined, def_var gets assigned a new dictionary. Then on every call where the argument isn't supplied, that same dictionary gets used.
  - Example:
```
def surprise(txt, def_var={}):
  print(f"{txt}\n\tExpected(?): {{}}. Is: {def_var}")
  def_var[4] = 2
  print(f"\tValue set: def_var[4] = 2.\n\tShould be: {{4: 2}}. Is: {def_var}")

surprise("Call #1; parameter 2 omitted; expected(?) to use specified default = {}; appears to do so.")
surprise("Call #2; parameter 2 omitted; expected(?) to use specified default = {}; DOES NOT!")
surprise("Call #3; parameter 2 explicitly passes as {}; resets its value, as expected.", {})

def better(txt, def_var=None): # This works because None is an immutable type (and, incidentally, a singleton).
  if def_var is None:
    def_var = {}
  ...

# Another possible approach is to implement something like this as an outer function
# wrapping an inner function (e.g., in the case of memoization), where even a mutable
# type does not hang around outside the outer function after it's exited.
```
- Learn f-strings
- Python syntactically supports the chaining together of inequalities, e.g., if u < v < w <= x <= y < z: ...
  - Most languages require something more like this: if u < v and v < w and w <= x and x <= y and y < z: ...
  - Another at least as important detail to know is that the range object efficiently supports its use in if-statements (via exactly the same interface), such as,
    if x in range(start, stop, step): ...
    Note that this does not iterate through to check; it does the direct-computation math to determine if x is in the specified range, including whether or not it aligns with the defined step size.
- Option-mapping dictionary example: Suppose part of a problem's description is to move around in a 2D-grid with coordinates akin to indexing into a "list of lists" (i.e., like (row, col), but not necessarily with any boundaries) in various directions, and your input says to do any of the following:
  - Go forward a distance in whatever direction you're facing (e.g., "F 3"), or
  - Turn left 90 degrees ("L"), or
  - Turn right 90 degrees ("R"), or
  - Turn around to face backward ("B").
  Suppose you decide to represent the N/E/S/W direction you're facing as a 2-tuple, i.e., one of these: (-1,0), (0,1), (1,0), (0,-1). You could write a regular expression to parse input, then go into a very simple loop to tear through it, line by line, such as the following. Note the usefulness of the "option-mapping dictionary", here named "go".
```
import re

go = { # "go" ... Left-face, Right-face, Backward-face, Forward-move-by-amount
  "L": lambda pos, hdg, _: [pos, (-hdg[1], hdg[0])], # turn left, regardless of current heading (pre-multiply hdg by [[0,-1],[1,0]])
  "R": lambda pos, hdg, _: [pos, (hdg[1], -hdg[0])], # turn right, regardless of current heading (pre-multiply hdg by [[0,1],[-1,0]])
  "B": lambda pos, hdg, _: [pos, (-hdg[0], -hdg[1])], # reverse direction, regardless of current heading
  "F": lambda pos, hdg, amt: [(pos[0]+int(amt)*hdg[0], pos[1]+int(amt)*hdg[1]), hdg], # move forward "amt" steps in direction of current heading
}
rx = re.compile(r"^(?P<cmd>[LRFB])\s*(?P<amt>\d+)?$")

with open("input.txt") as infile:
  pos = (0, 0) # initial position
  hdg = (-1, 0) # initial heading: face "north" ("up", in terms of (row,col) grid indexing)
  for line in infile:
    m = rx.match(line.rstrip('\n'))
    if not m: raise Exception("Uh, oh...")
    grp = m.groupdict()
    pos, hdg = go[grp["cmd"]](pos, hdg, grp["amt"]) # <---- makes for very clean code; doesn't care what action to "go" do; just go do it!
  print(f"Final position: {pos}. Final heading: {hdg}.")

EXAMPLE INPUT:
F 8
L
F 2
L
F 5
R
F 1
B
L
F 7
B
F 15
L
F 5

OUTPUT:
Final position: (5, 2). Final heading: (0, 1).
```

A full dynamic programming example

You're tasked with writing a 'best_sum' function that takes two inputs:

target_sum: an integer
number_pool: a collection of some natural numbers (i.e., positive integers)

It is to return the smallest collection of those numbers that sum to target_sum.
The numbers are re-usable as many times as needed.
If target_sum is impossible to reach, return None.
(For more, see 'bestSum' problem (etc.) here.)

from functools import cache

#@cache # uncomment this to memoize this function
def best_sum(target_sum, number_pool):
  if target_sum == 0: return []
  if target_sum < 0: return None
  shortest = None
  for n in number_pool:
    remainder_combo = best_sum(target_sum - n, number_pool)
    if remainder_combo is not None:
      combo = [*remainder_combo, n]
      if shortest is None or len(combo) < len(shortest):
        shortest = combo
  return shortest

# Manual Memoization (MM): manual dyn. prog. (more control over exactly what's memoized)
# Identical to above, except where marked with 'memo'.
def best_sum_MM(target, numbers, memo=None): #memo
  if memo is None: memo = {}      # memo (avoids default-value non-reset that memo={} causes)
  if target in memo: return memo[target]    # memo
  if target == 0: return []
  if target < 0: return None
  shortest = None
  for n in numbers:
    remainder_combo = best_sum_MM(target - n, numbers, memo)   # memo
    if remainder_combo is not None:
      combo = [*remainder_combo, n]
      if shortest is None or len(combo) < len(shortest):
        shortest = combo
  memo[target] = shortest                        # memo
  return shortest

inputs = [ # [target sum, available numbers (repeatable) to try to hit it]
  [7, [5, 3, 4, 7]],
  [9, [6, 7, 4]],
  [8, [2, 3, 5]],
  [100, [1, 2, 5, 25]], # HORRIBLY BREAKS non-memoized run
  [8, [1, 4, 5]],
]
for tgt, nbrs in inputs:
  # frozenset used since @cache decorator req's all inputs be hashable. (It caches them all.)
  # Manual memoization doesn't cache nbrs so doesn't req it be hashable; only tgt must be.
  print(f"{str(tgt)+', '+str(nbrs):>20} --> ", end="")
  print(best_sum(tgt, frozenset(nbrs))) # call best_sum_MM variant instead to run it

Advent of Code

Introduction and Overview

Language-Agnostic Comments

Python-Specific Comments/Topics

Comprehensions

Generator Expressions

Generator Functions

Coroutines

Specific Years' Problem Hints/Discussion