Advent of Code (AoC) is an annual computer programming event that began in 2015 and is found at https://adventofcode.com. The problems are presented in the context of a whimsical/silly story arc with some kind of semi-Christmas-y theme (e.g., eventually recovering Santa's lost keys through a long series of intermediate adventures). This page captures some recommended topics to be familiar with when participating in it, as well as some tips and tricks learned while working through its problems.
AoC happens in December each year, with a total of 25 problems (1 through 25) in the full event, which opens on Dec 1 and culminates on Dec 25. (NOTE: The Advent of Code creator, Eric Wastl, announced that, starting in 2025 and going forward for each event, there will be only 12 days of problems, rather than 25 (still starting on Dec 1). Further details TBD, but all prior events' problems remain available.) Each problem number, N, becomes available at midnight EST on December N. (So, problem 1 is revealed at exactly the start of Dec 1 in EST, problem 2 on Dec 2, etc.) Each problem has two parts, with Part 1 being revealed at midnight on its day (and year) and Part 2 revealed upon solving Part 1. After a problem is made available, it remains available forever; this includes all previous events' problems, as well as the current event's previous days' problems. Individual problem availability is not dependent upon solving earlier problems in its event; i.e., each problem is revealed on its date, independently of those problems preceding it. (Though there are some problems that refer to earlier problems, they do not require that you have solved them first.)
For each year's event, every solved Part earns you a star. The event's objective is to earn 50 stars. (Well, really it's to save Christmas, but you do that by earning all the stars.) Part 2 of Problem 25 provides you a star for free, but it also requires that you have earned all the event's other 49 stars to actually complete it. In the big picture, the problems trend from easier to harder as the days increase, but there's definitely some choppiness in the difficulty curve. (Some problems may seem easier than others that came earlier.) There is no penalty for submitting wrong answers since you can simply try again; however, if you submit several wrong tries in succession, there is an increasing enforced delay for submitting yet more. (This prevents spamming submissions until "correct".)
AoC requires that you create a site account in order to participate. This is so your progress can be saved. (The event does not spam you at all.) It also provides a mechanism for creating/joining Private Leaderboards so you and your friends can track each other's progress.
NOTE: If you feel any important information is missing here, or something is unclear, please let me know so I can address it! This is a work in progress, and it's meant to be helpful.
AoC requires programming to solve the problems but does not require the use of any specific language. I recommend Python and will focus on it in the next section, but this section's comments are language-agnostic.
A couple use cases:
[dx, dy] pairs, then loop through them, processing them all identically. (This is trivially easy to update if diagonal cells aren't needed, then are, and it makes for highly readable/maintainable code.) See examples under zip of the "Especially useful functions" Python-specific section.Certainly, not all problems need them, but they can drastically simplify code and effort for a lot of problems.
Sets and dictionaries are hash-type storage objects and so are generally far more efficient to check if something is in them than are, for example, lists (arrays). They also ensure uniqueness (rather than multiplicity) of membership. Of course, they're also not (necessarily) guaranteed to be in a particular order, so there can be both pros and cons to them.
Sometimes a somewhat simple/naïve approach may work perfectly well for Part 1 of a problem, but you'll need to adapt your approach to a different and more efficient data representation for its Part 2. Keep such possibilities in mind. (See, for example, 2021's problem 6.)
Also, a bit of a side note: It's very efficient to implement "circular storage" and the like by using lists (arrays) with modulus math. That is, you can just keep incrementing an index and always mod it by the length of the list; so, i=(i+1)%len(alist), and alist[i], will just keep endlessly cycling through alist. One caveat to keep in mind for those languages that support negative indexing (such as Python) is to keep your indexing non-negative (≥0), or you may encounter problems. If you want to decrement an index by 1, add len(alist)-1 to it instead of just -1; in modulus math, this results in "i-1" regardless of where in the list the index is—including at i=0. (Of course, this only applies to non-empty lists. Not even i=0 is valid for empty lists.)
Some of the problems are deliberately designed so that you must identify patterns in order to solve them. For example, maybe Part 1 of a problem has you implement a process and compute a result by iterating it a few hundred times, but then Part 2 asks you what the result will be after iterating it several trillion times! If you don't notice that there's a repeated pattern in the computational behavior, and figure out how to leverage it to your advantage, then you will never get the answer; but if you do identify that fact, and you're careful in how you exploit it, you can chop out the vast majority of the computation in the middle that's fundamentally repeatedly redoing the same work, and instead only do the lead-in and run-out "clean up" processing in order to arrive at the correct answer before the heat death of the universe occurs.
This is definitely a minority case, but it does crop up here and there. Most problems expect general solutions, but some require that the data be structured in a particular way, and discovering that and using it may be essential to solving them. (Basically, the fundamental structure of the data is part of the problem statement, but that fact may not be clearly stated. Noticing it is important.)
Sometimes even if it's not required, it's still easier to solve for the data given rather than the general problem. The objective is to get the correct answers to this problem-set. Corner cases that may crop up in more general hypothetical input data, but do not in the provided data, may safely go unaccounted for (albeit, perhaps with a comment in the code, noting this), and sometimes you can actually leverage this in order to significantly simplify solution development.
I know of a few past problems that needed more of a reverse-engineering type of solution than a programmatic one. And, there are some others that are more easily solved in less than strictly programmatic ways (especially via visualization, but not only that).
DFS is more often used for exploring solution-space, but there's often a problem or two thrown in where BFS is needed instead. (Note that this may be more of a conceptual approach than a traversal of an actual graph data structure.)
So, keep this in mind, i.e., the idea that sometimes the best way forward is . . . all of them. And, prune the results as needed.
Of course, it may also be helpful to be familiar with some common/useful graph algorithms, such as Dijkstra's algorithm.
Recursion often results in cleaner code. Iteration is typically more performant. One generally successful approach for converting a recursive solution to an iterative solution is to use a "work queue". Create a data structure to capture converting "state" (usually a tuple works fine), and then create a work queue (e.g., a Python list, or even better, a deque) with a single state element representing the starting state. Remove the first element from the queue, evaluate it to determine all new possible states, and append those to the queue. Repeat until the queue is empty, or you have found the end state you were looking for.
Lambda functions (aka anonymous functions) are very handy for inline purposes, from doing filtering, to transforming data, to mapping special functionality.
Closures are good to understand if you want to use generators, coroutines, and more.
This programming technique is somewhat less known to a lot of people, but it is very useful for certain problems that occasionally appear in the events, so it's recommended as something to be familiar with. I've seen it drop execution times from many minutes, or hours (or potentially even much longer), down to near-instant, and with very little additional effort required, so it's a good thing to be aware of. (If using Python, see a potential memoization gotcha in the Miscellaneous section bullet, Default-value parameters do not necessarily "reset" to default value. Also, see the Python-specific bullet: A full dynamic programming example.)
This is a bit of a hodge-podge of topics/tips that could probably be better crafted.
A few examples:
evens = [i for i in range(0, 101, 2)] ⟶ list of all even numbers from 0 through 100mul_tbl = [[col*row for col in range(1, 13)] for row in range(1, 13) if row != 5] ⟶ 12x12 multiplication table (list of lists), but skipping row 5for n in (i for i in range(1000000000) if i not in set_a): ⟶ loop n from 0 to a billion, skipping any values in set_a, and generating one value at a time, not a whole list at once (as would be the case if brackets were used instead of parentheses)The first two examples use list comprehensions. The third uses a generator expression. These are conceptually and syntactically very similar, although there are big differences in what happens under the covers that make it important to understand what you're doing, especially when creating large amounts of data.
A comprehension is special syntax that allows in-place initialization of a list, set, or dictionary using a very succinct expression:
[ expression for variable in iterable ]{ expression for variable in iterable }{ immutable : expression for variable in iterable }Note that comprehensions are fully populated when they are evaluated. Thus, they can consume large amounts of memory, and for lists in particular, they can be very slow. (Many list operations are O(n).)
The syntax for a generator expression is very similar to a list comprehension:
( expression for variable in iterable )The difference is that a generator expression only produces its elements one at a time, on demand. This keeps generator expressions lightweight in terms of memory usage. However, it also means that certain operations are not available (e.g., accessing an element by index).
Note that generator expressions are distinct from lists, sets, and dictionaries; they're their own thing, even though they may appear very similar syntactically, and may also generate individual elements of content of similar forms to any of them. There is no "going back" or "random access" nature to them; there's only "give me the next item" (until there are no more to give).
Generator functions offer a more general-purpose approach with similar functionality to generator expressions, albeit with completely different syntax. To create a generator function, you simply use the keyword yield. This is best explained by example:
def mygen(n):
for i in range(n):
yield i ** 2
for i in mygen(5):
print(i)
OUTPUTS:
0
1
4
9
16
The important thing to note here is that the mygen function effectively "pauses" after each yield. The generator function above is equivalent to the generator expression ( i ** 2 for i in range(n) ); however, you have considerably more flexibility in a generator function to create complicated generator functionality.
It's also possible to use the yield keyword in the context of creating coroutines, though I won't discuss these in detail here since I haven't found them necessary in AoC. (Feel free to research them, though.) That said, here's a very short code sample to show one in action:
def a_coroutine(): # define a coroutine function
while True:
val_rcvd = yield # assigning from yield makes it a coroutine, not a (regular) generator
print('Value consumed:', val_rcvd)
c = a_coroutine() #create instance of coroutine
print(type(c), c) # print coroutine object
next(c) # prime the coroutine (i.e., start it)
for string in "first", "second", "third":
c.send(string) # send in the data (into the coroutine)
OUTPUTS:
<class 'generator'> <generator object coroutine at 0x000001BB1724B350>
Value consumed: first
Value consumed: second
Value consumed: third
A few examples:
list_a[::-1] ⟶ list that is reverse of list_alist_a[:i] + list_b[j:] + list_c[i:j] ⟶ list that is composed of list_a elements 0 through i-1 (or its end), list_b elements j through its end, and list_c elements i through j-1 (or its end)"0123456789abcdef"[13:4:-3] ⟶ "da7"Those are just a few example uses, but the general slicing format is start:stop:step, with different combinations of the parts being explicitly declared being optional.
A few examples:
a, b = [b, a] ⟶ swaps values of a and ba, b, c, *rest = [1,2,3,4,5,6,7,8,...] ⟶ a=1, b=2, c=3, rest=[4,5,6,...]def var_args_func(a, b, *args): ⟶ can call with var_args_func(1,2) (so, a=1, b=2, args=()) or var_args_func(1,2,3) (so, a=1, b=2, args=(3,)) or var_args_func(1,2,3,"abc",5,False,7.0) (so, a=1, b=2, args=(3,'abc',5,False,7.0)), etc.
t = (1,)**kwargs notation, which is used to collect named arguments ("keyword arguments") into a dictionary that can then be used inside the function. If *args is also being used with it, then *args should appear before **kwargs, and any more specific parameters should be listed before both, e.g.:def var_args_func(a, b, c, *args, **kwargs):return args, a, b ⟶ returns a tuple – (args, a, b) – which can be captured in a variable, or multiple variables via destructuringfor k, v in my_dict.items(): ⟶ my_dict.items() is an iterator that generates (key, value) tuples, which are then destructured into variables k and v, for use in each loopprint(*grid, sep='\n') ⟶ print a "grid" (list of lists), one row (unpacked from grid) at a time, with intervening separator character '\n' (i.e., newline)Those are a few somewhat arbitrary examples, but hopefully they communicate some useful information. Packing and unpacking can look similar, other than context, and they also sometimes happen implicitly.
open – for opening files, often best to use within the context of a with block – example:
with open("input.txt") as in_file:len – get the length/size of a list/set/etc.map – transform iterable data from one form to anotheriter – create an iterator for an objectfilter – filter iterable data according to defined criteria, passing through only those items that satisfy the filter testenumerate – handy way to tack on an iteration number in a loopsorted – what it sounds like, but also allows you to customize how it's sorting (see also reversed, and also the functools.cmp_to_key function, which enables deprecated versions of sorted to be updated and still work)type/isinstance/issubclass/callable – do typing-related checkszip – generator that "zips" together n iterables – examples:
zip("abcd", [1,2,3,4,5], (True,False,True)) ⟶; iteratively produces: ('a', 1, True), ('b', 2, False), and ('c', 3, True), then stopping because it ran out of elements in the third iterablezip_longest function in the itertools modulerc = [3,5] # e.g., indices into a grid, perhaps passed into a function as a parameter
for drdc in [[-1,0],[0,1],[1,0],[0,-1]]: # e.g., cycle through all horizontal and vertical neighbor-cell offsets
r, c = [sum(z) for z in zip(rc, drdc)] ⟶ r, c end up as the row, column grid indices of neighbor cells of rc. (Note: This is easy to extend to arbitrary dimensions (change rc, and drdc list), or to include diagonal neighbor cells (edit drdc list).)transposed_matrix = list(zip(*matrix))all – True if all items in an iterable are True; evaluates with short-circuitingany – True if any items in an iterable are True; evaluates with short-circuitingeval – evaluate a string as Python coderound, sum, min, max, abs, divmod, ord, chr, hex, id, hasattr, str/int/..., and so forth; you can look up a full listing of the built-in functions online (e.g., https://docs.python.org/3/library/functions.html), or get an approximate listing programmatically like this:import builtins, types print([nm for nm, obj in vars(builtins).items() if isinstance(obj, types.BuiltinFunctionType)])
re – regular expression module; especially useful functions are compile and match, but there are othersitertools – many useful functions: combinations, permutations, product, groupby, cycle, repeat, accumulate, batched, chain, compress, pairwise, dropwhile, takewhile, ...
functools – probably especially look at reduce (often conceptually grouped with map and filter) and cache (a decorator)collections – at least Counter, defaultdict, namedtuple, and deque
d.setdefault(k, []).append(new_value) OR d.setdefault(k, set()).update(new_set) OR d[k] = d.get(k, 0) + new_nbr etc.math – at least comb and also the usual suspects, otherwisecopy – copy and deepcopy, at least (if needing to make copies of arbitrarily complex objects)time – time function inside it makes it very easy to time things executingos, sys, subprocess, threading, multiprocessing, asyncio, abcpyvis – a powerful and simple-to-use third party module for visualizing graphs, which can be immensely useful for some problems (to better understand the structure of input data)There are, of course, many other modules that could be of use—including other third party modules, such as numpy.
A lot of the solutions don't really require custom data-type definitions, but they can also be useful for some of them. So, it's still a good idea to review the details of user-defined Python classes, such as self and cls, the so-called dunder functions (__init__, __str__, etc.), getters and setters (@property and @propertyname.setter), static and class methods (@staticmethod and @classmethod), class vs instance variables, leading single- and double- underscore named "private" variables, and of course inheritance, polymorphism, etc. If you really want to get more into this, a potentially relevant module to examine is abc (short for Abstract Base Classes).
@cache / [custom] decorators (See A full dynamic programming example below.)def_var gets assigned a new dictionary. Then on every call where the argument isn't supplied, that same dictionary gets used.
def surprise(txt, def_var={}):
print(f"{txt}\n\tExpected(?): {{}}. Is: {def_var}")
def_var[4] = 2
print(f"\tValue set: def_var[4] = 2.\n\tShould be: {{4: 2}}. Is: {def_var}")
surprise("Call #1; parameter 2 omitted; expected(?) to use specified default = {}; appears to do so.")
surprise("Call #2; parameter 2 omitted; expected(?) to use specified default = {}; DOES NOT!")
surprise("Call #3; parameter 2 explicitly passes as {}; resets its value, as expected.", {})
def better(txt, def_var=None): # This works because None is an immutable type (and, incidentally, a singleton).
if def_var is None:
def_var = {}
...
# Another possible approach is to implement something like this as an outer function
# wrapping an inner function (e.g., in the case of memoization), where even a mutable
# type does not hang around outside the outer function after it's exited.
if u < v < w <= x <= y < z: ...
if u < v and v < w and w <= x and x <= y and y < z: ...range object efficiently supports its use in if-statements (via exactly the same interface), such as,if x in range(start, stop, step): ...Suppose you decide to represent the N/E/S/W direction you're facing as a 2-tuple, i.e., one of these: (-1,0), (0,1), (1,0), (0,-1). You could write a regular expression to parse input, then go into a very simple loop to tear through it, line by line, such as the following. Note the usefulness of the "option-mapping dictionary", here named "go".
import re
go = { # "go" ... Left-face, Right-face, Backward-face, Forward-move-by-amount
"L": lambda pos, hdg, _: [pos, (-hdg[1], hdg[0])], # turn left, regardless of current heading (pre-multiply hdg by [[0,-1],[1,0]])
"R": lambda pos, hdg, _: [pos, (hdg[1], -hdg[0])], # turn right, regardless of current heading (pre-multiply hdg by [[0,1],[-1,0]])
"B": lambda pos, hdg, _: [pos, (-hdg[0], -hdg[1])], # reverse direction, regardless of current heading
"F": lambda pos, hdg, amt: [(pos[0]+int(amt)*hdg[0], pos[1]+int(amt)*hdg[1]), hdg], # move forward "amt" steps in direction of current heading
}
rx = re.compile(r"^(?P<cmd>[LRFB])\s*(?P<amt>\d+)?$")
with open("input.txt") as infile:
pos = (0, 0) # initial position
hdg = (-1, 0) # initial heading: face "north" ("up", in terms of (row,col) grid indexing)
for line in infile:
m = rx.match(line.rstrip('\n'))
if not m: raise Exception("Uh, oh...")
grp = m.groupdict()
pos, hdg = go[grp["cmd"]](pos, hdg, grp["amt"]) # <---- makes for very clean code; doesn't care what action to "go" do; just go do it!
print(f"Final position: {pos}. Final heading: {hdg}.")
EXAMPLE INPUT:
F 8
L
F 2
L
F 5
R
F 1
B
L
F 7
B
F 15
L
F 5
OUTPUT:
Final position: (5, 2). Final heading: (0, 1).
You're tasked with writing a 'best_sum' function that takes two inputs:
target_sum: an integer
number_pool: a collection of some natural numbers (i.e., positive integers)
It is to return the smallest collection of those numbers that sum to target_sum.
The numbers are re-usable as many times as needed.
If target_sum is impossible to reach, return None.
(For more, see 'bestSum' problem (etc.) here.)
from functools import cache
#@cache # uncomment this to memoize this function
def best_sum(target_sum, number_pool):
if target_sum == 0: return []
if target_sum < 0: return None
shortest = None
for n in number_pool:
remainder_combo = best_sum(target_sum - n, number_pool)
if remainder_combo is not None:
combo = [*remainder_combo, n]
if shortest is None or len(combo) < len(shortest):
shortest = combo
return shortest
# Manual Memoization (MM): manual dyn. prog. (more control over exactly what's memoized)
# Identical to above, except where marked with 'memo'.
def best_sum_MM(target, numbers, memo=None): #memo
if memo is None: memo = {} # memo (avoids default-value non-reset that memo={} causes)
if target in memo: return memo[target] # memo
if target == 0: return []
if target < 0: return None
shortest = None
for n in numbers:
remainder_combo = best_sum_MM(target - n, numbers, memo) # memo
if remainder_combo is not None:
combo = [*remainder_combo, n]
if shortest is None or len(combo) < len(shortest):
shortest = combo
memo[target] = shortest # memo
return shortest
inputs = [ # [target sum, available numbers (repeatable) to try to hit it]
[7, [5, 3, 4, 7]],
[9, [6, 7, 4]],
[8, [2, 3, 5]],
[100, [1, 2, 5, 25]], # HORRIBLY BREAKS non-memoized run
[8, [1, 4, 5]],
]
for tgt, nbrs in inputs:
# frozenset used since @cache decorator req's all inputs be hashable. (It caches them all.)
# Manual memoization doesn't cache nbrs so doesn't req it be hashable; only tgt must be.
print(f"{str(tgt)+', '+str(nbrs):>20} --> ", end="")
print(best_sum(tgt, frozenset(nbrs))) # call best_sum_MM variant instead to run it