Task 19

2026-01-08

1 Setup

1.1 Libraries

library(httr)
library(xml2)
library(magrittr)
library(dplyr)
library(purrr)
library(stringr)

1.2 Retrieve Data from `AoC`

session_cookie <- set_cookies(session = keyring::key_get("AoC-GitHub-Cookie"))
base_url <- paste0("https://adventofcode.com/", params$year, "/day/", params$task_nr)
puzzle <- GET(base_url,
              session_cookie) %>% 
  content(encoding = "UTF-8") %>% 
  xml_find_all("///article") %>% 
  lapply(as.character)

parse_puzzle_data <- function(text_block = readClipboard()) {
  if (length(text_block) == 1L) {
    text_block <- text_block %>% 
      str_split("\n") %>% 
      extract2(1L) %>% 
      keep(nzchar)
  }
  rules_marker <- str_which(text_block, ":") %>% 
    max()
  rules <- text_block[1:rules_marker]
  lhs <- str_extract(rules, "^\\d+") %>% 
    as.integer() %>% 
    add(1L)
  rhs <- str_remove_all(rules, "^\\d+: |\"") %>% 
    str_split(fixed(" | ")) %>% 
    map(function(rhs) {
      parts <- str_split(rhs, "\\s")
      map(parts, function(x) {
        xv <- suppressWarnings(as.integer(x)) 
        if (any(is.na(xv))) {
          x
        } else {
          xv + 1L
        }
      })
    })
  list(rules = rhs[order(lhs)], msg = text_block[(rules_marker + 1L):length(text_block)])
}

puzzle_data <- local({
  GET(paste0(base_url, "/input"),
      session_cookie) %>% 
    content(encoding = "UTF-8") %>% 
    parse_puzzle_data()
})

2 Puzzle Day 19

2.1 Part 1

2.1.1 Description

— Day 19: Monster Messages —

You land in an airport surrounded by dense forest. As you walk to your high-speed train, the Elves at the Mythical Information Bureau contact you again. They think their satellite has collected an image of a sea monster! Unfortunately, the connection to the satellite is having problems, and many of the messages sent back from the satellite have been corrupted.

They sent you a list of the rules valid messages should obey and a list of received messages they’ve collected so far (your puzzle input).

The rules for valid messages (the top part of your puzzle input) are numbered and build upon each other. For example:

0: 1 2
1: "a"
2: 1 3 | 3 1
3: "b"

Some rules, like 3: “b”, simply match a single character (in this case, b).

The remaining rules list the sub-rules that must be followed; for example, the rule 0: 1 2 means that to match rule 0, the text being checked must match rule 1, and the text after the part that matched rule 1 must then match rule 2.

Some of the rules have multiple lists of sub-rules separated by a pipe (|). This means that at least one list of sub-rules must match. (The ones that match might be different each time the rule is encountered.) For example, the rule 2: 1 3 | 3 1 means that to match rule 2, the text being checked must match rule 1 followed by rule 3 or it must match rule 3 followed by rule 1.

Fortunately, there are no loops in the rules, so the list of possible matches will be finite. Since rule 1 matches a and rule 3 matches b, rule 2 matches either ab or ba. Therefore, rule 0 matches aab or aba.

Here’s a more interesting example:

0: 4 1 5
1: 2 3 | 3 2
2: 4 4 | 5 5
3: 4 5 | 5 4
4: "a"
5: "b"

Here, because rule 4 matches a and rule 5 matches b, rule 2 matches two letters that are the same (aa or bb), and rule 3 matches two letters that are different (ab or ba).

Since rule 1 matches rules 2 and 3 once each in either order, it must match two pairs of letters, one pair with matching letters and one pair with different letters. This leaves eight possibilities: aaab, aaba, bbab, bbba, abaa, abbb, baaa, or babb.

Rule 0, therefore, matches a (rule 4), then any of the eight options from rule 1, then b (rule 5): aaaabb, aaabab, abbabb, abbbab, aabaab, aabbbb, abaaab, or ababbb.

The received messages (the bottom part of your puzzle input) need to be checked against the rules so you can determine which are valid and which are corrupted. Including the rules and the messages together, this might look like:

0: 4 1 5
1: 2 3 | 3 2
2: 4 4 | 5 5
3: 4 5 | 5 4
4: "a"
5: "b"

ababbb
bababa
abbbab
aaabbb
aaaabbb

Your goal is to determine the number of messages that completely match rule 0. In the above example, ababbb and abbbab match, but bababa, aaabbb, and aaaabbb do not, producing the answer 2. The whole message must match all of rule 0; there can’t be extra unmatched characters in the message. (For example, aaaabbb might appear to match rule 0 above, but it has an extra unmatched b on the end.)

How many messages completely match rule 0?

2.1.2 Solution

We use use the Cocke–Younger–Kasami algorithm (CYK) on this grammar to determine whether a word can be generated by it. Thus, the grammar needs to be in Chomsky normal form (CNF).

As it happens the grammar is almost in CNF, but we need to remove some unit production rules first. We also take care of chain rules in case they occur (will be needed for part 2).

We have a working implementation in R which is rather slow (it can be found in the appendix). Thus we re-implemented the cyk part in C++ for a substantial speed gain.

#ifndef STANDALONE
#include <Rcpp.h>
using namespace Rcpp;
#else
#include <iostream>
#endif
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <variant>
#include <vector>

using Key = std::variant<std::pair<int, int>, std::string>;

struct KeyHash {
    size_t operator()(const std::variant<std::pair<int, int>, std::string>& k) const {
      return std::visit(
          [](const auto& v) {
            using T = std::decay_t<decltype(v)>;

            if constexpr (std::is_same_v<T, std::string>) {
              return std::hash<std::string>()(v);
            } else {
              size_t h1 = std::hash<int>()(v.first);
              size_t h2 = std::hash<int>()(v.second);
              return h1 ^ (h2 << 1);
            }
          },
          k);
    }
};

std::vector<bool> cyk_parse(const std::vector<std::string>& words,
                            const std::unordered_map<Key, std::vector<int>, KeyHash>& inverse_rules,
                            int start_symbol) {
  std::vector<bool> results;
  results.reserve(words.size());

  for (const auto& word : words) {
    size_t n = word.size();
    std::vector<std::vector<std::unordered_set<int>>> P(n, std::vector<std::unordered_set<int>>(n));
    for (size_t i = 0; i < n; ++i) {
      Key key = std::string(1, word[i]);
      if (auto it = inverse_rules.find(key); it != inverse_rules.end()) {
        P[i][0].insert(it->second.begin(), it->second.end());
      }
    }

    for (std::size_t j = 1; j < n; ++j) { // j = length of span - 1
      for (std::size_t i = 0; i + j < n; ++i) { // start of span
        for (std::size_t k = 0; k < j; ++k) { // partition of span
          const auto& left = P[i][k];
          const auto& right = P[i + k + 1][j - k - 1];
          if (left.empty() || right.empty()) {
            continue;
          }
          for (int B : left) {
            for (int C : right) {
              Key pair_key = std::make_pair(B, C);
              if (auto it = inverse_rules.find(pair_key); it != inverse_rules.end()) {
                P[i][j].insert(it->second.begin(), it->second.end());
              }
            }
          }
        }
      }
    }
    results.push_back(P[0][n - 1].count(start_symbol) > 0);
  }

  return results;
}

#ifndef STANDALONE
// [[Rcpp::export]]
LogicalVector is_member(List inv_rules, CharacterVector words, int start_symbol) {
  std::unordered_map<Key, std::vector<int>, KeyHash> inverse_rules;
  CharacterVector names = inv_rules.names();
  for (int i = 0; i < inv_rules.size(); ++i) {
    std::string key_str = as<std::string>(names[i]);
    IntegerVector rule = inv_rules[i];
    if (key_str.find('|') != std::string::npos) {
      size_t pipe_pos = key_str.find('|');
      int first = std::stoi(key_str.substr(0, pipe_pos));
      int second = std::stoi(key_str.substr(pipe_pos + 1));
      Key key = std::make_pair(first, second);
      inverse_rules[key] = as<std::vector<int>>(rule);
    } else {
      Key key = key_str;
      inverse_rules[key] = as<std::vector<int>>(rule);
    }
  }

  std::vector<std::string> word_vec = as<std::vector<std::string>>(words);
  std::vector<bool> results = cyk_parse(word_vec, inverse_rules, start_symbol);
  return wrap(results);
}
#else
int main() {
  std::unordered_map<Key, std::vector<int>, KeyHash> inverse_rules;
  inverse_rules[std::make_pair(1, 5)] = {6};
  inverse_rules[std::string("b")] = {5};
  inverse_rules[std::string("a")] = {4};
  inverse_rules[std::make_pair(4, 5)] = {3};
  inverse_rules[std::make_pair(5, 4)] = {3};
  inverse_rules[std::make_pair(4, 4)] = {2};
  inverse_rules[std::make_pair(5, 5)] = {2};
  inverse_rules[std::make_pair(2, 3)] = {1};
  inverse_rules[std::make_pair(3, 2)] = {1};
  inverse_rules[std::make_pair(4, 6)] = {0};

  std::vector<std::string> words = {"ababbb", "bababa", "abbbab", "aaabbb", "aaaabbb"};

  int start_symbol = 0;

  std::vector<bool> results = cyk_parse(words, inverse_rules, start_symbol);
  for (size_t i = 0; i < words.size(); ++i) {
    std::cout << "Word: " << words[i] << " is " << (results[i] ? "in" : "not in")
              << " the language." << std::endl;
  }
  return 0;
}
#endif

create_cnf <- function(rules) {
  names(rules) <- as.character(seq_along(rules))
  unit_rules <- keep(rules, ~ any(lengths(.x) == 1L)) 
  for (lhs in names(unit_rules)) {
    rule <- rules[[lhs]]
    new_rhs <- NULL
    for (i in seq_along(rule)) {
      rhs <- rule[[i]]
      if (length(rhs) == 1L && is.integer(rhs)) {
        new_rhs <- c(new_rhs, rules[[as.character(rhs)]])
      } else {
        new_rhs <- c(new_rhs, rule[i])
      }
    }
    rules[[lhs]] <- new_rhs
  } 
  chain_rules <- keep(rules, ~ any(lengths(.x) > 2L))
  if (length(chain_rules) > 0L) {
    for (lhs in names(chain_rules)) {
      rule <- rules[[lhs]]
      new_rhs <- NULL
      add_rules <- list()
      n <- length(rules)
      for (i in seq_along(rule)) {
        rhs <- rule[[i]]
        if (length(rhs) == 3L) {
          new_rhs <- c(new_rhs, list(c(rhs[1L], n + 1L)))
          add_rules <- c(add_rules, 
                         list(
                           list(rhs[-1L]))
          )
          n <- n + 1L
        } else {
          new_rhs <- c(new_rhs, rule[i])
        }
      }
      rules[[lhs]] <- new_rhs
    }
    rules <- c(rules, add_rules)
  }
  unname(rules)
}

make_key <- function(...) {
  paste(c(...), collapse = "|")
}

invert_rules <- function(rules) {
  rls <- new.env(parent = emptyenv())
  for (i in seq_along(rules)) {
    rule <- rules[[i]]
    for (r in rule) {
      stopifnot(length(r) %in% 1:2)
      rls[[make_key(r)]] <- c(i, rls[[make_key(r)]])
    }
  }
  rls
}

cyk <- function(words, rules, start = 1L) {
  cnf_rules <- create_cnf(rules)
  inv_rules <- invert_rules(cnf_rules)
  is_member(inv_rules, words, start) %>% 
    sum()
}

cyk(puzzle_data$msg, puzzle_data$rules)

## [1] 224

2.2 Part 2

2.2.1 Description

— Part Two —

As you look over the list of messages, you realize your matching rules aren’t quite right. To fix them, completely replace rules 8: 42 and 11: 42 31 with the following:

8: 42 | 42 8
11: 42 31 | 42 11 31

This small change has a big impact: now, the rules do contain loops, and the list of messages they could hypothetically match is infinite. You’ll need to determine how these changes affect which messages are valid.

Fortunately, many of the rules are unaffected by this change; it might help to start by looking at which rules always match the same set of values and how those rules (especially rules 42 and 31) are used by the new versions of rules 8 and 11.

(Remember, you only need to handle the rules you have; building a solution that could handle any hypothetical combination of rules would be significantly more difficult.)

For example:

42: 9 14 | 10 1
9: 14 27 | 1 26
10: 23 14 | 28 1
1: "a"
11: 42 31
5: 1 14 | 15 1
19: 14 1 | 14 14
12: 24 14 | 19 1
16: 15 1 | 14 14
31: 14 17 | 1 13
6: 14 14 | 1 14
2: 1 24 | 14 4
0: 8 11
13: 14 3 | 1 12
15: 1 | 14
17: 14 2 | 1 7
23: 25 1 | 22 14
28: 16 1
4: 1 1
20: 14 14 | 1 15
3: 5 14 | 16 1
27: 1 6 | 14 18
14: "b"
21: 14 1 | 1 14
25: 1 1 | 1 14
22: 14 14
8: 42
26: 14 22 | 1 20
18: 15 15
7: 14 5 | 1 21
24: 14 1

abbbbbabbbaaaababbaabbbbabababbbabbbbbbabaaaa
bbabbbbaabaabba
babbbbaabbbbbabbbbbbaabaaabaaa
aaabbbbbbaaaabaababaabababbabaaabbababababaaa
bbbbbbbaaaabbbbaaabbabaaa
bbbababbbbaaaaaaaabbababaaababaabab
ababaaaaaabaaab
ababaaaaabbbaba
baabbaaaabbaaaababbaababb
abbbbabbbbaaaababbbbbbaaaababb
aaaaabbaabaaaaababaa
aaaabbaaaabbaaa
aaaabbaabbaaaaaaabbbabbbaaabbaabaaa
babaaabbbaaabaababbaabababaaab
aabbbbbaabbbaaaaaabbbbbababaaaaabbaaabba

Without updating rules 8 and 11, these rules only match three messages: bbabbbbaabaabba, ababaaaaaabaaab, and ababaaaaabbbaba.

However, after updating rules 8 and 11, a total of 12 messages match:

bbabbbbaabaabba
babbbbaabbbbbabbbbbbaabaaabaaa
aaabbbbbbaaaabaababaabababbabaaabbababababaaa
bbbbbbbaaaabbbbaaabbabaaa
bbbababbbbaaaaaaaabbababaaababaabab
ababaaaaaabaaab
ababaaaaabbbaba
baabbaaaabbaaaababbaababb
abbbbabbbbaaaababbbbbbaaaababb
aaaaabbaabaaaaababaa
aaaabbaabbaaaaaaabbbabbbaaabbaabaaa
aabbbbbaabbbaaaaaabbbbbababaaaaabbaaabba

After updating rules 8 and 11, how many messages completely match rule 0?

2.2.2 Solution

For the second part, we need to adapt the rules. Because we followed R’s 1-based paradigm all indices need to be shifted by 1. Now we also have to take care about the rule with three elements on the right hand side and need to split them accordingly to maintain a CNF.

new_rules <- puzzle_data$rules
new_rules[[9L]] <- list(
  43L,
  c(43L, 9L)
)
new_rules[[12L]] <- list(
  c(43L, 32L),
  c(43L, 12L, 32L)
)

cyk(puzzle_data$msg, new_rules)

## [1] 436

3 R Legacy Solution

cyk <- function(words, rules, start = 1L) {
  cnf_rules <- create_cnf(rules)
  inv_rules <- invert_rules(cnf_rules)
  is_member <- function(word) {
    chars <- str_split(word, "")[[1L]]
    n <- length(chars)
    V <- vector("list", n * n)
    dim(V) <- c(n, n)
    for (i in 1:n) {
      V[[i, 1]] <- inv_rules[[chars[i]]]
    }
    for (j in 2:n) {
      for (i in 1:(n + 1L - j)) {
        cell <- integer(0L)
        for (k in 1:(j - 1)) {
          B <- V[[i, k]]
          C <- V[[i + k, j - k]]
          if (length(B) == 0 || length(C) == 0) {
            next
          }
          
          for (b in B) {
            for (c in C) {
              rhs <- inv_rules[[make_key(b, c)]]
              if (!is.null(rhs)) {
                cell <- c(cell, rhs)
              }
            }
          }
        }
        cell <- unique(cell)
        if (length(cell) > 0L) {
          V[[i, j]] <- cell
        }
      }
    }
    start %in% V[[1L, n]]
  }
  res <- imap_lgl(words, ~ {cat("Testing word ", .y, "...\r", sep =""); is_member(.x)}) 
  cat("\n")
  sum(res)
}