1 Setup
1.1 Libraries
library(httr)
library(xml2)
library(magrittr)
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
library(tidyr)
library(kableExtra)
1.2 Retrieve Data from AoC
session_cookie <- set_cookies(session = keyring::key_get("AoC-GitHub-Cookie"))
base_url <- paste0("https://adventofcode.com/", params$year, "/day/", params$task_nr)
puzzle <- GET(base_url,
session_cookie) %>%
content(encoding = "UTF-8") %>%
xml_find_all("///article") %>%
lapply(as.character)
parse_puzzle_data <- function(text_block = readClipboard()) {
if (length(text_block) == 1L) {
text_block <- text_block %>%
str_split("\n") %>%
extract2(1L) %>%
keep(nzchar)
}
rules <- str_subset(text_block, "=>") %>%
str_split(" => ") %>%
do.call(rbind, .) %>%
set_colnames(c("from", "to")) %>%
as_tibble()
molecule <- str_subset(text_block, "=>", TRUE)
list(rules = rules, molecule = molecule)
}
puzzle_data <- local({
GET(paste0(base_url, "/input"),
session_cookie) %>%
content(encoding = "UTF-8") %>%
parse_puzzle_data()
})
2 Puzzle Day 19
2.1 Part 1
2.1.1 Description
— Day 19: Medicine for Rudolph —
Rudolph the Red-Nosed Reindeer is sick! His nose isn’t shining very brightly, and he needs medicine.
Red-Nosed Reindeer biology isn’t similar to regular reindeer biology; Rudolph is going to need custom-made medicine. Unfortunately, Red-Nosed Reindeer chemistry isn’t similar to regular reindeer chemistry, either.
The North Pole is equipped with a Red-Nosed Reindeer nuclear fusion/fission plant, capable of constructing any Red-Nosed Reindeer molecule you need. It works by starting with some input molecule and then doing a series of replacements, one per step, until it has the right molecule.
However, the machine has to be calibrated before it can be used. Calibration involves determining the number of molecules that can be generated in one step from a given starting point.
For example, imagine a simpler machine that supports only the following replacements:
H => HO
H => OH
O => HH
Given the replacements above and starting with HOH
, the following molecules could be generated:
-
HOOH
(viaH => HO
on the firstH
). -
HOHO
(viaH => HO
on the secondH
). -
OHOH
(viaH => OH
on the firstH
). -
HOOH
(viaH => OH
on the secondH
). -
HHHH
(viaO => HH
).
So, in the example above, there are 4
distinct molecules (not five, because HOOH
appears twice) after one replacement from HOH
. Santa’s favorite molecule, HOHOHO
, can become 7
distinct molecules (over nine replacements: six from H
, and three from O
).
The machine replaces without regard for the surrounding characters. For example, given the string H2O
, the transition H => OO
would result in OO2O
.
Your puzzle input describes all of the possible replacements and, at the bottom, the medicine molecule for which you need to calibrate the machine. How many distinct molecules can be created after all the different ways you can do one replacement on the medicine molecule?
2.1.2 Solution
We use regular expressions to solve this task.
count_new_molecules <- function(molecule, rules) {
unique_patterns <- rules %>%
pull(from) %>%
unique()
replace_molecule <- function(molecule, start, end, to) {
str_sub(molecule, start, end) <- to
molecule
}
all_matches <- map(unique_patterns, function(reg) {
pos <- str_locate_all(molecule, reg) %>%
extract2(1L) %>%
as_tibble()
to <- rules %>%
filter(from == reg) %>%
pull(to)
pos %>%
mutate(to = list(to)) %>%
mutate(from = reg, .before = 1L) %>%
unnest(cols = c(to))
}) %>%
list_rbind() %>%
rowwise()
all_matches %>%
mutate(new_molecule = replace_molecule(molecule, start, end, to)) %>%
pull(new_molecule) %>%
unique() %>%
length()
}
count_new_molecules(puzzle_data$molecule, puzzle_data$rules)
## [1] 576
2.2 Part 2
2.2.1 Description
— Part Two —
Now that the machine is calibrated, you’re ready to begin molecule fabrication.
Molecule fabrication always begins with just a single electron, e
, and applying replacements one at a time, just like the ones during calibration.
For example, suppose you have the following replacements:
e => H
e => O
H => HO
H => OH
O => HH
If you’d like to make HOH
, you start with e
, and then make the following replacements:
-
e => O
to getO
-
O => HH
to getHH
-
H => OH
(on the secondH
) to getHOH
So, you could make HOH
after 3
steps. Santa’s favorite molecule, HOHOHO
, can be made in 6
steps.
How long will it take to make the medicine? Given the available replacements and the medicine molecule in your puzzle input, what is the fewest number of steps to go from e
to the medicine molecule?
2.2.2 Solution
This was a tough nut to crack. First approaches of (guided) brute-force leaded nowhere,
neither some ideas to parse the string into a nested list, which can be eventually
transformed into a tree. There, we walked from the bottom to the top, reducing
Rn
Ar
brackets one after another. However, teh remaining top level string was
still too large to brute force all possible rule combinations.
Some observations we amade along the efforst were:
- Rules that do not contain
Y
,Rn
orAr
. - Rules that are of the form
x Rn z Ar
wherex
andz
are single atoms. - Rules that are of the form
x Rn (z Y w)+ Ar
wherex
,z
andw
are single atoms.
We can regard Rn
as (
, Ar
as )
and Y
as ,
. This translates the relevant rules
to the following:
puzzle_data$rules %>%
filter(str_detect(to, "Ar")) %>%
mutate(to = str_replace_all(to, c(Ar = ")", Y = ",", Rn = "("))) %>%
mutate(" => " = " => ", .after = 1L) %>%
kbl() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE) %>%
column_spec(1:3, monospace = TRUE)
from | => | to |
---|---|---|
Al | => | Th(F) |
B | => | Ti(F) |
Ca | => | P(F) |
Ca | => | Si(F,F) |
Ca | => | Si(Mg) |
H | => | C(Al) |
H | => | C(F,F,F) |
H | => | C(F,Mg) |
H | => | C(Mg,F) |
H | => | N(F,F) |
H | => | N(Mg) |
H | => | O(F) |
N | => | C(F) |
O | => | C(F,F) |
O | => | C(Mg) |
O | => | N(F) |
P | => | Si(F) |
Furthermore, we observe that there are only 4 types of production rules (X
being a
terminal symbol, i.e. not (
, )
or ,
):
XX => X
orXX => e
X(X) => X
X(X,X) => X
X(X,X,X) => X
That is, each rule (when applied from the full string), reduces the string length
(measured as number of tokens and not characters) either by 1 (when we apply rule (1)),
3 (rule (2)), 5 (rule (3)) or 7 (rule (4)). To reduce a string of length n
to a string
of length 1 we would need n - 1 steps
if we applied only rules of type (1). Applying
a bracket rule (2) - (4) needs the same amount of steps, but we reduce the brackets for
free. Applying a rule with a comma, gets 2 symbols for free (,
and X
). Thus we
simply need to count number of brackets, commas and tokens and get an analytical solution.
This gives us the correct answer, however, our idea to implement a reducing
algorithm failed.
N.B. This problem kept bothering me and after reading more on the underlying problem, I figured that it can be solved via an adapted version of the Cocke–Younger–Kasami algorithm. I include my working C++ code for solving this issue with means of the CYK algorithm in the appendix.
count_reduction_steps <- function(molecule) {
str_count(molecule, "[A-Z][a-z]?") -
str_count(molecule, "Rn|Ar") -
2 * str_count(molecule, "Y") -
1
}
count_reduction_steps(puzzle_data$molecule)
## [1] 207
3 CYK Algorithm
The Cocke–Younger–Kasami algorithm (CYK) determines whether a given string is part of a grammar. However, this grammar must be a context-free grammar given in Chomsky normal form (CNF).
The CNF dictates that every rule NT
must resolve either to a single terminal, or to
exactly 2 non terminals. Thus, we needed first to translate the given rules into the
CNF and could then apply the CYK.
The CYK in its original form uses dynamic programming to
determine whether a substring starting at position i
up to position j
can be built
using the rules. It starts with single tokens (they can be built if there is a unit
production rule producing this terminal). In subsequent iterations it uses the previously
calculated values to determine whether a longer string can be formed. The idea being, to
determine whether we can build a substring from position 1 to 2 for example, we simply
look whether we can build this string from the rules producing string (if at all) from 1
to 1 and a rule producing a string from 2 to 2. So if rule NT1
produced a string from
position 1 to 1, and NT2
produced the string from position 2 to 2 and we have a rule
producing NT1 NT2
we can deduce that also a string from position to 2 can be produced.
The deeper we go into the iteration the more combinations must be checked, but eventually we fill the last cell. If in the end this cell is not empty (or more generally can be reduced to the starting symbol), we can conclude that the string can be built.
In the orignal form it test simply for existance, but in our adapted form we count number of replacements on the way (becuase we needed to transfrom the problem to CNF upfront, we must take care to count only the original and not the synthetical rules, which were simply introduced to satisfy the CNF requirement).
A great visual tool for understanding the CYK algorithm can be found here.
#ifndef STANDALONE
#include <Rcpp.h>
using namespace Rcpp;
#endif
#include <iostream>
#include <map>
#include <memory>
#include <optional>
#include <set>
#include <string>
#include <utility>
#include <vector>
using Token = std::string;
using DoubleToken = std::pair<Token, std::optional<Token>>;
class Grammar {
private:
std::set<Token> terminals;
std::set<Token> non_terminals;
std::set<Token> original_rules;
std::map<Token, size_t> bin_cnts;
std::map<Token, std::set<DoubleToken>> rules;
std::map<DoubleToken, std::set<Token>> inverse_rules;
Token make_nonterminal(const Token&, bool);
std::vector<Token> tokenize_input(const Token&) const;
void add_rule(const Token&, const std::vector<Token>&);
bool is_original_rule(const Token&, const DoubleToken&) const;
std::map<Token, unsigned int> run_cyk(const std::vector<Token>&) const;
public:
Grammar(const std::map<Token, std::vector<std::vector<Token>>>&);
bool is_member(const Token&) const;
bool is_member(const std::vector<Token>&) const;
unsigned int get_nr_replacements(const Token&) const;
unsigned int get_nr_replacements(const std::vector<Token>&) const;
friend std::ostream& operator<<(std::ostream&, const Grammar&);
};
Grammar::Grammar(const std::map<Token, std::vector<std::vector<Token>>>& rules) {
for (const auto& [symbol, rule] : rules) {
for (const auto& r : rule) {
add_rule(symbol, r);
}
}
}
bool Grammar::is_member(const std::vector<Token>& word_list) const {
return run_cyk(word_list).size() > 0;
}
bool Grammar::is_member(const Token& word) const {
return is_member(tokenize_input(word));
}
unsigned int Grammar::get_nr_replacements(const std::vector<Token>& word_list) const {
std::map<Token, unsigned int> cyk_results = run_cyk(word_list);
unsigned int result = -1; // set to maximum value
for (const auto& [key, n] : cyk_results) {
result = (n < result) ? n : result;
}
return result;
}
unsigned int Grammar::get_nr_replacements(const Token& word) const {
return get_nr_replacements(tokenize_input(word));
}
std::ostream& operator<<(std::ostream& stream, const Grammar& grammar) {
stream << "Terminals:" << std::endl << "\t[";
for (auto it = grammar.terminals.begin(); it != grammar.terminals.end(); ++it) {
stream << "\"" << *it << "\"";
if (next(it) != grammar.terminals.end()) {
stream << ", ";
}
}
stream << "]" << std::endl;
stream << "Non-Terminals:" << std::endl << "\t[";
for (auto it = grammar.non_terminals.begin(); it != grammar.non_terminals.end(); ++it) {
stream << *it;
if (next(it) != grammar.non_terminals.end()) {
stream << ", ";
}
}
stream << "]" << std::endl;
stream << "Original Rules:" << std::endl << "\t[";
for (auto it = grammar.original_rules.begin(); it != grammar.original_rules.end(); ++it) {
stream << *it;
if (next(it) != grammar.original_rules.end()) {
stream << ", ";
}
}
stream << "]" << std::endl << "Rules:" << std::endl;
for (const auto& rule : grammar.rules) {
stream << "\t" << rule.first << " => ";
auto it = rule.second.begin();
while (it != rule.second.end()) {
if (it->second) {
stream << it->first << " " << *(it->second);
} else {
stream << "\"" << it->first << "\"";
}
++it;
if (it != rule.second.end()) {
stream << " | ";
}
}
stream << std::endl;
}
return stream;
}
Token Grammar::make_nonterminal(const Token& symbol, bool add_suffix = false) {
Token nt = "NT_" + symbol;
if (add_suffix) {
size_t suffix = bin_cnts[symbol] + 1;
nt = nt + "_" + std::to_string(suffix);
bin_cnts[symbol]++;
}
return nt;
}
std::vector<Token> Grammar::tokenize_input(const Token& word) const {
Token token;
std::vector<Token> word_list;
for (auto it = word.begin(); it != word.end(); ++it) {
token = *it;
if (std::next(it) != word.end() && std::islower(*std::next(it))) {
token += *std::next(it);
++it; // Skip the next character as it is already added
}
word_list.push_back({token});
}
return word_list;
}
void Grammar::add_rule(const Token& lhs, const std::vector<Token>& rhs) {
DoubleToken p1;
Token nt_lhs, nt_rhs_left, nt_rhs_right, nt_prev;
bool need_bin = rhs.size() > 2;
nt_prev = make_nonterminal(lhs);
original_rules.insert(nt_prev); // add lhs to original rules needed to count only them
for (size_t i = 0; i < rhs.size() - 1; ++i) {
/*
* Original string: H => CRnFYFYFAr
* lhs = "H"
* rhs = {"C", "Rn", "F", "Y", "F", "Ar"}
* NT_H => NT_C NT_H_1 [0]
* NT_H_1 => NT_Rn NT_H_2 [1]
* NT_H_2 => NT_F NT_H_3 [2]
* NT_H_3 => NT_Y NT_H_4 [3]
* NT_H_4 => NT_F NT_Ar [4]
* NT_H => "H"
* NT_C => "C"
* NT_Rn = "Rn"
* NT_F => "F"
* NT_Y => "Y",
* NT_Ar => "Ar"
*/
terminals.insert(rhs[i]);
terminals.insert(rhs[i + 1]);
nt_rhs_left = make_nonterminal(rhs[i]); // NT_C
p1 = std::make_pair(rhs[i], std::nullopt);
rules[nt_rhs_left].insert(p1); // NT_C => "C"
inverse_rules[p1].insert(nt_rhs_left);
if (need_bin && i != (rhs.size() - 2)) {
nt_rhs_right = make_nonterminal(lhs, true);
} else {
nt_rhs_right = make_nonterminal(rhs[i + 1]);
p1 = std::make_pair(rhs[i + 1], std::nullopt);
rules[nt_rhs_right].insert(p1); // NT_Ar => "Ar"
inverse_rules[p1].insert(nt_rhs_right);
}
p1 = std::make_pair(nt_rhs_left, nt_rhs_right);
rules[nt_prev].insert(p1);
inverse_rules[p1].insert(nt_prev);
non_terminals.insert(nt_rhs_left);
non_terminals.insert(nt_rhs_right);
non_terminals.insert(nt_prev);
nt_prev = nt_rhs_right;
}
}
bool Grammar::is_original_rule(const Token& rule, const DoubleToken& rhs) const {
return original_rules.find(rule) != original_rules.end() && rhs.second.has_value();
}
std::map<Token, unsigned int> Grammar::run_cyk(const std::vector<Token>& word_list) const {
DoubleToken p1;
std::set<Token> rhs;
size_t n = word_list.size();
std::map<int, std::map<int, std::map<Token, unsigned int>>> token_table;
for (size_t j = 0; j < n; ++j) {
p1 = std::make_pair(word_list[j], std::nullopt);
if (inverse_rules.find(p1) != inverse_rules.end()) {
rhs = inverse_rules.at(p1);
for (const auto& r : rhs) {
token_table[j][j][r] = 0U;
}
}
for (int i = j; i >= 0; --i) {
for (size_t k = i; k <= j; ++k) {
for (const auto& [rl, il] : token_table[i][k]) {
for (const auto& [rr, ir] : token_table[k + 1][j]) {
p1 = std::make_pair(rl, rr);
if (inverse_rules.find(p1) != inverse_rules.end()) {
rhs = inverse_rules.at(p1);
for (const auto& r : rhs) {
token_table[i][j][r] = il + ir + (is_original_rule(r, p1) ? 1U : 0U);
}
}
}
}
}
}
}
return token_table[0][n - 1];
}
#ifndef STANDALONE
// [[Rcpp::export]]
unsigned int count_replacements(List rules, Token molecule) {
Token symbol;
CharacterVector symbols = rules.names();
std::map<Token, std::vector<std::vector<Token>>> rule_map;
for (R_xlen_t i = 0; i < rules.size(); ++i) {
symbol = as<Token>(symbols[i]);
std::vector<Token> rule_vec = as<std::vector<Token>>(rules[i]);
rule_map[symbol].push_back(rule_vec);
}
Grammar grammar(rule_map);
return grammar.get_nr_replacements(molecule);
}
#else
int main() {
std::map<Token, std::vector<std::vector<Token>>> rules = {{"H", {{"H", "O"}, {"O", "H"}}},
{"O", {{"H", "H"}}}};
Grammar grammar(rules);
Token molecule = "HOH";
unsigned int replacements = grammar.get_nr_replacements(molecule);
std::cout << "Number of replacements for molecule '" << molecule << "': " << replacements
<< std::endl;
return 0;
}
#endif
rules <- puzzle_data$rules %>%
mutate(to = str_split(to, "(?<=\\w)(?=[A-Z])"))
rules <- rules %>%
pull(to) %>%
set_names(rules %>%
pull(from))
count_replacements(rules, puzzle_data$molecule)
## [1] 207