Welcome to my website!
I’m Jonathan Y. Chan (jyc
, jonathanyc
, 陳樂恩, or 은총), a
🐩 Yeti fan,
🇺🇸 American,
and 🐻 Californian,
living in 🌁 San Francisco: the most beautiful city in the greatest country in the world.
My mom is from Korea and my dad was from Hong Kong.
I am a Christian.
My professional endeavors include:
VLOOKUP
s on billions of rows;
Parlan was a spreadsheet with an interface and formula language that looked just like Excel. Under the hood, it compiled formulas to SQL then evaluated them like Spark RDDs. Alas, a former manager’s prophecy about why startups fail proved prescient…
I also helped out with things like trees, road markings, paths, and lines of latitude!
… including copy-paste,
a high-fidelity PDF exporter,
text layout,
scene graph code(gen),
and putting fig-foot
in your .fig
files—while
deleting more code than I added!
Harmony is set in a world where technology makes it possible for everyone to be in perfect physical health. In such a world, Project Itoh (the pen name of author ITO Satoshi) thinks the only remaining source of suffering would be free will.
I enjoyed the book: the world is interesting and I was hooked on the plot. But I can’t recommend it.
First, a petty complaint that I need to get it out of the way: the book shares Golden Age of Science Fiction’s unfortunate fixation with writing like:
To prove that these tits, this ass, this belly, aren’t a book.
I’m no Andrea Dworkin but I just don’t think that’s good writing. A male main character who, transposing from the first chapter, had constant thoughts like “While my dick is growing longer… While my balls are still dropping…”, would be equally annoying.
🚨 Warning: spoilers follow!
The conceit that the XML-inspired “ETML” markup, e.g. <anger>
, is for readers in the book’s world who no longer have the ability to experience inner emotions is cute.
But the author conflates qualia with rational vs. emotional behavior in a way that doesn’t make sense to me.
For example:
People cried as though they were sad and raged as though they were angry. But these actions carried the same value as the mimicked emotional responses a robot would have had in the previous era. All people had lost their inward minds.
Mankind was in perfect harmony with its medical industrial society.
The instant the old folks had entered their codes and the Harmony program had begun to sing, suicide disappeared from human society.
… and:
“What happens when you lose your consciousness? Do you just sit there all day in your chair, drooling?”
“Nothing of the sort. You go shopping, you eat, you enjoy entertainment–you merely no longer have to make decisions what to do at any given time because everything is self-evident. It’s the difference between having to make choices and having it all be obvious to you. That’s all it is. … From the outside, it’s nearly impossible to tell whether someone has consciousness or is merely acting as though they did. However, because their system of values is fashioned to be in perfect harmony with society, there are far fewer suicides…”
…
“When she came back, Miach said it had been pure ecstasy. … She only had the sensation that she had been in a wonderful, joyous place.”
…
“People with perfect judgement do not require a consciousness, so it does not exist.”
This sounds a lot like a description of philosophical zombies1 (henceforth p-zombies) (“as though they were sad” and “had lost their inward minds”). Although the statement that “it’s nearly impossible to tell” (emphasis mine) implies that unlike p-zombies there is a change in behavior.
The author claims that “having to make choices” leads to consciousness, which leads to experienced emotions, which leads to e.g. suicide. But the characters without consciousness still display emotions even though they don’t experience them.
Surely there is a distinction between experiencing X and “acting as though” you experience X. But why do the people without consciousness cry as though they were sad and rage as though they were angry but not commit suicide as though they were depressed?
I don’t want to get hung up on qualia: the idea that people might try to reduce emotional behavior in order to create a more harmonious society is interesting. If the book were only about using WatchMe to involuntarily reduce everyone’s emotions rather than to totally eliminate their consciousnesses (“All people had lost their inward minds.”) I think I’d like it more.
I don’t think I can recommend the book, but I try to make sense of it in the context of the book’s creation: Project Itoh worked on Harmony in the final year of his life, dying from cancer. The thesis of the book is that in order to eliminate pain and the temptation to commit suicide we would have to give up subjective experience and the ability to choose at all. That says much about what he believed he stood to lose and what we enjoy for a little longer.
Beings otherwise identical to a consciousness-experiencing persons, but without consciousness.
From the Wikipedia article on the Mosuo ethnic group in China:
In Mosuo culture, a myth describes that long ago, dogs had life spans of 60 years while humans had life spans of thirteen years. Humans felt their life span was too short, so they traded it with the dogs in exchange for paying homage to them.
From “British logistics in the Falklands War” on Wikipedia:
The Argentine government did not wish to “repatriate” its dead, as it considered that they were already in Argentina. Many were not identified, and were buried with the inscription “Argentine soldier known unto God.”
LLMs can sometimes recognize number patterns, but can they explain their reasoning? See for yourself!
The interactive demo below generates a random program and uses that to compute three number sequences. The LLM is given two of those as examples and asked to pick the third out of a lineup. Click expand to see the actual messages sent to/from the LLM. You can run things yourself if you click Settings and enter an Anthropic API key: check out the FAQ for more details!
If you click expand next to any of the LLM rows, you can all the messages sent to the API.
In all of the examples, Alice takes multiple turns and uses the eval_js
tool.
In some of the examples (e.g. #1) you can see that she even writes out a hypothesis before testing it with code, which I think is pretty clever.
It’s a good idea to experiment with the prompts though!
You don’t need to change any of the code to do that: just change the value of prompts
key in the settings.
See “Q. How can I run this myself” for details.
When you click Run:
difficulty
option in the settings corresponds to the maximum generated program length.
To search for programs, we just iterate over or sample integers from between zero and $2^{4 \times \text{difficulty}}$ (equivalently, all bitstrings with up to $4 \times \text{difficulty}$ bits).- 1 bgtz 8 -
will place $\{t_1 \; t_0 \; -\}$ at the top of the stack.
$t_i$ denotes a symbolic value corresponding to the $i$-th value to be popped off the top of the stack.
The curly braces denote that the top of the stack might correspond to any symbolic expression from a set of possible symbolic expressions.
In this case, the analysis is able to detect that the bgtz
will always branch because the constant $1$ is pushed immediately beforehand.api
settings key (currently Anthropic’s Claude 3.5 Sonnet).First, some big warnings (repeated in the settings panel!):
🚨 Many of the values in the settings can contain arbitrary JavaScript which is run inside of your browser!
🚨 The default
eval_js
tool allows the LLM itself to run arbitrary JavaScript in your browser!
So it’s probably a good idea not to connect this demo to a malevolent superintelligent LLM that knows zero-day exploits for your browser. Having said that:
To provide an Anthropic API key, go to api.headers.x-api-key
and replace the {{TODO: ...}}
placeholder.
The settings are just YAML.
The api
section comes after the prompts
so you’ll have to scroll a little.
You can change a lot through the settings: not just the prompts, but also the HTTP requests the demo makes (by default it’s configured to use Claude 3.5 Sonnet by Anthropic).
For example, to remove the eval_js
tool, you can set "tools": []
in the value of api.body
.
You should be able to configure the demo to hit e.g. OpenAI or Mistral’s APIs just by changing settings.
You’d have to change the JavaScript under api.parse
function to handle their output.
Just make sure that your function hews to the same format:
{ action: 'stop', append, output }
to return the string output
from the call to the LLM and append append
to the list of messages.{ action: 'reply', append }
to append append
to the list of messages and then hit the API again.
This is for tool calling.The demo does not save anything, so be careful about refreshing! I’d recommend working on the settings YAML in your editor of choice so you get syntax highlighting etc.
My takeaway is that although LLMs are powerful, their reasoning ability is limited in ways that are often unintuitive. Even when the LLM picks the right number sequence, the pattern it gives is generally nonsense. And even Bob and Charlie generally recognize that Alice’s pattern is wrong! Also, as usual, the LLM sounds very confident even when it’s dead wrong.
It’s interesting to me that in spite of the fact that the LLM does so poorly with number sequences in general, it does pretty well with variations of the Fibonacci sequence.
I was also surprised that the patterns (in a sense explanations) were so low-quality. Given the architecture of autoregressive LLMs, if the explanation is output after the choice, the text of the explanation has no impact on its choice. But it’s still surprising that the explanations can be so bad:
One day in March, I was walking my dog, saw a house numbered 1347, and thought it was a funny pattern. It’s the Fibonacci sequence with a different seed (13 instead of 11).1 I texted a friend about it and then got interested in the problem of automatically generating number patterns.
At the end of March I was going to try set up a system so Amazon Mechanical Turk workers could compete against LLMs. The infrastructure for that was so soul-crushingly boring I dropped the project. Last week I decided I should just make a simple UI and release it for people to play around with it themselves.
I’m glad it’s obvious to you!
I heard that LLMs did pretty well on standardized tests like the SAT and GRE and could already reason as well as college students, but I might have just hallucinated that. I’m glad I did: it would have been kinda misleading for people to say things like that!
I am optimistic that computers will be better at this task someday (perhaps soon)! Obviously it’s incredibly impressive that LLMs are able to do any of this at all, but even good science fiction authors discuss their fictional technologies’ limitations. In particular, seeing as the LLM is good at recognizing variations on the Fibonacci sequence, I wonder how well a model trained exclusively to predict number sequences generated by programs like this could do. How would e.g. the maximum difficulty where an LLM could solve at least half of the problems correlate to the model’s size?
It’s worth noting that if you wanted to cheat at this demo, you could trivially brute-force it: at the default difficulty of 5 you would only need to iterate over $2^{4 \times 5} \approx 1\text{M}$ programs. Your smart fridge could do that.
Maybe you should conclude that you aren’t an (A)GI either! Just kidding, ha ha.
It doesn’t make much sense to argue about whether something is intuitive or not! But I’m happy to walk through how I think about the first example:
Program: - 1 bgtz 8 -
(51f85
)
Here are 2 number sequences that follow a common pattern:
Which one of the following sequences follows the same pattern?
Personally, when I look at the two examples, I notice that:
At this point I still don’t know a mathematical rule for the pattern. But by process of elimination I’m left with (C), which turns out to be correct.
The mathematical rule is that you subtract the last term from the second-to-last term. The program is far from canonical: in fact, everything after the first instruction is a noop. But remember that we’re not asking anyone to come up with a precise mathematical rule or to reverse-engineer the original program!
Among many other potential experiments, it would be interesting to see if it’s useful to tell the LLM about some of these tricks in the prompt.
Given that it has an eval_js
tool to evaluate arbitrary JavaScript, we could even give it a library of functions that would be useful for evaluating number sequences.
Alas, I already dropped this project for months due to getting bored—the backend was already finished in March—so I’m just releasing it as-is.
The “top” of the stack is on the right. When I write
$$\cdots \rightarrow \cdots 0$$
I mean that the when the operator/instruction 0
is executed, then $0$ is pushed to the top of the stack.
I tried to stick to Forth names and semantics.
Kind | Hex | Operator | Stack semantics |
---|---|---|---|
Constants | 0 | 0 | $\cdots \rightarrow \cdots 0$ |
1 | 1 | $\cdots \rightarrow \cdots 1$ | |
2 | 2 | $\cdots \rightarrow \cdots 2$ | |
3 | 3 | $\cdots \rightarrow \cdots 3$ | |
Arithmetic | 4 | + | $\cdots x y \rightarrow \cdots (x + y)$ |
5 | - | $\cdots x y \rightarrow \cdots (x - y)$ | |
6 | * | $\cdots x y \rightarrow \cdots (x \times y)$ | |
7 | / | $\cdots x y \rightarrow \cdots (x \div y)$ | |
8 | % | $\cdots x y \rightarrow \cdots (x\mod y)$ | |
Stack | 9 | dup | $\cdots x \rightarrow x x$ |
a | drop | $\cdots x \rightarrow \cdots$ | |
b | swap | $\cdots x y \rightarrow \cdots y x$ | |
c | rot | $y \cdots \rightarrow \cdots y$ | |
d | unrot | $\cdots x \rightarrow x \cdots$ | |
e | len | $\cdots \rightarrow \cdots \operatorname{len}(\cdots)$ | |
Control | f | bgtz n | $\cdots x \rightarrow \cdots$ |
bgtz
is the only control flow operator and the only multi-word operator (i.e. the only operator that takes more than 4 bits).
The following word in the program defines the branch offset and is interpreted as a number, not an opcode.
This means that you can skip up to 15 instructions following the offset; bgtz 0
is a noop.
Execution jumps by n
instructions when $x > 0$ and continues otherwise.
The sequence generated by a program is not just the stack after the program has executed. To generate the $(n+1)$-th term of a sequence, we initialize the stack with the $n$ previous terms; the $n$-th term is at the top of the stack. After running the program, the top of the stack becomes the $(n+1)$-th term of the sequence. In the demo, the stack is initialized with the same two seed numbers for all of the choices presented to the LLM.
You can play with the interpreter and analyzer in your browser’s developer console.
Use stackbee.evaluate("+", "1 2")
to evaluate the program +
on the stack 1 2
.
You can write rational numbers too:
"/ 1 +", "1 2"
// '3/2'
// Tracing execution
"/ 1 +", "1 2"
// stack ops
// 1 2 / 1 +
// 1/2 1 +
// 1/2 1 +
// 3/2
//
// '3/2'
// Static analysis
"/ 1 +"
// stack ops
// ⋯ / 1 +
// ⋯ {t₁ t₀ /} 1 +
// ⋯ {t₁ t₀ /} {1} +
// ⋯ {t₁ t₀ / 1 +}
//
// ' ⋯ {t₁ t₀ / 1 +}'
2 *
This program just doubles whatever’s on top of the stack. Given the seed $0, 4$, for example, it will generate the number sequence: $0, 4, 8, 16, \ldots$
+
This program adds the top two numbers on the stack. A simple program for a simple sequence: and one it seems likely that LLMs have learned a specific circuit for, given how often they miscategorize sequences as Fibonacci! Given the seed $1, 2$ we recover the classic Fibonacci sequence: $1, 2, 3, 5, 8, \ldots$
len 2 * 1 - 2 * * len 1 + /
Catalan numbers show up a lot when counting things.
When the stack contains the $n-1$ previous Catalan numbers, after this program is executed, the top of the stack will be the $n$-th Catalan number. That means when the seed is $1, 1$ (the first two Catalan numbers), the program will generate the Catalan numbers: $1, 1, 2, 5, \ldots$
We use the following recurrence:
C_n = \frac{2(2n-1)}{n+1}C_{n-1}
The execution trace is:
Stack | Program |
---|---|
$\cdots \quad C_{n-1}$ | len 2 * ⋯ |
$\cdots \quad C_{n-1} \quad 2n$ | 1 - ⋯ |
$\cdots \quad C_{n-1} \quad 2n-1$ | 2 * ⋯ |
$\cdots \quad C_{n-1} \quad 2(2n-1)$ | * ⋯ |
$\cdots \quad 2(2n-1)C_{n-1}$ | len 1 + ⋯ |
$\cdots \quad 2(2n-1)C_{n-1} \quad n+1$ | / |
$\cdots \quad C_n$ |
Technically only the top of the stack needs to be the $(n-1)$-th Catalan number, the other $n-2$ values can be whatever.
dup 2 % bgtz 5 2 / 1 bgtz 4 3 * 1 +
To apply the Collatz function $C$ to a number $n$:
The Collatz sequence (or hailstone sequence) for a number $n$ is $n, C(n), C(C(n)), \cdots$. The Collatz conjecture is that the Collatz sequence for every positive integer eventually reaches 1.
The program looks funky but is actually simpler than the Catalan numbers program:
Is even? | Even | (go to end) | Odd |
---|---|---|---|
dup 2 % bgtz 5 | 2 / | 1 bgtz 4 | 3 * 1 + |
The 1 bgtz 4
jumps to the end of the program so that the even case doesn’t fall through to the odd case.
I’m sure a better instruction set is possible; it’s just tough to fit all the instructions in 4 bits.
This program is 14 instructions long (so 56 bits).
Unfortunately this is just one over the maximum size of generated programs in the demo (13), which is limited by MAX_SAFE_INTEGER
in JavaScript ($2^{53} - 1$).
I apologize if you are reading this in the year 2525 and your computer would otherwise be fast enough to generate this program and test it against an LLM.
That would have been fun.
The symbolic executor exists for two purposes:
The language and the programs are both puny, which means that my primitive static analysis does OK. Conveniently, we can represent symbolic expressions as programs themselves! For example, the symbolic expression representing the first unknown value popped from the stack is $t_0$; the symbolic expression representing that value plus two is $t_0 \; 2 \; +$.
SymExpr = SymVal[]
SymVal = Op | Sym
Sym =
$t_i, \; b_j, \; \ell_k$len
operator was evaluatedValues on the stack are represented as sets of symbolic expressions; branching execution paths enlarge sets.
SymExprSet = SymExpr[]
So a symbolic expression set of $\{0, 1\}$ means that the concrete value might be either zero or one.
The symbolic stack itself is represented as two lists: a list of known values at the top, and a list of known values at the bottom.
We can write the stack like this:
x y \cdots z w
This means $x$ and $y$ are at the bottom and $z$ and $w$ are at the top. The analysis assumes that the length of the stack is unknown, so there may be zero or more values separating the top and bottom.
An implication of that is that the following stack might have only one value $x$ at both the bottom and the top:
\cdots x
The length of the stack being unknown has a few implications.
For example, rot
has two obvious cases:
\dfrac{x \cdots \quad {\tt rot}}{\cdots x} \; \text{rot-1}
\dfrac{\cdots \quad {\tt rot}}{\cdots b_i} \; \text{rot-3}
… and then a third case (which I’m calling $\text{rot-2}$ to indicate precedence), which is a little funky:
\dfrac{\cdots x \quad {\tt rot}}{\cdots b_i} \; \text{rot-2}
The first rule says “if the stack starts out with $x$ at the bottom, then after rot
, $x$ will be (rotated) to the top.”
The second rule says “if we don’t know anything about the stack, then after rot
, we generate a new symbolic value $b_i$ to place at the top of the stack.”
In the third rule, why is there an $x$ at the top of the stack? And why does it disappear afterwards? The reason is that $\cdots x$ might actually be a one-value stack $x$; remember that we don’t know anything about the size of $\cdots$. So $x$ is at both the bottom and top of the stack! If we didn’t delete $x$ and instead set the symbolic stack to $\cdots x b_i$ afterwards, we might mistakenly think the stack contains at least two values now.
We say that a program is boring if any of the following are true:
There’s no reason there couldn’t be more or fewer rules; this is just what worked OK in testing.
We say that a program is confusing if it will definitely underflow the stack. We can only approximate this, but we currently look at the top of the stack and check whether
i + j + 2 > {\tt SEED\_LEN}
where $i$ is the highest index of the symbolic $t_i$ value we see, and $j$ is the highest index of the symbolic $b_j$ values we see. If we see $t_i$ that means at most $i+1$ values were consumed from the top (and likewise with $b_j$ and the bottom), so $i + j + 2$ gives us an estimate of the number of values consumed from the stack.
There is also no particular reason for this rule other than that it works OK.
The reason I introduced it is because I noticed that because programs which underflow the stack tend to make for confusing number sequences.
pop
on an empty stack evaluates to $0$, so you can get number sequences which suddenly seem to change rules some number of terms in, after the stack has had a chance to grow.
It’s also a little bit silly because we could track more information about the symbolic stack and avoid having to guess.
The static analysis of this program illustrates many of the current system’s shortcomings:
Program: rot * unrot 2 +
(c6d24
)
Analysis: $\{t_1 \; 2 \; +\}$
Stack | Ops |
---|---|
$\cdots$ | rot * unrot 2 + |
$\cdots \quad \{b_0\}$ | * unrot 2 + |
$\cdots \quad \{t_0 \; b_0 \; *\}$ | unrot 2 + |
$\{t_0 \; b_0 \; *\} \quad \cdots$ | 2 + |
$\{t_0 \; b_0 \; * \} \quad \cdots \quad \{2\}$ | + |
$\cdots \quad \{t_1 \; 2 \; + \}$ |
Example number sequences:
Because the size of the stack $\cdots$ is unknown, when we see the final +
we have to trash the bottom of the stack, viz. $\{t_0 \; b_0 \; * \}$.
Even if the stack size is unknown, we could use a set of two symbolic expressions, {t₀ b₀ * 2 +, t₁ 2 +}
, to represent the value at the top of the stack (although this would make the analyses of other programs noisier).
The analysis is still valid; the next entry is always t₁ 2 +
, insofar as t₁ just means the second item popped off the top of the stack.
We should arguably detect that this program is confusing because its behavior differs depending on whether the stack is initialized with two vs. three values. We already try to reject programs that underflow the stack, but (again) because the analysis currently doesn’t make any assumptions about the stack size, we approximate this by just looking at the subscripts of $t_i$ and $b_j$ values in the final expression.
bgtz
results in two symbolic stacks, which we merge.
This can create horrifying symbolic stacks.
At the default difficulty
of 5 you will very rarely see programs with bgtz
s that actually do anything.
The symbolic stack for the Collatz program is pretty, though:\cdots \{t_0 \; 3 * 1 \; +, t_0 \; 2 \; /\}
eval
for templates is horrible, but it definitely saved a lot of time.
It’s cool that browsers are hardened to the point where I’m fine with it.JSON.stringify
as a hack for doing structural comparisons (when you can control key ordering) is good for demos.
But it really makes you miss BEAM values.<details>
elements.
Having to manually restore whether they are open or not after I re-render is definitely kind of annoying.
At least half of the benefit would have been just JSX, even without React.There’s a lot more I could write about but I figure very few people will read this far anyways. If you did, you’re amazing and I appreciate you!
This was a side project to distract from actual work so I don’t know how much more time I can spend on it, unfortunately.
The source code for the demo (including program execution, analysis, and generation) is here. I’ve licensed it under the BSD 3-clause license. Copyright acknowledgments for libraries used are here.
Lastly, I’m writing this blog post on the third anniversary of the unexpected death of my dad in the year he was planning to retire. I’m embarrassed to foist unsolicited advice upon others, but the anniversary reminds me to think about whether I’d regret it if today were the day I died. If this is my last day and I never retire, I’d prefer to save my energy for the many good people out there.
A helpful Hacker News commenter pointed out that in an earlier version of this post I managed to write “31” instead of “13” twice, which is ironic.
Now I’m also unsure whether the house actually was 3147.
Then the pattern would have been “add the first term to the last term” (unrot +
) instead of the Fibonacci sequence (+
).
That’s what I get for writing at 3am.
You shall not wrong a sojourner or oppress him, for you were sojourners in the land of Egypt.
— Exodus 22:21, ESV
You shall treat the alien who resides with you no differently than the natives born among you; you shall love the alien as yourself; for you too were once aliens in the land of Egypt. I, the LORD, am your God.
— Leviticus 19:34, NAB
Photographs by Joe Raedle, Alex Wong, and Scott Olson for Getty Images, and Brian Snyder for Reuters.
There’s nothing particularly complex about this, but a few things surprised me along the way so I figured I’d write up some notes.
To generate a random $n$-dimensional unit vector, first generate a vector where each entry is a random sample from a normal distribution then normalize it to a unit vector. Why does this work? I’ll paraphrase a very helpful comment by mindoftea on StackOverflow:
The probability of a point being at a given $[x, y]$ is $P(x) \times P(y)$. The Gaussian distribution has roughly the form $\exp(-x^2)$, so $\exp(-x^2) \times \exp(-y^2)$ is $$\exp(-(x^2+y^2))$$ That is a function only of the distance of the point from the origin, so the resulting distribution is radially symmetric. This generalizes easily to higher dimensions.
Here’s a “visual” version of the algorithm for people familiar with computer graphics, inspired by a friend’s comments:
Observe that an $n$-dimensional Gaussian is radially symmetric around the origin (consider a Bell curve or a Gaussian splat). The radial symmetry means that if you squeeze the probability density function (pdf) onto the unit $n$-sphere you’ll end up with a uniform density. Just make sure to only move probabilities directly towards or away from the origin, which corresponds to only scaling points by scalars.
To generate the $n$-dimensional vector, note that Gaussians are separable, i.e. the $n$-dimensional Gaussian’s pdf is the product of $n$ independent $1$-dimensional Gaussian pdfs.
Here’s an Elixir Nx function that implements the algorithm.
import Nx.Defn
defn random_unit_vector(key, opts) do
import Nx
alias Nx.
case Nx.shape(key) do
-> :ok
_ -> raise
end
dim = opts[:dim]
vectorized_axes = key.vectorized_axes
key_split = Random.split(key, parts: dim)
axis = :random_unit_vector_key
mean = 0
stdev = 1
v =
key_split
|> vectorize(axis)
|> Random.normal_split(mean, stdev)
|> revectorize(vectorized_axes, target_shape: , target_names: [axis])
n = LinAlg.norm(v, axes: [axis], ord: 2)
u =
(v / n)
|> Nx.rename([nil])
end
end
You use it like so:
key = Nx.Random.key(37)
# One random 3-dimensional unit vector
= Math.random_unit_vector(key, dim: 3)
# 8 random 2-dimensional unit vectors, vectorized along the
# along the :key axis
=
Nx.Random.split(key, parts: 8)
|> Nx.vectorize(:key)
|> Math.random_unit_vector(dim: 2)
I ran into a few gotchas while writing this.
Nx.Random.split
and number argumentsFirst I hit an error changing def
to defn
.
It turns out that that’s because Nx.Random.split
expects the value for parts
to be a regular Elixir/BEAM number, not a tensor.
But defn
automatically converts all of its number arguments to tensors:
When numbers are given as arguments, they are always immediately converted to tensors on invocation. If you want to keep numbers as is or if you want to pass any other value to numerical definitions, they must be given as keyword lists.
Hence the dim = opts[:dim]
.
Next I noticed that my function worked with single keys, but failed when I passed a vectorized tensor of multiple keys (to generate multiple random unit vectors).
That’s because initially I called devectorize
to remove the temporary vectorization axis
I introduced for generating one random value per dimension:
axis = :random_unit_vector_key
v =
key
|> Random.split(parts: dim)
|> vectorize(axis)
|> Random.normal_split(0, 1)
|> devectorize()
That works fine when the caller gives us a scalar key
, but when the caller gives us a vectorized scalar for key
, devectorize
will not only remove my temporary vectorization axis
but also any of the caller’s axes!
revectorize
lets us restore the original vectorized_axes
while devectorizing the temporary axis
.
For a while I didn’t realize that that’s why I was getting just a bunch of non-unit vectors.
The problem was that LinAlg.norm(v)
was happily computing the summed norm of all of the vectors instead of giving me one norm per each vector!
Then the final v / n
was dividing all of the vectors by the same scalar.
I tried using revectorize(vectorized_axes ++ [foo: :auto])
but that actually does nothing.
It just renames axis
to foo
.
The documentation has this identity for revectorize
which didn’t really help me that much, because it mentions names like vectorized_sizes
which it doesn’t reference subsequently:
assert revectorize(tensor, target_axes,
target_shape: target_shape,
target_names: target_names
) =
tensor
|> Nx.devectorize(keep_names: false)
|> Nx.reshape(vectorized_sizes ++ target_shape, names: target_names)
|> Nx.vectorize(vectorized_names)
It turns out vectorized_sizes
and vectorized_names
are the keys and values of target_axes
, e.g.
target_axes = [foo: 1, bar: 2]
vectorized_sizes =
vectorized_names = [:foo, :bar]
Personally, these identities help me more:
tensor = # ...
vectorized_axes = tensor.vectorized_axes
assert tensor = revectorize(tensor, vectorized_axes)
assert tensor =
tensor
|> vectorize(:foo)
|> revectorize(vectorized_axes,
target_shape: Nx.shape(tensor),
target_names: Nx.names(tensor)
)
assert tensor |> vectorize(:foo) = revectorize(tensor, vectorized_axes ++ [foo: :auto])
The ```math
fenced math code block syntax is an alternative to the traditional $$
or \[
syntax for \begin{display}
supported by GitHub.
I learned about it today while working on a small (~200 line) shell script to serve Markdown files as live-reloading webpages w/ KaTeX support: markd.
To use the LaTeX syntax highlighter for triple-backtick ```math
fenced code blocks in Neovim, place this at ~/.config/nvim/after/queries/markdown/injections.scm
:
; extends
; Use :Inspect, :InspectTree, and :EditQuery
; Highlight ```math fenced code blocks using the LaTeX highlighter.
I use the following to render KaTeX on this blog and in markd:
Note that this assumes your Markdown to HTML compiler will output ```math
blocks as <pre><code class="language-math">
.
That’s what cmark does.
A golden oldie from Rich Lowry at the National Review in 2008:
I’m sure I’m not the only male in America who, when Palin dropped her first wink, sat up a little straighter on the couch and said, “Hey, I think she just winked at me.”
👀
Photos were not syncing from my Mac to my other devices. On my Mac, the number of photos it was aware of (displayed at the bottom of the main screen) was different from the number of photos displayed in Photos on my iPhone. Both said “Last synced X minutes ago”, where X was some small number, so it seemed like it wasn’t even aware that the two were out of sync! Bizarrely, new photos from my iPhone would still show up on my Mac.
A few things I tried didn’t work: I waited a few days for this to resolve itself and tried checking/unchecking “iCloud Photos” in the “iCloud” section of Settings in Photos on my Mac.
Finally I read about the Photos Repair tool:
To get to the Photos Repair Library tool on your Mac, follow these steps:
- If Photos is open, close the app. Then, while you click to open Photos, hold down the Command and Option keys at the same time.
- In the window that opens, click Repair to start the repair process. You might be asked to enter your user account password.
This fixed it, but it’s pretty lame that I had to do this. It doesn’t inspire much confidence to learn that Apple’s synchronization system can be unaware that it is out of sync.
UPDATE: I’m not sure if it’s related, but today I started running out of space on my computer.
I narrowed it down to ~/Library/Caches/CloudKit/comp.apple.bird
:
|
# 309G /System/Volumes/Data/Users/jyc/Library/Caches/CloudKit/com.apple.bird
# 309G /System/Volumes/Data/Users/jyc/Library/Caches/CloudKit/com.apple.bird/3593ccf6c2fd2e7e5212117d4c22421ae134a9a7
# 309G /System/Volumes/Data/Users/jyc/Library/Caches/CloudKit/com.apple.bird/3593ccf6c2fd2e7e5212117d4c22421ae134a9a7/MMCS
# 309G /System/Volumes/Data/Users/jyc/Library/Caches/CloudKit/com.apple.bird/3593ccf6c2fd2e7e5212117d4c22421ae134a9a7/MMCS/ClonedFiles
# 310G /System/Volumes/Data/Users/jyc/Library/Caches/CloudKit
# 336G /System/Volumes/Data/Users/jyc/Library/Caches
It’s seems like this shouldn’t be related to Photos because “Optimize Mac Storage” is set in Photos’ iCloud setings: And looking at the contents of that folder, the files appear to be a mix of random data from iCloud: ZIP files, HTML files, and some images, but no images from my Photos Library. Strange.
Countries in dark blue grant jus soli without restrictions; all other countries require at least one parent to have citizenship or residency.
Jus soli is the predominant rule in the Americas…
Almost all states in Europe, Asia, Africa and Oceania grant nationality at birth based upon the principle of jus sanguinis (“right of blood”), in which nationality is inherited through parents rather than birthplace, or a restricted version of jus soli in which nationality by birthplace is automatic only for the children of certain immigrants.
The alternative, jus sanguinis (by blood) citizenship, is in my opinion an abomination.
From the Wikipedia article “Jus soli.”
I'm computing the week number by dividing the number of days we are into the year by 7. This gives a different week number from ISO 8601. Suits are ordered diamonds, clubs, hearts, spades (like Big Two, unlike Poker) so that red and black alternate. On leap years there are 366 days in the year; the card for the 366th day is the white joker. Karl Palmen has proposed a different encoding.