Welcome to my website!
I’m Jonathan Y. Chan (jyc
, jonathanyc
, 陳樂恩, or 은총), a
🐩 Yeti fan,
🇺🇸 American,
and 🐻 Californian,
living in 🌁 San Francisco: the most beautiful city in the greatest country in the world.
My mom is from Korea and my dad was from Hong Kong.
I am a Christian.
I’ve worked on:
VLOOKUP
s on billions of rows;
Parlan was a spreadsheet with an interface and formula language that looked just like Excel. Under the hood, it compiled formulas to SQL then evaluated them like Spark RDDs. Alas, a former manager’s prophecy about why startups fail proved prescient…
I also helped out with things like trees, road markings, paths, and lines of latitude!
… including copy-paste,
a high-fidelity PDF exporter,
text layout,
scene graph code(gen),
and putting fig-foot
in your .fig
files—while
deleting more code than I added!
Here’s the copy-icloud-photos
script I use to backup my photos stored on iCloud to my Synology NAS:
#!/bin/bash
args=(
--delete
--human-readable
--no-perms
--partial
--progress
--times
-v
)
src="/Users/jyc/Pictures/Photos Library.photoslibrary/originals/"
|
I added find
recently because it’s annoying to accidentally backup temporary photos, like screenshots, that only live in my iPhone’s Camera Roll for a minute or so before I delete them.
I have launchd run that script daily using a configuration plist at ~/Library/LaunchAgents/jyc.copy-icloud-photos.service
:
Disabled
Label
copy-icloud-photos
ProgramArguments
/usr/local/bin/fdautil
exec
/Users/jyc/bin/copy-icloud-photos
StandardErrorPath
/tmp/copy-icloud-photos.err
StandardOutPath
/tmp/copy-icloud-photos.out
StartInterval
86400
I set it up via LaunchControl, which is a third-party shareware GUI for launchd that also provides the fdautil
wrapper script that makes it possible for the copy-icloud-photos
script to have full disk access.
I think it’s possible to get this to work without LaunchControl but I haven’t tried.
Unfortunately a big caveat is that this will back up recently deleted photos until they are truly deleted by iCloud.
Here’s some lists filenames of non-deleted non-hidden photos under ~/Pictures/Photos Library.photoslibrary/originals/
when run on the database at ../Photos.sqlite
:
select substr(ZFILENAME, 1, 1) || '/' || ZFILENAME
from ZASSET
where ZTRASHEDSTATE = 1 and ZHIDDEN = 0;
… but even when I grant bash
, copy-icloud-photos
, and sqlite3
Full Disk Access in System Settings > Privacy & Security, I can’t get it to work.
I thought I might just need to grant my script Photos access as well, but that doesn’t work.
Maybe Apple really is trying to block all programmatic access except through PhotoKit.
I am Culgi, who has been chosen by Inana for his attractiveness.
…
Because I am a powerful man who enjoys using his thighs, I, Culgi, the mighty king, superior to all, strengthened the roads, put in order the highways of the Land.
…
So that my name should be established for distant days and never fall into oblivion, so that my praise should be uttered throughout the Land, and my glory should be proclaimed in the foreign lands, I, the fast runner, summoned my strength and, to prove my speed, my heart prompted me to make a return journey from Nibru to brick-built Urim as if it were only the distance of a double-hour.
— A praise poem of Shulgi (Shulgi A)
Sumerians didn’t skip leg day or cardio.
Previously: One-Paragraph Reviews, Vol. I
I didn’t manage to stick to the one-paragraph format this time. I’m trying to write down:
… but (1) can be a lot of stuff because the things I’m reading about are generally things on which I’m not an expert! I’ll try moving stuff that isn’t related to the main point into footnotes to cheat. If the trend continues, though, I’ll have to think of how to make things more concise…
“Efficient Natural Language Response Suggestion for Smart Reply” is a paper by Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun-hsuan Sung, Laszlo Lukacs, Ruiqi Guo, Sanjiv Kumar, Balint Miklos, and Ray Kurzweil (2017) on the algorithm behind Google’s pre-LLM1 “Smart Reply” feature, which suggests short replies like “I think it’s fine” or “It needs some work.”
The authors train a model composed of two neural network “towers”, one for the input email and one for the reply: each takes a vector representing an email, encoded as the sum2 of the n-gram embeddings of its words. The model learns to computes two vectors, $h_x$ for input emails and $h_y$ for response emails, such that $P(y|x) = h_x \cdot h_y$ is the probability that an email $y$ is the reply to an email $x$.
There are a few post-processing steps:
These days, you might use someone else’s text embedding model for $h_x$ and $h_y$, but you’d still need the post-processing steps; you would also need some transformation from input vectors to reply vectors so that $h_x \cdot h_y$ represents “$h_y$ is a reply to $h_x$” rather than just “$h_y$ is similar to $h_x$.” I wonder if LLMs might become cheap enough that $P_{\text{LM}}$ becomes all you need, similar to how spellchecking used to be an engineering feat but is now “3-5 lines of Python.”
Seq2Seq, the direct ancestor of the current generation of GPT-style LLMs, already existed at the time, but the authors wanted something more efficient.
C.f. the sinusoidal or learned positional encoding used in many current models. Sinusoidal positional encodings have a vaguely geometric interpretation: a word/token at a given position in a sentence is the token’s embedding vector with a translation applied, such that the distance between the translation applied to tokens at two positions is “symmetrical and decays nicely with time”.
They learn a “hierarchical quantization” for each vector such that $h_y \approx \text{VQ}(h_y) + \text{R}^T \text{PQ}(r_y)$, where $\text{VQ}$ is a vector quantization, $\text{R}$ is a rotation, and $\text{PQ}$ is a product quantization (the Cartesian product of $\mathcal K$ independent vector quantizers). Vector quantization just means expressing a $a$-dimensional vector as a linear combination of $b$ other vectors (the “codebook”); compression comes from $b < a$. It feels vaguely reminiscent of $k$-means, which predicts the output for a given input using the $k$ nearest input vectors.
The iteration order for Elixir maps is not just “undefined” in the sense that there is some order at runtime which you don’t know. Different functions that take maps can also iterate over the map in different orders!
Lists have the iteration order you’d expect:
range = 1..32
Enum.map(range, fn a -> a end)
Enum.zip_with(range, range, fn a, b -> end)
# [1, 2, 3, ...]
# [{1, 1}, {2, 2}, {3, 3}, ...]
… and so do maps with 32 or fewer entries:
range = 1..32
map = Enum.map(range, &) |> Enum.into(%)
IO.inspect(Enum.map(map, fn -> k end))
IO.inspect(Enum.zip_with(range, map, fn _, -> k end))
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
# 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
# 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]
… but add one entry to a map and the pattern breaks:
range = 1..33
# ...
# [4, 25, 8, 1, 23, 10, 7, 9, 11, 12, 28, 24, 13, 3, 18, 29, 26, 22, 19, 2, 33,
# 21, 32, 20, 17, 30, 14, 5, 6, 27, 16, 31, 15]
# [15, 31, 16, 27, 6, 5, 14, 30, 17, 20, 32, 21, 33, 2, 19, 22, 26, 29, 18, 3,
# 13, 24, 28, 12, 11, 9, 7, 10, 23, 1, 8, 25, 4]
Enum.zip_with
happens to enumerate over the entries of a map in opposite order from Enum.map
!
I think it’s especially funny that this behavior only manifests for maps with more than 32 elements! It reminds me of this plotline (no spoilers) from Cixin Liu’s mind-blowing Remembrance of Earth’s Past trilogy:
“These high-energy particle accelerators raised the amount of energy available for colliding particles by an order of magnitude, to a level never before achieved by the human race. Yet, with the new equipment, the same particles, the same energy levels, and the same experimental parameters would yield different results. Not only the results would vary if different accelerators were used, but even with the same accelerator, experiments performed at different times would give different results. Physicists panicked. …”
“What does this mean? Wang asked. …
“It means that the laws of physics are not invariant across time and space.”
On a less dramatic note, it reminds me of the Borwein integrals discovered by David Borwein and Jonathan Borwein in 2001:
$$ \int_0^\infty \frac{\sin(x)}{x} dx = \frac{\pi}{2} $$ $$ \int_0^\infty \frac{\sin(x)}{x} \frac{\sin(x/3)}{x/3} dx = \frac{\pi}{2} $$ $$ \int_0^\infty \frac{\sin(x)}{x} \frac{\sin(x/3)}{x/3} \cdots \frac{\sin(x/13)}{x/13} dx = \frac{\pi}{2} $$ $$ \int_0^\infty \frac{\sin(x)}{x} \frac{\sin(x/3)}{x/3} \cdots \frac{\sin(x/15)}{x/15} dx = \frac{\pi}{2} - 2.32 \times 10^{-11} $$
It’s interesting to think about the different kinds of behavior which you can’t know ahead-of-time. Suppose I roll some dice inside of a closed box.
I’d assumed that emoji were all organized into a contiguous Unicode codepoint range, but this is very much not the case!
There are more than a thousand different ranges containing emoji.
The Unicode consortium makes the complete list available as a file, emoji-data.txt
.
Here are a few lines:
25FB..25FE ; Emoji # E0.6 [4] (◻️..◾) white medium square..black medium-small square
2600..2601 ; Emoji # E0.6 [2] (☀️..☁️) sun..cloud
2602..2603 ; Emoji # E0.7 [2] (☂️..☃️) umbrella..snowman
2604 ; Emoji # E1.0 [1] (☄️) comet
I wanted to convert this list into the form U+25FB-25FE,U+2600-2601,...
for use with the kitty terminal’s symbol_map
configuration option.
I wrote some shell to convert it into that format:
| | | | | |
Some pretty big caveats:
emoji-data.txt
also contains ASCII codepoints like #
(0x23
), *
(0x2a
), and 0
-9
(0x30-0x39
)!
Depending on your usecase you might want to remove these.man lifting weights
) is composed from three codepoints: person lifting weights
+ zero width joiner
+ male sign
.
All three codepoints are listed separately in emoji-data.txt
.Going to try and see if this format helps me get through the backlog of reviews I’ve been meaning to write. The schema I’ll try is: (1) why it’s interesting (2) the most interesting insight.
… and that’s it for now!
The denomination of which I am a member, the Presbyterian Church (USA), considers itself to belong to the Reformed/Calvinist tradition.
Mr. Ishiguro won the Nobel Prize in Literature in 2017, so unfortunately I can’t say I read him before he was cool.
You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture.
— OpenAI’s system prompt for ChatGPT
Imagine you are an experienced Ethereum developer tasked with creating a smart contract for a blockchain messenger.
— a ChatGPT prompt found on the web
Peter Watts predicted LLM prompts in Blindsight (2006) and Echopraxia (2014):
Imagine you are Siri Keeton:
You wake in an agony of resurrection, gasping after a record-shattering bout of sleep apnea spanning one hundred forty days.
“Something’s coming,” she said at last. “Maybe not Siri.”
“Why do you say that?”
“It just sounds wrong the way it talks there are these tics in the speech pattern it keeps saying Imagine you’re this and Imagine you’re that and it sounds so recursive sometimes it sounds like it’s trying to run some kind of model…”
Imagine you’re Siri Keeton, he remembered. And gleaned from a later excerpt of the same signal: Imagine you’re a machine.
“It’s a literary affectation. He’s trying to be poetic. Putting yourself in the character’s head, that kind of thing.”
Received this official-looking document in the mail by virtue of having my address associated with my failed startup. If you look at the fine print you’ll notice it’s not actually from the government. It’s from a scam company called “Corporate Processing Service” that is generously offering to file a form for you for \$243.
The state only charges you \$25 and has an online form. See “Misleading Statement of Information Solicitations” on the California Secretary of State’s website.
Just tried to print out a PDF and it’s been an adventure!
All of them render it properly on my computer though.
The PDF is “Compiler and Runtime Support for Continuation Marks” (Flatt & Dybvig, 2020).
If a function is only called from a single place, consider inlining it. – John Carmack
From John Carmack on Inlined Code on Jonathan Blow’s blog.
I'm computing the week number by dividing the number of days we are into the year by 7. This gives a different week number from ISO 8601. Suits are ordered diamonds, clubs, hearts, spades (like Big Two, unlike Poker) so that red and black alternate. On leap years there are 366 days in the year; the card for the 366th day is the white joker. Karl Palmen has proposed a different encoding.