SimplePie: Demo

Or try one of the following: 詹姆斯.com, adult swim, Afterdawn, Ajaxian, Andy Budd, Ask a Ninja, AtomEnabled.org, BBC News, BBC Arabic, BBC China, BBC Russia, Brent Simmons, Channel Frederator, CNN, Digg, Diggnation, Flickr, Google News, Google Video, Harvard Law, Hebrew Language, InfoWorld, iTunes, Japanese Language, Korean Language, mir.aculo.us, Movie Trailers, Newspond, Nick Bradbury, OK/Cancel, OS News, Phil Ringnalda, Photoshop Videocast, reddit, Romanian Language, Russian Language, Ryan Parman, Traditional Chinese Language, Technorati, Tim Bray, TUAW, TVgasm, UNEASYsilence, Web 2.0 Show, Windows Vista Blog, XKCD, Yahoo! News, You Tube, Zeldman

ongoing fragmented essay by Tim Bray

Bye, Google Search 1 Nov 2025, 7:00 pm

For this blog, I mean. Which used to have a little search window in the corner of the banner just above that looked up blog fragments using Google. No longer; I wired in Pagefind; click on the magnifying glass up there on the right to try it out. Go ahead, do it now, I’ll wait before getting into the details.

The problem

Well, I mean, Google is definitely Part Of The Problem in advertising, surveillance, and Internet search. But the problem I’m talking about is that it just couldn’t find my pages, even when I knew they were here and knew words that should find them.

Either it dropped the entries from the index or dropped a bunch of search terms. Don’t know and don’t care now. ongoing is my outboard memory and I search it all the freaking time. This failure mode was making me crazy.

Pagefind

Tl;dr: I downloaded it and installed it and it Just Worked out of the box. I’d describe the look and feel but that’d be a waste of time since you just tried it out. It’s fast enough and doesn’t seem to miss anything and has a decent user interface.

How it works

They advertise “fully static search library”, which I assumed meant it’s designed to work against sites like this one composed of static files. And it is, but there’s more to it than that; read on.

First, you point a Node program at the root of your static-files tree and stand back. My tree has a bit over 5,000 files containing about 2½ million words, adding up to a bit over 20M of text. By default, it assumes you’re indexing HTML and includes all the text inside each page’s <body> element.

You have to provide a glob argument to match the files you want to index; in most cases, something like root/**/*.html would do the trick. Working this out was for me the hardest part because among other things my articles don’t end with .html; maybe it’ll be helpful for some to note that what worked for ongoing was:
When/???x/????/??/??/[a-zA-Z0-9]<[\-_a-zA-Z0-9]:>

This produced an index organized into about 11K files adding up to about 78M. It includes a directory with one file per HTML page being searched.

I’d assumed I’d have to wire this up to my Web server somehow, but no: It’s all done in the client by fetching little bits and pieces of the index using ordinary HTTP GETs. For example, I ran a search for the word “minimal”, which resulted in my browser fetching a total of seven files totaling about 140K. That’s what they mean by “static”; not just the data, but the index too.

Finally, I noticed a couple of WASM files, so I had to check out the source code and, sure enough, this is basically a Rust app. Again I’m impressed. I hope that slick modern Rust/WASM code isn’t offended by me rubbing it up against this blog’s messy old Perl/Ruby/raw-JS/XML tangle.

Scalable?

Interesting question. For the purposes of this blog, Pagefind is ideal. But, indexing my 2½ million words burned a solid minute of CPU on the feeble VPS that hosts ongoing. I wonder if the elapsed time is linear in the data size, but it wouldn’t surprise me if it were worse. Furthermore, the index occupies more storage than the underlying data, which might be a problem for some.

Also, what happens when I do a search while the indexing is in progress? Just to be sure, I think I’ll wire it up to build the index in a new directory and switch indices as atomically as possible.

Finally, I think that if you wanted to sustain a lot of searches per second, you’d really want to get behind a CDN, which would make all that static index fetching really fly.

Configuring

The default look-and-feel was mostly OK by me, but the changes I had to make did involve quality time with the inspector, figuring out the class and ID complexities and then iterating the CSS.

The one thing that in the rear-view seems unnecessary is that I had to add a data-pagefind-meta attribute to the element at the very bottom of the page where the date is to include it in the result list. There should be a way to do this without custom markup. John Siracusa filed a related bug.

Deployment

There’s hardly any work. I’ll re-run the indexer every day with a crontab entry and it looks it should just take care of itself.

To do?

Well, I could beautify the output some more but I’m pretty happy with it after just a little work. I can customize the sort order, which I gather is in descending order of how significant Pagefind thinks the match is. There’s a temptation to sort it in reverse date order. Actually, apparently I can also influence the significance algorithm. Anyhow I’ll run with mostly-defaults for now.

Search options

I notice that the software is pretty good at, and aggressive about, matching across verb forms and singular/plural and prefixes. Which I guess is what you want? You can apparently defeat that by enclosing a word in quotes if you want it matched exactly. Works for phrases too. I wonder what other goodies are in there; couldn’t find any docs on that subject.

Finally, there’s an excellent feature set I’ll never use; it’s smart about lots of languages. But alas, I write monolingually.

Shameful cleanup

Like I said, getting Pagefind installed and working was easy. Getting the CSS tuned up was a bit more effort. But I have to confess that I put hours and hours into hiding my dirty secrets.

You see, ongoing contains way more writing than you or Google can see. It’s set up so I can “semi-publish” pieces; there but unlinked. There was a whole lot of this kind of stuff: Photo albums from social events, pitches to employers about why they should hire various people including me, rants at employers for example about why Solaris should adopt the Linux userland (I was right) and why Android should include a Python SDK (I was right), and pieces that employer PR groups convinced me to bury. One of my minor regrets about no longer being employed is I no longer get to exercise my mad PR-group-wrangling skillz.

But when your search software is just walking the file tree, it doesn’t know what’s “published” and what’s not. I ended up using my rusty shell muscles with xarg and sed and awk and even an ed(1) script. I think I got it all, but who knows, search hard enough and you might find something embarrassing. If you do, I’d sure appreciate an email.

Thanks!

To the folks who built this. Seems like a good thing.

Grokipedia 28 Oct 2025, 7:00 pm

Last night I had a very strange experience: About two thirds of the way through reading a Web page about myself, Tim Bray, I succumbed to boredom and killed the tab. Thus my introduction to Grokipedia. Here are early impressions.

On Bray

My Grokipedia entry has over seven thousand words, compared to a mere 1,300 in my Wikipedia article. It’s pretty clear how it was generated; an LLM, trained on who-knows-what but definitely including that Wikipedia article and this blog, was told to go nuts.

Speaking as a leading but highly biased expert on the subject of T. Bray, here are the key take-aways:

(Overly) complete

It covers all the territory; there is no phase of my life’s activity that could possibly be encountered in combing the Web that is not exhaustively covered. In theory this should be good but in fact, who cares about the details of what I worked on at Sun Microsystems between 2004 and 2010? I suppose I should but, like I said, I couldn’t force myself to plod all the way through it.

Wrong

Every paragraph contains significant errors. Sometimes the text is explicitly self-contradictory on the face of it, sometimes the mistakes are subtle enough that only I would spot them.

Style

The writing has that LLM view-from-nowhere flat-affect semi-academic flavor. I don’t like it but the evidence suggests that some people do?

References

All the references are just URLs and at least some of them entirely fail to support the text. Here’s an example. In discussion of my expert-witness work for the FTC in their litigation against Meta concerning its acquisitions of Instagram and WhatsApp, Grokipedia says:

[Bray] opined that users' perceptions of response times in online services critically influence market dynamics.

It cites Federal Trade Commission’s Reply to Meta Platforms, Inc.’s Response to Federal Trade Commission’s Counterstatement of Material Facts (warning: 2,857-page PDF). Okay, that was one of the things I argued, but the 425 pages of court documents that I filed, and the references to my reporting in the monster document, make it clear that it was one tiny subset of the main argument.

Anyhow, I (so that you won’t have to) spent a solid fifteen minutes spelunking back and forth through that FTC doc, looking for strings like “response time” and “latency” and so on. Maybe somewhere in those pages there’s support for the claim quoted above, but I couldn’t find it.

Useful?

Wikipedia, in my mind, has two main purposes: A quick visit to find out the basics about some city or person or plant or whatever, or a deep-dive to find out what we really know about genetic linkages to autism or Bach’s relationship with Frederick the Great or whatever.

At the moment, Grokipedia doesn’t really serve either purpose very well. But, after all, this is release 0.1, maybe we should give it a chance.

Or, maybe not.

Woke/Anti-Woke

The whole point, one gathers, is to provide an antidote to Wikipedia’s alleged woke bias. So I dug into that. Let’s consider three examples of what I found. First, from that same paragraph about the FTC opinion quoted above:

While Bray and aligned progressives contend that such dominance stifles innovation by enabling predatory acquisitions and reduced rivalry—evidenced by fewer startup exits in concentrated sectors—counterarguments highlight that Big Tech's scale has fueled empirical gains, with these firms investing over $240 billion in U.S. R&D in 2024 (more than a quarter of national totals) and driving AI, cloud, and patent surges.[128] [131] Six tech industries alone accounted for over one-third of U.S. GDP growth from 2012–2021, comprising about 9% of the economy and sustaining 9.3 million jobs amid falling consumer prices and rapid technological diffusion. [132] [133] Right-leaning economists often defend consumer welfare metrics and market self-correction, warning that forced divestitures risk eroding the efficiencies and investment incentives that have propelled sector productivity above 6% annual growth in key areas like durable manufacturing tech. [134] [135]

I’ve linked the numbered citations to the indicated URLs. Maybe visit one or two of them and see what you think? Four are to articles arguing, basically, that monopolies must be OK because the companies accused of it are growing really fast and driving the economy. They seem mostly to be from right-wing think-tanks but I guess that’s what those think-tanks are for. One of them, #131, Big Tech and the US Digital-Military-Industrial Complex, I think isn’t helpful to the argument at all. But still, it’s broadly doing what they advertise: Pushing back against “woke” positions, in this case the position that monopolization is bad.

I looked at a couple of other examples. For example, this is from the header of the Greta Thunberg article:

While credited with elevating youth engagement on environmental issues, Thunberg's promotion of urgent, existential climate threats has drawn scrutiny for diverging from nuanced empirical assessments of climate risks and adaptation capacities, as well as for extending her activism into broader political arenas such as anti-capitalist and geopolitical protests.[5][6]

Somehow I feel no urge to click on those citation links.

If Ms Thunberg is out there on the “woke” end of the spectrum, let’s flit over to the other end, namely the entry for J.D. Vance, on the subject of his book Hillbilly Elegy.

Critics from progressive outlets, including Sarah Smarsh in her 2018 book Heartland, faulted the memoir for overemphasizing personal and cultural failings at the expense of structural economic policies, arguing it perpetuated stereotypes of rural whites as self-sabotaging.[71] These objections, often rooted in institutional analyses from academia and media, overlooked data on behavioral patterns like opioid dependency rates—peaking at 21.5 deaths per 100,000 in Appalachia around 2016—that aligned with Vance's observations of "deaths of despair" precursors.[72]

I read and enjoyed Heartland but the citation is to a New Yorker article that doesn’t mention Smarsh. As for the second sentence… my first reaction as I trudged through its many clauses, was “life’s too short”. But seriously, opioid-death statistics weaken the hypothesis about structural economic issues? Don’t get it.

Take-away

Wikipedia is, to quote myself, the encyclopedia that “anyone who’s willing to provide citations can edit”. Grokipedia is “the encyclopedia that Elon Musk’s LLM can edit, with sketchy citations and no progressive argument left un-attacked.”

So I guess it’s Working As Intended?

Recent Music 15 Oct 2025, 7:00 pm

There are musical seasons where I re-listen to the old faves, the kind of stuff you can read about in my half-year of “Song of the Day” essays from 2018. This autumn I find myself listening to new music by living people. Here’s some of it.

The musical influx is directly related to my adoption of Qobuz, whose weekly editors’-picks are always worth a look and have led me to more than half of the tunes in this post. Qobuz, like me, still believes in the album as a useful unit of music and thus I’ll cover a few of those. And live-performance YouTubes of course. You’ll spot a pattern: A lot of this stuff is African or African-adjacent with Euro angles and jazz flavors.

Ghana Downtown

The Kwashibu Area Band, founded in Accra, have been around for a few years and played in a few styles, sometimes as Pat Thomas’ band.

Title of Love Warrior’s Anthem by Kwashibu Area Band

What happened was, Qobuz offered up their recent Love Warrior’s Anthem and there isn’t a weak spot on it. Their record label says something about mixing Highlife and jazz; OK I guess. Here’s their YouTube channel but it doesn’t seem to have anything live from the Love-Warrior material. It isn’t often that I listen to an entire album end-to-end more than once.

Posted to Flickr by p_a_h, licensed under the Creative Commons Attribution 4.0 International license.

Loud Rude Brits

The New Eves are from Brighton and Wikipedia calls them “folk punk” which is weird because yeah, they’re loud and rude, but a lot of the instrumental sound is cello/violin/flute. Anyhow, check out Mother. I listened to most of their recent LP The New Eve Is Rising while driving around town and that’s really a lot of good and very intense music.

Rwanda Sings With Strings

That’s the title of the latest from “The Good Ones”, here it is on BandCamp. Adrien Kazigira and Janvier Havugimana are described as a “folk duo”; the songs are two-voice harmonies backed with swinging acoustic guitars. This record is just like the title says: They set up in a hotel room with a couple of string players and recorded these songs in a single take with no written music and no overdubs.

It’s awfully sweet stuff and while none of the lyrics are in English, they offer translations of the song titles, which include One Red Sunday, You Lied & Tried to Steal My Land, In the Hills of Nyarusange They Talk Too Much, and You Were Given a Dowry, But Abandoned Me. This music does so much with so little.

Rapper Piano

Alfa Mist was a rapper who went to music school and learned to play keyboards as an adult. The music’s straight-ahead Jazz but he still raps a bit here and there, it blends in nicely. If you get a chance to listen to an interview with him you should, if only for his voice; he’s from South London and of Ugandan heritage, which results in an accent like nothing I’ve ever heard before but makes me smile.

By Dirk Neven - Alfa Mist, Maassilo Rotterdam 20 November 2022 - Alfa Mist, CC0, (Wikimedia).

The problem with AM’s music is that’s it’s extremely smoooooooth, to the point that I thought of it as sort of pleasant-background stuff. Then I took in a YouTube of a live-in-studio session (maybe this one?) and realized that I was listening to extremely sophisticated soloing and ensemble playing that deserves to be foreground. But still sweet.

By World Trade Organization from Switzerland, cropped by User:HLHJ - Aid for Trade Global Review 2017 – Day 1, CC BY-SA 2.0, (Wikimedia).

Kora Magic

The Kora is that Gambian instrument with a gourd at the botton and dozens of strings. Sona Jobarteh, British-Gambian, plays Kora and guitar and sings beautifully and has a great band. Here she is at Jazz à Parquerolles.

And now for something completely different

Vanessa Wagner is a French classical pianist of whom I’d not heard. But Qobuz offered a new recording of Phil Glass’s Piano Etudes which, despite being a big fan, I’d never listened to. Here’s Etude No. 2, which is pretty nice, as is the whole recording; dreamy, shimmering stuff. I found myself leaning back with eyes closed.

It makes me happy

That there’s plenty of music out there that’s new and good.

Social Media Provenance Challenge 1 Oct 2025, 7:00 pm

At a a recent online conference, I said that we can “change the global Internet conversation for the better, by making it harder for liars to lie and easier for truth-tellers to be believed.” I was talking about media — images, video, audio. We can make it much easier to tell when media is faked and when it’s real. There’s work to do, but it’s straightforward stuff and we could get there soon. Here’s how.

The Nadia story

This is a vision of what success looks like.

Nadia lives in LA. She has a popular social-media account with a reputation for stylish pictures of urban life. She’s not terribly political, just a talented street photog. Her handle is “CoolTonesLA@hotpix.example”.

She’s in Venice Beach the afternoon of Sunday August 9, 2026, when federal agents take down a vendor selling cheap Asian ladies’ wear. She gets a great shot of an enforcer carrying away an armful of pretty dresses while two more bend the merchant over his countertop. None of the agents in the picture are in uniform, all are masked.

She signs into her “CoolTonesLA” account on hotpix.example and drafts a post saying “Feds raid Venice Beach”. When she uploads the picture, there’s a pop-up asking “Sign this image?” Nadia knows what this means, and selects “Yes”. By midnight her post has gone viral.

As a result of Nadia agreeing to “sign” the image, anyone who sees her post, whether in a browser or mobile app, also sees that little “Cr” badge in the photo’s top right corner. When they mouse over it, a little pop-up says something like:

Signature is valid.
Media was posted by @CoolTonesLA
on hotpix.example
at 5:40 PM PDT, August 9th, 2026.

The links point to Nadia’s feed and her instance’s home page. Following them can give the reader a feeling for what kind of person she is, the nature of her server, and the quality of her work. Most people are inclined to believe the photo is real.

Marco is a troublemaker. He grabs Nadia’s photo and posts it to his social-media account with the caption “Criminal illegals terrorize local business. Lock ’em up!” He’s not technical and doesn’t strip the metadata. Since the picture is already signed, he doesn’t get the “Sign this picture?” prompt. Anyone who sees his post will see the “Cr” badge and mousing over it makes it pretty clear that it isn’t what he says it is. Commenters gleefully point this out. By the time Marco takes the post down, his credibility is damaged.

Maggie is a more technical troublemaker. She sees Marco’s post and likes it, strips the picture’s metadata, and reposts it. When she gets the “Sign this picture?” prompt, she says “No”, so it doesn’t get a “Cr” badge. Hostile commenters accuse her of posting a fake, saying “LOL badge-free zone”. It is less likely that her post will go viral.

Miko isn’t political but thinks the photo would be more dramatic if she Photoshopped it to add a harsh dystopian lighting effect. When she reposts her version, the “Cr” badge won’t be there because the pixels have changed.

Morris follows Maggie. He grabs the stripped picture and, when he posts it, says “Yes” to signing. In his post the image will show up with the “Cr” and credit it to him, with a “posted” timestamp later than Nadia’s initial post. Now, the picture’s believability will depend on Morris’s. Does he have a credible track record? Also, there’s a chance that someone will notice what Morris did and point out that he stole Nadia’s picture.

(In fact, I wouldn’t be surprised if people ran programs against the social-network firehose looking for media signed by more than one account, which would be easy to detect.)

That’s the Nadia story.

How it’s done

The rest of this piece explains in some detail how the Nadia story can be supported by technology that already exists, with a few adjustments. If jargon like “PKIX” and “TLS” and “Nginx” is foreign to you, you’re unlikely to enjoy the following. Before you go, please consider: Do you think making the Nadia story come true would be a good investment?

I’m not a really deep expert on all the bits and pieces, so it’s possible that I’ve got something wrong. Therefore, this blog piece will be a living document in that I’ll correct any convincingly-reported errors, with the goal that it accurately describes a realistic technical roadmap to the Nadia story.

By this time I’ve posted enough times about C2PA that I’m going to assume people know what it is and how it works. For my long, thorough explainer, see On C2PA. Or, check out the Content Credentials Web site.

Tl;dr: C2PA is a list of assertions about a media object, stored in its metadata, with a digital signature that includes the assertions and the bits of the picture or video.

This discussion assumes the use of C2PA and also an in-progress specification from the Creator Assertions Working Group (CAWG) called Identity Assertion.

Not all the pieces are quite ready to support the Nadia story. But there’s a clear path forward to closing each gap.

“Sign this picture?”

C2PA and CAWG specify many assertions that you can make about a piece of media. For now let’s focus just on what we need for provenance. When the media is uploaded to a social-network service, there are two facts that the server knows, absolutely and unambiguously: Who uploaded it (because they’ve had to sign in) and when it happened.

In the current state of the specification drafts, “Who” is the cawg.social_media property from the draft Identity Assertion spec, section 8.1.2.5.1, and “When” is the c2pa.time-stamp property from the C2PA specification, section 18.17.3.

I think these two are all you need for a big improvement in social network media provenance, so let’s stick with them.

What key?

Let’s go back to the Nadia story. It needs the Who/When assertions to be digitally signed in a way that will convince a tech-savvy human or a PKIX validation library that the signature could only have been applied by the server at hotpix.example.

The C2PA people have been thinking about this. They are working on a Verified News Publishers List, to be maintained and managed by, uh, that’s not clear to me. The idea is that C2PA software would, when validating a digital signature, require that the PKIX cert is one of those on the Publishers List.

This isn’t going to work for a decentralized social network, which has tens of thousands of independent servers run by co-ops, academic departments, municipal governments, or just a gaggle of friends who kick in on Patreon. And anyhow, Fediverse instances don’t claim to be “News Publishers”, verified or not.

So what key can hotpix.example sign with? Fortunately, there’s already a keypair and PKIX certificate in place on every social-media server, the one it uses to support TLS connections. The one at tbray.org, that’s being used right now to protect your interaction with this blog, is in /etc/letsencrypt/live/ and the private key is obviously not generally readable.

That cert will contain the public key corresponding to the host’s private key, the cert's ancestry, and the host name. It’s all that any PKIX library needs to verify that yes, this could only have been signed by hotpix.example. However, there will be objections.

Objection: “hotpix.example is not a Verified News Publisher!” True enough, the C2PA validation libraries would have to accept X.509 certs. Maybe they do already? Maybe this requires an extension of the current specs? In any case, the software’s all open-source, could be forked if necessary.

Objection: “That cert was issued for the purpose of encrypting TLS connections, not for some weird photo provenance application. Look at the OID!” OK, but seriously, who cares? The math does what the math does, and it works.

Objection: “I have to be super-careful about protecting my private key and I don’t want to give a copy to the hippies running the social-media server.” I sympathize but, in most cases, social media is all that server’s doing.

Having said that, it would be great if there were extensions to Nginx and Apache httpd where you could request that they sign the assertions for you. Neither would be rocket science.

OK, so we sign Nadia’s Who/When assertions and her photo’s pixels with our host’s TLS key, and ship it off into the world. What’s next?

How to validate?

Verifying these assertions, in a Web or mobile app, is going to require a C2PA library to pick apart the assertions and a PKIX library for the signature check.

We already have c2pa-rs, Rust code with MIT and Apache licenses. Rust libraries can be called from some other programming languages but in the normal course of affairs I’d expect there soon to be native implementations. Once again, all these technologies are old as dirt, absolutely no rocket science required.

How about validating the signatures? I was initially puzzled about this one because, as a programmer, certs only come into the picture when I do something like http.Get() and the library takes care of all that stuff. So I can’t speak from experience.

But I think the infrastructure is there. Here’s a Curl blogger praising Apple SecTrust. Over on Android, there’s X509ExtendedTrustManager. I assume Windows has something. And if all else fails, you could just download a trusted-roots file from the Curl or Android projects and refresh it every week or two.

What am I missing?

This feels a little too easy, something that could be done in months not years. Perhaps I’m oversimplifying. Having said that, I think the most important thing to get right is the scenarios, so we know what effect we want to achieve.

What do you think of the Nadia story?

GenAI Predictions 26 Sep 2025, 7:00 pm

I’m going to take a big chance here and make predictions about GenAI’s future. Yeah, I know, you’re feeling overloaded on this stuff and me too, but it seems to have sucked the air out of all the other conversations. I would so like to return to arguing about Functional Programming or Free Trade. This is risky and there’s a pretty good chance that I’m completely wrong. But I’ll try to entertain while prognosticating.

Reverse Centaurs

That’s the title of a Cory Doctorow essay, which I think is spot on. I’m pretty sure anyone who’s read even this far would enjoy it and it’s not long, and it’d help understand this. Go have a look, I’ll wait.

Hallucinations won’t get fixed

I have one good and one excellent argument to support this prediction. Good first: While my understanding of LLMs is not that deep, it doesn’t have to be to understand that it’s really difficult (as in, we don’t know how) to connect the model’s machinations to our underlying reality, so as to fact-check.

The above is my non-expert intuition at work. But then there’s Why Language Models Hallucinate, three authors from OpenAI and one from Georgia Tech, which seems to show that hallucinations are an inevitable result of current training practices.

And here’s the excellent argument: If there were a way to eliminate the hallucinations, somebody already would have. An army of smart, experienced people, backed by effectively infinite funds, have been hunting this white whale for years now without much success. My conclusion is, don’t hold your breath waiting.

Maybe there’ll be a surprise breakthrough next Tuesday. Could happen, but I’d be really surprised.

(When it comes to LLMs and code, the picture is different; see below.)

The mass layoffs won’t happen

The central goal of GenAI is the elimination of tens of millions of knowledge workers. That’s the only path to the profits that can cover the costs of training and running those models.

To support this scenario the AI has to run in Cory’s “reverse centaur” mode, where the models do the work and the humans tend them. This allows the production of several times more work per human, generally of lower quality, with inevitable hallucinations. There are two problems here: First, that at least some of the output is workslop, whose cleanup costs eat away at the productivity wins. Second, that the lower quality hurts your customers and your business goes downhill.

I just don’t see it. Yeah, I know, every CEO is being told that this will work and they’ll be heroes to their shareholders. But the data we have so far keeps refusing to support those productivity claims.

OK then, remove the “reverse” and run in centaur mode, where smart humans use AI tools judiciously to improve productivity and quality. Which might be a good idea for some people in some jobs. But in that scenario neither the output boost nor the quality gain get you to where you can dismiss enough millions of knowledge workers to afford the AI bills.

The financial damage will be huge

Back to Cory, with The real (economic) AI apocalypse is nigh. It’s good, well worth reading, but at this point pretty well conventional wisdom as seen by everyone who isn’t either peddling a GenAI product or (especially) fundraising to build one.

To pile on a bit, I’m seeing things every week like for example this: The AI boom is unsustainable unless tech spending goes ‘parabolic,’ Deutsche Bank warns: ‘This is highly unlikely’.

The aggregate investment is ludicrous. The only people who are actually making money are the ones selling the gold-mining equipment to the peddlers. Like they say, “If something cannot go on forever, it will stop.” Where by “forever”, in the case of GenAI, I mean “sometime in 2026, probably”.

… But the economy won’t collapse

Cory forecasts existential disaster, but I’m less worried. Those most hurt when the bubble collapses will be the investing classes who, generally speaking, can afford it. Yeah, if the S&P 500 drops by a third, the screaming will shake the heavens, but I honestly don’t see it hitting as hard as 2008 and don’t see how the big-picture economy falls apart. That work that the genAI shills say would be automated away is still gonna have to be done, right?

The software profession will change, but not that much

Here’s where I get in trouble, because a big chunk of my professional peers, including people I admire, see GenAI-boosted coding as pure poison: “In a kind of nihilistic symmetry, their dream of the perfect slave machine drains the life of those who use it as well as those who turn the gears.” (The title of that essay is “I Am An AI Hater.”)

I’m not a hater. I argued above that LLMs generating human discourse have no way to check their output for consistency with reality. But if it’s code, “reality” is approximated by what will compile and build and pass the tests. The agent-based systems iteratively generate code, reality-check it, and don’t show it to you until it passes. One consequence is that the quality of help you get from the model should depend on the quality of your test framework. Which warms my testing-fanatic heart.

So, my first specific prediction: Generated code will be a routine thing in the toolkit, going forward from here. It’s pretty obvious that LLMs are better at predicting code sequences than human language.

In Revenge of the junior developer, Steve Yegge says, more or less, “Resistance is useless. You will be assimilated.” But he’s wrong; there are going to be places where we put the models to work, and others where we won’t. We don’t know which places those are and aren’t, but I have (weaker) predictions; let’s be honest and just say “guesses”.

Where I suspect generated code will likely appear:

Application logic: “Depreciate the values in the AMOUNT field of the INSTALLED table forward ten years and write the NAME field and the depreciated value into a CSV.” Or “Look at JIRA ticket 248975 and create a fix.”

(By the way, this is a high proportion of what actual real-world programmers do every day.)
Glorified StackOverflow-style lookups like I did in My First GenAI Code.
Drafting code that needs to run against interfaces too big and complex to hold in your head, like for example the Android and AWS APIs (“When I shake the phone, grab the location from GPS and drop it in the INCOMING S3 bucket”). Or CSS (“Render that against a faded indigo background flush right, and hold it steady while scrolling so the text slides around it”).
SQL. This feels like a no-brainer. So much klunky syntax and so many moving pieces.

Where I suspect LLM output won’t help much.

Interaction design. I mean, c’mon, it requires predicting how humans understand and behave.
Low level infrastructure code, the kind I’ve spent my whole life on, where you care a whole lot about about conserving memory and finding sublinear algorithms and shrinking code paths and having good benchmarks.

Here are areas where I don’t have a prediction but would like to know whether and how LLM fits in (or not).

Help with testing: Writing unit and integration tests, keeping an eye on coverage, creating a bunch of BDD tests from a verbal description of what a function is going to do.
Infrastructure as code: CI/CD, Terraform and peers, all that stuff. There are so many ways to get it wrong.
Bad old-school concurrency that uses explicit mutexes and java.lang.Thread where you have to understand language memory models and suchlike.

The real reason not to use GenAI

Because it’s being sold by a panoply of grifters and chancers and financial engineers who know that the world where their dreams come true would be generally shitty, and they don’t care.

(Not to mention the environmental costs and the poor folk in the poor countries where the QA and safety work is outsourced.)

Final prediction: After the air goes out of the assholes’ bubble, we won’t have to live in the world they imagine. Thank goodness.

C2PA Investigations 18 Sep 2025, 7:00 pm

This is the blog version of my talk at the IPTC’s online Photo Metadata Conference conference. Its title is the one the conference organizers slapped on my session without asking; I was initially going to object but then I thought of the big guitar riff in Dire Straits’ Private Investigations and snickered. If you want, instead of reading, to watch me present, that’s on YouTube. Here we go.

Hi all, thanks for having me. Today I represent… nobody, officially. I’m not on any of the committees nor am I an employee of any of the providers. But I’m a photographer and software developer and social-media activist and have written a lot about C2PA. So under all those hats this is a subject I care about.

Also, I posted this on Twitter back in 2017.

I’m not claiming that I was the first with this idea, but I’ve been thinking about the issues for quite a while.

Enough self-introduction. Today I’m going to look at C2PA in practice right now in 2025. Then I’m going to talk about what I think it’s for. Let’s start with a picture.

This smaller version doesn’t have C2PA,
but if you click on it, the larger version you get does.
Photo credit: Rob Pike

I should start by saying that a few of the things that I’m going to show you are, umm, broken. But I’m still a C2PA fan. Bear in mind that at this point everything is beta or preview or whatever, at best v1.0. I think we’re in glass-half-full mode.

This photo is entirely created and processed by off-the-shelf commercial products and has content credentials, and let me say that I had a freaking hard time finding such a photo. There are very few Content Credentials out there on the Internet. That’s because nearly every online photo is delivered either via social media or by professional publishing software. In both cases, the metadata is routinely stripped, bye-bye C2PA. So one of the big jobs facing us in putting Content Credentials to work is to stop publishers from deleting them.

Of course, that’s complicated. Professional publishers probably want the Content Credentials in place, but on social media privacy is a key issue and stripping the metadata is arguably a good default choice. So there are a lot of policy discussions to be had up front of the software work.

Anyhow, let’s look at the C2PA.

Picture with two Content Credentials glyphs and one drop-down

I open up that picture in Chrome and there are little “Cr” glyphs at both the top left and top right corners; that’s because I’ve installed multiple C2PA Chrome plug-ins. Turns out these seem to only be available for Chrome, which is irritating. Anyhow, I’ve clicked on the one in the top left.

That’s a little disappointing. It says the credentials were recorded by Lightroom, and gives my name, but I think it’s hiding way more than it’s revealing. Maybe the one on the top right will be more informative?

More or less the same info. A slightly richer presentation But both displays have an “inspect” button and both do the same thing. Let’s click it!

Content Credentials inspector page, failing to retrieve a page for review

This is the Adobe Content Credentials inspector and it’s broken. That’s disappointing. Having said that, I was in a Discord chat with a senior Adobe person this morning and they’re aware of the problem.

But anyhow, I can drag and drop the picture like they say.

Content credentials as displayed by the Inspector

Much much better. It turns out that this picture was originally taken with a Leica M11-P. The photographer is a famous software guy named Rob Pike, who follows me on Mastodon and wanted to help out.

So, thanks Rob, and thanks also to the Leica store in Sydney, Australia, who loaned him the M11. He hasn’t told me how he arranged that, but I’m curious.

I edited it in Lightroom, and if you look close, you can see that I cropped it down and brightened it up. Let’s zoom in on the content credentials for the Leica image.

There’s the camera model, the capture date (which is wrong because Rob didn’t get around to setting the camera’s date before he took the picture.) The additional hardware (R-Adapter-M), the dimensions, ISO, focal length, and shutter speed.

Speaking as a photographer, this is kind of cool. There’s a problem in that it’s partly wrong. The focal length isn’t zero, and Rob is pretty sure he didn’t have an adapter on. But Leica is trying to do the right thing and they’ll get there.

Now let’s look at the assertions that were added by Lightroom.

There’s a lot of interesting stuff in here, particularly the provenance. Lightroom lets you manage your identities, using what we call “OAuth flows”, so it can ask Instagram (with my permission) what my Instagram ID is. It goes even further with LinkedIn; it turns out that LinkedIn has an integration with the Clear ID people, the ones who fast-track you at the airport. So I set up a Clear ID, which required photos of my passport, and went through the dance with LinkedIn to link it up, and then with Lightroom so it knew my LinkedIn ID. So to expand, what it’s really saying is: “Adobe says that LinkedIn says that Clear says that the government ID of the person who posted this says that he’s named Timothy Bray”.

I don’t know about you, but this feels like pretty strong provenance medicine to me. I understand that the C2PA committee and the CAWG people are re-working the provenance assertions. To them: Please don’t screw this particular style of provenance up.

Now let’s look at what Lightroom says it did. It may be helpful to know what I in fact did.

Cropped the picture down.
Used Lightroom’s “Dehaze” tool because it looked a little cloudy.
Adjusted the exposure and contrast, and boosted the blacks a bit.
Sharpened it up.

Lightroom knows what I did, and you might wonder how it got from those facts to that relatively content-free description that reads like it was written by lawyers. Anyhow, I’d like to know. Since I’m a computer geek, I used the open-source “c2patool” to dump what the assertions actually are. I apologize if this hurts your eyes.

It turns out that there is way more data in those files than the inspector shows. For example, the Leica claims included 29 EXIF values, here are three I selected more or less at random:

          "exif:ApertureValue": "2.79917",
          "exif:BitsPerSample": "16",
          "exif:BodySerialNumber": "6006238",

Some of these are interesting: In the Leica claims, the serial number. I could see that as a useful provenance claim. Or as a potentially lethal privacy risk. Hmmm.

            {
              "action": "c2pa.color_adjustments",
              "parameters": {
              "action": "c2pa.color_adjustments",
              "parameters": {
                "com.adobe.acr.value": "60",
                "com.adobe.acr": "Exposure2012"
              }
            },
            {
              "action": "c2pa.color_adjustments",
              "parameters": {
                "com.adobe.acr": "Sharpness",
                "com.adobe.acr.value": "52"
              }
            },
            {
              "action": "c2pa.cropped",
              "parameters": {
                "com.adobe.acr.value": "Rotated Crop",
                "com.adobe.acr": "Crop"
              }
            }

And in the Lightroom section, it actually shows exactly what I did, see the sharpness and exposure values.

My feeling is that the inspector is doing either too much or too little. At the minimal end you could just say “hand processed? Yes/No” and “genAI? Yes/No”. For a photo professional, they might like to drill down and see what I actually did. I don’t see who would find the existing presentation useful. There’s clearly work to do in this space.

Oh wait, did I just say “AI”? Yes, yes I did. Let’s look at another picture, in this case a lousy picture.

Picture of an under-construction high-rise behind leaves

I was out for a walk and thought the building behind the tree was interesting. I was disappointed when I pulled it up on the screen, but I still liked the shape and decided to try and save it.

Picture of an under-construction high-rise behind leaves, improved

So I used Lightroom’s “Select Sky” to recover its color, and “Select Subject” to pull the building details out of the shadows. Both of these Lightroom features, which are pretty magic and I use all the time, are billed as being AI-based. I believe it.

Let’s look at what the C2PA discloses.

Lightroom C2PA assertions with automation AI

Having said all that, if you look at the C2PA (or at the data behind it) Lightroom discloses only “Color or Exposure”, “Cropping”, and “Drawing” edits. Nothing about AI.

Hmm. Is that OK? I personally think it is, and highlights the distinction between what I’d call “automation” AI and Generative AI. I mean, selecting the sky and subject is something that a skilled Photoshop user could accomplish with a lot of tinkering, the software is just speeding things up. But I don’t know, others might disagree.

Well, how about that generative AI?

Turtle in shallow water, generated by ChatGPT

Fails c2patool validation, “DigitalSourceType” is trainedAlgorithmicMedia

Desktop with decorations, Magic Erase has been applied

“DigitalSourceType” is compositeWithTrainedAlgorithmicMedia

The turtle is 100% synthetic, from ChatGPT, and on the right is a Pixel 10 shot on which I did a few edits including “Magic Eraser”. Both of these came with Content Credentials; chatGPT’s is actually invalid, but on the glass-half-full front, the Pixel 10’s were also invalid up until a few days ago, then they fixed it. So this stuff does get fixed.

I’m happy about the consistent use of C2PA terminology, they are clearly marking the images as genAI-involved.

I’m about done talking about the state of the Content Credentials art generally but I should probably talk about this device.

Because it marks the arrival of Content Credentials on the mass consumer market. Nobody knows how many Pixels Google actually sells but I guarantee it’s a lot more than Leica sells M11’s. And since Samsung tends to follow Google pretty closely, we’re heading for tens then hundreds of millions of C2PA-generating mobile devices. I wonder when Apple will climb on board?

Let’s have a look at that C2PA.

This view of the C2PA is from the Google Photos app. It’s very limited. In particular, there is nothing in there to support provenance. In fact, it’s the opposite, Google is bending over backward to avoid anything that could be interpreted as breaking the privacy contract by sharing information about the user.

Let’s pull back the covers and dig a little deeper. Here are a few notes

The device is identified just as “Pixel camera”. There are lots of different kinds of those!
The C2PA inclusion is Not optional!
DigitalSourceType: computationalCapture (if no genAI)
Timestamp is “untrusted”

The C2PA not being optional removes a lot of UI issues but still, well, I’m not smart enough to have fully thought through the implications. That Digital Source Type looks good and appropriate, and the untrusted-ness of the timestamp is interesting.

You notice that my full-workflow examples featured a Leica rather than the Pixel, and that’s because the toolchain is currently broken for me: Neither Lightroom nor Photoshop can handle the P10 C2PA. I’ll skip the details, except to say that Adobe is aware of the bug, a version mismatch, and they say they’re working on it.

Before we leave the Pixel 10, I should say that there are plenty of alternate camera apps in Android and iOS, some quite good, and it’d be perfectly possible for them to ship much richer C2PA, notably including provenance, location, and so on.

I guess that concludes my look at the current state of the Content Credentials art. Now I’d like to talk about what Content Credentials are for. To start with, I think it’d be helpful to sort the assertions into three baskets.

C2PA assertions in Capture, Processing, and Provenance baskets

Capture, that’s like that Leica EXIF stuff we showed earlier. What kind of camera and lens, what the shooting parameters were. Processing, that’s like the Lightroom report: How was the image manipulated, and by what software. Provenance: Which person or organization produced this?

But I think this picture has an important oversimplification, let me fix that.

C2PA assertion baskets with the addition of GenAI

Processing is logically where you’d disclose the presence of GenAI. And in terms of what people practically care about, that’s super important and deserves special consideration.

Now I’m going to leave the realm of facts and give you opinions. As for the Capture data there on the left… who cares? Really, I’m trying to imagine a scenario in which anyone cares about the camera or lens or F stop. I guess there’s an exception if you want to prove that the photo was taken by one of Annie Liebowitz’s cameras, but that’s really provenance.

Let’s think about a professional publication scenario. They get photos from photographers, employees or agencies or whatever. They might want to be really sure that the photo was from the photographer and not an imposter. So having C2PA provenance would be nice. Then when the publisher gets photos, they do a routine check of the provenance and if it doesn’t check out, they don’t run the picture without a close look first.

They also probably want to check for the “is there genAI?” indicator in the C2PA, and, well, I don’t know what they might do, but I’m pretty sure they’d want to know.

That same publisher might want to equip the photos they publish with C2PA, to demonstrate that they are really the ones who chose and provided the media. That assertion should be applied routinely by their content management system. Which should be easy, on the technology side anyhow.

So from the point of view of a professional publisher, provenance matters, and being careful about GenAI matters, and in the C2PA domain, I think that’s all that really matters.

Now let’s turn to Social Media, which is the source of most of the images that most people see most days. Today, all the networks strip all the photo metadata, and that decision involves a lot of complicated privacy and intellectual-property thinking. But there is one important FACT that they know: For any new piece of media, they know which account uploaded the damn thing, because that account owner had to log in to do it. So I think it’s a no-brainer that IF THE USER WISHES, they can have a Content Credentials assertion in the photo saying “Initially uploaded by Tim Bray at LinkedIn” or whoever at wherever.

What we’d like to achieve is that if you see some shocking or controversial media, you’d really want to know who originally posted it before you decided whether you believed it, and if Content Credentials are absent, that’s a big red flag. And if the picture is of the current situation in Gaza, your reaction might be different depending on whether it was originally from an Israeli military social-media account, or the Popular Front for the Liberation of Palestine, or by the BBC, or by igor282356057@popular.online.

Anyhow, here’s how I see it:

C2PA assertion baskets inflated according to their relative importance

So for me, it’s the P and A in C2PA that matter – provenance and authenticity. I think the technology has the potential to change the global Internet conversation for the better, by making it harder for liars to lie and easier for truth-tellers to be believed. I think the first steps that have been taken so far are broadly correct and the path forward is reasonably clear. All the little things that are broken, we can fix ’em.

And there aren’t that many things that matter more than promoting truth and discouraging lies.

And that’s all, folks.

Maritime Wrap-up 13 Sep 2025, 7:00 pm

Only a few more pictures to share from our vacation, which I’ll wrap up in conventional tourism advice.

Looking down off a cliff at a tiny boat far below

It’s mostly about the oceanfront, and what you can see from it.

Food and drink

I recommend all of the following.

Schoolhouse Brewery in Windsor, NS; nice space, decent food, the Vice Principal is a good IPA. Maybe the beer that I enjoyed most was “Exile on North Street” from unfiltered brewing; you might want to follow that link and also check out the URL.

I didn’t love Halifax that much but it has this charming little neighborhood called Hydrostone, where The Brown Hound offered very solid food and beer. We didn’t spend that much time in New Brunswick, but Moncton’s Pump House was cheery and competent; a cool space; I can’t remember which of their IPAs I had, but it was good. The other peak New Brunswick goodness was Adorable Chocolat in Shediac, where everyone was effortlessly bilingual and the pastries just divine. Don’t miss it if you’re anywhere near.

Cape Breton coastline, people swimming, houses on the slopes

People live by the sea, and swim in it.

Charlottetown’s not that rich in dining options, but got a really excellent lunch at The Cork & Cast. Maybe our best meal of the trip was at The Wheelhouse, in Digby. Scallops all around, seared is the best option.

Cities, towns, and other tourists

Every good tourist spot in the world seems to suffer from increasingly intense and persistent overcrowding, and the Maritimes are no exception. On top of which, they’re thinly populated, fewer than two million souls in three provinces. The biggest city, Halifax (and the entire province of Prince Edward Island) are both smaller than individual Vancouver suburbs. It’s not a place for savouring urban flavors.

In Nova Scotia, Halifax has too many cruise ships; stay away from its so-called “farmers market” unless you love cruise culture. Lunenberg is big enough to soak up its waves of visitors and still offer unique visuals.

Overcrowded but has nice bits.

Peggy’s Cove I just can’t recommend; beautiful but jam-packed with cars looking for parking and people risking their lives on the rocks.

These were once defences but now just a pleasant walk.

I do recommend visiting Annapolis Royal; it’s got that great garden and Fort Anne, despite its lengthy and chequered military history, is lovely and peaceful.

In PEI, Charlottetown makes an effort and it has a beautiful basilica, but just isn’t big enough to reward a whole day’s visit.

In NB, Moncton is OK but its biggest tourist attraction is the tide going in and out.

Crowded tidal flat at Hopewell Provincial Park

Clifftop forest at Hopewell Provincial Park

Hopewell Provincial Park, NB. The clifftop trees are exceptional.

Lodging

The hotels and Airbnbs and VRBOs were OK, mostly. The Harbourview Inn, near Digby, is a charmingly-traditional guest-house. The rooms are OK, but the downstairs is warmly welcoming, drinks available when the host’s there to man the bar, lots of space to sink into a comfy chair and conversation or your laptop. Also the breakfast was solid.

Excited clouds over Lake Ainsley, NS.

But the trip’s lodging highlight was this VRBO called Forest Lake House on Lake Ainsley, the Maritimes’ biggest. Isolated, comfortable, outstanding grounds, your own private forest walk; everything anyone could want. We stopped traveling and had a chill-out day there, enjoying every minute of it.

Lots of people but plenty at room at Cavendish beach.

Otherwise

We only swam once, at Cavendish Beach in PEI’s Anne of Green Gables territory, very nicely set up. But what looked most appealing to me was Crescent Beach in Lockeport, Nova Scotia; I wish we’d made time to have a swim there.

Turns out all three vacationers had farming or agriculture-adjacent roots. If you care about that stuff, driving around PEI is a treat; the agriculture is super-intensive and, to my eye, pleasingly well-done.

A farm field with hay bales by the seaside

The farmlands extend to the seaside.

But if you have the time, get away from PEI’s farms and head northwest, drive down the coast from Tignish to West Point; that ride is full of colors and sea-fronts that aren’t like anywhere else I’ve seen.

Since it’s the New World there’s plenty of nasty history around the indigenous folk, the Mi'kmaq nation. But you really have to look to find it. We visited the Millbrook Cultural & Heritage Centre in Truro, which is much better than nothing.

You gotta drive; we put 3,742km on a basic rented Kia. The roads are way better taken care of than here out West.

Bye-bye, Maritimes.

We didn’t run across a single human Maritimer who was anything less than friendly and welcoming.

A white house perched on rocks above the sea-side

Nice people living along beautiful oceanfronts, plenty good enough for me.

Maritime Colors 1 Sep 2025, 7:00 pm

When someone (like us) comes back from a trip to the Maritimes, they’re apt to have pictures of brightly-colored houses. This is to show those colors off and not just in houses. Plus a camera color conundrum.

A trailer park with small building covered by bright colored fenders

On the northwest coast of PEI, probably near Cape Wolfe.

In that picture above, glance at the bit of beach showing left of the little lighthouse. There’s a color story there too.

Residentials

As it happens, our very first outing on the vacation was to Lunenberg, which features those cheerful houses.

It wasn’t just tourist magnets like Lunenberg; anywhere in the Maritimes you’re apt to see exuberantly-painted residences, a practice I admire. While the Maritimes are a long way from my home in Vancouver, we share a long, dim, grey winter, and any splash of color can help with that Seasonal Affective Disorder.

Also, we recently bought a house and, while we like it, it’s an undistinguished near-grey, so we’re looking for color schemes to steal. Thus I took lots of pictures of bright houses.

A couple years back we painted our cabin a cheery blue based on sampling photos of the shutters on Mykonos. A few neighbors rolled their eyes but nobody’s actually complained.

Red

That’s the other color you have to talk about down east; I mean the color of the soil and sand and rocks. PEI in particular is famous for its red dirt, when you come in the on the ferry from Nova Scotia the first thing you notice is the island’s red fringe. I took a million pictures and maybe this is the closest to capturing it.

Not far from that first picture.

Green Conundrum

One of Nova Scotia’s attractions is the Cabot Trail, a 300km loop around Cape Breton, stretching northeast out into the Atlantic. This one scenic turn-off has you looking at a big, densely-forested mountainside. It’s more chaotic than our West-Coast temperate rain forests, with many tree species jumbled together. The spectrum of greens under shifting clouds was a real treat for the eyes. Here are two of the pictures I came away with. Have a look at them for a moment.

Above is by my Pixel 7, below a modern Fujifilm camera. When I unloaded them on the big outboard screen, I was disappointed with the Fujifilm take, which seemed a little flat and boring; was thinking the Pixel had done better. But then I started feeling uneasy; my memory kept telling me that that mountainside just didn’t include that yellow flavor in the Pixel’s highlights. I mean, those highlights look great, but I’m pretty sure they’re lies.

After a while, I edited the Fujifilm version just a teeny bit, gently bumping Lightroom’s “exposure” and “Vibrance” sliders, and I thought what I got was very close to what I remembered. The Pixel photo is entirely un-touched.

I’m not sure what to think. Mobile-phone cameras in general and the Pixel in particular proudly boast their “computational photography” and “AI” chops and, yeah, the Pixel produced a photo that it’s hard not to like.

And quite a few of the pictures I publish in this space have have been adjusted pretty heavily in Lightroom. I stand by my claim that I’m mostly trying to make something that looks like what I saw. But increasingly, I suspect the Pixel is showing colors people like, as opposed to what’s real.

Maritime Birds and Bees 30 Aug 2025, 7:00 pm

Nova Scotia and New Brunswick each have plenty of wilderness; PEI not so much. So pictures of bears and cougars and so on would be plausible, as would marine mammals. But no. Herewith, from our recent vacation, birds and bees, with a little lens-geek side trip.

Birds

Having touristed around Charlottetown, we drove down a series of smaller and smaller back roads and ended up at Canceaux Cove near Rocky Point, which I thought might present a nice vista of the city. It did, but the city looks boring. By way of consolation, there were these cute little birds running around on the beach and then flying loops in formation over the water.

Pretty sure these are Semipalmated plovers.

I wanted to get a picture of them in the air so I sauntered down the beach, assuming they’d fly away picturesquely. They studiously ignored me and eventually I had to jump and down and wave my arms and even then they took off grudgingly.

They were graceful and did this mysterious thing that birds can do, staying in formation with no obvious leader. I’ve had the pleasure, very occasionally, of being in engineering teams like that.

Bees

We went to Annapolis Royal because of its Historic Gardens and wow, what a treat. I think even those who don’t see themselves as garden fans would enjoy an hour or more sauntering around in there. I like taking pictures of flowers and a lot of these flowers had bees in them.

This one was cute enough to reward a close-up.

Aren’t her wings cute?

And I ask, what could be better than a cute bee in a pretty flower? Obviously, two bees.

And again, a closer look.

Bees are admirable creatures and I don’t want to make fun of them, but this surprised-looking little citizen makes me laugh. (She’s just navigating from one blossom to the next.)

Lens

All of these are shot with Fujifilm’s 55-200mm lens, which I’ve had for at least eleven years. Up till now, I’ve always pointed it at faraway things, but wow, I think I’ll be taking this to more gardens in future.

I mention the lens partly so I can link to this awesome (and funny) teardown piece from Lensrentals.

And, on the way out, let’s let that lens show off with a couple of roses.

Remember, pink and black are the colors of rock & roll. And if you’re anywhere near Annapolis Royal, stop and visit that garden.

Maritime Vacation 27 Aug 2025, 7:00 pm

The sound of the wind surging through birchy Eastern woods isn’t like the same coastal gusts in my own Pacific rain forest; around you not above you, alto not baritone. The colors differ too: Forests, houses, soil, and sea. And everywhere little white churches, each with its cemetery. A scattering of forts, far too many cannons. And everything faces the sea.

Birchy Cape Breton forest.

For the first time since Covid and, more important, since Lauren’s 2½-year battle with Long Covid, we went on the road for pleasure; Lauren and I and our dear friend Sally from Warragul, Australia. To my shame, all my decades’ travel had never taken me to Canada east of Montreal, so we spent a couple of weeks poking around Nova Scotia and Prince Edward Island, plus a bit of New Brunswick. I took many pictures and it’ll take a few blog pieces to share those that I think deserve it.

No part of Canada’s settler culture is old by European or Asian standards, but ten generations of white people lived and died here before the first rough town organized itself near what’s now Vancouver. They had to be buried someplace, thus the graveyards everywhere you go. These were captured near Whycocomagh.

Gravestone of Lillian S. DeWolfe, 1876-1958

Lillian S. DeWolfe, Oct 1876 Sept 1958.

A simple grave marker saying only “sleeping”

How long will it still matter that my hometown is one of the world’s youngest big cities?

Many graveyards are church attachments, but many more greet you at a random turn in the road; always framed by forest. The density of churches is remarkable; all built of wood, mostly white, mostly well-kept. This one was attached to the graves above and is untypically faded (but lovely inside).

Square white wooden church, it needs a paint job

Some of the churches have become boutiques and breweries, but those that haven’t still occur more densely than in any other New World jurisdiction I’ve seen. Why should faith hold stronger down East?

Another church, St Dunstan’s Basilica in Charlottetown, offered perhaps the most intense experience of the whole trip, because a singer and organist were practicing elaborate hymn treatments. Both were great, the organ is a magnificent Casavant, and parish organist Leo Marchildon was having fun, putting lots of wind through those pipes including the 32’ bass monsters. My ears and I were smiling when we left.

Stained glass in St. Dunstan’s Basilica, Charlottetown

The stained glass is nothing special
but I liked the opened panes at the bottom.

Forts and cannons, I said; the Maritimes’ messy history included repeated captures and recaptures by the forces of France and Britain and the USA, and quite a few of the forts had been put to their intended use, repelling or falling to one invader or another.

The locals, at least the ones who set things up for tourists, seem to take their history seriously; I don’t pretend expertise or even much interest in it, but I have to say that some cannons have good typography.

“VR” is Victoria Regina of course,
so sometime in the second half of the 19^th century.

The colors are different, and an entry later in this series will dip in gleefully and give me a platform for camera geekery. One expects changes in houses and vegetation when you travel four timezones away, but nothing prepared me for the shockingly red soil in Prince Edward Island (hereinafter PEI).

red and blue see, red soil, wind turbines

Past Tignish at PEI’s northern extremity,
well off the paved-roads part.

I opened with words about everything facing the sea. Not entirely true, sometimes you’re looking at a lake.

Kids in a large calm lake, circular ripples

Those kids don’t know how lucky they are.

This is in the wonderful Kejimkujik National Park in central Nova Scotia, mostly closed due to extreme wildfire peril.

All across the Maritimes, drought was in effect; crops failing, forest trails closed. Which reminds me; near that lake there was a birch-bark-canoe workshop. I asked the guy making the canoe how long it took. He said “My great-grandpa could do it in seven days, because back then there were birch trees big enough that you could make the whole hull out of a single piece.” It’s very difficult to find any aspect of life on earth that isn’t exhibiting Anthropocene damage.

Usually, it’s the sea that you’re looking at.

Nova Scotia coastline near Annapolis Royal

Above, coastline near Annapolis Royal.
Below, low tide near Chipman Brook.

From one end of Canada to the other; to me, the surprise was not so much the difference in the landscapes but the similarity of the people; they spoke my accent, shopped in my stores, obeyed my road signs. More on that later. For now, this.

Trees frame the seawater and a couple of oceanfront houses

On Bell Island, among the LaHaves.

RFC 9839 and Bad Unicode 14 Aug 2025, 7:00 pm

Unicode is good. If you’re designing a data structure or protocol that has text fields, they should contain Unicode characters encoded in UTF-8. There’s another question, though: “Which Unicode characters?” The answer is “Not all of them, please exclude some.”

This issue keeps coming up, so Paul Hoffman and I put together an individual-submission draft to the IETF and now (where by “now” I mean “two years later”) it’s been published as RFC 9839. It explains which characters are bad, and why, then offers three plausible less-bad subsets that you might want to use. Herewith a bit of background, but…

Please

If you’re actually working on something new that will have text fields, please read the RFC. It’s only ten pages long, and that’s with all the IETF boilerplate. It’s written specifically for software and networking people.

The smoking gun

The badness that 9839 focuses on is “problematic characters”, so let’s start with a painful example of what that means. Suppose you’re designing a protocol that uses JSON and one of your constructs has a username field. Suppose you get this message (I omit all the non-username fields). It’s a perfectly legal JSON text:

{
    "username": "\u0000\u0089\uDEAD\uD9BF\uDFFF"
}

Unpacking all the JSON escaping gibberish reveals that the value of the username field contains four numeric “code points” identifying Unicode characters:

The first code point is zero, in Unicode jargon U+0000. In human-readable text it has no meaning, but it will interfere with the operation of certain programming languages.
Next is Unicode U+0089, official name “CHARACTER TABULATION WITH JUSTIFICATION”. It’s what Unicode calls a C1 control code, inherited from ISO/IEC 6429:1992, adopted from ECMA 48 (1991), which calls it “HTJ” and says: HTJ causes the contents of the active field (the field in the presentation component that contains the active presentation position) to be shifted forward so that it ends at the character position preceding the following character tabulation stop. The active presentation position is moved to that following character tabulation stop. The character positions which precede the beginning of the shifted string are put into the erased state.

Good luck with that.
The third code point, U+DEAD, in Unicode lingo, is an “unpaired surrogate”. To understand, you’d have to learn how Unicode’s much-detested UTF-16 encoding works. I recommend not bothering.

All you need to know is that surrogates are only meaningful when they come in pairs in UTF-16 encoded text. There is effectively no such text on the wire and thus no excuse for tolerating surrogates in your data. In fact, the UTF-8 specification says that you mustn’t use UTF-8 to encode surrogates. But the real problem is that different libraries in different programming languages don’t always do the same things when they encounter this sort of fœtid interloper.
Finally, \uD9BF\uDFFF is JSON for the code point U+7FFFF. Unicode has a category called “noncharacter”, containing a few dozen code points that, for a variety of reasons, some good, don’t represent anything and must not be interchanged on the wire. U+7FFFF is one of those.

The four code points in the example are all clearly problematic. The just-arrived RFC 9839 formalizes the notion of “problematic” and offers easy-to-cite language saying which of these problematic types you want to exclude from your text fields. Which, if you’re going to use JSON, you should probably do.

Don’t blame Doug

Doug Crockford I mean, the inventor of JSON. If he (or I or really anyone careful) were inventing JSON now that Unicode is mature, he’d have been fussier about its character repertoire. Having said that, we’re stuck with JSON-as-it-is forever, so we need a good way to say which of the problematic characters we’re going to exclude even if JSON allows them.

PRECISion

You may find yourself wondering why the IETF waited until 2025 to provide help with Bad Unicode. It didn’t; here’s RFC 8264: PRECIS Framework: Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols; the first PRECIS predecessor was published in 2002. 8264 is 43 pages long, containing a very thorough discussion of many more potential Bad Unicode issues than 9839 does.

Like 9839, PRECIS specifies subsets of the Unicode character repertoire and goes further, providing a mechanism for defining more.

Having said that, PRECIS doesn’t seem to be very widely used by people who are defining new data structures and protocols. My personal opinion is that there are two problems which make it hard to adopt. First, it’s large and complex, with many moving parts, and requires careful study to understand. Developers are (for good reason) lazy.

Second, using PRECIS ties you to a specific version of Unicode. In particular, it forbids the use of the (nearly a million) unassigned code points. Since each release of Unicode includes new code point assignments, that means that a sender and receiver need to agree on exactly which version of Unicode they’re both going to use if they want reliably interoperable behavior. This makes life difficult for anyone writing a general-purpose code designed to be used in lots of different applications.

I personally think that the only version of Unicode anybody wants to use is “as recent as possible”, so they can be confident of having all the latest emojis.

Anyhow, 9839 is simpler and dumber than PRECIS. But I think some people will find it useful and now the IETF agrees.

Source code

I’ve written a little Go-language library to validate incoming text fields against each of the three subsets that 9839 specifies, here. I don’t claim it’s optimal, but it is well-tested.

It doesn’t have a version number or release just yet, I’ll wait till a few folk have had a chance to spot any dumb mistakes I probably made.

Details

Here’s a compact summary of the world of problematic Unicode code points and data formats and standards.

	Problematic classes excluded?
	Surrogates	Legacy controls	Noncharacters
CBOR	yes	no	no
I-JSON	yes	no	yes
JSON	no	no	no
Protobufs	no	no	no
TOML	yes	no	no
XML	yes	partial [1]	partial [2]
YAML	yes	mostly [3]	partial [2]
	RFC 9839 Subsets
Scalars	yes	no	no
XML	yes	partial	partial
Assignables	yes	yes	yes

Notes:

[1] XML allows C1 controls.

[2] XML and YAML don’t exclude the noncharacters outside the Basic Multilingual Pane.

[3] YAML excludes all the legacy controls except for the mostly-harmless U+0085, another version of \n used in IBM mainframe documents.

Thanks!

9839 is not a solo production. It received an extraordinary amount of discussion and improvement from a lot of smart and well-informed people and the published version, 15 draft revisions later, is immensely better than my initial draft. My sincere thanks go to my co-editor Paul Hoffman and to all those mentioned in the RFC’s “Acknowledgements” section.

On individual submissions

9839 is the second “individual submission” RFC I’ve pushed through the IETF (the other is RFC 7725, which registers the HTTP 451 status code). While it’s nice to decide something is worth standardizing and eventually have that happen, it’s really a lot of work. Some of that work is annoying.

I’ve been involved in other efforts as Working-Group member, WG chair, and WG specification editor, and I can report authoritatively that creating an RFC the traditional way, through a Working Group, is easier and better.

I feel discomfort advising others not to follow in my footsteps, but in this case I think it’s the right advice.

Long Links 4 Aug 2025, 7:00 pm

All of these Long Links pieces have begun with more or less the same words, so why stop now? This is an annotated parade of links to long-form pieces. Most people won’t have the time (nor the weird assortment of interests) to consume them all, but I hope that most readers will find one or two reward a visit.

Radisson (and Groseilliers)

I don’t know if it is still the case, but in my youth, Canadian elementary education included several overexcited units about the Coureurs des bois, early European settlers in “New France” (now Québec) who ventured, by foot and canoe, far to the north and west, mostly engaged in trading with the indigenous peoples: trinkets (and later, serious hardware including guns) for furs.

The names I remembered were Radisson and Groseilliers, but I don’t recall learning much about who they were and what they did. Then I ran across the 2019 book Bush Runner: The Adventures of Pierre-Esprit Radisson and, wow… The writing is pedestrian but who cares because what a story! Radisson lived an absolutely astonishing life. He went as deep into the bush as anyone of his era, interacted intensely with the indigenous people as business partner, friend, and foe, worked for Charles of England and Louis of France (changing sides several times), in 1670 founded the Hudson’s Bay Company (recently, 355 years later, deceased), and fortunately took notes, a copy of which was preserved by Samuel Pepys.

I learned more from this book’s pages about the early history of Upper and Lower Canada than all those elementary-school units had to offer, and had loads of fun doing so. I guess this is a fairly Canadian-specific Long Link, but I think anyone interested in the early history of Europeans in North America would find much to enjoy.

Music

It’s rare these days that I discover interesting new musicians, but here are two of those rarities.

Lucie Horsch plays recorder, you know, the cheap plastic thing they use to introduce second-graders to music. It’s actually a lovely instrument and I wish we would switch to its German name, “Blockflöte”, which to my ear sounds a bit like the instrument does. Anyhow, check out this YouTube entitled only Lucie Horsch - Bach, annoyingly omitting any mention of which Bach. Annoyance aside, it’s a pretty great performance, Ms Horsch is the real deal, full of virtuosity and grace.

I got an unusual mid-week message from Qobuz, all excited about The New Eves’ new record The New Eve Is Rising. So I played it in the car on a long crosstown drive and now I’m all excited too. The New Eves are talented, musically surprising, and above all, insanely brave.

Their music doesn’t sound like anything else and flies in the face of all conventional wisdom concerning popular music. They take absurd chances and yeah, the album has klunkers amid the bangers, but when I got to its end I went back and started at the beginning again. I found myself smiling ear-to-ear over and over. Maybe I’m being a bit over-the-top here, but check them out: Mother is live. Cow Song is off the new album and strong albeit with forgettable video.

Life online

Every Long Links has hardcore-geek threads and there is no harder core imaginable than Filippo Valsordi’s Go Assembly Mutation Testing. I have always admired (but never actually used) mutation testing, and Filippo offers a convincing argument that it moves catching certain classes of bug from nearly impossible to pretty easy. Good stuff!

And of course we can’t ignore genAI and programming. Most of you are likely aware of Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity, but I’m linking again to boost its visibility, because hard quantitative research on methodology is damn rare in our profession. I will confess to being a little (but just a little) surprised at the conclusions.

It is apparently quite possible that Intel will exit the business of making high-end chips, leaving TSMC with a global monopoly: Intel and the Wide Open Barn Doors. This is an unsettling prospect. Not, I have to say, surprising though. I’ve sneered at Intel leadership cluelessness for years and years, see here and here.

Finally, here’s the charmingly-titled How to Surf the Web in 2025, and Why You Should. I love this piece.

Class Reductionism

The news keeps making me want to build something around the classreductionist.org domain name I’ve owned for years.

The tl;dr on Class Reductionism is something like “In the best possible world it’ll take generations to disassemble the global tangle of intersectional oppression, but we could treat the symptoms effectively right now this year by sending money to the poor. I’m talking about Universal Basic Income or suchlike. I wrote a couple thousand words on the subject back in 2023, and there are complexities, and I probably won’t put up that site. But I still do maintain that a very high proportion of our societal pain is rooted in the egregious inequality, and consequent poverty, that seems a baked-in feature of Late Capitalism.

Let’s start with Nobelist Paul Krugman, who’s been writing an “Understanding Inequality” series on his paywalled newsletter and then republishing a gratis version, start here. Very data-dense and educational. Hmm, that site is slow; there’s a livelier table of contents here.

Don’t kid yourself that this is just an American problem, see ‘The Better Life Is Out of Reach’: The Chinese Dream Is Slipping Away.

Let’s pull the impersonal veil of facts and figures aside and focus on the human experience of what we used to call Class Struggle. Confessions of the Working Poor is beautifully written and opened my eyes to lifestyle choices that I didn’t even know some people have to make.

But hey, there are people who are just fine with this: Delta's premium play is taking advantage of the growing economic split.

Look, being class-determinist-adjacent doesn’t mean you should ignore intersectional awfulness: What We Miss When We Talk About the Racial Wealth Gap.

No more sections

The remaining Long Links refused to be organized so I had to turn them loose; call it the Long Tail.

The Venetian origins of roman type. You might think you don’t care about typography but still enjoy the pictures and descriptions here.

This guy is a full-time Coyote researcher. What a great gig! I’m an admirer of those animals and how they’ve carved themselves a comfy niche in most of North America’s big cities. (Even if it means that you better not let your cat out at night.) They’re also remarkably attractive.

Here’s another long list of Long Links, and many of you will wonder why anyone would choose to browse it: The Best Camera Stores in Tokyo: The Ultimate Guide. Some of the interiors are remarkable.

Oh, while we’re on the subject of photography: A Photojournalist Took a Fujifilm Instax Camera to a Mexican Cartel Wedding.

GLP-1’s (i.e. Ozempic and friends) would probably dominate a large section of the news if weren’t for all the political craziness. Here’s one small example: How GLP-1s Are Breaking Life Insurance.

Science is hard. There are lots of largely-unsolved areas, and “gap-map.org” tries to organize them: Fundamental Development Gap Map v1.0. The UI is a little klunky but the thing still sucked me right in.

I’m going to give the last word to Laurie Penny. I don’t know what we’d do without her. In a time of monsters: do we have any ideas for surviving the zombie apocalypse that aren’t nightmare patriarchy?

De-Google Project Update 29 Jul 2025, 7:00 pm

I introduced this family project in the spring of 2024. I won’t reproduce those arguments for why we’re working on this, but in the current climate I feel like I hardly need to. Since that post, our aversion to Google dependency has only grown stronger. Progress has been non-zero but not fast.

Here’s the table, with progress notes below.

Need	Supplier	Alternatives
Office	Google Workspace	Proton?
Data sharing	Dropbox
Photos	Google Photos	Dropbox?
Video meetings	Google Meet	Jitsi, Signal?
Maps	Google Maps	Magic Earth, Here, something OSM-based?
Browser	Safari, Firefox, Vivaldi, LibreWolf
Search	Google	Bing-based options, Kagi?
Chat	Signal
Photo editing	Adobe Lightroom & Nik	Capture One, Darktable, ?
In-car interface	Google Android Auto	Automaker software
Play my music	Plex, USB
Discover music	Qobuz
TV	Roku, Apple, migration

Pink indicates a strong desire to get off the incumbent service, green means we’re happy-ish with what we’re using, and blue means that, happy or not, it’s not near the top of the priority list.

I’ll reproduce the metrics we care about when looking to replace Google products, some combination of:

Not ad-supported
Not VC-funded
Not Google, Microsoft, or Amazon

The list used to include “Open Source” but I decided that while that’s good, it’s less important than the other three criteria.

Now let’s walk down the chart.

Office

This is going to be a wrenching transition; we’ve been running the family on Google stuff forever, and I anticipate muscle-memory pain. But increasingly, using Google apps feels like being in enemy territory. And, as I said last time, I will not be sorry to shake the dust of Google Drive and Docs from my heels, I find them clumsy and am always having trouble finding something that I know is in there.

While I haven’t dug in seriously yet, I keep hearing reasonably-positive things about Proton, and nothing substantive to scare me away. Wish us luck.

Data sharing (progress!)

Dropbox is, eh, OK. It doesn’t seem actively evil, there’s no advertising, and the price is low.

Photos

We’re a four-Android family including a couple of prolific photographers, and everything just gets pumped into Google and then it fills up and then they want more money. If we could configure the phones to skip Google and go straight to Dropbox, that would be a step forward.

Video meetings

Google meet isn’t painful but I totally suspect it of data-mining what should be private conversations. I’m getting the feeling that the technical difficulty of videoconferencing is going steadily down, so I’m reasonably optimistic that something a little less evil will come along with a fair price.

Maps

The fear and loathing that I started feeling in 2017 grows only stronger. But replacements aren’t obvious. It’s a pity, maps and directions and reviews feel like a natural monopoly that should be a public utility or something, rather than a corporate moat.

Browser (progress!)

Chrome has seriously started making my flesh crawl; once again, enemy territory. Fortunately, there are lots of good options. Even people like us who have multiple lives we need to keep separate can find enough better browsers out there.

Maybe I’ll have a look at one of the new genAI-company browsers ha ha just kidding.

Search

The reports on Kagi keep being positive and giving it a try is definitely on the To-Do list.

Chat

Signal is the only sane choice at this point in history for personal use.

Photo editing

Adobe’s products are good, and I’m proficient and happy with Lightroom, but they are definitely suffering from bad genAI craziness. Also the price is becoming unreasonable.

I’ve had a few Lightroom software failures in recent months and if that becomes a trend, looking seriously at the alternatives will move to the top of the priority list.

In-car interface

It’s tough, Android Auto is a truly great product. I think I’m stuck here for now, particularly given that I plan to be driving a 2019-model-year car for the foreseeable future. Also, it supports my music apps.

Discover music and play mine (progress!)

Progress here. I’ve almost completely stopped using YouTube Music in favor of Plex and Qobuz. Really no downside; YTM has more or less completely lost the ability to suggest good new stuff.

TV

Video continues morphing itself into Cable TV redux. We have an old Roku box that works fine and I think I’ve managed to find its don’t-spy-on-us settings. We’ll keep subscribing to Apple+ as long as they keep shipping great shows. I have zero regrets about having left Prime behind.

As for the rest, we’ve become migrants, exclusively month-at-a-time subscriptions for the purpose of watching some serial or sports league, unsubscribe after the season finale or championship game. The good news is that I haven’t encountered much friction in unsubscribing, just a certain amount of earnest pleading.

Looking forward

I have yet to confront any of the really hard parts of this project, but the sense of urgency is increasing. Let’s see.

QRS: Finite-state Struggles 21 Jul 2025, 7:00 pm

I just posted a big Quamina PR representing months of work, brought on by the addition of a small basic regular-expression feature. This ramble doesn’t exactly have a smooth story arc but I’m posting it anyhow because I know there are people out there interested in state-machine engineering and they are my people.

As far as I can tell, a couple of the problems I’m trying to solve haven’t been addressed before, at least not by anyone who published their findings. Partly because of that, I’m starting to wonder if all these disorderly Quamina postings might be worked into a small book or monograph or something. State machines are really freaking useful software constructs! So yeah, this is a war story not an essay, but if you like finite automata you’ll likely be interested in bits of it.

The story thus far

Prior to beginning work on Regular Expressions, I’d already wired shell-style “*” wildcards into Quamina, which forced me to start working with NFAs and ε-transitions. The implementation wasn’t crushingly difficult, and the performance was… OK-ish.

Which leads me to The Benchmark From Hell. I wondered how the wildcard functionality would work under heavy stress, so I pulled in a list of 12,959 five-letter strings from the Wordle source code, and inserted a “*” at a random position in each. Here are the first ten:

aalii*
*aargh
aar*ti
abaca*
a*baci
a*back
ab*acs
ab*aft
abak*a

I created an NFA for each and merged them together as described here. Building and merging the automata were plenty fast enough, and the merged NFA had 46,424 states, which felt reasonable. Matching strings against it ran at under ten thousand per second, which is pretty poor given that Quamina can do a million or two per second on patterns encoded in a DFA.

But, I thought, still reasonably usable.

The cursed “`?`”

Last year, my slow grind through the regexp features had led me to the zero-or-one quantifier “?”. The state machine for these things is not rocket science; there’s a discussion with pictures in my recent Epsilon Wrangling.

So I implemented that and fired off the unit tests, most of which I didn’t have to write, and they all failed. Not a surprise I guess.

It turned out that the way I’d implemented ε-transitions for the wildcards was partially wrong, as in it worked for the tight-loop state-to-itself ε-transitions, but not for more general-purpose things like “?” requires.

In fact, it turns out that merging NFAs is hard (DFAs are easy), and I found precious little help online. Thompson’s construction does give an answer: Make an otherwise-empty state with two ε-transitions, one to each of the automata, and it’ll do the right thing. Let’s call that a “splice state”. It’s easy to implement, so I did. Splicing is hardly “merging” in the Quamina sense, but still.

Unfortunately, the performance was hideously bad, just a few matches per second while pegging the CPU. A glance at the final NFA was sobering; endless chains of splice states, some thousands long.

At this point I became very unhappy and got stalled for months dealing with real-life issues while this problem lurked at the back of my mind, growling for attention occasionally.

Eventually I let the growler out of the cave and started to think through approaches. But first…

Worth solving?

Is it, really? What sane person is going to want to search for the union of thousands of regular expressions in general or wild-carded strings in particular?

I didn’t think about this problem at all, because of my experience with Quamina’s parent, Ruler. When it became popular among several AWS and Amazon teams, people sometimes found it useful to match the union of not just thousands but a million or more different patterns. When you write software that anyone actually uses, don’t expect the people using it to share your opinions on what is and isn’t reasonable. So I wasn’t going to get any mental peace until I cracked this nut.

I eventually decided that three approaches were worth trying:

Figure out a way really to merge, not just splice, the wildcarded patterns, to produce a simpler automaton.
Optimize the NFA-traversal code path.
Any NFA can be transformed into a DFA, says computer-science theory. So do that, because Quamina is really fast at DFA-based matching.

Nfa2Dfa

I ended up doing all of these things and haven’t entirely given up on any of them. The most intellectually-elegant was the transform-to-DFA approach, because if I did that, I could remove the fairly-complex NFA-traversal logic from Quamina.

It turns out that the Net is rich with textbook extracts and YouTubes and slide-shows about how to do the NFA-to-DFA conversion. It ended up being quite a pleasing little chunk of code, only a couple hundred lines.

The bad news: Converting each individual wildcard NFA to a DFA was amazingly fast, but then as I merged them in one by one, the number of automaton states started increasing explosively and the process slowed down so much that I never had the patience to let it finish. Finite-automata theory warns that this can happen, but it’s hard to characterize the cases where it does. I guess this one of them.

Having said that, I haven’t discarded the nfa2Dfa code, because perhaps I ought to offer a Quamina option to apply this if you have some collection of patterns that you want to run really super fast and are willing to wait for a while for the transformation process to complete. Also, I may have missed opportunities to optimize the conversion; maybe it’s making more states than it needs to?

Faster NFA traversal

Recently in Epsilon wrangling I described how NFA traversal has to work, relying heavily on implementing a thing called an ε-closure.

So I profiled the traversal process and discovered, unsurprisingly, that most of the time was going into memory allocation while computing those ε-closures. So now Quamina has an ε-closure cache and will only compute each one once.

This helped a lot but not nearly enough, and the profiler was still telling me the pain was in Go’s allocation and garbage-collection machinery. Whittling away at this kind of stuff is not rocket science. The standard Go trick I’ve seen over and over is to keep all your data in slices, keep re-using them then chopping them back to [:0] for each request. After a while they’ll have grown to the point where all the operations are just copying bytes around, no allocation required.

Which also helped, but the speed wasn’t close to what I wanted.

Merging wildcard automata

I coded multiple ways to do this, and they kept failing. But I eventually found a way to build those automata so that any two of them, or any one of them and a DFA, can merged and generate dramatically fewer ε-transition chains. I’m not going to write this up here for two reasons: First of all, it’s not that interesting, and second, I worry that I may have to change the approach further as I go on implementing new regxp operators.

In particular, at one point I was looking at the code while it wasn’t working, and I could see that if I added a particular conditional it would work, but I couldn’t think of a principled reason to do it. Obviously I’ll have to sort this out eventually. In the meantime, if you’re the sort of um special person who is now burning with curiosity, check out my branch from that PR and have a look at the spinout type.

Anyhow, I added that conditional even though it puzzled me a bit, and now you can add wildcard patterns to Quamina at 80K/second, and my 12.9K wildcards generate an NFA with with almost 70K states, which can scan events at almost 400K/second. And that’s good enough to ship the “?” feature.

By the way, I tried feeding that 70K-state automaton to the DFA converter, and gave up after it’d burned an hour of CPU and grown to occupy many GB of RAM.

Next steps

Add “+” and “*”, and really hope I don’t have to redesign the NFA machinery again.

Also, figure out the explanation for that puzzling if statement.

And I should say…

Despite the very narrow not to say obsessive focus of this series, I’ve gotten a few bits and pieces of positive feedback. So there are a few people out there who care about this stuff. To all of you, thanks.

Memory in Saskatchewan 9 Jul 2025, 7:00 pm

I just came back from Canada’s only rectangular province. I was there to help out my 95-year-old mother while her main caregiver took vacation. It’s an unhappiness that my family has splashed itself across Canada in such a way that we have to get on an airplane (or take drives measured in days) to see each other, but that’s where we are. I came back with pictures and stories.

Let me set the stage with a couple of photos. Everyone knows that Saskatchewan is flat and brown and empty, right?

Flowers, intensely colored in near-black purple and yellow

Trees and lawns, behind a still body of water and somewhat reflected in it

Mom lives in Regina, the provincial capital, a city built round a huge park that contains the Legislature (the flowers are from its front lawn), a sizeable lake, and an artificial mini-mountain (the water and trees are from its tip). Have no fear, I’ll get to some no-kidding prairie landscapes.

Health-care drama

The night I arrived, after my Mom went to bed she got up again, tripped on something and fell hard. Her right arm was swollen, bruised, and painful. The skin and adjacent blood vessels of very old people become thin and fragile; her whole forearm was a bruise. I tried to get her to go to Emergency but she wasn’t having any of it: “You wait for hours and then they give you a pain-killer, which is constipating.” Since she could twist her wrist and wiggle her fingers and give my hand a firm grasp, I didn’t push too hard.

A couple days later on Saturday she got her regular twice-a-week visit from the public HomeCare nurse, a friendly and highly competent Nigerian immigrant, to check her meds and general condition. She looked at Mom’s wrist and said “Get her an appointment with her doctor, they’ll probably want an X-Ray.”

I called up her doctor at opening time Monday. The guy who answered the phone said “Don’t have any appointments for a couple weeks but come on over, we’ll squeeze her in.” So we went in after morning coffee and waited less than an hour. The doctor looked at her arm for 45 seconds and said “I’m writing a prescription for an X-Ray” and there was a radiologist around the corner and she was in ten minutes later. The doctor called me back that afternoon and said “Your mother’s got a broken wrist, I got her an 8AM appointment tomorrow at Regina General’s Cast Clinic.”

The doctor at the clinic looked at her wrist for another 45 seconds and said “Yeah, put on a cast” so they did and we were home by ten. I’d pessimistically overpaid a couple bucks for hospital parking.

The reason I’m including this is because I notice that this space has plenty of American readers. Did you notice that the story entirely omits insurance companies and money (except parking)? In Canada your health-care comes with your taxes (granted, higher than Americans’) and while the system is far from perfect, it can fix up an old lady’s broken wrist pretty damn fucking quick without any bureaucratic bullshit. Also, Canada spends a huge amount less per head on health-care than the US does.

And Mom told me not to forget that Saskatchewan is the birthplace of Canadian single-payer universal healthcare. Tommy Douglas, the Social Democrat who made that happen, has been named The Greatest Canadian.

Gentle surface

Oh, did I say “flat and brown and empty”? Wrong, wrong, and wrong. The Prairies, in Canada and the US too, have textures and colors and hills and valleys, it’s just that the slopes are gentle. There are really flat parts and they make farmers’ lives easier, but more or less every square inch that’s not a town or a park is farmed. I took Mom for a drive out in the country southeast of Regina, from whence these views:

A road leading slightly uphill, brilliant yellow canola on both sides

Note that in both shots we’re looking up a gentle slope. In the second, there’s farm infrastructure on the distant horizon.
Also consider the color of the sky.

In Canada that yellow-flowering crop is called “Canola”, which Wikipedia claims refers to a particular cultivar of Brassica napus, commonly known as rapeseed or just rape, so you can see why when Canada’s agribiz sector wanted to position its oil as the thing to use while cooking they went for the cultivar not the species name. I’m old enough to remember when farmers still said just “rapeseed”. Hmm, Wikipedia also claims that the OED claims this: The term “rape” derives from the Latin word for turnip, rāpa or rāpum, cognate with the Greek word ῥάφη, rhaphe.

Let’s stick with canola.

Pixelated color

After I’d taken those two canola-field shots I pulled out my Pixel and took another, but I’m not gonna share it because the Pixel decided to turn the sky from what I thought was a complex and interesting hue into its opinion of “what a blue sky looks like” only this sky didn’t.

Maybe it’s just me, but I think Google’s camera app is becoming increasingly opinionated about color, and not in a good way. There are plenty of alternative camera apps, I should check them out.

In case it’s not obvious, I love photographing Saskatchewan and think it generally looks pretty great, especially when you look up. On the province’s license plates it says “Land of living skies”, and no kidding.

The first two are from the park behind Mom’s place,
the third from that mini-mountain mentioned above.

Experience and memory

My Mom’s doing well for a nonagenerian. She’s smart. When I visited early last fall and we talked about the US election I was bullish on Kamala Harris’s chances. She laughed at me and said “The Americans won’t elect a woman.” Well then.

But she’s forgetful in the short term. I took her to the Legislature’s garden and to the top of the mini-mountain and for a drive out in the country and another adventure we’ll get to; she enjoyed them all. But maybe she won’t remember them.

“Make memories” they say, but what if you show someone you love a good time and maybe they won’t remember it the next day? I’m gonna say it’s still worthwhile and has a lesson to teach about what matters. There endeth the lesson.

The gallery

Indigenous people make up 17% of Regina’s population, the highest share in any significant Canadian city. By “indigenous” I mean the people that my ancestors stole the land from. It’s personal with me; Around 1900, my Dad’s family, Norwegian immigrants, took over some pretty great farmland southeast of Edmonton by virtue of “homesteading”, such a nice word isn’t it?

Regina tries to honor its indigenous heritage and my favorite expression of that is its Mackenzie Art Gallery, a lovely welcoming space in the T.C.Douglas building (for “T.C.” read “Tommy”. (Did I mention him?) Mom and I walked around it and had lunch in its very decent café.

Every time I’ve been there the big exhibitions in the big rooms have been indigenous-centered, and generally excellent. I try to go every time I visit and I’ve never been disappointed.

Indigenous art at Regina’s Mackenzie Gallery

In 2025, anything I have to say about this piece would be superfluous.

I love modern-art galleries, especially with big rooms full of big pieces, even if I don’t like all the art. Because it feels good to be in the presence of the work of people who are pouring out what they have to offer, especially at large scale. If the task wasn’t hard enough that failures are common then it wouldn’t be worthwhile, would it?

They’re especially great when there’s someone I love there enjoying it with me. Here’s Mom.

These days, any visit might be the last. I hope this wasn’t.

QRS: Epsilon Wrangling 7 Jul 2025, 7:00 pm

I haven’t shipped any new features for Quamina in many months, partly due to a flow of real-life distractions, but also I’m up against tough performance problems in implementing Regular Expressions at massive scale. I’m still looking for a breakthrough, but have learned things about building and executing finite automata that I think are worth sharing. This piece has to do with epsilons; anyone who has studied finite automata will know about them already, but I’ll offer background for those people to skip.

I’ve written about this before in Epsilon Love. A commenter pointed out that the definition of “epsilon” in that piece is not quite right per standard finite-automata theory, but it’s still a useful in that it describes how epsilons support constructs like the shell-style “*”.

Background

Finite automata come in two flavors: Deterministic (DFA) and Nondeterministic (NFA). DFAs move from state to state one input symbol at a time: it’s simple and easy to understand and to implement. NFAs have two distinguishing characteristics: First, when you’re in a state and an input symbol arrives, you can transfer to more than one other state. Second, a state can have “epsilon transitions” (let’s say “ε” for epsilon), which can happen any time at all while you’re in that state, input or no input.

NFAs are more complicated to traverse (will discuss below) but you need them if you want to implement regular expressions with . and ? and * and so on. You can turn any NFA into a DFA, and I’ll come back to that subject in a future piece.

For implementing NFAs, I’ve been using Thompson's construction, where “Thompson” is Ken Thompson, co-parent of Unix. This technique is also nicely described by Russ Cox in Regular Expression Matching Can Be Simple And Fast. You don’t need to learn it to understand this piece, but I’ll justify design choices by saying “per Thompson”.

I’m going to discuss two specific issues today, ε-closures and a simpler NFA definition.

ε-closures

To set the stage, consider this regexp: A?B?C?X

It should match “X” and “BX” and “ACX” and so on, but not “CAX” or “XX”. Thompson says that you implement A? with a transition to the next state on “A” and another ε-transition to that next state; because if you see an “A” you should transition, but then you can transition anyhow even if you don’t.

The resulting NFA looks like this:

In finite-automaton math, states are usually represented by the letter “q” followed by a number (usually italicized and subscripted, like q₀, but not here, sorry). Note q4’s double circle which means it’s a goal state, i.e. if we get here we’ve matched the regexp. I should add that this was produced with draw.io, which seems to make this sort of thing easy.

Back to that NFA

So, here’s a challenge: Sketch out the traversal code in your head. Think about the input strings “AX” and “BCX” and just “X” and how you’d get through the NFA to the Q4 goal state.

The trick is what’s called the ε-closure. When you get to a state, before you look at the next input symbol, you have to set up to process it. In this case, you need to be able to transition on an A or B or C. So what you do is pull together the start state q0 and also any other states you can reach from there through ε-transitions. In this case, the ε-closure for the start state is {q0, q1, q2, q3}.

Suppose, then, that you see a “B” input symbol. You apply it to all the states in the ε-closure. Only q1 matches, transitioning you to q2. Before you look at the next input symbol, you compute the ε-closure for q2, which turns out to be {q2, q3}. With this ε-closure, you can match “C” or “X”. If you get a “C”, you”ll step to q3, whose ε-closure is just itself, because “X” is the only path forward.

So your NFA-traversal algorithm for one step becomes something like:

Start with a list of states.
Compute the ε-closure of that list.
Read an input symbol.
For each state in the ε-closure, see if you can traverse to another state.
If so, add it to your output list of states.
When you’re done, your output list of states is the input to this algorithm for the next step.

Computation issues

Suppose your regular expression is (A+BC?)+. I’m not going to sketch out the NFA, but just looking at it tells you that it has to have loopbacks; once you’ve matched the parenthetized chunk you need to go back to a state where you can recognize another occurrence. For this regexp’s NFA, computing the ε-closures can lead you into an infinite loop. (Should be obvious, but I didn’t realize it until after the first time it happened.)

You can have loops and you can also have dupes. In practice, it’s not that uncommon for a state to have more than one ε-transition, and for the targets of these transitions to overlap.

So you need to watch for loops and to dedupe your output. I think the only way to avoid this is with a cookie-crumbs “where I’ve been” trail, either as a list or a hash table.

Both of these are problematic because they require allocating memory, and that’s something you really don’t want to do when you’re trying to match patterns to events at Quamina’s historic rate of millions per second.

I’ll dig into this problem in a future Quamina-Diary outing, but obviously, caching computed epsilon closures would avoid re-doing this computation.

Anyhow, bear ε-closures in mind, because they’ll keep coming up as this series goes on.

And finally, simplifying “NFA”

At the top of this piece, I offered the standard definition of NFAs: First, when you’re in a state and an input symbol arrives, you can transfer to more than one other state. Second, you can have ε-transitions. Based on my recent work, I think this definition is redundant. Because if you need to transfer to two different states on some input symbol, you can do that with ε-transitions.

Here’s a mini-NFA that transfers from state q0 on “A” to both q1 and q2.

An NFA transferring to two different states on an input symbol

And here’s how you can achieve the same effect with ε-transitions:

Transferring to two destinations using ε-transitions

In that NFA, in qS the “S” stands for “splice”, because it’s a state that exists to connect two threads of finite-automaton traversal.

I’m pretty sure that this is more than just a mathematical equivalence. In my regexp implementation, so far at least, I’ve never encountered a need to do that first kind of dual transition. Furthermore, the “splice” structure is how Thompson implements the regular-expression “|” operator.

So if you’re building an NFA, all the traversal stuff you need in a state is a simple map from input symbol to next state, and a list of ε-transitions.

Next up

How my own implementation of NFA traversal collided head-on into the Benchmark From Hell and still hasn’t recovered.

The Real GenAI Issue 6 Jul 2025, 7:00 pm

Last week I published a featherweight narrative about applying GenAI in a real-world context, to a tiny programming problem. Now I’m regretting that piece because I totally ignored the two central issues with AI: What it’s meant to do, and how much it really costs.

What genAI is for

The most important fact about genAI in the real world is that there’ve been literally hundreds of billions of dollars invested in it; that link is just startups, and ignores a comparable torrent of cash pouring out of Big Tech.

The business leaders pumping all this money of course don’t understand the technology. They’re doing this for exactly one reason: They think they can discard armies of employees and replace them with LLM services, at the cost of shipping shittier products. Do you think your management would spend that kind of money to help you with a quicker first draft or a summarized inbox?

Adobe said the quiet part out loud: Skip the Photoshoot.

At this point someone will point out that previous technology waves have generated as much employment as they’ve eliminated. Maybe so, but that’s not what business leaders think they’re buying. They think they’re buying smaller payrolls.

Maybe I’m overly sensitive, but thinking about these truths leads to a mental stench that makes me want to stay away from it.

How much does genAI cost?

Well, I already mentioned all those hundreds of billions. But that’s pocket change. The investment community in general and Venture Capital in particular will whine and moan, but the people who are losing the money are people who can afford to.

The first real cost is hypothetical: What if those business leaders are correct and they can gleefully dispose of millions of employees? If you think we’re already suffering from egregious levels of inequality, what happens when a big chunk of the middle class suddenly becomes professionally superfluous? I’m no economist so I’ll stop there, but you don’t have to be a rocket scientist to predict severe economic pain.

Then there’s the other thing that nobody talks about, the massive greenhouse-gas load that all those data centers are going to be pumping out. This at a time when we we blow past one atmospheric-carbon metric after another and David Suzuki says the fight against climate change is lost, that we need to hunker down and work on survival at the local level.

The real problem

It’s the people who are pushing it. Their business goals are quite likely, as a side-effect, to make the world a worse place, and they don’t give a fuck. Their technology will inevitably worsen the onrushing climate catastrophe, and they don’t give a fuck.

It’s probably not as simple as “They’re just shitty people” — it’s not exactly easy to escape the exigencies of modern capitalism. But they are people who are doing shitty things.

Is genAI useful?

Sorry, I’m having trouble even thinking about that now.

My First GenAI Code 1 Jul 2025, 7:00 pm

At the moment, we have no idea what the impact of genAI on software development is going to be. The impact of anything on coding is hard to measure systematically, so we rely on anecdata and the community’s eventual consensus. So, here’s my anecdata. Tl;dr: The AI was not useless.

The problem

My current work on Quamina involves dealing with collections of finite-automata states, which, in the Go programming language, are represented as slices of pointers to state instances:

[]*faState

The problem I was facing was deduping them, so that there would be only one instance corresponding to any particular collection. This is what, in Java, the intern() call does with strings.

The algorithm isn’t rocket science:

Dedupe the states, i.e. turn the collection into a set.
For each set of states, generate a key.
Keep a hash table of sets around, and use the key to see whether you’ve already got such a set, and if so return it. Otherwise, make a new entry in the hash table and return that.

I’m out of touch with the undergrad CS curriculum, but this feels like a second-year assignment or thereabouts? Third?

Enter Claude

So I prompted Claude thus:

I need Go code to provide a "intern"-like function for lists of pointers. For example, if I have several different []*int arrays, which may contain duplicates, I want to call intern() on each of them and get back a single canonical pointer which is de-duplicated and thus a set.

Claude did pretty well. It got the algorithm right, the code was idiomatic and usefully commented, and it also provided a decent unit test (but in a main() stanza rather than a proper Go test file). I didn’t try actually running it.

The interesting part was the key computation. I, being lazy, had just done a Go fmt.Sprintf("%p") incantation to get a hex string representing each state’s address, sorted them, joined them, and that was the key.

Claude worked with the pointers more directly.

	// Sort by pointer address for consistent ordering
	sort.Slice(unique, func(i, j int) bool {
		return uintptr(unsafe.Pointer(unique[i])) < uintptr(unsafe.Pointer(unique[j]))
	})

Then it concatenated the raw bytes of the map addresses and lied to Go by claiming it was a string.

	// Create key from pointer addresses
	key := make([]byte, 0, len(slice)*8)
	for _, ptr := range slice {
		addr := uintptr(unsafe.Pointer(ptr))
		// Convert address to bytes
		for i := 0; i < 8; i++ {
			key = append(key, byte(addr>>(i*8)))
		}
	}
	return string(key)

This is an improvement in that the keys will be half the size of my string version. I didn’t copy-paste Claude’s code wholesale, just replaced ten or so lines of key construction.

Take-away

I dunno. I thought the quality of the code was fine, wouldn’t have decomposed the functions in the same way but wouldn’t have objected on review. I was pleased with the algorithm, but then I would be since it was the same one I’d written, and, having said that, quite possibly that’s the only algorithm that anyone has used. It will be super interesting if someone responds to this write-up saying “You and Claude are fools, here’s a much better way.”

Was it worth fifteen minutes of my time to ask Claude and get a slightly better key computation? Only if this ever turns out to be a hot code path and I don’t think anybody’s smart enough to know that in advance.

Would I have saved time by asking Claude first? Tough to tell; Quamina’s data structures are a bit non-obvious and I would have had to go to a lot of prompting work to get it to emit code I could use directly. Also, since Quamina is low-level performance-critical infrastructure code, I’d be nervous about having any volume of code that I didn’t really really understand.

I guess my take-away was that in this case, Claude knew the Go idioms and APIs better than I did; I’d never looked at the unsafe package.

Which reinforces my suspicion that genAI is going to be especially useful at helping generate code to talk to big complicated APIs that are hard to remember all of. Here’s an example: Any moderately competent Android developer could add a feature to an app where it strobes the flash and surges the vibration in sync with how fast you’re shaking the device back and forth, probably in an afternoon. But it would require a couple of dozen calls into the dense forest of Android APIs, and I suspect a genAI might get you there a lot faster by just filling the calls in as prompted.

Reminder: This is just anecdata.

Page processed in 0.089 seconds.

ongoing by Tim Bray

Bye, Google Search 1 Nov 2025, 7:00 pm

The problem

Pagefind

How it works

Scalable?

Configuring

Deployment

To do?

Search options

Shameful cleanup

Thanks!

Grokipedia 28 Oct 2025, 7:00 pm

On Bray

(Overly) complete

Wrong

Style

References

Useful?

Woke/Anti-Woke

Take-away

Recent Music 15 Oct 2025, 7:00 pm

Ghana Downtown

Loud Rude Brits

Rwanda Sings With Strings

Rapper Piano

Kora Magic

And now for something completely different

It makes me happy

Social Media Provenance Challenge 1 Oct 2025, 7:00 pm

The Nadia story

How it’s done

“Sign this picture?”

What key?

How to validate?

What am I missing?

GenAI Predictions 26 Sep 2025, 7:00 pm

Reverse Centaurs

Hallucinations won’t get fixed

The mass layoffs won’t happen

The financial damage will be huge

… But the economy won’t collapse

The software profession will change, but not that much

The real reason not to use GenAI

C2PA Investigations 18 Sep 2025, 7:00 pm

Maritime Wrap-up 13 Sep 2025, 7:00 pm

Food and drink

Cities, towns, and other tourists

Lodging

Otherwise

Maritime Colors 1 Sep 2025, 7:00 pm

Residentials

Red

Green Conundrum

Maritime Birds and Bees 30 Aug 2025, 7:00 pm

Birds

Bees

Lens

Maritime Vacation 27 Aug 2025, 7:00 pm

RFC 9839 and Bad Unicode 14 Aug 2025, 7:00 pm

Please

The smoking gun

Don’t blame Doug

PRECISion

Source code

Details

Thanks!

On individual submissions

Long Links 4 Aug 2025, 7:00 pm

Radisson (and Groseilliers)

Music

Life online

Class Reductionism

No more sections

De-Google Project Update 29 Jul 2025, 7:00 pm

Office

Data sharing (progress!)

Photos

Video meetings

Maps

The cursed “`?`”