> But no need to wait. At a high level, Gas City is the answer to all your problems. Ha! At least, for certain classes of problem, such as, “How can I bring AI into my company and pass an audit trail,”
The important audit at my company is conducted by the FDA.
I have a feeling when they ask what processes we followed to mitigate any user harm that could be caused by software changes that "I told an AI-mayor in the form of a cartoon fox what to do and he spit out a bunch of vibecode software written by AI-driven virtual cartoon characters" is not among the answers they want to hear.
I did an induction at some ISO certified company some years back, reading their docs. A good 50% of them contained significant content that basically read:
> the thing must be in the place where it should be
With no further information e.g. what place, where, how, when, who facilitates that?
> the person who facilitates it, is the person who facilitates it.
Yea thanks. So their ISO accredited process was basically no process. Would have been way better with a talking fox.
So I feel like humans are capable of just as bad. I'd be interested in what answer the Fox could spit out and I kinda wonder where it might fit on the bell curve of all non Gas-Town "auditable" processes. I'm all for skepticism but I feel like it would be more tangible if we instead criticised the response instead of just conjuring it as "definitely awful" because it happens to be on top of a generated stack.
I mean: I don't want it to work, but maybe we're not as good as we think we are, or the stuff we rate as super important is actually way less important with a generated context. As much as I love good code, the thought that gnaws at the back of my head is the truism that some of the most profitable code in history has been some of the "worst" code (e.g. MySpace's janky code base ontop of ColdFusion or Twitter's "Fail Whale" era).
So I'm happy that someone is exploring this space in an open way. I'm just glad I'm not the one finding that out with my face first.
Which ISO certification matters, but the key thing people should be aware of is that the primary value of the certification to customers is that your processes are documented and that deviations are tracked, so that customers can check whether the processes makes sense before signing a contract. It's important not to expect the certification itself to guarantee quality.
Ehhh in my experience compliance auditors are 10 behind the cutting edge. I still see auditors that don't understand Kubernetes and so ask the same questions they would about on prem machines. They don't know the questions to ask to get to the real meat of the risks. This leads them to allow things through that probably deserve more scrutiny. I bet the same thing will happen with LLM tools like this. They'll just ask if you use PRs and wave you on through.
If its all obviously shit then it shouldn't be that hard. Maybe point Claude at it and ask it to find the most stupid stuff that you can then manually verify as being wtf.
My point is that just calling him names has no substance, but mocking his source specifically does.
> We're supposed to be engineers. Criticising a concept based on conjecture and insult is unbecoming of our culture.
The entire thing is nothing but conjecture. No real software has been produced by the concept to date, except more garbage software that takes hundreds of thousands of lines where a few thousand would do.
And to be clear, Beads and Gastown are unbecoming of our “culture” and any self respecting engineer would recoil in horror at the concept.
There are some good ideas in the tools. I hope our culture has room left for curiosity and exploration in that way. I also highly doubt these will catch on over one-shotting and Jira though. Here’s one quick thought from each:
Beads keeps the issue tracker state in git. This can only work if you don’t have a PR/Review-Gate to submitting patches (to file bugs/issues/etc) but I’ve found it unexpectedly helpful for personal projects. Helpful enough to entertain the idea in other contexts.
Gastown uses an AI Agent _as the orchestrator_, and to kick stalled agents, and, that’s such an obvious thing in hindsight I can’t believe I hadn’t thought of it. I have adopted this in a few other contexts now.
How is it conjecture when you just admitted you're aware of a repo with hundreds of thousands of lines of code.
Your argument belies a lazyness or skill issue. You abandon the possibility of proof to sling mud.
People have made a shit ton of economic value with shit code over the course of history of software and now that's accelerating with this sort of shite. I appreciate its ugly but i will not follow you by fashioning a duvet out of arrogance and throwing rocks people investigating the ugly.
What they're doing it at least an interesting spectacle and ill wait for the show to end before writing my review.
What it means is that it is easy to shit on other people's work. Much harder to give constructive criticism - especially on what looks like a throwaway account.
It’s not “other people’s work” because Steve didn’t do any work. He vibe coded hundreds of thousands of lines that don’t do what they’re supposed to with many thousands of lines of documentation that are inaccurate at best and aspirational at worst. He wrote some blog posts and got them picked up by vapid outlets that had nothing else to add to boost his exposure.
Case in point: no one talks about beads or gastown on HN because it’s crap that no one uses. Even *claw and that dumb fad get more mileage. meanwhile, CC vs Codex is an ever ongoing battle and Anthropic employees announce policy changes in “Tell HN” posts which stay on the front page for days.
If you actually follow the links you posted, you will see that he didn't create a meme coin and didn't rug pull it. Someone else made the coin and set it up so he got transfer fees.
Does Yegge really think that building production software this way is a good idea?
Let's assume that managing context well is a problem and that this kind of orchestration solves it. But I see another problem with agents:
When designing a system or a component we have ideas that form invariants. Sometimes the invariant is big, like a certain grand architecture, and sometimes it's small, like the selection of a data structure. Eventually, though, you want to add a feature that clashes with that invariant. At that point there are usually three choices:
* Don't add the feature. The invariant is a useful simplifying principle and it's more important than the feature; it will pay dividends in other ways.
* Add the feature inelegantly or inefficiently on top of the invariant. Hey, not every feature has to be elegant or efficient.
* Go back and change the invariant. You've just learnt something new that you hadn't considered and puts things in a new light, and it turns out there's a better approach.
Often, only one of these is right. Usually, one of these is very, very wrong, and with bad consequences.
But picking among them isn't a matter of context. It's a matter of judgment and the models - not the harnesses - get this judgment wrong far too often (they go with what they know - the "average" of their training - or they just don't get it). So often, in fact, that mistakes quickly accumulate and compound, and after a few bad decisions like this the codebase is unsalvageable. Today's models are just not good enough (yet) to create a complete sustainable product on their own. You just can't trust them to make wise decisions. Study after study and experiement after experiment show this.
Now, perhaps we make better judgment calls because we have context that the agent doesn't. But we can't really dump everything we know, from facts to lessons, and that pertains to every abstraction layer of the software, into documents. Even if we could, today's models couldn't handle them. So even if it is a matter of context, it is not something that can be solved with better context management. Having an audit trail is nice, but not if it's a trail of one bad decision after another.
I think a lot of it comes down to the training objective, which is to fulfill the user’s request.
They have knowledge about how programs can be structured in ways that improve overall maintainability, but little room to exercise that knowledge over the course of fulfilling the user’s request to add X feature.
They can make changes which lead to an improvement to the code base itself (without adding features); they just need to be asked explicitly to do so.
I’d argue the training objective should be tweaked. Before implementing, stop to consider the absolutely best way to approach it - potentially making other refactors to accommodate the feature first.
Beads is cool, but I tried to use it, and the backend didn't really make sense. I have to run an sql database in the background? How does it sync with Git? (I didn't see any files/objects committed to the repo) Plus, Dolt ended up using a constant 3-30kB/s of i/o in the background, while nothing was actually going on. That and Beads has a lot of features I'm not gonna use. All of this was just too complicated for my tiny brain.
So I slapped together my own Beads implementation (https://codeberg.org/mutablecc/dingles) over a day or two. Probably has bugs, and I'm sure race conditions if you tried to use with Gas Town, and likely does not scale. But it has the minimum functionality needed to create and track issues and sync them (locally and remotely, either via normal merge, or a dedicated ticket branch). No SQL, no extra features, just JSONL and Git. Threw a whole large software project at it, and the AI took to it like a duck to water, used it to make epics for the whole project, methodically worked through them all, dependencies first, across multiple context sessions. The paradigm of making tools the AI wants to use is clearly a winner.
Glad I'm not the only one using Gastown as a space heater. I filed an issue here and hope to find some time this week to research more: https://github.com/dolthub/dolt/issues/10849
I did the same thing, and had the same experience, but it also gave more of an appreciation for Beads' architecture. I think the weird redundancies actually made a lot of sense when I began to understand some of the edge cases where agents crap the bed.
Serious question - there's a lot of fluff talking about Gas Town, but has Gas Town shipping something in public that can be evaluated without all of this surrounding hype and blogposting?
At this point it should be clear that Gas Town has done something we can evaluate the value of.
>At this point it should be clear that Gas Town has done something we can evaluate the value of.
I see this sentiment often, repeated a couple times in here, but I don't understand why on earth that would be the case. Gas Town was released a little over three months ago. It's an ongoing open-source experiment at the bleeding edge of vendor-agnostic multi-agent orchestration.
I was using gastown for fire-and-forget prototyping of larger projects. It was flaky and scorches tokens but it was able to get larger prototypes done than I could with a single instance of my daily river (claude) alone.
you can always fire it up yourself and see what its all about. in my experience it generates a lot of code very quickly, that code is probably only ever supposed to be LLM maintained, not by people.
I don't think the op meant Gas Town itself (if they did, my bad), but what has Yegge done with Gas Town? By now it should have released some amazing thing if Gas Town increases productivity so much.
What has Yegge done with Gas Town? Well for one, he has posted a bunch of blog content about it which has generated chatter like this and increased his geek mindshare.
Just because he's operating in the realm of smart nerds doesn't mean he is immune to the value-inverting effects of social media.
If the post does not have any use-cases proving value then perhaps this is something yet to be validated, i.e. the burden on the users, not the creators.
In an era where creating such libraries is much cheaper than validating that they're useful or work, yeah you really should validate it before you expect someone to use it. Nobody is going around trying out every slop project they see, they'd be wasting hours and hours for no gain at all.
This all being said, I do find the idea interesting, but heeded it's advice when it said it's hideously expensive and risky to use. So yes, I do want someone braver, richer, and stupider than me to take the first leap
You’re very good at this! I have trouble slopping out more than a day or two!
Treat this like art. There are some neat ideas, maybe not executed particularly well. Somewhere around 7/10 IMDB score. The working implementation makes the blog post more impactful more than the other way around.
If it was art, I would find it really quite neat. However it doesn't seem intended as such:
> Gas Town “just works.” It does its job, it has tons of integration points, and it has been stable for many weeks. People are using it to build real stuff.
This is my experience as well. At work, our team is 50/50 on 'mastery' of current AI tools. All of us using parallel agentic workflows have our own flavor of tooling. I'm not convinced there's an agreement yet on what the 'ideal' is here, so experimentation is where it's at. Over-indexing on a massively complex system like Gastown for professional work seems unwise. Lots of us have used it for fun at home though.
> Having spent six weeks or so using Gas Town across multiple simultaneous projects, I believe I can describe the shift concretely. The bottleneck migrates from coding speed to the rate at which you can generate ideas, write specifications, and validate outputs. You are no longer limited by how fast you can build. You are limited by how fast you can think.
Interesting:
> Kubernetes asks “Is it running?” Gas Town asks “Is it done?” Kubernetes optimizes for uptime. Gas Town optimizes for completion.
I’m not sure I find the testimony of a Bain & Company AI consultant (https://www.bain.com/our-team/eric-koziol/) to be compelling for anything outside of generating fees.
This sounds like every LLM workflow, which is 'you tell the LLM what you want'.
The real distinction is of scale - whether you want a REST endpoint or a fully functional word processor.
But real, actual, complex software is at least half spec (either explicit, or implicitly captured by its code), the question is, can LLMs specify software to the same degree with Gas Town, that you get something functioning?
You provided a quote from someone who seems to be an AI-boosting influencer who claimed to use it, but where's the output in the form of code we can look at, or in the form of an app someone can use today?
I'm not an AI-denier. I use LLMs and agentic coding. They increase my productivity.
...but there is still a very real problem with people claiming that some new way of using AI is earth shattering, and changes everything based on vague anecdotes that don't involve a tangible released output that they can point to.
Yeah if this can truly just autonomously make great software, then where is all the new SaaS that is able to undercut incumbents by charging 10-20% of what they are charging?
I loved Beads, but kept running into issues because it is so git heavy. One: not every system and project I work on uses git. Two: Sometimes I'd switch branches, and that would screw up Beads state entirely. Three: And this is at least last I used it, there's no safety net, Claude would close a Bead, without validating anything.
I wound up building my own with Claude, I made it SQLite first, syncs to GitHub, can pull down from GitHub, and I added "Gates" to stopgap Claude or whatever agent from marking things complete if they've not been: compiled, unit tests run, or simple human testing / confirmation. The Gates concept improved my experience with Claude, all too often it says it finished something, when in fact it did not. Every task must have a gate, and gates must pass before you can close a task. Gates can be reused across tasks, so if "Run unit tests" is one gate, you can reuse it for every task, when it passes, it passes for that one task <-> gate combination.
Anyway, I'm happy for Beads, Gas Town not so much my wheelhouse on the other hand.
I have not, and at this point its too late for me to do so, I've already invested in my project that does what Beads did for me with some features I really wanted.
How did you implement gates? Are they simply tasks Claude itself has to confirm it ran, or are they scripts that run to check that the thing in question actually happened, or do they spawn a separate AI agent to check that the thing happened, or what?
Claude or whatever agent will get a message when it tries to close a task, which tells them which gates are not resolved yet, at which point, the agent will instinctively want to read the task. I did run into an issue where I forgot to add gates to a new project, so Claude did smoosh over by making a blanket gate, I have otherwise never had an issue when I defined what the gate is, Claude usually honors it. I havent worked on big updates recently, but I noticed other tools like rtk (Rust Token Killer) will add their own instructions to your claude's instructions.md file, so I think I need to craft one to tack on with sane instructions, including never closing tasks without having the user create gates for them first.
In a nutshell, a gate is a entry in the DB with arbitrary text, Claude is good about following whatever it is. Claude trying to close a task will force it to read it.
Life's gotten slightly busy, but you can see more on the repo. I've been debating giving it a better name, I feel like GuardRails implies security, when the goal is just to validate work slightly.
I suppose, I mean the LLM is still reading it, the issue is, Beads gives the model a task, and then the model finishes, and never checks anything. I kept running into this repeatedly, and sometimes I'd go to compile the project after it said "hey I finished" it wouldn't compile at all, where if it would have just tried to build the project, it would have just worked.
I built something similar with verifiable gates tasks. The agent has a command to mark the task as done and it will run the bash script, if it passes the task closes, if it doesn’t it appends the failure information into the task description for the agents next attempt at the task.
Gas Town has always struck me as more of a performance art piece than a tool that was actually meant to be used, even among the recent hyped AI projects. If you’re using it for real, what are you using it for?
Gas Town really feels not just vibe coded but also vibe designed.
I looked into it, to see whether multi agent setups really made a difference, the entire design philosophy feels like it was « let’s add one more layer of agent and surely this time it will work » about 10 times in a row.
So now you have agents of type mayor, polecats, witnesses, deacons, dogs etc plus a slew of
Unneeded constructs with incomprehensible names.
In one of the blog post for gas town I remember reading something by the author along the lines of « it’s super inefficient, but because you burn so many tokens, you still get what you want at the end! » clearly this is also the design philosophy behind this project, just (get your ai to) throw more random abstractions and more agent types until you feel like it kinda works, don’t bother asking yourself if they actually contribute anything.
This gave me the very clear feeling that most of the complexity of gas town is absolutely not needed and probably detrimental.
Ended up building my own thing that is 10x simpler, just a simple main agent you talk to, that can dispatch subagents, they all communicate, wake each other up and keep track of work through a simple CLI. No « refinery » or « wasteland » or « molecule » or « convoys » or « deacons » or …
> Ended up building my own thing that is 10x simpler, just a simple main agent you talk to, that can dispatch subagents, they all communicate, wake each other up and keep track of work through a simple CLI. No « refinery » or « wasteland » or « molecule » or « convoys » or « deacons » or …
You won't get 10k stars and a blog post out of that. Obviously you need some Stoats who have Conferences with the Stump Lord to determine whether they are needed at the Silo or the Bilge. They'll then regroup at the appropriate Decision Epicenter and delegate to the Weasels and Chipmunks who actually do the coding (antiquated term) in the Salt Mine.
Nice timing. I was just noting that beads in an old repo, just ... worked. Updates worked, I didn't have super weird errors to track down... I was like "nice!" Beads bumping to 1.0 is great. I haven't used gas town in a month or so, but a stable gas town sounds very valuable.
I think Yegge's instincts that making a programmable / editable coordination layer (he calls this gas city) is a great idea. Gas town early days was definitely a wild experience in terms of needing to watch carefully lest your system be destroyed, and then I put that energy into OpenClaw - I'll probably spin up Gas City and see what it can do soon though. Very cool.
My experience is that Agentic Coding can legitimately get you mostly-working software. You do, however, still need to spend a few days groking, validating, and usually nudging/whacking it to conform to the shape you intended vs what the agent inferred.
It is pretty magical to go from brainstorming an idea in the evening, having ChatGPT Pro spit out a long list of beads to implement it, leaving it running over night in a totally empty repo and waking up to a mostly-implemented project.
It's magical until you start going through the code carefully, line by line, and find yourself typing at the agent: YOU DID WHAT NOW? Then, when you read a few more lines and realise that neither AI nor human will be able to debug the codebase once ten more features are added you find yourself typing: REVERT. EVERYTHING.
yes, this is an issue i see too... also fixing it up takes alot of time (sometimes more if i just 'one-shotted' it myself)... idk these tools are useful, but i feel like we are going too far with 'just let the ai do everything'...
I tried Beads and it kept breaking in such frustratingly random ways that I just added a Linear MCP server and called it a day. That's really all you need.
I've been using Beads for 5 different projects, and Beads and/or Dolt failures have been a regular thing. It's own "doctor" feature is sort of disturbing, in that it (1) tells me that my Beads setups are always at least a little bit broken, but (2) can never fix all of the issues. Hopefully the 1.0 designation means that Steve is out of "throw shit at the wall" mode. Beads is fine as a replacement for Markdown files, but I'll never go near Gas Town because of my experience with it.
Agreed. I kinda concluded that the expected `doctor` usage is to have an agent run it for you and then they can try to figure it out when `doctor` can’t fix the issues.
I tried both Beads and Gas Town and had the same experience.
These fully vibe coded tools seem to have near zero QA. The fact that they ship with a `doctor` command that you regularly need to run (even if you didn’t change anything about your environment) tells you all you need to know.
I searched on google about the cost of running Gas Town. The Gemini AI response claimed Gas town costs $100 / hour and can spit out 4000 lines of code per hour, so Gas Town costs 2.5 cents per line of code.
I tried tracking down where those numbers came from and the sources were a bit sketchy. Can anybody who has used Gas Town confirm those numbers, or report their personal numbers?
I think the point is that we are given no real idea at all, not even some ballpark figures, about how much actual money actually needs to be spent in order to run Gas Town. The original article had some handwavy stuff about how it cost An Awful Lot, but no actual numbers.
Cost per line of code is not an amazing metric, but at least it's an attempt to come up with a figure.
(I would also be interested to find out how much it costs to run...)
Well, yes ideally we would eventually also have metrics about error rates and reject rates. Like ideally at some point someone could do a study of "for every 100 PRs Gas Town generates, how many are accepted after code review and how many are rejected" or "for every 100 lines of code Gas Town generates, how many coding errors are detected by human reviewers".
Unfortunately I think things are moving so fast that by the time such a study was done, we would already be on to newer models and newer versions of gas town.
i've experimented quite a lot with multi agent setups and orchestrations.
In the end, it didn't feel worth it mostly because of high token overhead (inter agent communications, agents re reading same code, etc...) and synchronization / cooperation issues (who should do what).
What actually works for me and provides good results: multi step workflows with clearly defined steps and strong guidance for the agent.
I'm pretty excited about agentic coding myself, but this does appear to be an extended ai-psychosis (i'm not super comfortable with this phrase, but it is becoming pretty recognisable).
I think he's boxed himself in by continually layering more complexity on his approach, rather than stepping back and questioning the fundamentals or the overall direction.
All of the steps Gas Town or Gas City etc are taking are towards reducing human oversight and control. This is profoundly misguided! In a world of infinite cheap software it is precisely this human decision making and control that matters.
> There will be nothing like it. You are going to want to use Gas City.
No. I do not want to talk to the mayor of my software factory, as its cartoonish minions build an infinite mountain of slop. Unreviewable, both in terms of code and the finished product.
Instead, I want to precisely capture human ideas, have those ideas questioned, challenged, improved, and then I want to bring those ideas to life, keeping the human in the loop whenever they want. Neither Beads, Gas Town, nor Gas City or anything like them are required for that.
> All of the steps Gas Town or Gas City etc are taking are towards reducing human oversight and control. This is profoundly misguided!
I mostly agree with you other than this one sentence.
Let's say you're building the literal antithesis of Gastown: an AI agent software specifically designed to be human reviewed and monitored. How to you make this as efficient as possible? Well, it's by ensuring high quality results from human oversight and control while spending the least amount of time. Which to be precise, is still reducing human oversight and control per unit of useful work done.
I cannot really get behind Gas Town or any other “agent swarm” setup. They always seem to waste an incredible amount of tokens on passing the buck around as half-finished specs, and even with a healthy amount of tokens pre-allocated they burn money faster than setting my wallet on fire…
The climate crisis is primarily a consequence of fossil fuels, not necessarily energy demand. I feel like its a poor conflation, despite it possibly being a truism depending on where the datacentre is based and what power source feeds it.
If we're keeping gas and coal plants online to power this or using gas generators to power data centers, I'd consider that a wasteful contributing factor.
but we could also argue that not investing in extra non-fossil fuel capacity is the issue here. OR not investing in more research on super conductors and/or storage. Iceland could possibly export considerable amounts of renewable energy if we licked those problems.
I mean, under the same logic couldn't we kinda argue that TV has ruined the planet? A lot of energy for something of debatable physical value. OR Motor racing, football, The Olympic games? All that energy and waste just to find out who can throw a stick the furthest every four years.
> I’ve been saying since last year that by the end of 2026, people will be mostly programming by talking to a face. There’s absolutely NO reason to type with the Mayor. You should be able to chat with them like a person. You’ll have a cartoon fox there onscreen, in costume, building and managing your production software, and showing you pretty status updates whenever you ask for one. This is the end state for IDEs.
This is a desirable end state for highly social but perhaps slightly sociopathic extroverts who want to spend all day talking even though they aren't talking to a person.
For anyone else, it's hard to imagine considering that a desirable way to spend eight hours a day.
>This is a desirable end state for [a category of people] who want to spend all day talking even though they aren't talking to a person.
When I am not in actual meetings, I do already spend all day talking to anthropomorphized facets of my personality that represent software architecture, security paranoia, operational practicality, user experience, etc. Often not by speaking aloud, but it's a conversation nonetheless.
So yes, this sounds absolutely grand.
EDIT: But I don't think it should be forced on everyone! Having the option to use the tools that work best for you should be the goal.
Set a budget. Fund an openrouter account with the max you can stomach spending on this test and give it a shot.
At least, that’s what I would do, if I had any interest in testing out gastown with my own money. If my employer wants to pay for the testing, that’s another question entirely.
No. He used to be good with ideas. Now he's drinking too much of his own poison. His blog posts are bloviated monstrosities incapable of coherently describing the very trivial ideas contained in them.
Also I chuckled at the AI-generated "The Overseer is Alnays Right | Vacation Approved" poster in the background of the split image where the mayor is reading so you can supervise. This has strong Boondocks Catcher Freeman vibes. I want to hear the polecats/badgers' version.
I feel Gastown is an attempt at answering: what if i push the multi-agent paradigm to its chaotic end?
But I think the point that Yegge doesn't address and that I had to discover for myself is: getting many agents working in parallel doing different things -- while cool and exciting (in an anthromorphic way) -- might not actually be solving the right problem. The bottleneck in development isn't workflow orchestration (what Gastown does) -- it's actually problem decomposition.
And Beads doesn't actually handle the decomposed problem well. I thought it did. But all it is is a task-graph system. Each bead is task, and agents can just pick up tasks to work on. That looks a lot like an SDE picking up a JIRA ticket right? But the problem is embedding just enough context in the task that the agent can do it right. But often it doesn't, so the agent has to guess missing context. And it often produces plausible code that is wrong.
Devolving a goal into the smaller slices is really where a lot of difficulty lies. You might say, oh, "I can just tell Claude to write Epics/Stories/Tasks, and it'll figure it out". Right? But without something grounding it like a spec, Claude doesn't do a good job. It won't know exactly how much context to provide to each independent agent.
What I have found useful is spec-driven development, especially of the opinionated variety that Kiro IDE offers. Kiro IDE is a middling Cursor, but an excellent spec generator -- in fact one of the best. It generates 3 specs at 3 levels of abstraction. It generates a Requirements doc in EARS/INCOSE (used at Rolls Royce and Boeing for reducing spec ambiguity), and then generate a Design doc (commonly done at FAANG), and... then generates a Task list, which cross-references the sections of the requirements/design.
This kind of spec hugely limits the degrees of freedom. The Requirements part of the spec actually captures intent, which is key. The Design part mocks interfaces, embeds glossaries, and also embeds PBTs (property-based tests using Hypothesis -- maybe eventually Hegel?) as gating mechanisms to check invariants. The Task list is what Beads is supposed to do -- but Beads can't do a good job because it doesn't have the other two specs.
I've deployed 4 products now using Kiro spec-driven dev (+ Simon Willison's tip "do red/green tdd") and they're running in prod and so far so good. They're pressure-tested using real data.
Spec-driven development isn't perfect but I feel its aim is the correct one -- to capture intentions, to reduce the degrees of freedom, and to constrain agents toward correctness. I tried using Claude Code's /plan mode but it's nowhere as rigorous, and there's still spec drift in the generated code. It doesn't pin down the problem sufficiently.
Gastown/Beads are solutions for workflow orchestration problem (which is exciting for tech bros), but at its core, it's not the most important problem. Problem decomposition is.
Otherwise you're just solving the wrong problem, fast.
Getting sucked into a crypto scam and then deciding to get out, despite the death threats(!)[1] is not a rug pull.
To be clear, the BAGS scam coin he got sucked into is a extractive zero-sum game where someone else creates a coin named after him, offers him trading commission to talk about it and then makes money off the hype.
He did the correct thing by leaving.
(I worked for a bit at a Web3 place. Went in with an open mind and now have opinions)
> someone else creates a coin named after him, offers him trading commission to talk about it and then makes money off the hype
And we are supposed to believe that someone deep in tech, in 2026, did not know this was going to be the end goal? Was $GAS supposed to be a crypto to help fund poor farmers in Burundi or something ? How else is the meme coin #16352813 supposed to end? That’s the entire point of meme coins.
Would love to also « get sucked » into making 300k$.
> Would love to also « get sucked » into making 300k$.
Exactly.
It seems harmless - "look, we have this token, it is being traded anyway, do you want to get some of the emissions?"
Who wouldn't say "yes" to that free money.
But it's not clear at all how corrupting the crypto scam is, and how subtly the corruption seeps into what you do. It starts by "oh can you just tweet out about this" and you are "sure - it's just a tweet" and slowly grows.
Steve Yegg deserves credit for walking away from it.
yeah its kind of sad, because people have to then re-evaluate others they heard about who they also didn't believe the apologies of at the time
like the Hawk Tuah girl, or the Enron relaunch long form comedy routine that wound up with a short lived crypto token, and pretty much anyone with 15 minutes of fame or celebrities that drop a contract address
for the most part, they themselves actually are the victims of a roving band of deployers running the crypto launch convincing them they're part of something, and of course, the consumers have the choice of never getting involved
but the deployers are the ones that should face some form of accountability, or at least the public eye
The description on Wikipedia looks like somebody else created a memecoin in his honor, sent him the profits, and he accepted them? And the only people harmed were people who invest in random memecoins? I don't understand the problem.
To me, no, not quite. I'll give him one free pass. More like "I'll coast on this pulled rug to see what happens" than that he did the rug pull. Not a very wise thing to do, but not malicious either.
I'm a long-time Steve Yegge fan but a major Gas Town hater (now Gas City too, I guess). It's doubling down on all the wrong metaphors.
I also simply detest how Gas Town is modeled fundamentally on an extractive and destructive metaphor, the 19th century factory. I want to live in a verdant software garden, not a dystopian industrialist hellscape.
In my view the StrongDM guys are on the right long-term path.
The important audit at my company is conducted by the FDA.
I have a feeling when they ask what processes we followed to mitigate any user harm that could be caused by software changes that "I told an AI-mayor in the form of a cartoon fox what to do and he spit out a bunch of vibecode software written by AI-driven virtual cartoon characters" is not among the answers they want to hear.
And those cartoon foxes didn't even do anything! I guess these ones do?
Don't put it past the masses. These are crazy times.
https://poignant.guide/book/chapter-3.html
> the thing must be in the place where it should be
With no further information e.g. what place, where, how, when, who facilitates that?
> the person who facilitates it, is the person who facilitates it.
Yea thanks. So their ISO accredited process was basically no process. Would have been way better with a talking fox.
So I feel like humans are capable of just as bad. I'd be interested in what answer the Fox could spit out and I kinda wonder where it might fit on the bell curve of all non Gas-Town "auditable" processes. I'm all for skepticism but I feel like it would be more tangible if we instead criticised the response instead of just conjuring it as "definitely awful" because it happens to be on top of a generated stack.
I mean: I don't want it to work, but maybe we're not as good as we think we are, or the stuff we rate as super important is actually way less important with a generated context. As much as I love good code, the thought that gnaws at the back of my head is the truism that some of the most profitable code in history has been some of the "worst" code (e.g. MySpace's janky code base ontop of ColdFusion or Twitter's "Fail Whale" era).
So I'm happy that someone is exploring this space in an open way. I'm just glad I'm not the one finding that out with my face first.
The sanatorium from American Horror Story Asylum comes to mind.
Dominique, nique, nique…
Where do I even begin to mock that except at the source? That’s just absolute insanity.
My point is that just calling him names has no substance, but mocking his source specifically does.
What does that even mean? Am I supposed to point Claude at garbage code bases? All it will find is garbage.
> My point is that just calling him names has no substance, but mocking his source specifically does.
He is the source. He wrote this stuff under his own volition.
The entire thing is nothing but conjecture. No real software has been produced by the concept to date, except more garbage software that takes hundreds of thousands of lines where a few thousand would do.
And to be clear, Beads and Gastown are unbecoming of our “culture” and any self respecting engineer would recoil in horror at the concept.
Beads keeps the issue tracker state in git. This can only work if you don’t have a PR/Review-Gate to submitting patches (to file bugs/issues/etc) but I’ve found it unexpectedly helpful for personal projects. Helpful enough to entertain the idea in other contexts.
Gastown uses an AI Agent _as the orchestrator_, and to kick stalled agents, and, that’s such an obvious thing in hindsight I can’t believe I hadn’t thought of it. I have adopted this in a few other contexts now.
Your argument belies a lazyness or skill issue. You abandon the possibility of proof to sling mud.
People have made a shit ton of economic value with shit code over the course of history of software and now that's accelerating with this sort of shite. I appreciate its ugly but i will not follow you by fashioning a duvet out of arrogance and throwing rocks people investigating the ugly.
What they're doing it at least an interesting spectacle and ill wait for the show to end before writing my review.
Case in point: no one talks about beads or gastown on HN because it’s crap that no one uses. Even *claw and that dumb fad get more mileage. meanwhile, CC vs Codex is an ever ongoing battle and Anthropic employees announce policy changes in “Tell HN” posts which stay on the front page for days.
Could work
Let's assume that managing context well is a problem and that this kind of orchestration solves it. But I see another problem with agents:
When designing a system or a component we have ideas that form invariants. Sometimes the invariant is big, like a certain grand architecture, and sometimes it's small, like the selection of a data structure. Eventually, though, you want to add a feature that clashes with that invariant. At that point there are usually three choices:
* Don't add the feature. The invariant is a useful simplifying principle and it's more important than the feature; it will pay dividends in other ways.
* Add the feature inelegantly or inefficiently on top of the invariant. Hey, not every feature has to be elegant or efficient.
* Go back and change the invariant. You've just learnt something new that you hadn't considered and puts things in a new light, and it turns out there's a better approach.
Often, only one of these is right. Usually, one of these is very, very wrong, and with bad consequences.
But picking among them isn't a matter of context. It's a matter of judgment and the models - not the harnesses - get this judgment wrong far too often (they go with what they know - the "average" of their training - or they just don't get it). So often, in fact, that mistakes quickly accumulate and compound, and after a few bad decisions like this the codebase is unsalvageable. Today's models are just not good enough (yet) to create a complete sustainable product on their own. You just can't trust them to make wise decisions. Study after study and experiement after experiment show this.
Now, perhaps we make better judgment calls because we have context that the agent doesn't. But we can't really dump everything we know, from facts to lessons, and that pertains to every abstraction layer of the software, into documents. Even if we could, today's models couldn't handle them. So even if it is a matter of context, it is not something that can be solved with better context management. Having an audit trail is nice, but not if it's a trail of one bad decision after another.
They have knowledge about how programs can be structured in ways that improve overall maintainability, but little room to exercise that knowledge over the course of fulfilling the user’s request to add X feature.
They can make changes which lead to an improvement to the code base itself (without adding features); they just need to be asked explicitly to do so.
I’d argue the training objective should be tweaked. Before implementing, stop to consider the absolutely best way to approach it - potentially making other refactors to accommodate the feature first.
Hang in there, there will be a lot of slop to fix in contract work...
So I slapped together my own Beads implementation (https://codeberg.org/mutablecc/dingles) over a day or two. Probably has bugs, and I'm sure race conditions if you tried to use with Gas Town, and likely does not scale. But it has the minimum functionality needed to create and track issues and sync them (locally and remotely, either via normal merge, or a dedicated ticket branch). No SQL, no extra features, just JSONL and Git. Threw a whole large software project at it, and the AI took to it like a duck to water, used it to make epics for the whole project, methodically worked through them all, dependencies first, across multiple context sessions. The paradigm of making tools the AI wants to use is clearly a winner.
At this point it should be clear that Gas Town has done something we can evaluate the value of.
I see this sentiment often, repeated a couple times in here, but I don't understand why on earth that would be the case. Gas Town was released a little over three months ago. It's an ongoing open-source experiment at the bleeding edge of vendor-agnostic multi-agent orchestration.
I was using gastown for fire-and-forget prototyping of larger projects. It was flaky and scorches tokens but it was able to get larger prototypes done than I could with a single instance of my daily river (claude) alone.
Just because he's operating in the realm of smart nerds doesn't mean he is immune to the value-inverting effects of social media.
Or those of hype, e.g. AI hype.
I imagine it doesn't run very cheaply.
But LLMs are trying to mimic people. So if confusion is the human response, what's to stop the llm from acting confused?
There should be no shortage of examples the creator could provide, unless of course...
This all being said, I do find the idea interesting, but heeded it's advice when it said it's hideously expensive and risky to use. So yes, I do want someone braver, richer, and stupider than me to take the first leap
Treat this like art. There are some neat ideas, maybe not executed particularly well. Somewhere around 7/10 IMDB score. The working implementation makes the blog post more impactful more than the other way around.
> Gas Town “just works.” It does its job, it has tons of integration points, and it has been stable for many weeks. People are using it to build real stuff.
I am very confident in saying that most individuals successfully using multiple agents have done so by building their own harness.
Interesting:
> Kubernetes asks “Is it running?” Gas Town asks “Is it done?” Kubernetes optimizes for uptime. Gas Town optimizes for completion.
https://embracingenigmas.substack.com/p/exploring-gas-town
edit: was "is your imagination". Changed to fully match https://genius.com/Zombo-zombocom-lyrics
The real distinction is of scale - whether you want a REST endpoint or a fully functional word processor.
But real, actual, complex software is at least half spec (either explicit, or implicitly captured by its code), the question is, can LLMs specify software to the same degree with Gas Town, that you get something functioning?
You provided a quote from someone who seems to be an AI-boosting influencer who claimed to use it, but where's the output in the form of code we can look at, or in the form of an app someone can use today?
I'm not an AI-denier. I use LLMs and agentic coding. They increase my productivity.
...but there is still a very real problem with people claiming that some new way of using AI is earth shattering, and changes everything based on vague anecdotes that don't involve a tangible released output that they can point to.
Sounds like the typical AI post slop.
I wound up building my own with Claude, I made it SQLite first, syncs to GitHub, can pull down from GitHub, and I added "Gates" to stopgap Claude or whatever agent from marking things complete if they've not been: compiled, unit tests run, or simple human testing / confirmation. The Gates concept improved my experience with Claude, all too often it says it finished something, when in fact it did not. Every task must have a gate, and gates must pass before you can close a task. Gates can be reused across tasks, so if "Run unit tests" is one gate, you can reuse it for every task, when it passes, it passes for that one task <-> gate combination.
Anyway, I'm happy for Beads, Gas Town not so much my wheelhouse on the other hand.
How did you implement gates? Are they simply tasks Claude itself has to confirm it ran, or are they scripts that run to check that the thing in question actually happened, or do they spawn a separate AI agent to check that the thing happened, or what?
In a nutshell, a gate is a entry in the DB with arbitrary text, Claude is good about following whatever it is. Claude trying to close a task will force it to read it.
Life's gotten slightly busy, but you can see more on the repo. I've been debating giving it a better name, I feel like GuardRails implies security, when the goal is just to validate work slightly.
https://github.com/Giancarlos/GuardRails
It seems like a lot of coding agent features work that way?
So now you have agents of type mayor, polecats, witnesses, deacons, dogs etc plus a slew of Unneeded constructs with incomprehensible names.
In one of the blog post for gas town I remember reading something by the author along the lines of « it’s super inefficient, but because you burn so many tokens, you still get what you want at the end! » clearly this is also the design philosophy behind this project, just (get your ai to) throw more random abstractions and more agent types until you feel like it kinda works, don’t bother asking yourself if they actually contribute anything.
This gave me the very clear feeling that most of the complexity of gas town is absolutely not needed and probably detrimental.
Ended up building my own thing that is 10x simpler, just a simple main agent you talk to, that can dispatch subagents, they all communicate, wake each other up and keep track of work through a simple CLI. No « refinery » or « wasteland » or « molecule » or « convoys » or « deacons » or …
You won't get 10k stars and a blog post out of that. Obviously you need some Stoats who have Conferences with the Stump Lord to determine whether they are needed at the Silo or the Bilge. They'll then regroup at the appropriate Decision Epicenter and delegate to the Weasels and Chipmunks who actually do the coding (antiquated term) in the Salt Mine.
The Stump Lord is an owl.
I think Yegge's instincts that making a programmable / editable coordination layer (he calls this gas city) is a great idea. Gas town early days was definitely a wild experience in terms of needing to watch carefully lest your system be destroyed, and then I put that energy into OpenClaw - I'll probably spin up Gas City and see what it can do soon though. Very cool.
It is pretty magical to go from brainstorming an idea in the evening, having ChatGPT Pro spit out a long list of beads to implement it, leaving it running over night in a totally empty repo and waking up to a mostly-implemented project.
Seems like I'm back to obscurity.
:)
These fully vibe coded tools seem to have near zero QA. The fact that they ship with a `doctor` command that you regularly need to run (even if you didn’t change anything about your environment) tells you all you need to know.
I tried tracking down where those numbers came from and the sources were a bit sketchy. Can anybody who has used Gas Town confirm those numbers, or report their personal numbers?
Lines of code per hour is a bad metric.
Can it solve a problem using production quality code that doesn’t take four times as long to review? That sounds like something I would pay $100 for.
Cost per line of code is not an amazing metric, but at least it's an attempt to come up with a figure.
(I would also be interested to find out how much it costs to run...)
Unfortunately I think things are moving so fast that by the time such a study was done, we would already be on to newer models and newer versions of gas town.
In the end, it didn't feel worth it mostly because of high token overhead (inter agent communications, agents re reading same code, etc...) and synchronization / cooperation issues (who should do what).
What actually works for me and provides good results: multi step workflows with clearly defined steps and strong guidance for the agent.
I think he's boxed himself in by continually layering more complexity on his approach, rather than stepping back and questioning the fundamentals or the overall direction.
All of the steps Gas Town or Gas City etc are taking are towards reducing human oversight and control. This is profoundly misguided! In a world of infinite cheap software it is precisely this human decision making and control that matters.
> There will be nothing like it. You are going to want to use Gas City.
No. I do not want to talk to the mayor of my software factory, as its cartoonish minions build an infinite mountain of slop. Unreviewable, both in terms of code and the finished product.
Instead, I want to precisely capture human ideas, have those ideas questioned, challenged, improved, and then I want to bring those ideas to life, keeping the human in the loop whenever they want. Neither Beads, Gas Town, nor Gas City or anything like them are required for that.
I mostly agree with you other than this one sentence.
Let's say you're building the literal antithesis of Gastown: an AI agent software specifically designed to be human reviewed and monitored. How to you make this as efficient as possible? Well, it's by ensuring high quality results from human oversight and control while spending the least amount of time. Which to be precise, is still reducing human oversight and control per unit of useful work done.
I mean, under the same logic couldn't we kinda argue that TV has ruined the planet? A lot of energy for something of debatable physical value. OR Motor racing, football, The Olympic games? All that energy and waste just to find out who can throw a stick the furthest every four years.
This is a desirable end state for highly social but perhaps slightly sociopathic extroverts who want to spend all day talking even though they aren't talking to a person.
For anyone else, it's hard to imagine considering that a desirable way to spend eight hours a day.
When I am not in actual meetings, I do already spend all day talking to anthropomorphized facets of my personality that represent software architecture, security paranoia, operational practicality, user experience, etc. Often not by speaking aloud, but it's a conversation nonetheless.
So yes, this sounds absolutely grand.
EDIT: But I don't think it should be forced on everyone! Having the option to use the tools that work best for you should be the goal.
At least, that’s what I would do, if I had any interest in testing out gastown with my own money. If my employer wants to pay for the testing, that’s another question entirely.
For one it’s a great way to dismiss any sort of criticism, and makes the positive comments claiming to gave built great stuff with it even funnier.
I think we need to take a hard line with AI stuff like this, and put the onus on the creator to prove these ideas have merit.
But I think the point that Yegge doesn't address and that I had to discover for myself is: getting many agents working in parallel doing different things -- while cool and exciting (in an anthromorphic way) -- might not actually be solving the right problem. The bottleneck in development isn't workflow orchestration (what Gastown does) -- it's actually problem decomposition.
And Beads doesn't actually handle the decomposed problem well. I thought it did. But all it is is a task-graph system. Each bead is task, and agents can just pick up tasks to work on. That looks a lot like an SDE picking up a JIRA ticket right? But the problem is embedding just enough context in the task that the agent can do it right. But often it doesn't, so the agent has to guess missing context. And it often produces plausible code that is wrong.
Devolving a goal into the smaller slices is really where a lot of difficulty lies. You might say, oh, "I can just tell Claude to write Epics/Stories/Tasks, and it'll figure it out". Right? But without something grounding it like a spec, Claude doesn't do a good job. It won't know exactly how much context to provide to each independent agent.
What I have found useful is spec-driven development, especially of the opinionated variety that Kiro IDE offers. Kiro IDE is a middling Cursor, but an excellent spec generator -- in fact one of the best. It generates 3 specs at 3 levels of abstraction. It generates a Requirements doc in EARS/INCOSE (used at Rolls Royce and Boeing for reducing spec ambiguity), and then generate a Design doc (commonly done at FAANG), and... then generates a Task list, which cross-references the sections of the requirements/design.
This kind of spec hugely limits the degrees of freedom. The Requirements part of the spec actually captures intent, which is key. The Design part mocks interfaces, embeds glossaries, and also embeds PBTs (property-based tests using Hypothesis -- maybe eventually Hegel?) as gating mechanisms to check invariants. The Task list is what Beads is supposed to do -- but Beads can't do a good job because it doesn't have the other two specs.
I've deployed 4 products now using Kiro spec-driven dev (+ Simon Willison's tip "do red/green tdd") and they're running in prod and so far so good. They're pressure-tested using real data.
Spec-driven development isn't perfect but I feel its aim is the correct one -- to capture intentions, to reduce the degrees of freedom, and to constrain agents toward correctness. I tried using Claude Code's /plan mode but it's nowhere as rigorous, and there's still spec drift in the generated code. It doesn't pin down the problem sufficiently.
Gastown/Beads are solutions for workflow orchestration problem (which is exciting for tech bros), but at its core, it's not the most important problem. Problem decomposition is.
Otherwise you're just solving the wrong problem, fast.
Thoughtful critique is of course fine but there's no need to be personal, and it should be something we can learn from.
https://news.ycombinator.com/newsguidelines.html
[0] https://en.wikipedia.org/wiki/Steve_Yegge#Vibe_coding_and_cr...
To be clear, the BAGS scam coin he got sucked into is a extractive zero-sum game where someone else creates a coin named after him, offers him trading commission to talk about it and then makes money off the hype.
He did the correct thing by leaving.
(I worked for a bit at a Web3 place. Went in with an open mind and now have opinions)
[1] https://x.com/Steve_Yegge/status/2043127887059210470
And we are supposed to believe that someone deep in tech, in 2026, did not know this was going to be the end goal? Was $GAS supposed to be a crypto to help fund poor farmers in Burundi or something ? How else is the meme coin #16352813 supposed to end? That’s the entire point of meme coins.
Would love to also « get sucked » into making 300k$.
Exactly.
It seems harmless - "look, we have this token, it is being traded anyway, do you want to get some of the emissions?"
Who wouldn't say "yes" to that free money.
But it's not clear at all how corrupting the crypto scam is, and how subtly the corruption seeps into what you do. It starts by "oh can you just tweet out about this" and you are "sure - it's just a tweet" and slowly grows.
Steve Yegg deserves credit for walking away from it.
like the Hawk Tuah girl, or the Enron relaunch long form comedy routine that wound up with a short lived crypto token, and pretty much anyone with 15 minutes of fame or celebrities that drop a contract address
for the most part, they themselves actually are the victims of a roving band of deployers running the crypto launch convincing them they're part of something, and of course, the consumers have the choice of never getting involved
but the deployers are the ones that should face some form of accountability, or at least the public eye
I also simply detest how Gas Town is modeled fundamentally on an extractive and destructive metaphor, the 19th century factory. I want to live in a verdant software garden, not a dystopian industrialist hellscape.
In my view the StrongDM guys are on the right long-term path.
I actually think the naming is apt because we are on the starting stages of a second Industrial Revolution.