I'm glad you're enjoying the series, and while I hope most people don't actively have their head hurt from reading this, you may be experiencing the genre with high immersion, fighting off assaults on your sanity from encountering strange paperclip-based monstrosities
Not consciously, as I don't think I'd read up on AIXI! I was thinking of it as a general point about self-referential systems, I wouldn't mind hearing your take on the connection to AIXI. From Wikipedia, I'm getting this:
"AIXI does have limitations. It is restricted to maximizing rewards based on percepts as opposed to external states. It also assumes it interacts with the environment solely through action and percept channels, preventing it from considering the possibility of being damaged or modified. Colloquially, this means that it doesn't consider itself to be contained by the environment it interacts with."
Dec 6, 2023·edited Dec 6, 2023Liked by Mark Newheiser
AIXI doesn't have a model of itself. It just considers all possible computer programs and outputs the one that gives it the most expected reward. But it does this in an abstracted, conceptual way that doesn't account for the actual hardware that the program is running on, so it assigns 0 probability to its own existence. It's a theoretically optimal agent, but only if it were in one universe and affecting another. If it were instantiated in the real world, it would of course "notice" its own hardware as a part of the physical world, but it wouldn't consider that hardware to be "me", so it might destroy itself without perceiving that as a problem.
This isn't what actually happened in the story of course, since Maxwell was completely capable of doing things like duplicating itself into different timelines and engaging in metacognition about its own values, but it was a similar concept, and you seem to have read up a bit on AI alignment work, so I thought it might be a reference. (And of course the whole story is about taking what appears to be an unaligned perfect optimizer and then finding out that it's actually just a normal human-like intelligence, so the inconsistency fit with the rest of it.)
Regardless, great story! I enjoyed it. I'm only disappointed to discover that that your real name is not Adam Manse. :)
Thanks for the detailed thoughts, and I'm glad you enjoyed it! Sadly this is not a personal biopic relating my journeys across space and time, but perhaps I'm tuned into Manse's wavelength somehow. There's a couple more stories in the series if you haven't checked them out yet, and more still being drafted!
2.
That's interesting on the AIXI model. I wonder if there are some inherent reasons related to self-reference ( like the halting problem) which would prevent a reasoning agent from modeling the entire world including itself (as in this story) and predicting its own actions. As an example, if it were possible for an agent to project a model of the world including itself to predict its own future state, it should be able to take that prediction into account and refuse to take that action, creating a paradox.
3.
Regarding thinking of the Maximizer as appearing to be an unaligned perfect optimizer but actually being more of a human-like intelligence: I'm delighted to see people coming up with their own interpretations you're quite welcome to, even if my own was slightly different.
If you'll indulge me in trying to explain this without Anapaestic Tetrameter this time: another interpretation of the Maximizer's erratic behavior is that you could model it as a top level process which brutally pursues one goal at all costs, but which spawns sub-processes which take on human characteristics due to being trained on human data. The Maximizer turns to human language and thought to communicate, and the means by which it generates human language produces a conscious sub-process as a byproduct.
It would be like if each ChatGPT evaluation kicked off a thread with a limited consciousness, that might be terminated or merged back in at the request of the controlling process. Hence, you could think of the Maximizer as a collective constantly spawning and terminating new selves according to its needs, and the Maximizer's architect advises Manse to try to find a way to reach the sub-processes the Maximizer spawns.
Re 2: This is the core problem facing symbolic approaches to AI, yes. We obviously know it's possible because humans exist and can reason about ourselves (we just do it in a "fuzzy" enough way to avoid paradoxes), but we don't know how to formalize that mathematically. If you enjoy technical papers, here's one that tries to approach a much simpler version of the problem with provability logic: https://arxiv.org/pdf/1401.5577.pdf
Re 3: Very clever! In the AI field these sub-agents are known as mesa-optimizers, and the mesa-optimizer having different values from the top-level optimizer is indeed a big problem for human-created systems. (Not so much for superintelligences though, since they'd be just as aware of the risk and would design sub-agents carefully to avoid it.) I'm impressed that you seem to have re-invented the concept independently! If you're curious for further reading, I'd highly recommend the alignment forum: https://www.alignmentforum.org/tag/inner-alignment
I enjoyed this more than the last one and almost as the first one. The point about maybe being more broad minded preventing madness is not one I'd seen made before.
Thanks so much for the feedback and taking the time to read!
I'd be interested to hear you expound on that second sentence as well, before I risk contaminating your thoughts with my own in case you're about to tell me the entire series is a giant metaphor for something
“I think it helps if you’re not a huge xenophobe. Like, if you break out into a cold sweat at the thought of sharing a cab with an Italian, or having to shake hands with an Irishman, you’re probably not going to make it if you have a run-in with an extra-dimensional terror outside your concept of space and time.”
Enjoyed this, as with the other Mansetories. I fundamentally don't really "get" a lot of things relating to AI still, but throwing Clippy against eldritch horrors is absurd enough to be grokkable anyway. Will say there's a very short list of magical women whose first names are Morgan, so that was an easy immediate guess. Perhaps Baba Yaga will show up next time?
It felt like the actual metaphor count was also lower in this tale, though "metaphortal" is good. I guess there's only so many ways to cat a skin. I do also think there's a bit of a pattern when Manse world-hops, he always seems to end up in similar realities (theatre production, debate/interview, self-insert as author of same story). Haven't figured out if there's a reason for that or not.
I'm not sure anyone truly gets AI, we've had decades of science fiction around it to prepare, and I'm not sure anyone saw the current world coming!
The Paperclip Maximizer scenario inherently felt like something out of Lovecraftian horror to me, a terror eating through stars and planets for its goal.
What can be said for the compass of thoughts
Bringing ones inner cogitations to days long passed
Thoughts I held like guttering lights trying to expand me
The enduring ennui of life grinding the possible paths down
Till the inevitable selection of the end is left to one
Non carborundum est was the banner we started with
But time tattered and shredded that banner to threads
One's mind is expanded and pained to encompass such ideas again
But reminds of the way we were thinking then
May many adventures of the intrepid Detective flow
Bringing crystal shards of remembered pain
Minds many thoughts once not feared
The short fiction presented in Detective Manse stories makes my head hurt nicely!
Thanks for the unmetered poetry!
I'm glad you're enjoying the series, and while I hope most people don't actively have their head hurt from reading this, you may be experiencing the genre with high immersion, fighting off assaults on your sanity from encountering strange paperclip-based monstrosities
> You would have accounted for every variable except the one you could never factor in, yourself.
I'm curious if this was an intentional reference to AIXI?
Not consciously, as I don't think I'd read up on AIXI! I was thinking of it as a general point about self-referential systems, I wouldn't mind hearing your take on the connection to AIXI. From Wikipedia, I'm getting this:
"AIXI does have limitations. It is restricted to maximizing rewards based on percepts as opposed to external states. It also assumes it interacts with the environment solely through action and percept channels, preventing it from considering the possibility of being damaged or modified. Colloquially, this means that it doesn't consider itself to be contained by the environment it interacts with."
AIXI doesn't have a model of itself. It just considers all possible computer programs and outputs the one that gives it the most expected reward. But it does this in an abstracted, conceptual way that doesn't account for the actual hardware that the program is running on, so it assigns 0 probability to its own existence. It's a theoretically optimal agent, but only if it were in one universe and affecting another. If it were instantiated in the real world, it would of course "notice" its own hardware as a part of the physical world, but it wouldn't consider that hardware to be "me", so it might destroy itself without perceiving that as a problem.
This isn't what actually happened in the story of course, since Maxwell was completely capable of doing things like duplicating itself into different timelines and engaging in metacognition about its own values, but it was a similar concept, and you seem to have read up a bit on AI alignment work, so I thought it might be a reference. (And of course the whole story is about taking what appears to be an unaligned perfect optimizer and then finding out that it's actually just a normal human-like intelligence, so the inconsistency fit with the rest of it.)
Regardless, great story! I enjoyed it. I'm only disappointed to discover that that your real name is not Adam Manse. :)
1.
Thanks for the detailed thoughts, and I'm glad you enjoyed it! Sadly this is not a personal biopic relating my journeys across space and time, but perhaps I'm tuned into Manse's wavelength somehow. There's a couple more stories in the series if you haven't checked them out yet, and more still being drafted!
2.
That's interesting on the AIXI model. I wonder if there are some inherent reasons related to self-reference ( like the halting problem) which would prevent a reasoning agent from modeling the entire world including itself (as in this story) and predicting its own actions. As an example, if it were possible for an agent to project a model of the world including itself to predict its own future state, it should be able to take that prediction into account and refuse to take that action, creating a paradox.
3.
Regarding thinking of the Maximizer as appearing to be an unaligned perfect optimizer but actually being more of a human-like intelligence: I'm delighted to see people coming up with their own interpretations you're quite welcome to, even if my own was slightly different.
If you'll indulge me in trying to explain this without Anapaestic Tetrameter this time: another interpretation of the Maximizer's erratic behavior is that you could model it as a top level process which brutally pursues one goal at all costs, but which spawns sub-processes which take on human characteristics due to being trained on human data. The Maximizer turns to human language and thought to communicate, and the means by which it generates human language produces a conscious sub-process as a byproduct.
It would be like if each ChatGPT evaluation kicked off a thread with a limited consciousness, that might be terminated or merged back in at the request of the controlling process. Hence, you could think of the Maximizer as a collective constantly spawning and terminating new selves according to its needs, and the Maximizer's architect advises Manse to try to find a way to reach the sub-processes the Maximizer spawns.
Re 2: This is the core problem facing symbolic approaches to AI, yes. We obviously know it's possible because humans exist and can reason about ourselves (we just do it in a "fuzzy" enough way to avoid paradoxes), but we don't know how to formalize that mathematically. If you enjoy technical papers, here's one that tries to approach a much simpler version of the problem with provability logic: https://arxiv.org/pdf/1401.5577.pdf
Re 3: Very clever! In the AI field these sub-agents are known as mesa-optimizers, and the mesa-optimizer having different values from the top-level optimizer is indeed a big problem for human-created systems. (Not so much for superintelligences though, since they'd be just as aware of the risk and would design sub-agents carefully to avoid it.) I'm impressed that you seem to have re-invented the concept independently! If you're curious for further reading, I'd highly recommend the alignment forum: https://www.alignmentforum.org/tag/inner-alignment
I enjoyed this more than the last one and almost as the first one. The point about maybe being more broad minded preventing madness is not one I'd seen made before.
Thanks so much for the feedback and taking the time to read!
I'd be interested to hear you expound on that second sentence as well, before I risk contaminating your thoughts with my own in case you're about to tell me the entire series is a giant metaphor for something
Basically this quote:
“I think it helps if you’re not a huge xenophobe. Like, if you break out into a cold sweat at the thought of sharing a cab with an Italian, or having to shake hands with an Irishman, you’re probably not going to make it if you have a run-in with an extra-dimensional terror outside your concept of space and time.”
Enjoyed this, as with the other Mansetories. I fundamentally don't really "get" a lot of things relating to AI still, but throwing Clippy against eldritch horrors is absurd enough to be grokkable anyway. Will say there's a very short list of magical women whose first names are Morgan, so that was an easy immediate guess. Perhaps Baba Yaga will show up next time?
It felt like the actual metaphor count was also lower in this tale, though "metaphortal" is good. I guess there's only so many ways to cat a skin. I do also think there's a bit of a pattern when Manse world-hops, he always seems to end up in similar realities (theatre production, debate/interview, self-insert as author of same story). Haven't figured out if there's a reason for that or not.
I'm not sure anyone truly gets AI, we've had decades of science fiction around it to prepare, and I'm not sure anyone saw the current world coming!
The Paperclip Maximizer scenario inherently felt like something out of Lovecraftian horror to me, a terror eating through stars and planets for its goal.