Could general-AI language generation be a test for sentience, sapience, or consciousness?

Question

One of the oft-cited examples of how to test if Artificial Intelligence (AI) is intelligent (often expanded to sentient) is the Turing test. Simply, an AI or machine passes the Turing test if it can generate language or conversation that would not be distinguishable from that of a biological human.

Though there is some debate about whether or not current LLMs are intelligent or sentient, the general consensus is that they are not. These AI models still fall under the category of narrow AI, that is, AI that functions for serving a singular purpose (in this case, language generation.) General AI, or AI that can complete a variety of tasks, has not yet been developed. It is likely that general AI will be required to achieve intelligence / sentience.

My question is this: If several independent general AI were to, unprompted, develop their own language ab initio (or perhaps from other languages,) could this serve as a test for sentience, sapience, or general consciousness? And if so, how liberal or conservative a test would this be?

The closest situation I could find is from Wikipedia, which states, "...researchers can induce the evolution of language in multi-agent systems when sufficiently capable AI agents have an incentive to cooperate on a task and the ability to exchange a set of symbols capable of serving as tokens in a generated language." This situation presumably involves narrow AI that requires a prompt to generate language.

J D · Answer 1 · 2024-01-03T16:56:49.460

If several independent general AI were to, unprompted, develop their own language ab initio (or perhaps from other languages,) could this serve as a test for sentience, sapience, or general consciousness? And if so, how liberal or conservative a test would this be?

Given the recent success of LLMs, it might appear that we now have the ability to answer your question, or that there is agreement about it, but there is not. While there is generally presumed to be a connection between the use of language and grammar and intelligence, it is an open philosophical question on exactly what intelligence is, and how it comes about. There are no known technologies that are capable of generating complex context-sensitive languages ab initio used pragmatically. Not even the cognitive architecture of animals who signal each other are fully understood. Since there is debate over what constitutes intelligence, there is necessarily debate on how it could be detected or measured. My own thoughts on the matter is that such a test would have to include a theory-of-mind component. From WP:

Possessing a functional theory of mind is crucial for success in everyday human social interactions. People utilise a theory of mind when analyzing, judging, and inferring others' behaviors. The discovery and development of theory of mind primarily came from studies done with animals and infants.

That is, no machine that lacks a theory of mind is a human-level intelligence.

LLMs are fascinating machines which present themselves as using complex language, but do no such thing. Human-level language use relies heavily on linguistic structures like prosody, pitch, morphemes, lexicons, and phrase and sentence grammar in order to convey meaning from one agent to another. The transformer model in NLP does no such thing. It relies on statistical properties of a corpus to generate strings. (For instance, the temperature setting on a such a model means one can get wildly different responses to the same question). It's best to understand the LLM as a search engine of text that weights common constructions of tokens (tokens are frequent character sequences that do not align even with morphemes necessarily). Thus, LLMs create the illusion of language use, and only simulate language by generating strings and have no underlying comprehension of the words.

Right now, LLMs are plagued by hallucinations, and cannot be used for anything that resembles deterministic output. They might be seen philosophically as generating intuitions about language, and clearly they can replicate the grammars they are trained on somewhat reliably, however, they have very little awareness of semantic content and instead are merely reflecting the semantic understanding of the people who write language that contributes to the corpus an LLM is trained on. An LLM is like a parrot that takes a survey of billions of people and then does its best to repeat what it thinks is the best representation of the results of that survey. Generating responses to prompts for meaning outside of the corpus is technically impossible, because without the semantic content contained by a syntactic encoding for training, the system is blissfully unaware of that content.

Currently, one task that LLM engineers are working on is raising the success rate of mathematical content. According to this article on Marktechpost.com, it's a highly imperfect output. The article says:

Its extensive trials and in-depth analysis demonstrate MathGLM’s superior mathematical reasoning over GPT-4. MathGLM delivers an impressive absolute gain of 42.29% in answer accuracy compared to fine-tuning on the original dataset. MathGLM’s performance on a 5,000-case math word problems dataset is very close to GPT-4 after being fine-tuned from the GLM-10B. By breaking down arithmetic word problems into their constituent steps, MathGLM can fully comprehend the intricate calculation process, learn the underlying calculation rules, and produce more reliable results.

I suspect an honors student without a calculator in elementary school could outperform the 42.29% easily.

Thus, LLMs are not able to solve arithmetic reliably (let alone higher level mathematics or invent their own languages). There are fundamental differences the way the human brain builds and uses categories and languages that LLMs simply flounder over, and no amount of big data will fix that problem. There are fundamental differences in the mechanism. The why's are the subject of Larson's The Myth of Artificial Intelligence and revolve around the determinism of Turing architecture and the lambda calculus, the difference between deductive, inductive, and abductive reasoning, and a large amount of confusion and hype on how the von Neumann architecture works and what it is actually capable of.

In fact, Larson talks about how Turing Test "winners" actually just take advantage of people's inability to communicate well and have little to do with reasoning. Turing Tests as often implemented eschew the logic that Turing himself set out in his paper, and resemble ELIZA, a bag of tricks that fool no analytical intellect. The Winograd schema challenge is one of a number of improvements on the Turing Test that an LLM couldn't deal with.

So, LLMs are very powerful NLP tools, but they are narrow technology because they only implement our agenda, and have no capacity to implement their own. In this way, they are in the same class of algorithms as bubble sorts and autocomplete functions your web browser uses. Only a fundamental ignorance of the underlying mechanisms would lead someone to conclude that they are a threat or on the cusp of becoming self-aware. Like the Great Oz, one has to use their imagination and ignore the man behind the curtain to get to such conclusions.

NotThatGuy · Accepted Answer · 2024-01-02T22:54:57.883

"there is some debate about whether or not current LLMs are intelligent or sentient"

There is debate about what it even means to be intelligent or sentient.

For a long time we've taken it as a given that these things are tied to the ability to (appear to) reason, alongside complex communication.

But modern AI can communicate complexly and has the appearance of reasoning (even if it's flawed... as if there aren't humans with very flawed reasoning ability).

So we have to ask what is even left in our concept of sentience and intelligence.

Someone can experience their own sentience, and reasonably posit that a rock does not experience this (although some might disagree). One might posit the same thing about similar beings or those that behave similarly. But it doesn't seem particularly reasonable to definitively assert that something is not sentient or intelligent when its behaviour is functionally the same as a sentient intelligent being (at least not when looking at it as a black box).

If we consider the internals, people are quick to point out that AI is merely electricity passing through some wires and transistors and such and to perform basic mathematical operations, and therefore AI is not intelligent like humans are. But you may have noticed that this is saying "AI is like this, therefore it's not like humans", without actually addressing what humans are like. Considering the mechanics of human intelligence doesn't seem to end up in favour of those saying AI is not intelligent. Human brains are, after all, a bunch of connected neurons passing electrical signals between one another, in some chemical soup. Some might say there's more to human consciousness than our brains, but we have no evidence of this, and every indication that consciousness is strongly and directly tied to the physical parts of our brain.

In conclusion, I don't think there's any test that would prove sentience or intelligence in a way that would categorically distinguish humans from modern AI (or something slightly more advanced), based on our current understanding of human brains.

The only way we can answer this question (for sentience) would be to discover some combination of chemicals or something that's generates consciousness, and that's more a question for neuroscience than AI.

I think distinguishing intelligence (as distinct from sentience) is just out the window altogether.

As for developing their own language, I don't think that would indicate much greater intelligence than modern AI. To develop language in procreating species through evolution, you need possible mutations to eventually enable language, and selection pressures for those mutations and for increasingly complex language (e.g. making a click noise in response to a predator, which warns another, and then you might start making different types of clicks, and so forth). There is genetic algorithms which simulates evolution, so that could potentially recreate such a process if you give it appropriate building blocks. Although AI can also learn and adapt without procreation. Also, the idea of AI creating language isn't purely contained within the theoretical (although in that case they started from English).

And AI cannot do something entirely unprompted, because, if nothing else, it needs to be running as some code on a computer. But there is plenty of instances of letting AI do its own thing - running around in some virtual environment or whatever (often by using genetic algorithms). The main barrier to this is... well, purpose and money: we create AI for a purpose, and this purpose is usually best served by having the AI be prompt-driven. So non-prompt-driven AI doesn't get that much investment or attention. That said, self-driving cars and robots are not prompt-driven, and those get plenty of attention (although they're very purpose-made).

Chris Sunami · Answer 3 · 2024-01-03T21:45:51.643

The Turing Test is a thought experiment from a particular philosophical point of view, and designed to support a particular philosophical claim. Its philosophical grounding is strict Empiricism in a Humean mode, to the effect that all we can know is what we have empirical evidence to support, and therefore that mental qualities such as intelligence and consciousness can only be known from their empirical footprint. (This is a commonly held view, particularly in the scientific community, but not a universal one.)

The test itself posits that we have no legitimate reason to withhold the label of "intelligent" from any machine that can provide the empirical footprint of intelligence, since the only evidence we have that ANYTHING (including a person) is intelligent is the empirical footprint of that intelligence. Talk of internal states, such as self-consciousness or "theory of mind," are (according to this view) illegitimate. For a long time, the test was purely theoretical, but there are people who do assert that modern LLMs can pass it (i.e. they can produce a convincing empirical footprint of intelligence). Whatever mechanism they might use to do so is irrelevant to the argument.

The claim this thought experiment is intended to support is that intelligence and consciousness are ultimately mechanical and material--they are the fruit of a physical monist universe. Whether you find this argument decisive largely depends on your philosophical commitments. People who deny the conclusion may discover they have non-empiricist commitments.

mic · Answer 4 · 2024-01-04T21:27:24.597

Check out "Consciousness in Artificial Intelligence," an extensive preprint written by a number of philosophers. Here is the abstract:

Whether current or near-term AI systems could be conscious is a topic of scientific interest and increasing public concern. This report argues for, and exemplifies, a rigorous and empirically grounded approach to AI consciousness: assessing existing AI systems in detail, in light of our best-supported neuroscientific theories of consciousness. We survey several prominent scientific theories of consciousness, including recurrent processing theory, global workspace theory, higher-order theories, predictive processing, and attention schema theory. From these theories we derive "indicator properties" of consciousness, elucidated in computational terms that allow us to assess AI systems for these properties. We use these indicator properties to assess several recent AI systems, and we discuss how future systems might implement them. Our analysis suggests that no current AI systems are conscious, but also suggests that there are no obvious technical barriers to building AI systems which satisfy these indicators.

The most capable large language models today are based on transformer architecture, but these are unlikely to be conscious, based on global workspace theory (GWT):

Transformers lack the overall structure of a system with a global workspace, in that there is no one distinct workspace integrating other elements. There is only a relatively weak case that Transformer-based large language models possess any of the GWT-derived indicator properties.

Even if LLMs could write text that is indistinguishable from a human, it would not necessarily be conscious depending on its architecture.

Could general-AI language generation be a test for sentience, sapience, or consciousness?

4 Answers4

Linked