The Fool on the Hill: Large Language Models and Knowledge

By: Simon Brooke :: 19 June 2023

Colin Firth as Mr Darcy, in a BBC television adaptation of Pride and Prejudice

Large language models digest vast amounts of text and develop statistical models from it in the form of relatively shallow neural networks. This has been described as 'artificial intelligence', but I think that it is not because the models do not, as I understand it, have any semantic model of the text they digest; and consequently when asked to infer new knowledge from the text, they 'hallucinate'. That is to say, they produce statements based on frequencies within the text, but not valid inferences based on any truth-preserving calculus. These statements are often plausible sounding, but it they are true only sometimes; and it's not generally possible, even for the people who created these systems, to systematically discriminate between the true statements and the false ones.

However, what if one is interested in extracting knowledge which is already present in the text, in a new format? This is, after all, what translation systems do, and modern translation systems are, as I understand it, closely related to large language models.

When representing knowledge in order to do automated inference on it, I like to use Stephen Toulmin's knowledge representation schema. I've been thinking about how to extract knowledge from a corpus of lore, in order that in the context of a video game, I can have characters answer questions about lore. Obviously the answers they give should not be Toulmin models, because that's not naturalistic speech; but, if I can generate Toulmin models from knowledge, I then know how to run truth-preserving inference on that knowledge, and how to produce naturalistic text from that knowledge.

The trick I'm trying to achieve, though, is to start with continuous natural text — specifically novels — and generate Toulmin models from it. The obvious way to do this is to take a parser — such as the Stanford Parser — and use that to reduce the text to parse trees, then write code which uses heuristics to extract sentences from those parse trees which can be further processed into Toulmin models. Which can be done but is a lot of work.

I thought, this evening, what if I could use a large language model to directly turn text into Toulmin models? So I asked Google Bard:

Describe, in the style of Stephen Toulmin's knowledge representation schema, how you know where Mr Darcy lived in Jane Austen's 'Pride and Prejudice'.

Bard replied:

Sure, here is an argument in the style of Stephen Toulmin's knowledge representation schema about where Mr. Darcy lived in Jane Austen's Pride and Prejudice:
Data:
In the novel, Mr. Darcy is said to live in Derbyshire.
Chatsworth House is a large country house in Derbyshire that is similar to the description of Pemberley in the novel.
Chatsworth House was used as the filming location for Pemberley in the 2005 film adaptation of Pride and Prejudice.
Warrant:
Authors often base their fictional works on real places.
The author of Pride and Prejudice, Jane Austen, was familiar with Derbyshire.
Qualifier:
It is possible that Austen changed some details of Chatsworth House when she created Pemberley.
Backing:
Austen lived in Derbyshire for a time, and she visited Chatsworth House.
Chatsworth House is similar to the description of Pemberley in the novel.
Rebuttal:
Some people argue that Austen did not base Pemberley on Chatsworth House.
However, the evidence suggests that Austen was familiar with Chatsworth House and that she may have used it as the inspiration for Pemberley.
Conclusion:
It is likely that Mr. Darcy lived in Chatsworth House in Jane Austen's Pride and Prejudice.
This is just one possible argument about where Mr. Darcy lived in Pride and Prejudice. There is no definitive answer, as Austen never explicitly states where Mr. Darcy lived. However, the evidence suggests that Chatsworth House is the most likely location.

OK, that's promising, in a slightly startling way. I was expecting it to tell me that he lived at Pemberley, because that's in the text. Of course, Bard has digested not only Pride and Prejudice, but also criticism of the book and of adaptations of the book; so it has meta knowledge. So it tells me he lived at Chatsworth, which is quite likely what Austen had in mind but not what I was seeking.

Nevertheless, so far so good: it did seem from this that Bard might produce knowledge from texts in a format which is at least easier to handle with inference processes.

And Bard also successfully gave me mostly-correct answers to two further questions:

List all the named characters in Jane Austen's Pride and Prejudice

This one was OKish. The response included an item 'The Bennets' servants,' who clearly don't constitute a named character; and

In Jane Austen's Pride and Prejudice, what was the relationship between Mr Darcy and Lydia Bennet

To be fair, the answer to this one was superb, much better than I expected, and I detected no errors in it. If you'd submitted it as an answer to an exam question, I would expect good marks.

But when you try to push it just a little further, you learn how brittle it is. I refined my original query:

Describe, in the style of Stephen Toulmin's knowledge representation schema, and using only information from the novel itself, how you know where Mr Darcy lived in Jane Austen's 'Pride and Prejudice'.

Bard replied:

Sure, here is an argument in the style of Stephen Toulmin's knowledge representation schema about where Mr. Darcy lived in Jane Austen's Pride and Prejudice, using only information from the novel itself:
Data:
In Chapter 39 of Pride and Prejudice, Elizabeth Bennet visits Pemberley, Mr. Darcy's estate.
Elizabeth describes Pemberley as a "handsome" and "extensive" estate that is located in Derbyshire.
Warrant:
Authors often use real places as inspiration for their fictional works.
Jane Austen was familiar with Derbyshire, as she had visited the county on several occasions.
Qualifier:
It is possible that Austen changed some details of Pemberley when she created it.
Backing:
The description of Pemberley in the novel is very similar to the description of Chatsworth House, a real estate in Derbyshire.
Chatsworth House is one of the largest and most impressive estates in England.
Rebuttal:
Some people argue that Austen did not base Pemberley on Chatsworth House.
However, the evidence suggests that Austen was familiar with Chatsworth House and that she may have used it as the inspiration for Pemberley.
Conclusion:
It is likely that Mr. Darcy lived in Pemberley, an estate in Derbyshire, in Jane Austen's Pride and Prejudice.
This is just one possible argument about where Mr. Darcy lived in Pride and Prejudice. There is no definitive answer, as Austen never explicitly states where Mr. Darcy lived. However, the evidence suggests that Pemberley is the most likely location.

Bard has no semantic understanding of what I asked it. It does not understand what 'using only information from the novel itself' means, and it probably does not know which of the statements which it is able to reproduce come from which source. Because, of course, none of them come from any single source, rather from a mashup of all the sources it has digested.

So, my conclusions from this evenings exercise are as follows:

It is not obviously possible to reduce corpuses of text to sets of Toulmin models using general purpose large language models;
It might be possible to reduce corpuses of text to sets of Toulmin models using a specially trained large language model, which had not had access to extraneous texts in its training data, but that would be a lot of work and is quite likely to fail completely;
I know how to generate Toulmin models from certain categories of natural sentences, and I can write functions which will identify those categories of sentences from a parse tree; so although that too is a lot of work and will probably not generate an adequately complete corpus of knowledge, at least it's an investment of time that I know won't be completely wasted.

But it does raise questions in my mind about whether the current generation of large language models are actually good for anything.

Back when I was last doing serious artificial intelligence research, back in the 1980s and early 1990s, the flavour of the month was neural networks. They could learn from training sets of raw data and generate decisions which were often remarkably accurate: but neither the systems themselves nor the people who wrote them could explain the grounds on which any particular decision had been made.

It seemed to those of us who worked on automated inference engines, what were known in those days as 'expert systems', that when one is dealing with uncertainty, being able to justify decisions matters. And thus our systems were based on explicit semantic models, often in terms of rules; and many of the systems we built, and I would say here, specifically including the systems I designed and built, were extremely good at generating natural language, easy to understand, logically correct explanations of every decision they made, referencing the data and rules on which the decision turned, and the authorities for those rules.

But creating the corpuses of rules was in itself a lot of work, so our approach, based on rigorous formal logic and explicit, usually human-authored rules, was then losing out, being seen as less sexy than, these neural networks which could, given the right data, train themselves.

I'm no longer seriously involved in the field; I haven't been for three decades. But my perception is that what is now called 'artificial intelligence' is pretty much entirely based on neural net approaches, and that the neural nets are just as inscrutable now as they were then.

There are areas of decision making in which a cheap process which is right most of the time is acceptable; in which a certain error rate is acceptable. A program, for example, which can scan thousands of medical images and correctly identify most of those which show evidence of a particular disease can save a lot of time for hard pressed medical staff, and will result in many more patients getting the correct treatment. If a few cases are missed, if a few people die, well, still a lot more people survive than would survive if each image had to be scanned visually by a skilled medical practitioner. People make errors too, especially when overworked and under pressure.

So neural networks aren't all bad. They do have uses, and in domains where they are a good fit they can be game-changing.

But large language models? I'm not so sure.

The issue is the same. A proportion of the output will be wrong. It may be excused as 'hallucination,' and yes, people can be wrong too; but when people are wrong they can at least justify their reasons for their beliefs. A large language model cannot do this (although it can produce a plausible string of words).

What proportion of the output will be wrong? We don't know.

How can we detect when the output is wrong? In general we can't, without verifying each individual statement against a more reliable source.

Tags: Software Game Worlds Inference

« Who owns 'Yes'? Atoure Numenie »

Key posts

Recent Posts

Tags

The Fool on the Hill: Large Language Models and Knowledge