|
Post Scarcity
A prototype for a post scarcity programming environment
|
The amount I'm leaking memory is now down by an order of magnitude, but the problem is not fixed. Better, not good enough. And although I'm aware of the amount to which Lisp objects are not being reclaimed, there may also be transient C objects — cheifly strings — which are also not being freed. This is an ongoing process.
The unit test system got into a mess because the bignum tests are failing. But because I know some tests are failing, and the bignum problem feels so intractable that I don't want to tackle it, I've been ignoring the fact that tests are failing; which means I've missed regressions — until I started to get an 'Attempt to take value of unbound symbol' exception for nil, which is extremely serious and broke a lot of things.
That arose out of work on the 'generalised key/value stores' feature, logged under [#20260203](20260203), below. However, because I wasn't paying attention to failing tests, it took me a week to find and fix it.
But I've fixed that one. And I've put a lot of work into cleaning up the unit tests. There is more work to do on this.
I'm also gradually working through cleaning up documentation.
Meantime we have some regressions which are serious, and must be resolved.
The core function equals is now failing, at least for integers. Also.
fails because I've never implemented a method for it, which I ought.
The current unit test for cond and that for recursion both fail but I think this is because equals is failing.
I have a horrible new regression in rational arithmetic which looks as though something is being freed when it shouldn't be.
As follows:
I haven't done a 'release' of Post Scarcity since September 2021, because I've been so despondent about the bignum problem. But actually a lot of this is usable, and it's at least sufficiently intereting that other people might want to play with it, and possibly even might fix some bugs.
So I'm currently planning to release a new master before the end of this month, and publicise it.
If you just start up and immediately abort the current build of psse, you get:
Allocation summary: allocated 19986; deallocated 245; not deallocated 19741.
Allocation summaries from the current unit tests give the following ranges of values:
| Min | Max | ||
|---|---|---|---|
| Allocated | 19991 | 39009 | |
| Deallocated | 238 | 1952 | |
| Not deallocated | 19741 | 37057 |
The numbers go up broadly in sinc with one another — that is to say, broadly, as the number allocated rises, so do both the numbers deallocated and the numbers not deallocated. But not exactly.
Write a test wrapper which reads a file of forms, one per line, from standard input, and passes each in turn to a fresh invocation of psse, reporting the form and the allocation summary.
So, from this:
Allocating an empty list allocates four additional cells, all of which are deallocated. Allocating 'nil' allocates a further 29 cells, 25 of which are not deallocated. WTF?
Following further work I have this, showing the difference added to the base case of cells allocated, cells deallocated, and, most critically, cells not deallocated.
From this we see that reading and printing nil allocates an additional 33 cells, of which eight are not cleaned up. That's startling, and worrying.
But the next row shows us that reading and printing an empty list costs only four cells, each of which is cleaned up. Further down the table we see that an empty map is also correctly cleaned up. Where we're leaking memory is in reading (or printing, although I doubt this) symbols, either atoms, numbers, or keywords (I haven't yet tried strings, but I expect they're similar.)
| Case | Delta Allocated | Delta Deallocated | Delta Not Deallocated |
|---|---|---|---|
| Basecase | 0 | 0 | 0 |
| nil | 33 | 8 | 25 |
| **()** | 4 | 4 | 0 |
| **(quote ())** | 39 | 2 | 37 |
| **(list )** | 37 | 12 | 25 |
| **(list 1)** | 47 | 14 | 33 |
| **(list 1 1)** | 57 | 16 | 41 |
| **(list 1 1 1)** | 67 | 18 | 49 |
| **(list 1 2 3)** | 67 | 18 | 49 |
| **(+)** | 36 | 10 | 26 |
| **(+ 1)** | 44 | 12 | 32 |
| **(+ 1 1)** | 53 | 14 | 39 |
| **(+ 1 1 1)** | 62 | 16 | 46 |
| **(+ 1 2 3)** | 62 | 16 | 46 |
| **(list 'a 'a 'a)** | 151 | 33 | 118 |
| **(list 'a 'b 'c)** | 151 | 33 | 118 |
| **(list :a :b :c)** | 121 | 15 | 106 |
| **(list :alpha :bravo :charlie)** | 485 | 15 | 470 |
| **{}** | 6 | 6 | 0 |
| **{:z 0}** | 43 | 10 | 33 |
| **{:zero 0}** | 121 | 10 | 111 |
| **{:z 0 :o 1}** | 80 | 11 | 69 |
| **{:zero 0 :one 1}** | 210 | 14 | 196 |
| **{:z 0 :o 1 :t 2}** | 117 | 12 | 105 |
Looking at the entries, we see that
list costs 33 cells, of which 25 are not deallocated, whereas the symbol + costs only one cell fewer, and an additional cell is not deallocated. So it doesn't seem that cell allocation scales with the length of the symbol;(list :a :b :c) allocates 121 and deallocates 15, while (list :alpha :bravo :charlie) allocates 485 and deallocates the same 15;(+ 1 2 3) or (+ 1 1 1) deallocates 16 suggest to me that the list structure is being fully reclaimed but atoms are not being.'a costs more to read than the keyword :a because the reader macro is expanding 'a to (quote a) behind the scenes.Looking at what happens when we read a single digit number, we get the following:
Many things are worrying here.
c_string_to_lisp_string, I need to make damn sure that that string is deallocated as soon as I'm done with it — and wherever I'm dealing with symbols which will be referred to repeatedly in C code, I need eitherC string, the same value that the standard hashing function will return for the lexically equivalent Lisp string, so that I can search hashmap structures from C without having to allocate and deallocate a fresh copy of the Lisp string;Lisp zero and Lisp ten, each time read_integer is called, and I'm not deallocating them.I'm consciously avoiding the bignum issue for now. My current thinking is that if the C code only handles 64 bit integers, and bignums have to be done in Lisp code, that's perfectly fine with me.
I now have the oblist working as a hashmap, and also hybrid assoc lists which incorporate hashmaps working. I don't 100% have consistent methods for reading stores which may be plain old assoc lists, new hybrid assoc lists, or hashmaps working but it isn't far off. This also takes me streets further towards doing hierarchies of hashmaps, allowing my namespace idea to work — and hybrid assoc lists provide a very sound basis for building environment structures.
Currently all hashmaps are mutable, and my doctrine is that that is fixable when access control lists are actually implemented.
The function (assoc store key) => value should be the standard way of getting a value out of a store.
The function (put! store key value) => store should become the standard way of setting a value in a store (of course, if the store is an assoc list or an immutable map, a new store will be returned which holds the additional key/value binding).
Currently:
Tested 45, passed 39, failed 6
But the failures are as follows:
In other words, all failures are in bignum arithmetic except that I still have a major memory leak due to not decrefing somewhere where I ought to.
I've also experimented with autotranslating my C into Zig, but this failed. Although I don't think C is the right language for implementing my base Lisp in, it's what I've got; and until I can get some form of autotranslate to bootstrap me into some more modern systems language, I think I need to stick with it.
Right, I'm getting second and subsequent integer cells with negative values, which should not happen. This is probably the cause of (at least some of) the bignum problems. I need to find out why. This is (probably) fixable.
Also, print is printing bignums wrong on ploughwright, but less wrong on mason, which implies a code difference. Investigate.
Thinking further about this, I think at least part of the problem is that I'm storing bignums as cons-space objects, which means that the integer representation I can store has to fit into the size of a cons pointer, which is 64 bits. Which means that to store integers larger than 64 bits I need chains of these objects.
If I stored bignums in vector space, this problem would go away (especially as I have not implemented vector space yet).
However, having bignums in vector space would cause a churn of non-standard-sized objects in vector space, which would mean much more frequent garbage collection, which has to be mark-and-sweep because unequal-sized objects, otherwise you get heap fragmentation.
So maybe I just have to put more work into debugging my cons-space bignums.
Bother, bother.
There are no perfect solutions.
However however, it's only the node that's short on vector space which has to pause to do a mark and sweep. It doesn't interrupt any other node, because their reference to the object will remain the same, even if it is the 'home node' of the object which is sweeping. So all the node has to do is set its busy flag, do GC, and clear its busy flag, The rest of the system can just be carrying on as normal.
So... maybe mark and sweep isn't the big deal I think it is?
OK, the 60 bit integer cell happens in int128_to_integer in arith/integer.c. It seems to be being done consistently; but there is no obvious reason. MAX_INTEGER is defined in arith/peano.h. I've changed both to use 63 bits, and this makes no change to the number of unit tests that fail.
With this change, (fact 21), which was previously printing nothing, now prints a value, 11,891,611,015,076,642,816. However, this value is definitively wrong, should be 51,090,942,171,709,440,000. But, I hadn't fixed the shift in integer_to_string; have now... still no change in number of failed tests...
But (fact 21) gives a different wrong value, 4,974,081,987,435,560,960. Factorial values returned by fact are correct (agree with SBCL running the same code) up to (fact 20), with both 60 bit integer cells and 63 bit integer cells giving correct values.
Uhhhmmm... but I'd missed two other places where I'd had the number of significant bits as a numeric literal. Fixed those and now (fact 21) does not return a printable answer at all, although the internal representation is definitely wrong. So we may be seeing why I chose 60 bits.
Bother.
Printing of bignums definitely doesn't work; I'm not persuaded that reading of bignums works right either, and there are probably problems with bignum arithmetic too.
The internal memory representation of a number rolls over from one cell to two cells at 1152921504606846976, and I'm not at all certain why it does because this is neither 263 nor 264.
| 262 | 4611686018427387904 | |
| 263 | 9223372036854775808 | |
| 264 | 18446744073709551616 | |
| Mystery number | 1152921504606846976 |
In fact, our mystery number turns out (by inspection) to be 260. But why?