Therefore, we cannot "deallocated" or free this memory in the way as the
memory that has been allocated on the stack.
However, if we keep allocating memory from the heap without freeing any
memory that is not used anymore, eventually we run out of memory.
Many imperative language has function to allocate and free memory from the
heap. However, "freeing" memory by the programmer may have
errors. These errors mostly come from freeing the memory that has
not been allocated. The other possible case is forget to free the
memory that is not used causing "memory leak" and running out of memory
after the program run for some time.
These kind of "bugs" are difficult to find and fix.
Modern language allows programmers to allocate memory but performs the
"free" automatically.
To do that, a special function called "garbage collector" is called during
runtime to "free" the memory that is not used anymore.
The most simple situation is when there is no more memory when the memory allocator is called. We can also call it periodically which make runtime behaviour more predictable.
Throughout the evolution of programming languages, there are many
variations of garbage collectors. Two basic types are
1) mark and sweep collector
2) reference count collector
Here are the steps:
1. mark the memory in the heap that is still "alive"
2. scan the whole heap (or sweep it) and reallocate those "alive"
memory to a new space (yes, we need to have extra space to do so).
3. free everything in the heap.
4. update the address of all variables.
step 1. To find out which variables are "alive" we look into all
activation records. All variables that are in them are
alive. To mark them, consider that each chunk of allocated
memory has the "header". This header contains markers for this purpose.
step 2. Sweep the heap. Heap is a big chunk of memory.
Each allocated memory are allocated from the heap and the free space in
heap is shrinking. The heap keeps track of its free space.
Scanning from the beginning, we can navigate the block of allocated
memory. The header of each block contains the size of the block. We
use this information of move from one block to another block. To
reallocate the live memory we use "extra" space and copy each live block
that location. We must remember new location of each block. That
requires an "address map" table each entry contains the old address and
the new address of the reallocated block.
step 3. Now that all live memory are moved to a save place. We
can free the heap. Finally, all blocks are moved back to the heap
and "address map" is updated.
step 4. Look into all activation records and use "address map" the
update the address of the relevant variables.
It takes a long time to collector all garbage at once. Hence, the runtime is difficult to predict. This makes the running user program seems like it pauses for quite a while when the garbage collector works.
In this scheme, all memory allocated from heap has a "counter"
field. The function of this counter is to keep track to the life of
the memory. The counter is increased when there are other variable
that shares this memory. The counter is decreased when the variable that
uses this memory is dead. When the count reaches zero, this memory
can be freed.
To update the counter, the compiler needs to insert a code to do it at the
right place. If the update is not absolutely correct then memory
error can happen. There are many situation that is difficult to
determine how to update the counter.
It can be expensive to update the counter. When a variable enter or leave
its scope the counter needs to be update. Determine when to update the
counter has to be absolutely correct.
The runtime behaviour of this collector is predictable and freeing a
memory is inexpensive. There is no need to sweep the heap or scan
for liveness of variables.
The above two schemes are the basic one. There are many variations that aim to improve the runtime behaviour.
In mark and sweep collector, we can think of the heap as many heaps that have different lifetime. The memory that is new is more likely to be used and free. The memory that survives sweeping is considered to be old. Then, this memory is migrate to the other heap. Sweeping this heap should occur less often than sweeping the young heap.
First from wiki
https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
then classic papers
1)
https://arxiv.org/pdf/1505.00017.pdf
Comparative Analysis of Classic Garbage-Collection Algorithms for a
Lisp-like Language
Tyler Hannan, Chester Holtz, Jonathan Liao (2015)
2)
https://courses.cs.washington.edu/courses/cse590p/05au/p50-bacon.pdf
A Unified Theory of Garbage Collection
David F. Bacon, Perry Cheng, V.T. Rajan (2004)
discussion
https://news.ycombinator.com/item?id=14823054
3)
super classic
https://dl.acm.org/doi/pdf/10.1145/1317203.1317205
Overview of garbage collection in symbolic computing
by TJ McEntee ยท 1987
You should try to implement it !
The base code is here (no garbage collection). You can add in.
Base language
https://www.cp.eng.chula.ac.th/%7Eprabhas/project/recursive-programming/rec-programming.htm
last update 11 March 2024