Table of Contents
Python memory management plays a major role to make it much popular and adaptable. How so? Python memory manager has been implemented in a way to support many functionalities and to make our life easier.
1. Introduction:
Python is a high-level, interpreted, and general-purpose programming language. You must be thinking, “do I really need to manage the memory in a high-level language like python? The most upvoted answer is ‘No’, you do not have to take care of memory management in python, but yes, you should be aware of how variables and objects are managed internally. Having a good understanding of how chunks of memory are allocated, re-used, and de-allocated for python objects enables you to write more efficient code and solve a lot of issues related to extra memory that your program pulls.
Python memory management plays a major role to make it much popular and adaptable. How so? Python memory manager has been implemented in a way to support many functionalities and to make our life easier. ‘Dynamically typed’ is the best example to mention here. Python allows you to create variables without type information and going forward, you can also assign another object irrespective of the size and type of the new object.
Did you ever wonder how is this possible and how python handles it internally? Let deep dive into it.
2. Python does not have variables, instead, it has ‘names’:
If you are coming from C/C++ or Java, you would be aware that you must declare a variable with its type before we can use it. Based on the type specified, it reserves some (fixed in size) space in the memory with a default value, and then the value is stored during the assignment. Also, we know that a variable in C/C++ or java will override its content in the same reserved memory block once a new value is assigned to it. If we assign a larger value, overflow may occur.
Python works in a different way and technically, it does not have anything like ‘variables’, instead, it uses ‘names’. Please note that –
3. Values are not updated, instead, a new object is pointed:
You can see, both names ‘x’ and ‘y’ are pointing to the same object. So, what happens to ‘y’ if we change the value of ‘x’? will it return the updated value? Absolutely not! If we change the value of ‘x’, the memory manager will search if an object equivalent to the updated value is already present in the heap? If yes, then it starts pointing to it, otherwise, a new object with an updated value is created –
Python optimizes memory utilization by allocating the same object reference to a new variable if the object already exists with the same value. That is why python is called more memory efficient.
4. So, where is the ‘type’?
Unlike C/C++ or Java, python names neither point to a specific ‘memory location’, nor fix in ‘type’. We have already seen, a python name will start referring to another object once it is updated, or a new value is assigned. Similarly, it inherits the ‘type’ information from the object it currently refers to. We can get the type of a python name by calling type(x) method.
5. How is memory allocated to new Objects?
Python uses the Dynamic Memory Allocation (DMA), which is internally managed by the Heap data structure. All python objects are stored in a private heap, and this heap is managed in such a way that you have zero control over it. Let us get some more details about DMA and compare it with SMA –
Static memory allocation | Dynamic memory Allocation |
Memory is allocated at compile time. | Memory is allocated while the program starts executing. |
It is a faster way of memory allocation. | It is a slower way of memory allocation. |
Once static memory is allocated, neither its size can be changed, nor it can be re-used. Hence, less efficient. | We can change the memory size after allocation and can be reused as well. Hence, more efficient. |
In this case, variables get allocated permanently, and allocated memory remains blocked until the program terminates. | In this case, variables get allocated only when the program unit gets active and releases the memory when variables get out of scope. |
Uses stack for memory management. | Uses heap for memory management. |
6. A Python Object:
Python is an Object-Oriented Language and everything in python is an object. All python objects always derived from ‘PyObject‘, which is just like a key-value container and it contains below 3 fields –
· type
· ref-count
· value
Python uses a CPython interpreter, which is written in C. When we create a python object (X=200), internally –
· CPython creates a PyObject in memory.
· The type of PyObject is set to an integer.
· The value of PyObject is set to 200.
· A name ‘X’ is created and set to point PyObject.
· PyObject ref-count is set to 1.
Python objects can be divided into 3 parts –
· Simple objects (Integer, Float, Boolean, String, etc.)
· Container objects (Tuple, List, Set, Dictionary, etc.)
· User-defined custom classes (Employee class etc.)
7. How objects are removed from memory: Garbage Collection
We have understood, a python ‘name’ can start pointing to another (new or existing) object (same or different type), when we update its value or assign a new instance to it. But what happens to the older object, which was being referred to by this ‘name’? Will that still available in memory? The answer is – “May or may not be”!
Python is a dynamically typed language and dynamically allocates the memory to its objects when a chunk of the program starts its execution. Similarly, Python also de-allocates the memory occupied by unused objects using “Garbage Collection”. When there are no more references available, the object can be safely removed from memory. Python uses the below 2 algorithms to perform garbage collection –
· Reference Counting
· Tracing
7.1. Reference Counting:
Python keeps tracking of all the names (references), currently pointing to a particular object. The total number of names referring to an object is called the ‘reference count’ of the object (PyObject) and Python keeps this count to ‘ref-count’ field.
Please note –
· The reference count of an object can increase or decrease dynamically.
· We can call a sys.getrefcount(X) to get the current ref-count value of an object ‘X’.
· Passing x to getrefcount() function adds one extra reference to it.
Every time, when a new reference to a Python object is created, its reference count is increased. Similarly, every time, when a reference to a Python object is removed, its reference count is decreased. When a reference count reaches 0, we can safely remove the object from memory.
7.1.1 What makes a reference?
Below are some cases, when the reference of a python object (new/existing PyObject) is made, and ref-count is increased by 1 –
· Binding a new object to a name:
x = 200
· Re-using an already available object and giving a new reference:
y = 200
· Adding an object to a container:
z = [200, 200]
· Passing it to a function:
my_fun(200)
7.1.2 What removes a reference?
Look at below cases, when a reference to an object (existing PyObject) is removed, and ref-counter is decreased by 1 –
· Assigning a new object to a name:
x = True
· Removing the reference or its container:
del y
· When a variable goes out of scope:
Once a local variable is loaded using a block or method, it gets some memory dynamically. Once the block completed its execution, the variable is called ‘out of scope’.
Please note, Global namespace never gets out of scope and stays alive until the program completes its execution.
7.1.3 Cascading effect:
If a removed object O1 was pointing to another object O2, the reference count of O2 will also be decreased by 1 and if the ref-count of O2 reaches 0, O2 can also be removed from memory. It means, one reference count reaching 0 can cause a lot of objects to be cleared from memory. This is called cascading effect in Python garbage collection.
7.1.4 Reference counting has good, bad, and ugly sides:
Good: The Good part of the reference counting algorithm is, that –
· It is easy to implement.
· It immediately clears the memory once the ref-count reaches 0.
Bad: Reference counting can be called bad as it is –
· This algorithm has some space overhead involved, as it needs some extra space for each object to store count values.
· It also has execution overhead, as the reference count is changed at every assignment. Hence, a single assignment operator can cause too many executions internally.
Ugly: Reference counting sometimes shows serious issues like –
· It is generally not thread-safe, and it may create lots of issues when multiple threads are trying to update reference count at the same time.
· Reference counting does not detect cyclical reference.
In the given diagram (Figure 1.10), there are 3 references ‘x’, ‘y’, and ‘z’ are creating a cyclical reference, and another reference ‘A’ is pointing to ‘y’. Once ‘A’ started referring to a new object, ‘x’, ‘y’, and ‘z’ are no more required in the memory. But, as the reference counts of these 3 variables are not zero, the garbage collector won’t remove them from the memory. Hence reference counting is not capable to handle this case.
7.2. Tracing:
Starting from Python 3, Python uses ‘Tracing’, along with reference counting to handle this type of situation. Tracing works on the ‘Mark and sweep’ principle, which is performed in two phases –
7.2.1 Mark Phase:
When the no of objects in memory reaches a max threshold value, it starts marking all reachable references using a reference graph, e.g., nodes r, 1, 4, 6, 7, and 8 will be masked reachable during the mark phase.
7.2.2 Sweep Phase:
Once all reachable references are identified, all remaining objects are removed out of memory in the sweep phase, e.g., nodes 2, 3, and 5 will be removed during the sweep phase.
8. Conclusion:
In this article, we understood how python internally manages its object to minimize memory uses. We have also seen that how python runs its garbage collector. Most of the garbage collection is done by reference counting only and programmers have no control over it. Though so much execution overhead during memory management makes it slower than other languages, still its is most widely used as it offers many more other useful features to the programmers.