Are pointers a performance optimization?

A controversial question and my view on it

7 min readDec 29, 2024

The short answer: No, pointers are not a performance optimization.

While pointers can be useful and may improve performance in specific cases, they are not a magic bullet. In fact, they can even introduce inefficiencies. This article will try to focus on these.

It’s important to keep in mind that the performance aspects discussed in this article are micro-optimizations. Always benchmark your code before applying any micro-optimizations to make sure they actually make a noticeable improvement in performance.

What are pointers?

At a basic level a pointer is a memory address. To access the value being pointed at, the program must follow the address to the beginning of the value. This is referred to as “dereferencing.”

How could using a pointer be an optimization?
At their core, pointers are memory addresses. Instead of directly containing a value, a pointer contains the location in memory where the value is stored. To access the actual value, the program must “dereference” the pointer — essentially following the address to retrieve the value at that location.

The main argument for pointers as an optimization revolves around memory copying. When you pass a variable to a function, the default behavior is to copy the variable’s value into the function’s local scope. If the variable is large — such as a struct with many fields — this copying can be expensive in terms of both time and memory.

A pointer, on the other hand, is typically small: it is usually the same size as the system’s architecture.

Instead of copying the entire variable, you can copy a single pointer, which then allows the function to access the original data. This can reduce memory usage and improve performance in some cases.

But can pointers negatively affect performance?
Absolutely. While pointers can improve performance in some cases, they can also introduce inefficiencies.

1. Dereferencing Costs
Dereferencing a pointer is not free. Although modern CPUs are highly optimized for such operations, there is still an added level of indirection. For example:

Accessing the value of a variable directly involves one memory operation.
Accessing the value of a pointer involves at least two operations: one to retrieve the address and another to retrieve the value at that address.

This additional cost can add up more in performance-critical code, especially if pointers are used excessively or in tight loops.

2. Heap Allocation
Pointers often lead to data being placed on the heap instead of the stack.
In Go, this is determined by escape analysis: if the compiler detects that a variable’s lifetime exceeds the scope of the function, it places the variable on the heap to ensure it remains accessible.
(feel free to read about Escape Analysis for more details)

Heap allocations have several downsides:

Allocation Overhead:
Allocating memory on the heap is more expensive than the stack because it involves interacting with the runtime memory manager.
Garbage Collection:
Heap-allocated memory is managed by the garbage collector, which adds runtime overhead. The more data you place on the heap, the more work the garbage collector must do to clean up unused memory.

3. Cache Misses
Modern CPUs rely on a hierarchy of caches — L1, L2, and L3 — to improve performance by reducing the time needed to access frequently used data.

L1 Cache: The L1 cache is the smallest but fastest cache, located closest to the CPU cores. It typically stores a limited amount of data and instructions that are accessed most frequently. Its extremely low latency ensures near-instantaneous access to this data. However, its small size means only a subset of the working data can fit, leading to potential cache evictions when new data is loaded.
L2 Cache: Sitting between the L1 cache and L3 or main memory, the L2 cache is larger but slightly slower than L1. It acts as a secondary buffer for data not found in L1, reducing the number of accesses to the slower L3 cache or main memory. The L2 cache is shared across cores in some CPU designs, while in others, each core has its own dedicated L2 cache.
L3 Cache: The L3 cache is the largest and slowest among the three but still significantly faster than accessing main memory. It is typically shared across all cores in a CPU and serves as a final layer of caching before data must be fetched from main memory.

Stack memory often exhibits excellent spatial and temporal locality, meaning data is stored and accessed sequentially or reused frequently within a short period. This predictable access pattern makes it more likely for stack data to remain in faster, smaller caches like the L1 cache, which has the lowest latency.

Heap memory, on the other hand, is dynamically allocated and tends to be scattered across memory. This lack of spatial and temporal locality means heap data is less likely to reside in the L1 cache. Instead, accessing heap data often results in cache misses, requiring the CPU to fetch it from slower caches like L2 or L3 or, in the worst case, from main memory.

Increased Complexity
Using pointers can also make your code more complex and harder to debug. Also, excessive use of pointers may lead to more frequent garbage collection pauses, increasing latency in high-performance systems.
(I’ll mention that later in the article.)

Stack vs Heap

Stack: Function-Local Memory
The stack is a fast, organized section of memory used for function-local variables. Each function call gets its own stack frame and when the function returns, that area is available for the next function call, no other cleanup is necessary. While not free, this process is relatively cheap.

Heap: Shared Data
The heap is a region of memory used for dynamic allocation. It’s primarily used to store data that needs to persist beyond the scope of a single function call, such as when a pointer is returned from a function.

Unlike the stack, which operates in a Last-In-First-Out (LIFO) manner and automatically deallocates memory when a function exits, data on the heap is manually managed and can live as long as needed.

Variables normally “disappear” after the function returns. However, if pointers are returned, the pointed-at data needs to be placed somewhere outside the stack so that it will not “disappear.” This is what the heap is for.

There are a few performance related concerns with the heap:

Excessive use of pointers often results in more data being allocated on the heap. This leads to larger memory footprints and increased garbage collection workload. It’s not a huge overhead, but not free either.
If the heap runs out of space, the runtime must request additional memory from the OS. This operation is significantly more expensive than managing stack memory and introduces additional overhead.
Once a value has been placed in the heap it needs to stay there until no functions have a pointer to it anymore. When there are no more pointers to the data it needs to be cleaned up. This is the job of the garbage collector. It has to find all the unreferenced values and mark their space in the heap as free. The more values placed in the heap, the more work the garbage collector has to do, and the more potential there is to impact your application.

The Role of Garbage Collection
Garbage collection (GC) is a mechanism used by many modern programming languages to automate memory management.
It identifies unused objects in the heap and reclaims their memory.

However, the efficiency of garbage collection is closely tied to how memory is utilized in your application. Different programming languages use different systems for GC, but pointers can negatively impact GC performance in all of them. For example:

Reference counting GC systems keep track of how many pointers reference each object. When the reference count drops to zero, the object can be safely deallocated. Each pointer usage complicates this process, especially when cyclic references occur (e.g., two objects referencing each other).

Mark-and-Sweep GC systems periodically scan the heap to identify unreachable objects. Overusing pointers increase the heap’s size and slowing down the tracing process.

Concurrent and incremental GC systems, such as those used in Golang, has GC cycles alongside program execution. Heap usage from pointer allocations increases the frequency and intensity of these cycles and consume CPU resources, which introduces additional overhead.

So why use pointers?

There may be situations where they can be the right choice:

Sharing Data: When you need a function to modify the data you pass to it, using a pointer is appropriate.
Avoiding Large Copies: For large structs or arrays, passing by pointer can actually meaningfully avoid the cost of copying the data.
Zero vs Unset Values: Pointers allow you to distinguish between a variable that has been explicitly set to zero and one that hasn’t been set at all as I have mentioned in this story.

Conclusion

Yes, pointers can reduce memory copying and enable data sharing, but they also introduce additional levels of indirection, heap allocations, and potential garbage collection overhead.

The key takeaway is that pointers should not be used purely as a performance optimization unless profiling and benchmarking reveal that they address a specific bottleneck in your application.

In most cases, modern CPUs are so fast at copying memory that avoiding the use of pointers results in simpler, faster, and more maintainable code.

This story was in my Drafts since early 2023, decided to give it a shot and published it.