December 23, 2022
Go is famously known for its concurrency model and its ability to handle hundreds of thousands of goroutines on a single machine. But how does it do that?
Photo by Mingwei Lim on Unsplash
To understand how Go handles concurrency, we must first have a bit of an understanding of how the underlying operating system schedules threads. When a program is launched, the operating system creates a new process, with a new thread called the “main” thread. This thread is then put in a queue of threads that are ready to be executed. A thread can have 3 different states: waiting, runnable, or running. A waiting thread is usually waiting for a system call, like a network request, to finish. The thread can’t continue executing without the system call being fulfilled first. The OS, therefore, puts that thread in a waiting state and frees some space on the CPU for another runnable thread to execute. The CPU must be always busy, so we can’t have (or at least try not to have) any threads blocking the execution of other threads that are ready to be run.
To keep the CPU busy, the OS usually performs context switches. A context switch is when the OS decides to switch a thread from a running state to a waiting state and puts a runnable thread in place of the waiting thread. Context switches are very expensive and time-consuming because the operating system must first save the state of the waiting thread so it can be later reloaded. Then, it must put a runnable thread on a CPU core, and switch that thread’s state from “runnable” to “running”. However, as far as the threads know, they never stopped running. Context switches usually happen when a thread is waiting for an I/O task to finish, and there are existing “runnable” threads ready to be executed. That’s why throwing more threads at a problem doesn’t mean it will run faster, it usually means there are going to be more context switches and wasted CPU cycles.
In Go, thread scheduling happens differently. Go tries to minimize its reliance on the OS’s scheduling and context switching because they can be very expensive & time-consuming, wasting many CPU cycles that could’ve been used efficiently. Go’s runtime manages the execution of goroutines, similar to how an operating system manages the execution of threads. Goroutines however are much lighter than threads. They hold a 2KB stack size (which grows dynamically as needed), while operating system threads usually have a stack size of 2MB. In fact, the main function in a Go program is executed as a goroutine, called the “main” goroutine. This main goroutine runs on a thread that is scheduled by the operating system.
Ok, so you execute your program, and your main function is run as a goroutine on an OS thread. But what happens when your program (which is currently executing) decides to launch another thread?
When you decide to launch a child thread in your program, the Go runtime does most of the heavy lifting. Your Go program runs on a logical processor, which is executed by a thread that is managed by your operating system. The Go runtime has a strict rule in which it only executes one thread on each CPU core. This way, context switching between threads is minimized. Each logical processor has a local run queue, and there is also a global run queue. When you launch a child thread, depending on the load of the CPU, it will either get put on the local run queue or the global run queue. If there is nothing in its local and global run queues, it will try to steal other goroutines that are in a run queue of some other logical processor. Notice that the OS is not involved in the decision of which goroutine runs, and which doesn’t. This is all managed by the Go runtime.
The Go runtime handles scheduling for the execution of goroutines by keeping track of ones that need to be executed in local and global run queues. Go routines carry the same states as threads: running, runnable, and waiting. When a goroutine makes a system call, the Go runtime takes a runnable goroutine from the local or global run queue and executes that on the logical processor. It performs a context switch. However, Go’s context switches are much faster than the ones performed by the operating system because it doesn’t have to save much state for each goroutine. It knows more about what each goroutine is doing, so context switching between goroutines is much faster and more efficient. From the operating systems’ perspective, your Go program, which internally can perform many context switches between goroutines, never goes into a waiting state. Your program is trying to keep itself executing on the CPU for as long as possible, and it doesn’t cue the OS into knowing that it’s performing any I/O bound work. As long as the OS is concerned, your Go program is busy executing some kind of CPU-bound work, and it needs to be kept executing. This is one of the reasons why Go programs are really quick and efficient.
In the end, you learned that Go is able to handle executing hundreds of thousands of threads because it minimizes its reliance on the operating system’s scheduler and uses its own internal scheduler that is a part of the Go runtime. This way, it’s able to intelligently make decisions on which goroutines are executed, and is able to perform context switches more efficiently.