Introduction to OpenGL
What is OpenGL?
OpenGL is an application programming interface (API) that allows us to draw 2D and 3D graphics.
When running on a standard PC, the OpenGL API is essentially a set of functions that allow us to control the inner-workings of the graphics card. The idea is that it helps us, as developers, to easily create 2D and 3D graphics at fast ‘real-time’ framerates.
Graphics cards and the software drivers define something called the graphics pipeline. This is a conceptual model of the operations and dataflow that take our 3D model inputs and convert them into the 2D image on the screen. OpenGL is an interface that allows us to access and control this pipeline.
Technically, OpenGL is not itself a library but a standard interface specification. This allows individual hardware vendors (like NVIDIA and AMD) to implement their own drivers and libraries to give developers common access to their hardware. This makes our lives a lot easier because otherwise we’d probably need totally different code for every type of graphics card we wanted to support!
What is a graphics card and how does OpenGL work with it?
The graphics card is a computer component that is specially built to make graphics run faster than they can on a general purpose processor. OpenGL essentially allows us to access and program graphics cards.
What is a GPU?
GPU stands for Graphics Processing Unit. It’s the chip that sits in a graphics card and does the actual processing. A lot of computers don’t have separate graphics cards, they may be built into the main processor or embedded somewhere else on the motherboard. It’s therefore more useful to refer to a GPU instead of the graphics card.
What is OpenGL ES?
This is essentially a cut-down version of the full OpenGL specification for embedded systems. This means things like mobile phones and tablets, where the graphics capabilities are less than a desktop graphics card.
What is WebGL?
This is another cut-down version of the full OpenGL specification, this time designed for web-browsers. It’s proved quite popular and allows 3D graphics within webpages.
What is GLSL?
GLSL is the OpenGL Shading Language. This is a C-like language for short programs, called shaders, which run on the graphics card. There are several types of shaders and they do separate tasks. In modern OpenGL there are two required shaders: one is the vertex shader, which runs for every vertex that’s drawn and has to work out its position; the other is the fragment shader that runs for every pixel that’s drawn and has to work out its colour.
What is Vulkan?
This is another API for accessing the graphics pipeline. It’s much lower-level than OpenGL and was originally based on AMD’s Mantle. There are some distinct advantages to Vulkan, such as support for CPU-side parallelisation, but the lower level of interface gives it a greater barrier for entry in my opinion. It should offer much better control of the GPU though and lower overheads on the CPU-side, which better match the requirements of the games industry.
Should I learn OpenGL or DirectX?
I’m often asked this question by graphics and games students. My general opinion is that fundamentally it shouldn’t matter which you learn. APIs come and go, sometimes it feels like as soon as you learn one it changes (or disappears!). What’s important is that you have a flexible mindset and transferrable skills so you can adapt to the current situation.
In terms of these two graphics APIs, they are basically two different ways of accessing the same underlying resource: the graphics card. There are, therefore, similarities between the two and it shouldn’t be a huge leap to transfer from one to the other.
Personally, I was taught OpenGL and have not yet needed to switch to DirectX, but I’m not totally against this concept. OpenGL suits my needs because it’s cross-platform (I do a lot of dev work under Linux), it works on mobiles (Android at least) and even in web browsers.
This is a little summary of how OpenGL came into being.
So back in the 80s there was a fun company called Silicon Graphics (later SGI). They were making some pretty revolutionary computer hardware and were building machines that would accelerate graphics.
These were the days before PCs, when a graphics workstation cost more than the worker’s yearly salary. SGI’s machines generally had MIPS processors, a totally custom architecture and graphics accelerators for real-time graphics. These ran their own flavour of Unix called IRIX.
To allow programmers to use their hardware, they developed a 2D and 3D graphics library and called it IRIS GL. Their approach proved popular with developers and in the early 90s they sat down for a redesign.
What they came up with was a reorganised but natural evolution of IRIS GL, which they cleverly licensed to their competitors. They separated the proprietary code and published it as a specification standard. They called it the Open Graphics Library, or OpenGL for short.
At its release in 1992, SGI brought together all interested parties and formed the Architecture Review Board (ARB). This group is responsible for the design and development of the OpenGL standard.
OpenGL was not the only graphics API at the time. IRIS GL continued for a while, as SGI supported its existing customers, and there was the more open PHIGS library which predated OpenGL but was considered harder to use. The most major competitor to emerge was Microsoft’s Direct3D / DirectX in 1995.
As the 90s progressed, the IBM compatible PC became the dominant force and SGI declined. By the end of the 90s, many of SGI’s graphics engineers moved to a new company called NVIDIA, where they helped develop hardware accelerated graphics for PCs. Unable or unwilling to adapt, SGI became bankrupt. Its website currently redirects to HP.
I like to think of OpenGL as SGI’s legacy. Before SGI’s ultimate demise though, the largely independent OpenGL ARB voted to move to the Khronos Group in 2006 and it’s this group that now oversees its development.
The OpenGL Graphics Pipeline
To learn OpenGL we need to look at the graphics pipeline. The graphics pipeline is essentially a set of operations that take 3D data in and output a 2D image to the screen. OpenGL and DirectX are software interfaces to this pipeline.
I like to use various analogies to visualise this and to explain some of its different aspects.
In some ways, OpenGL is like a child’s marble-run game. You set up the stands and chutes and loops, and then when you’re ready to go you empty the marbles into the top container and watch them go. If you’ve set up everything correctly, the marbles all end in the final container – in the case of OpenGL, giving us a pretty picture as the output. However, if we didn’t set it up right all our marbles end up on the floor – with OpenGL, this often means that nothing reaches the screen and we’re left scratching our heads (and searching for our lost marbles).
OpenGL then can also be thought of as a black box. When something goes wrong, unlike the marble-run, we can’t always see the error. To fix problems it’s therefore important to have a good understanding of the pipeline itself.
Graphics Pipeline Summary
In a nutshell, the graphics pipeline works a bit like this:
- Our 3D models are defined as lists of vertices that make up triangles
- Each vertex will have multiple properties: position, texture coordinate, normal, etc
- We store our 3D models in memory that’s managed by OpenGL – this normally ends up on the graphics card to make it run faster
- To draw our model, it’s like we ask OpenGL to take a copy of every vertex and send it down the pipeline
- The pipeline then does a series of operations on our vertex data:
- Each vertex goes through a vertex shader, which is a small program we must write and which must perform viewing transformations
- Vertices are joined back up into triangles
- Triangles are split into potential pixels, called fragments, in the process known as rasterisation
- Each fragment is run through a fragment shader, another small program we must write which figures out what colour it should be (we can do lighting calculations in here)
- There are a few extra tests and then the fragment is drawn to screen as a pixel
Development of the Graphics Pipeline
The graphics pipeline has changed a lot in its short history. Furthermore, it will continue to change as hardware becomes more powerful and as developers implement more advanced features. This is part of what makes graphics programming so exciting, we’re constantly pushing the boundaries of what’s possible.
Given its constantly changing nature, I feel that having some understanding of its past can help us to cope with its future. If nothing else, it helps us to realise that changes are the norm!
When you’re learning OpenGL you’ll find lots of code online that’s old and makes reference to things that aren’t in the current pipeline – having some understanding of previous features can help you recognise and adapt (or avoid) old code.
Finally, it may come as a shock but not everyone has the latest and greatest graphics hardware. When we develop games or other graphics applications for publishing, we need to consider how to support users with older hardware.
Pipeline circa 2001
The fixed function pipeline had optional elements to it, but was fundamentally limiting in scope. Developers wanted more flexibility to push more exciting graphical features, without just having to select from a fixed menu of options.
This desire for flexibility led to perhaps the most important and fundamental shift in graphics card technology: the movement towards more general and programmable parts of the pipeline.
At first though, these were clunky and awkward to use. We had extensions like register combiners, which allowed fragment data to be combined in different ways to achieve certain texturing and lighting effects. DirectX 8, and the NVIDIA GeForce 3 in 2000, brought the revolutionary capability of vertex and pixel shaders.
Shaders are short programs that run on the graphics card to process individual vertex or fragment data, but these early versions had to be written in assembly (which is not much fun) and were still very limiting in what could be achieved and the length of the program.
Pipeline circa 2003
Finally, NVIDIA released the CG language and compiler for programmable shaders and we could use a high level programming language to write shader code. CG was originally developed together with Microsoft, but they had some sort of disagreement and Microsoft released their own version for DirectX 9 called HLSL (higher level shading language).
Alongside the increase in usability, the hardware continued to develop and we could write (slightly) longer shaders with more complex operations. I remember being especially excited with the GeForce FX’s ability to allow texture coordinates to be calculated in the fragment shader.
Pipeline circa 2004
Development focussed on shader capabilities and gradually we got longer programs and more
We got access to more precise data types, though doubles remain slower than floats to this day. The vertex shader could now access textures (displacement mapping, anyone?).
Perhaps one of the biggest advances though was the ability to use true conditional statements and loops within shaders. Until this point, a loop had to be ‘unrollable’ at compile time, with a well determined number of times around the loop. Conditional statements are still slow depending on the situation, but at least you can use an ‘if’ statement now should you need one.
Pipeline circa 2007
At the SIGGRAPH 2007 OpenGL ‘Birds of a Feather’ event, OpenGL 3.0 was announced (and we got free t-shirts). This release represented a significant rework, with the old fixed-function pipeline now gone and the addition of an optional geometry shader. Other changes included the introduction of context profiles and the removal of immediate mode from the core profile.
The geometry shader was driven by ATI and represented a curious addition to the pipeline – the ability to change the nature of a geometric primitive, for example adding or removing vertices. While the most obvious application would be subdivision, my early experiences of geometry shaders led me to feel they were best avoided – they were just unbelievably slow.
NVIDIA was also taking their hardware design much further in flexibility. They came up with a unified shader processor, which could be allocated as either a vertex or fragment shader depending on the pipeline load.
It was also around this time that NVIDIA announced CUDA, which is a more general language to allow a graphics card to be used for non-graphics applications. I imagine this has proven to be a very lucrative decision for them, as NVIDIA cards are frequently found in the top supercomputers now.
Pipeline circa 2014
If we fast-forward to OpenGL 4.5, the main architectural changes were the addition of tessellation shaders and more formal loops back from later stages to the beginning of the pipeline. Most of the real changes were at the lower levels as buffers became more unified, shaders became longer and graphics cards generally became more powerful.
Tessellation shaders are powerful but depending on what you’re doing you might not need them.
Pipeline circa 20??
What does the future hold for the graphics pipeline? Given NVIDIA’s moves into real-time ray tracing I would suggest some convergences here and Direct X is already offering access so OpenGL will surely follow. Given the quite different nature of ray tracing to the traditional rasterisation approach of graphics cards, this will be interesting to watch.
The OpenGL API follows a client-server design, where the server is the graphics system and as programmers we write the client software.
This separation is especially evident in the modern OpenGL approach where as standard we send data to the server and spend time setting up the server-side states. It also indicates that operations we request are not executed right away and that execution can take some time, but we're free to continue with other work (i.e. it works asynchronously).
In versions before OpenGL 3, one of the key advantages to learning OpenGL was that you could issue drawing calls in your code and they’d be executed at that point.
This was insanely inefficient because it bypassed the advantage of the asynchronous nature the client-server architecture gave. However, it was much easier to set up and we could get things going quite quickly.
As an example, we could specify the viewing matrices then define what geometry to draw, vertex-by-vertex, and it would just draw it. In the current version of OpenGL we’d have to create server-side buffers for the vertex data, send the data to them, create shaders, handle the matrices ourselves, and then issue a drawing call. The new way is much faster, but there’s more setup and you need to have some understanding of a lot more of the pipeline just to get going.
Personally, although immediate mode is tempting it does lead to bad habits (making inefficient code), so I would recommend using the more modern approach. Just be aware that the initial learning curve is a bit steeper, but you’ll end up with an earlier understanding of what you’re doing and how the pipeline works.
A context in OpenGL is loosely the container for an instance of OpenGL. It contains all of the states and data as well as the connection to the displayable window (the framebuffer). We must create a context when we initialise OpenGL, but usually there are utilities and libraries that we use so this is hidden from us. Something we can do is to choose the OpenGL version we want to use, which can potentially give us access to different features.
Until OpenGL 3, all versions were fully backwards compatible. However, with OpenGL 3 the standard started deprecating old functionality and mechanisms. To optionally maintain backward compatibility without compromising access to the latest features, the standard introduced two context profiles: compatibility and core. With the compatibility profile we have full backwards compatibility, while core restricts us to only the functionality from the selected version.
When we set up OpenGL we must choose between these profiles. If we choose the core profile and a modern OpenGL version we will not be able to use the old fixed-function pipeline and features like immediate mode. This is an important distinction for a beginner, especially if you’re trying to piece together projects from code you find online (which may be from different OpenGL versions).
The major (and even minor) releases of OpenGL have traditionally been at significant intervals. I would say this has improved significantly since the ARB moved to Khronos, but it’s still not necessarily enough to keep up with hardware development.
When a graphics card manufacturer releases a card with a new feature, they need to give developers a way of accessing this. The OpenGL method is for them to give an extension, which is typically extra functions and enumerators. Eventually, these are then incorporated into the full core OpenGL API.
There are various ways of accessing extensions at a low level, which I don’t recommend (mostly due to the tedium). Instead I recommend using a utility library to manage extensions. I personally like the GL Extension Wrangler, GLEW. It’s lightweight and generally easy to use.
Using a utility like GLEW is also vital if you’re developing under Windows with Visual Studio, as the OpenGL libraries that VS ships with are typically decades out of date. A simple initialisation of GLEW will get you access to the latest and greatest features. At least in theory.
The nature of extensions means they’re tied to capabilities of specific hardware. This means that if you want to use an extension you should really first be checking whether the underlying hardware is capable of supporting that extension.
When learning OpenGL, this is less likely to be an issue as you’ll probably have good knowledge of your development machine. It only really becomes an issue when you start thinking about distributing your code. Checking and supporting older hardware may, or may not, be important to you and you’ll always have to draw a line somewhere.
Opening a Window, Getting Input, Maths Support, and File Input
When we interact with a computer we generally use a windowed environment and expect the programs we run to respond to our input. Unfortunately OpenGL doesn’t handle any of this. I mean it won’t even open a window.
This comes down to its cross-platform capabilities and the fact it’s only an interface specification. The original IRIS GL did do things like window management, but that’s because it ran on UNIX before X-Windows was invented. To make it cross-platform, SGI stripped out the interaction aspects and concentrated on making OpenGL a graphics only design.
Whatever the reasons, we have no choice but to use alternatives. You can use the OS-provided mechanisms, such as X-Windows, but I always feel this goes against the cross-platform nature that OpenGL gives us.
Instead, you can use libraries such as SDL, SFML, GLFW or if you want widgets something like FLTK. I wouldn’t recommend GLUT nowadays; it’s quite old now and gives very little control over the game loop.
Similarly, OpenGL doesn’t provide any way of loading files from local storage. Again, this would be operating system dependent, so we need to use additional libraries. An OBJ file parser is straightforward to write and a good beginner’s programming exercise. If you want anything more complex, you might want to look at Assimp.
For image loading you could use SDL, which is quite convenient if you’re already using it to manage windowing and user input. Alternatives include DevIL and SOIL, and OpenEXR if you’re into HDR.
Early versions of OpenGL included support for matrix manipulation and a full viewing matrix stack. While these were useful for beginners, my understanding is that no real production company was using them in a live system so they were removed to simplify the API. The result is that we need to use our own vector and matrix solutions, for which I’d recommend GLM.
The definitive documentation for OpenGL is the official programming guide, or ‘red book’:
Make sure you get the latest one, if searching you’ll find various different versions and some of the older ones are available online. The red book has improved over the years, but at times it reads more like a technical manual. While it’s an important reference point, I’ve found its emphasis on the API can make it tricky for a beginner to grasp the underlying pipeline concepts.
There are also a variety of other introduction-level books I’d recommend:
Conclusion and How to Learn OpenGL
Programming with OpenGL is a lot of fun. Just like making games or other graphics, it's fairly straightforward to get something going but it takes hard work to get it looking great.
Over the years I’ve guided hundreds of students through OpenGL and the approach I normally recommend is a mixture of theory and practice. To learn OpenGL properly we need to look at the underlying maths, its application through physics in lighting and shading, and the ever-changing graphics hardware which OpenGL gives us access to that will produce our final renders.
At the end of the day OpenGL is a just a tool, perhaps like an advanced paintbrush. Knowing how to use the tool is important, but it’s what you do with it that counts.
OpenGL® and the oval logo are trademarks or registered trademarks of Hewlett Packard Enterprise in the United States and/or other countries worldwide.