Tutorial #6 - Optimizations - Introduction

To show some examples of optimizing a program, some tests were run to show the cost. Each test was run in a loop 100 million times and averaged from 5 tests. After each set of tests, I'll give an overview of what may be happening and how to try and get that extra speed in the games you create. Even though pretty basic, it should help understand why and how to optimize.

OpenGL calls

glRotatef(74.38, 28.48, 82.74, 129.9)
glColor3f(74.38, 28.48, 82.74)
glTranslatef(74.38, 28.48, 82.74)
2.116780 sec
1.245937 sec
1.233421 sec
0.988515 sec
0.986884 sec
0.923384 sec
0.156373 sec

The rotate function takes the most processing power as you would expect since it's the only function listed that does any calculations. The next two do some calculations but either shifting or initializing so not as much time is used. glLoadIdentity loads a matrix so it will take time and the glEnd/glBegin functions pretty much start/end the rendering state. glBlendFunc just sets the blending state so does not use up as much time.

Even though these calls may be seem slow such as glRotatef - it's usually faster using the hardware accelerated functions on the graphics card then writing your own functions. Since the cards specific goal is graphics, it can spit out the values quicker than a general processor like the CPU. This is the same with buffering, clipping, culling, etc.


for loop
while loop
if (x == y)
0.432281 sec
0.311844 sec
0.316098 sec

Here the while loop and the two if statements take around the same time. The for loop takes longer which is likely due to the creation of the loop itself (assigning the values and incrementing of the loop) and then the comparisons thereafter. To optimize, the best advice is just to make sure whatever is in the loop is really needed.

Function calls

Test_1(float, float, ..., float)
Test_2(&float3, &float3)
0.339573 sec
0.153631 sec

Test_1 has 6 float functions passed, whereas Test_2 has only two addresses of float3 (which is just .x .y .z - three floats). Test_1 does take longer since it has two create the copy when passed and then delete it when returning. Creating a copy may be good if you want to make sure nothing is being overwritten since any value changed in Test_2 will remain changed. This though as shown above will take longer, which can be further shown by this class with the constructor, destructor and the copy constructor which copies the passed variables.

class Test_Copy {
Test_Copy::Test_Copy() {}
Test_Copy::~Test_Copy() { printf("Exiting\n"); }
Test_Copy::Test_Copy(const Test_Copy& rhs) { printf("Copying\n"); }
float3 data;

Then by calling the function Test_3(Test_Copy tmp_1, Test_Copy tmp_2) {} the following is printed to the screen:


Even though nothing is done in the function, it still has to create the copy and then destroy it. If we would have called Test_4(Test_Copy *tmp_1, Test_Copy *tmp_2) {} and just pass the pointers, no copy will have to be made and later destroyed. Also take into account the structure itself - an address to any structure will still only be the address but if a whole animation class is copied - every variable within the class is also brought with it and copied.


Basically, the above information should give some sense of function cost and at the same time show that these functions now don't really take that much time as compared to even a few years ago. Although a really good functioning game should keep resources down, a few bad optimizations on a newer system won't cause that much trouble. Throw in a glLoadIdentities for every triangle drawn and you get a slideshow, but knowing what goes where and what certain functions cost can help a lot when trying to speed things up.

Copyright © 2018 Luigi Pino. All rights reserved.
Privacy Policy | Links