Ten Little Endians
Monday, 27 July 2009
I like to think that I support the open source community. I find open source to be very beneficial; I like the ability to review code and see how things work. Although I use open source tools (everything from gcc and nmap to Gimp and Open Office), I seldom incorporate open source code into my own code, and rarely contribute my own code to the open source community.
I have a few big reasons for not incorporating free and open source software (FOSS) in my closed source world. First, FOSS usually needs so much tweaking to fit into my code that it is easier to implement it from scratch. In many cases, my code has much more demanding requirements -- every assumption and error condition must be identified and handled correctly. In my experience, most FOSS is designed to get the job done, and not test for corner-case conditions.
And then there is licensing... I hate GPL. Just as I don't like door-to-door solicitors who try to push their religion on me, I also don't like licenses that try to push their philosophies onto me. GPL is a virus. If you use GPL code in your code, then your code becomes GPL. Give me the MIT, BSD, or WTFPL licenses -- these allow anyone to freely use the code. (BSD and MIT have the additional requirement to cite their work, but that does not impede my own license.) GPL requires that your own code becomes GPL. It may be an open source license, but it comes with strings; GPL isn't "free".
On one project, I worked with some pretty big names in the open source world. However, I felt the need to constantly remind them that "Open source is a fad" -- just like the pet rock and parachute pants. (But like the mini-skirt, not all fads are bad. I also believe that the Internet is a fad based on a bad joke, yet everyone takes it so seriously.)
I actually have a list of reasons why open source is bad. This isn't to say that closed source is good -- closed source is bad for other reasons.
With the open source community, you get to meet these people head-on.
Let's say you have a question about an open source project -- anything from usage to technical programming queries. Most mid- to large-size projects want you to post your message to a forum or send it to a mailing list. In my experience, you have a 50% chance of getting the posting answered, and an independent 50% chance of getting insulted or a rude reply.
There are some notable exceptions to this. For example, the Linux kernel mailing list seems very professional and polite. However, most crypto forums and truly open security mailing lists are crawling with trolls and rudeness.
Since my Mac code is compiled as a universal binary (runs on both Intel and PowerPC platforms), I needed FFmpeg compiled as a library with universal binaries. The universal libraries are only needed during compile-time, not run-time. At run-time, the "correct" platform libraries are used.
In order to make a universal library out of FFmpeg, I needed to make some code changes. One of the big problems with supporting multiple platforms comes from byte order -- the endian issue. The FFmpeg library hard-codes the endian conditions during the configure stage (before compiling). This is a problem since a universal binary must support both endian conditions.
Abiding by their license (it's part of the LGPL), I shared my list of changes with the FFmpeg community. Boy, this was a mistake. Talk about a rude bunch of people. Some of the responses:
My posting was not an isolated incident. Many threads -- initiated by people other than the FFmpeg developers -- received multiple rude or hostile comments. If you don't like hostile replies, then you can follow Dark Shikari's advice and "Get the hell out."
More importantly... my background includes many years as a C code optimizer. My job was to identify areas in code written by non-computer science people and make it run faster. (Compiler optimization flags can only do so much. If you write high-performance code, then you can usually do better than any tweaks that the -O2 or -O3 flags do to slower code.)
Since Vitors's sample code uses three indirections within the loop, I wanted to see where it was used. Removing even one indirection would speed up the loop. And if this code was really critical, then perhaps inline ASM would be faster...
Moreover, these functions can be combined to take advantage of the local architecture. For example, if you are on a big endian architecture and calling get_be16(), then you currently run:
Here's some faster code for big endian architectures:
Now, if you define this as a macro in libavformat/avio.h instead of a library function in libavformat/aviobuf.c, then the function call even becomes inlined. (With gcc -O2, inline functions don't cross library boundaries. Since FFmpeg compiles each .c file independently, no simple inlining happens.)
Of course, my code suggestion was to not even hard-code the endian-ness. Instead, determine the system's endian, then choose the right conversion method. The determination is only made once, when the program first starts (so there is no ongoing performance loss), and it is very fast:
Now, each of the get-functions can test the endian. We can even pass big/little as a flag:
Granted, this code does introduce an "if" condition. However, if this is defined as a macro in a header file rather than a C function (notice the backslashes at the end of each line), then you drop two push calls (parameters for the function), a function call, and associated stack pops. The one additional "if" is much faster.
In the worst case, there is an "if", two assignments, and a return (which gets optimized to a single assignment by the compiler) -- much faster than the original two gets, two assignments, a bit shift, a logical OR, and a real return that cannot be optimized. All in all, this is a very good performance improvement. And since the endian code is called all over the place -- including in critical loops -- this should be a significant performance increase.
The original FFmpeg code actually does something a little more complicated for getting a single byte (the get_byte() function). However, even this can be optimized. The FFmpeg source code actually includes the comment "XXX: put an inline version".
PS. For people who don't know the children's song, "Ten Little Indians", you can find the lyrics at Wikipedia.
I have a few big reasons for not incorporating free and open source software (FOSS) in my closed source world. First, FOSS usually needs so much tweaking to fit into my code that it is easier to implement it from scratch. In many cases, my code has much more demanding requirements -- every assumption and error condition must be identified and handled correctly. In my experience, most FOSS is designed to get the job done, and not test for corner-case conditions.
And then there is licensing... I hate GPL. Just as I don't like door-to-door solicitors who try to push their religion on me, I also don't like licenses that try to push their philosophies onto me. GPL is a virus. If you use GPL code in your code, then your code becomes GPL. Give me the MIT, BSD, or WTFPL licenses -- these allow anyone to freely use the code. (BSD and MIT have the additional requirement to cite their work, but that does not impede my own license.) GPL requires that your own code becomes GPL. It may be an open source license, but it comes with strings; GPL isn't "free".
On one project, I worked with some pretty big names in the open source world. However, I felt the need to constantly remind them that "Open source is a fad" -- just like the pet rock and parachute pants. (But like the mini-skirt, not all fads are bad. I also believe that the Internet is a fad based on a bad joke, yet everyone takes it so seriously.)
I actually have a list of reasons why open source is bad. This isn't to say that closed source is good -- closed source is bad for other reasons.
One Little...
One of my FOSS dislikes comes from the community. Engineers are stereotyped as geeky people with no social graces. Some are Prima Donnas, others are outright rude. In the closed-source corporate world, there are layers upon layers of support staff for keeping the engineers away from the customers. The last thing any project needs is an engineer on the phone with a customer saying, "That's stupid! Only a moron would do that!" (Business 101: Never call a customer stupid, regardless of how stupid they are.)With the open source community, you get to meet these people head-on.
Let's say you have a question about an open source project -- anything from usage to technical programming queries. Most mid- to large-size projects want you to post your message to a forum or send it to a mailing list. In my experience, you have a 50% chance of getting the posting answered, and an independent 50% chance of getting insulted or a rude reply.
There are some notable exceptions to this. For example, the Linux kernel mailing list seems very professional and polite. However, most crypto forums and truly open security mailing lists are crawling with trolls and rudeness.
Two Little...
About two months ago I had a need to use FFmpeg on a Mac. FFmpeg uses a LGPL license. (I'm not a lawyer, but...) This means that you can link to the shared library, but you cannot incorporate or statically link your code without altering your own license.Since my Mac code is compiled as a universal binary (runs on both Intel and PowerPC platforms), I needed FFmpeg compiled as a library with universal binaries. The universal libraries are only needed during compile-time, not run-time. At run-time, the "correct" platform libraries are used.
In order to make a universal library out of FFmpeg, I needed to make some code changes. One of the big problems with supporting multiple platforms comes from byte order -- the endian issue. The FFmpeg library hard-codes the endian conditions during the configure stage (before compiling). This is a problem since a universal binary must support both endian conditions.
Abiding by their license (it's part of the LGPL), I shared my list of changes with the FFmpeg community. Boy, this was a mistake. Talk about a rude bunch of people. Some of the responses:
- "Please don't misspell FFmpeg." Wow... the first reply corrected my spelling. I wrote "FFMpeg" rather than "FFmpeg". I wonder if he would have dropped the "Please" if I wrote "FFMPEG".
- "Holy shit." Need I quote more? Not very professional.
- "I am not sure at all why one would like to build ffmpeg (or indeed any other thing) as a universal binary." Ah... a short-sighted developer. I actually find Mac's universal binary option to be an elegant solution to a very complicated problem. I cannot assume that my users know which platform they are using -- they only need to know that they are using a Mac. As Ramiro replied in a rather rude fashion: "Remember that there are people who buy Apple products because they're shiny."
My posting was not an isolated incident. Many threads -- initiated by people other than the FFmpeg developers -- received multiple rude or hostile comments. If you don't like hostile replies, then you can follow Dark Shikari's advice and "Get the hell out."
Three Little Endians!
One of the replies to my endian suggestion really irked me. Vitor wrote:This is not possible for speed reasons. FFmpeg is full of very speedThere are two issues here. First, I could not find this code example in the actual code. ('find ffmpeg-0.5 -type f | xargs grep -i bytestream_read_le32' returns nothing.)
critical code like
for (i=0; i < frame_size; i++)
out[i] = table[bytestream_read_le32(&in)];
where the implementation of bytestream_read_le32 is endianness dependent
(and we cannot afford an extra if()). There is much more code like this
(all bytestream readers, bitstream readers, scalers, etc).
More importantly... my background includes many years as a C code optimizer. My job was to identify areas in code written by non-computer science people and make it run faster. (Compiler optimization flags can only do so much. If you write high-performance code, then you can usually do better than any tweaks that the -O2 or -O3 flags do to slower code.)
Since Vitors's sample code uses three indirections within the loop, I wanted to see where it was used. Removing even one indirection would speed up the loop. And if this code was really critical, then perhaps inline ASM would be faster...
Four Little, Five Little, Six Little Endians...
What I did notice is that functions like "get_le32()" are called in critical loops. This function is defined in libavformat/aviobuf.c and calls get_le16() twice. And what does this do? It gets (get) 32 bits (32) in little endian (le) format. The family of FFmpeg endian functions (get_le16, get_le32, get_be16, etc.) can be seriously optimized -- much more than any "-O2" flag can do.Moreover, these functions can be combined to take advantage of the local architecture. For example, if you are on a big endian architecture and calling get_be16(), then you currently run:
unsigned int get_be16(ByteIOContext *s)
{
unsigned int val;
val = get_byte(s) << 8;
val |= get_byte(s);
return val;
}
Here's some faster code for big endian architectures:
unsigned int get_be16(ByteIOContext *s)
{
return( ((uint16_t *)(s))[0] ); // look ma! No bit shifting!
}
Now, if you define this as a macro in libavformat/avio.h instead of a library function in libavformat/aviobuf.c, then the function call even becomes inlined. (With gcc -O2, inline functions don't cross library boundaries. Since FFmpeg compiles each .c file independently, no simple inlining happens.)
Seven Little, Eight Little, Nine Little Endians...
So Vitor's whole argument about "FFmpeg is full of very speed critical code" is bogus. If they were worried about speed, then they would optimize their endian conversion system. With any kind of video format, endian conversion happens everywhere.Of course, my code suggestion was to not even hard-code the endian-ness. Instead, determine the system's endian, then choose the right conversion method. The determination is only made once, when the program first starts (so there is no ongoing performance loss), and it is very fast:
#ifndef BIG_ENDIAN
#define BIG_ENDIAN 4321 // values are compatible with GCC's definitions
#endif
#ifndef LITTLE_ENDIAN
#define LITTLE_ENDIAN 1234
#endif
int EndianHost = LITTLE_ENDIAN; // global for tracking the host system
/*****
EndianSet(): Compute the endian for the host system.
This must be called before running any of the other endian functions.
*****/
void EndianSet ()
{
byte Test[2] = { 1,0 };
uint16_t Num;
Num = *(uint16_t *)Test;
if (Num == 1) EndianHost=LITTLE_ENDIAN; // Num is 0x0001
else EndianHost=BIG_ENDIAN; // Num is 0x0100
} /* EndianSet() */
Now, each of the get-functions can test the endian. We can even pass big/little as a flag:
get_e16( (ByteIOContext *)s, (int) endian) \
{ \
if (endian == EndianHost) \
{ \
/* no conversion needed */ \
return( ((uint16_t *)(s))[0] ); \
} \
/* else swap the order */ \
uint8_t val[2]; \
val[0] = s[1]; val[1] = s[0]; \
return( ((uint16_t *)(val))[0] ); \
}
Granted, this code does introduce an "if" condition. However, if this is defined as a macro in a header file rather than a C function (notice the backslashes at the end of each line), then you drop two push calls (parameters for the function), a function call, and associated stack pops. The one additional "if" is much faster.
In the worst case, there is an "if", two assignments, and a return (which gets optimized to a single assignment by the compiler) -- much faster than the original two gets, two assignments, a bit shift, a logical OR, and a real return that cannot be optimized. All in all, this is a very good performance improvement. And since the endian code is called all over the place -- including in critical loops -- this should be a significant performance increase.
The original FFmpeg code actually does something a little more complicated for getting a single byte (the get_byte() function). However, even this can be optimized. The FFmpeg source code actually includes the comment "XXX: put an inline version".
... Ten Little Endian Boys
FFmpeg is one of the best libraries for playing video formats. Regardless of the format (AVI, WMV, MP3, etc.), this library supports it. While they have taken some good steps to make their code fast, their code is not as efficient as it could be. Moreover, they are stereotypical of the open source community: while a few people are helpful and friendly, a very vocal group are rude and hostile. This does not make me want to contribute.PS. For people who don't know the children's song, "Ten Little Indians", you can find the lyrics at Wikipedia.


I am now curious to hear a response from the FFmpeg guys to see why those functions aren't inlined already.
Remember when the net was a friendly place where people in newsgroups actually tried to help you answer a question or learn something or fix a problem? I do, it was nice. I wonder what ever happened to that network?
Oh, and thanks for putting that song back in my head!