Tuesday, August 30, 2005

From 12 to 24 layers and other gimmicks in Flash Player 8

Here are a few quickies from Flash Player 8, these are mostly bugs I fixed and remember being important, but there were hundreds more overall:

  • Flash Player 8 now supports up to 32 sound channels (Flash 7 or older files are still limited to 8 though so things stay compatible).

  • We upped the maximum number of allowed overlapping layers using alpha transparency from a total of 12 to 24. This will probably mostly be welcome if you use blend modes.

  • Doing alpha only color transformations on bitmaps is now about twice as fast as in Flash Player 7.

  • Rendering bitmaps is faster and/or consumes less CPU on a Mac when using Safari compared to an equivalent clocked PC. This is mostly due to the OpenGL support but also the MUCH improved bitmap drawing routines which are specifically designed to run well on PowerPC.

  • Flash Player 8 will now clip overlarge vectors. In Flash Player 7 or lower scaling objects too large would cause corrupted display.

  • Flash Player 8 can now render bitmaps much much larger than the stage. Try to load a PNG which is 8000 x 256 pixels large it will still render unlike in Flash Player 7.

  • Text metrics in ActionScript are now returned as numbers instead of integers making them more precise.

  • Text leading and indent can now be set to negative numbers in the TextFormat API.

  • Various artifacts when overlaying vectors on top of video have been eliminated.

  • Alpha values in the Drawing API are now converted to numbers internally instead of being cast to integers. The result is that you'll get the whole range available internally which is 256 different alpha values. Yes I know we should have used the 0..255 scale in the first place in ActionScript, but we needed to stay compatible with the old APIs.

  • Gradients can now have up to 16 colors instead of just 8 when using ActionScript.

  • Setting NetStream.onStatus in the Mac will not freeze or crash the browser anymore when reaching the end of playback of a progressive FLV.

  • The NetStream object now sends "NetStream.Seek.Notify" messages even for progressive FLVs. This message tells you when a seek has successfully completed.

  • The NetStream object now sends "NetStream.Seek.InvalidTime" status messages when you try to seek to a invalid time for progressive FLVs. Invalid time means that either the data is not downloaded yet or your seek time is past the end of the file.

  • The NetStream object now sends "NetStream.Buffer.Flush" status messages when the buffer is being emptied, f.ex. when you reach the end of a file. This is helpful especially if videos are shorter than the buffer time you selected. In this case the "NetStream.Buffer.Full" message was never send making it difficult to build UIs around it.

  • On Mac you can now load a local progressive FLV from something else than the boot drive.

I am sure you'll find more as time goes on. But and that is usual when writing software we needed to stay realistic. We were not able to fix every issue which was reported and had to select the ones which we think had the greatest impact for users.

Enjoy Flash Player 8!

Monday, August 29, 2005

Porting the Flash Player to 'alternative' platforms

Update (10/31/06): I still see tons of people reading this old entry. Please read this for up to date information. A beta for Linux is now available.

Now that Flash Player 8 is soon to be released for Windows and OS X, I am sure we'll get plenty of questions about support for Unix, probably mostly for Linux. No, we haven't forgotten that we need to support it. It WILL happen.

But to tell you the truth I am not even happy with the state of Flash Player 7 on Linux. So much that I use the Windows versions along with Wine to get Flash content to display on my Linux boxen. So rather than just do a quick and dirty port for Flash Player 8 I would really like to get it done right. The kicker is this: It's damn hard. Much harder than even supporting OS X. While porting command line applications to Unix is beyond trivial, porting media applications (which the Flash Player essentially is) is a real nightmare. This starts with sound support where we have to support many different sound standards (ALSA, OSS, aRTs, ESD etc.), framework support (X11, QT, GTK for copy&paste support f.ex.), IMEs (is there such a thing?), font support which is almost beyond comprehensible and many other quirks and forked 'standards'. And how about support for PowerPC and x86-64? All of this needs to be done to get something acceptable on Linux IMO.

Another problem is that Linux's main compiler is gcc, which means that ALL of our MMX code which is written in Intel notation will not compile. MMX is what makes the Flash Player reasonably fast. Having to use plain C code automatically means that your performance will be cut by at least 50% rendering wise. I am really not kidding here. So why is it so difficult to port performance optimizations? Let me show an example which shows the effort which is required to support a plethora of platforms.

WARNING! I will go into some low level assembly code here, so you might want to skip this ;-)

Disclaimer: The code here in this post was totally made up in a few minutes and is NOT in the Flash Player. I am not even sure any of this code does the right thing.

So, assume a simple generic compositing function which is used if the target platform is unknown:

struct argb {
unsigned char alpha;
unsigned char red;
unsigned char green;
unsigned char blue;
} argb;

#ifdef GENERIC
void compositeARGB(argb *src, argb *dst, int n) {
for ( int i=0; i < n; n++ ) {
int a = 256 - src->alpha;
dst->alpha= ( (dst->alpha* a ) >> 8 ) + src->alpha;
dst->red = ( (dst->red * a ) >> 8 ) + src->red ;
dst->green= ( (dst->green* a ) >> 8 ) + src->green;
dst->blue = ( (dst->blue * a ) >> 8 ) + src->blue ;
#endif // GENERIC

Now for 32bit machines we would modify this function the following way to get a drastic performance increase (50-60% faster, on PowerPC even more). Several things to notice:

  • We use array access instead of doing a dst++ and src++ since some compilers generate better code.

  • We do not access components on a byte per byte basis since that is extremely slow.

#ifdef TARGET_32BIT
void compositeARGB(argb *src, argb *dst, int n) {
for ( int i=0; i < n; n++ ) {
int a = 256 - src[i].alpha;

unsigned int dr = *((unsigned int *)&dst[i]);
unsigned int dl = ( dr >> 8 ) & 0x00FF00FF;
dr &= 0x00FF00FF;

unsigned int sr = *((unsigned int *)&src[i]);
unsigned int sl = sr & 0xFF00FF00;
sr &= 0x00FF00FF;

dr = ( ( dr * a ) & 0xFF00FF00 );
dl = ( ( dr * a ) & 0xFF00FF00 ) >> 8;

dr += sr;
dl += sl;

*((unsigned int *)&dst[i]) = dr | dl;
#endif // TARGET_32BIT

Lets continue with a MMX version specifically made to compile using Microsoft Visual Studio. Please note that we can't use intrinsics on x86 since they generate extremly slow code (also see the 3/17/2004 entry here which explains the problems). This version here is about 4 times faster than the original generic C code:

void compositeARGB(argb *src, argb *dst, int n) {

const __m64 a256 = 0x0100010001000100;

_asm {
mov edi, dst
mov esi, src
mov ecx, n
pxor mm7, mm7
movq mm6, a256

movd mm0, [esi]
punpcklbw mm0, mm7

movd mm1, [edi]
punpcklbw mm1, mm7

movq mm2, mm0
pshufw mm2, mm2, 0
movq mm3, mm6
psubusw mm3, mm2

pmullw mm1, mm3
psrlw mm1, 8
paddw mm1, mm0
packuswb mm1, mm7
movd [edi], mm1

add esi, 4
add edi, 4
sub ecx, 1
jnz loop

But this won't compile under Linux which means we would have revert to the second version which is much slower. Under Linux we need to rewrite this using AT&T notation. So here we go:

void compositeARGB(unsigned int *src, unsigned int *dst, int n) {
unsigned int a256[2];
a256[0] = 0x01000100;
a256[1] = 0x01000100;
asm volatile ("pxor %%mm7,%%mm7\n\t"
"movq (%3),%%mm6\n\t"
"movd (%0),%%mm0\n\t"
"punpcklbw %%mm7,%%mm0\n\t"

"movd (%1),%%mm1\n\t"
"punpcklbw %%mm7,%%mm1\n\t"

"movq %%mm0,%%mm2\n\t"
"pshufw $0,%%mm2,%%mm2\n\t"
"movq %%mm6,%%mm3\n\t"
"psubusw %%mm2,%%mm3\n\t"

"pmullw %%mm3,%%mm1\n\t"
"psrlw $8,%%mm1\n\t"
"paddw %%mm0,%%mm1\n\t"
"packuswb %%mm7,%%mm1\n\t"
"movd %%mm1,(%1)\n\t"

"addl $4,%0\n\t"
"addl $4,%1\n\t"
"subl $1,%2\n\t"
"jnz loop\n\t"
: "r" (src), "r" (dst), "r" (n), "r" (a256)
#endif // TARGET_MMX_GCC

What? All that work just to compile under GCC? This means we need to port thousands of lines of code to support AT&T notation (Yes, I know binutils 2.1 now has Intel notation support, although some changes are still required. In theory we could use the Intel compiler also, but if we want to ship this code as an SDK it will be a problem).

Next lets look what is needed to do to support PowerPC and AltiVec to get decent performance on that platform. Here is the AltiVec version of the compositing routine from above:

void compositeARGB(unsigned int *src, unsigned int *dst, int n) {
while ((long(dst) & 0xF) && (n > 0))
int a = 256 - src->alpha;
dst->alpha= ( (dst->alpha* a ) >> 8 ) + src->alpha;
dst->red = ( (dst->red * a ) >> 8 ) + src->red ;
dst->green= ( (dst->green* a ) >> 8 ) + src->green;
dst->blue = ( (dst->blue * a ) >> 8 ) + src->blue ;

vector unsigned char s0;
vector unsigned char s1 = vec_ld (0, (unsigned char *) src);
vector unsigned char s2;

const vector unsigned char aPr01 = (vector unsigned char)
(31, 0, 31, 0, 31, 0, 31, 0, 31, 4, 31, 4, 31, 4, 31, 4);
const vector unsigned char aPr23 = (vector unsigned char)
(31, 8, 31, 8, 31, 8, 31, 8, 31, 12, 31, 12, 31, 12, 31, 12);
const vector unsigned short v256 = (vector unsigned short) (256);
const vector unsigned char dPr = (vector unsigned char)
(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30);
const vector unsigned char sPr = vec_lvsl (0, (unsigned char *) src);

while ( n >= 4 ) {

s2 = vec_ld (16, (unsigned char *) src);
s0 = vec_perm (s1, s2, vPerm);
s1 = s2;

vector unsigned short d0 = vec_ld (0, (unsigned char *) dst);

vector unsigned short a01 = (vector unsigned short)
vec_perm (s0, (vector unsigned char) (0), aPr01);
vector unsigned short a23 = (vector unsigned short)
vec_perm (s0, (vector unsigned char) (0), aPr02);

vector unsigned short s01 = (vector unsigned short)
vec_mergeh (s0, (vector unsigned char) (0));
vector unsigned short s23 = (vector unsigned short)
vec_mergel (s0, (vector unsigned char) (0));

vector unsigned short d01 = (vector unsigned short)
vec_mergeh ((vector unsigned char) (0), d0);
vector unsigned short d23 = (vector unsigned short)
vec_mergel ((vector unsigned char) (0), d0);

a01 = vec_sub (v256, a01);
a23 = vec_sub (v256, a23);

d01 = vec_mladd (d01, a01, s01);
d23 = vec_mladd (s23, a23, s23);

vec_st ((vector unsigned char)
vec_perm (d01, d23, dPr), 0, (unsigned char *) dst);

dst += 4;
src += 4;
n -= 4;

while (n--)
int a = 256 - src->alpha;
dst->alpha= ( (dst->alpha* a ) >> 8 ) + src->alpha;
dst->red = ( (dst->red * a ) >> 8 ) + src->red ;
dst->green= ( (dst->green* a ) >> 8 ) + src->green;
dst->blue = ( (dst->blue * a ) >> 8 ) + src->blue ;

If you can understand any of this, kudos. Writing AltiVec code is even worse than writing MMX code IMO. They had the best intentions when they designed AltiVec, but code bloat and maintenance are simply a nightmare.

But we are not done yet!!! Lets say we want to port Flash Player 8 to Microsoft Windows XP 64bit edition (which is also on the TODO list for the Flash Player). This version of Windows does not support MMX, only SSE1/SSE2/SSE3 and above. That means we need to rewrite another version specifically for this OS. Also, Microsoft Visual Studio does not support inline assembly here, which means we need to use intrinsics despite the fact it generates slow code:

#ifdef TARGET_SSE2
void compositeARGB(argb *src, argb *dst, int n) {

__m128i zero; zero = _mm_xor_si128(zero,zero);
const unsigned int A256[4] = { 0x01000100, 0x01000100, 0x01000100, 0x01000100 };
__m128i a256 = *((__m128i *)A256);

for ( ; n > 1; n -= 2) {
__m128i m0 = _mm_unpacklo_epi8(*((__m128i *)src),zero);
__m128i m1 = _mm_unpacklo_epi8(*((__m128i *)dst),zero);
__m128i m2 = _mm_shuffle_epi32(_mm_shufflelo_epi16(m0,0),0);
__m128i m3 = _mm_subs_epu16(a256,m2);

m1 = _mm_mullo_epi16(m1,m3);
m1 = _mm_srli_epi16(m1,8);
m1 = _mm_add_epi16(m1,m0);
m1 = _mm_packs_epi16(m1,zero);
*((__m128i *)dst) = m1;


for ( ; n > 0 ; n-- ) {
int a = 256 - src->alpha;
dst->alpha= ( (dst->alpha* a ) >> 8 ) + src->alpha;
dst->red = ( (dst->red * a ) >> 8 ) + src->red ;
dst->green= ( (dst->green* a ) >> 8 ) + src->green;
dst->blue = ( (dst->blue * a ) >> 8 ) + src->blue ;
#endif // TARGET_SSE2

(This here is actually not quite complete and will crash hard since the source and destination data pointer need to be aligned for this to work)

Now imagine that the Flash Player has not only this one routine which needs to be optimized, but in fact dozens to hundreds of these and you see why getting a decent port is so complex.

If you bothered to read to this point it means that Macromedia wants you really bad. Really, really bad. :-) We've been looking for Linux gurus for a while (well, it has been for more than a year now without anyone being even close to what we need, I guess they all go to Google...) and for best results they would have to be in house. Collaboration with the team here on a daily basis as a long term full time employee will be key to get the best results. So apply right now for this job. Not only will you be able to do exciting work on Linux, you'll even be paid for it!

Friday, August 26, 2005

FireFox SVG and Flash

I was just glancing over the source code of the upcoming FireFox release which will support SVG and thought I'd express an opinion here which is granted, a little bit over the top and begging for flames, but nevertheless could be interesting:

Much of the Slashdot crowd does not like Flash, they have some valid but then also highly emotional reasons. I am always amused that when anytime there is a post about Flash on Slashdot SVG is mentioned in the same breath explaining that it will kill Flash and that Flash is simply evil. (And I do not even want to start about their applauding of AJAX which has essentially the same drawbacks as Flash when it comes to accessibility. What a bunch of hippocrates.)

The bad reputation these days comes mostly because Flash animations are used for advertisement pretty much everywhere (I hope the 'Skip Intro' days are somewhat over). Well, here is some news for you: If SVG is ever to become dominant in any way, advertisers will use it the same way as Flash right now. Even worse, since SVG is supposed to be tightly integrated into the DOM of the browser it'll be in theory possible to design HTML pages that will simply not display correctly if SVG or JavaScript is turned off. Advertisers don't care about technology itself, they simply pick the thing with the highest reach and will want to make sure that you can not get around not seeing their 'messages'.

I also expect that once FireFox has SVG support, advertisers will use this instead of Flash if the Flash plugin can not be detected. Ironically Flash might actually help you here. FlashBlock tells the browser that the plugin is there, but displays it own static image instead. So installing Flash and FlashBlock will probably allow you to get around this. The last thing I personally want to see is that the Flash authoring tool allows you to export SVG animations, it would be abused for certain. ;-) There are simply still no professional SVG authoring tools out there which support animation in a decent manner. But all of that is really a personal opinion.

That brings be to another fact which is still not understood by the Slashdot crowd: Flash is not simply about vector graphics and media capabilities. What's much more important is the term Macromedia now uses which is the 'Flash Platform'. One of the most important parts of this is the dedication we have to be backwards compatible. Create once, run everywhere and update the platform to gain new capabilities without breaking old content. This is extremely important and one of the reasons Flash is even considered as an alternative for development at all despite all its shortcomings.

Looking at this post really makes this point I think. Every SVG library on this page has different quirks. While Flash is certainly not perfect in that respect, how would you like it if colors of your content suddenly looks different depending on what machine or browser you are on? Not quite a value proposition IMO. Welcome to the world of open standards without appropriate reference implementations everyone can use. This almost killed SVG.

I think my next blog entry will be about 'alternative' platforms. Although I hate to say 'alternative' since I do not see them that way. A lot of work is waiting for us on this front.

Wednesday, August 24, 2005

Little old school fire effect using Flash Player 8

This mini demo here is using the new BitmapData API. Implementing old school demo effects is suddenly so easy with Flash 8. I feel like I am back in 1992 ;-)

Things have certainly changed in the scene since then, now it's really all about 3D. One of my favorite modern demos is from farbrausch (which along with haujobb is probably the best known group) called candytron.

Anyway, enjoy:

Stroke hinting in Flash Player 8 (A.K.A. better round rects)

I noticed a comment from John Olson asking if we addressed the well known issues with 1pt strokes in the Flash Player. Well I have good news and bad news :-)

First the good news: Yes, we did try to address this in this release and I came up with a solution which should work in most cases. It's called 'Stroke hinting'. This feature does two things:

  • Hint anchor points of curves to full pixels. This means that you will not see disjoints anymore when combining curves and straight lines. This will change the way rounded rectangles look in Flash.

  • Hint the stroke width to full pixels. This means as you scale a stroke inside a symbol it will never blur, snap to full pixels and therefore always look sharp.

Here is an example (blown up to 200%) which shows the difference between using stroke hinting and not using it:

Since we need to stay compatible with older Flash versions we decided to offer this new method of drawing strokes as an option which is turned off by default:

BTW, ALL of the new stroke options are also accessible through ActionScript, so you devs out there won't have to use the UI if you don't want to. :-)

The bad news about this new feature is that this approach can lead to rendering artifacts in itself. It's also not really suitable for animation as strokes might 'jitter' due to the hinting which is done. Another problem is that a small radius in rounded rectangles might not always look perfect. The underlying problem here is that Flash uses quadratic curves. The only way to really fix this would be to add real arc support to the Flash Player which was simply too much in this release, not to speak of the technical difficulties we would have encountered trying to add this.

Anyway, I still think it was worth adding the stroke hinting feature. It should address 90% of the problem cases.

Tuesday, August 23, 2005

More on line strokes in Flash Player 8

Scott Fegette has a great overview of the enhancements we made to line stroke support in this version. As usual I have a little more insight I can give since I originally suggested and implemented these changes in Flash Player 8 (God, why haven't I done in this release? ;-)

I remember vividly the constant wining of customers about the fact that strokes are so limited in the Flash Player and that in many cases you had to revert to use fills to get the effect you wanted. This even affected the UI components, all of the outlines are drawn using fills instead of strokes. So with this baggage in my back me we started to finally bring strokes up to speed in Flash Player 8. It wasn't easy, let me tell you...

First I can talk a little about how time we spent on details. If you have read Scotts introduction you probably know what miter joints are by now. One of the more obscure but very important settings is the miter limit. The SVG specifications defines this property as such:

When two line segments meet at a sharp angle and miter joins have been specified for 'stroke-linejoin', it is possible for the miter to extend far beyond the thickness of the line stroking the path. The 'stroke-miterlimit' imposes a limit on the ratio of the miter length to the 'stroke-width'. When the limit is exceeded, the join is converted from a miter to a bevel.

So it will essentially avoid that a miter covers your whole stage in extreme cases. And this exactly how I implemented it originally. Until an old friend of mine with whom I had spent working on another vector drawing tool quite a few years back (it's still sold btw) complained about this part: When the limit is exceeded, the join is converted from a miter to a bevel. He had a point. When a miter is animated f.ex. in a skeleton object it means that the miter will suddenly 'snap' to a bevel joint which looks quite ugly. Obviously I did not know that the better solution was a pain to implement. But I did it. So now if you reach the miter limit the miter will simply be 'cut off' instead of reverting to a bevel joint. Once you start doing really fine animation you'll hopefully appreciate this little detail.

This brings me to a bug we fixed in Flash Player 8. Flash 8 will now support stroke sizes larger than 10 in the UI. No need anymore to scale up objects to get large strokes. While this is a great change it suddenly exposed an ugly rendering artifact. My first reaction to that artifact was: "Well, this is how Flash works!" During these times I really appreciate that Macromedia does not allow us to bring various instruments of pain into the office (clubs, baseball bats etc.) because from the look of our QA people I really got the message that they would have liked to use them on me in this moment. :-) So what is the problem? Here is a picture:

See the ugliness around the left side of the stroke? So this made large strokes essentially useless in Flash according to our QA people. Alright, the fix was actually fairly simple if you know it: We were using an approximation for computing square roots in the Flash Player to obtain some of the coordinates. Switching to a floating point using the standard C library square root fixed this. Puh, I am happy I was able to avoid the wrath of QA on this :-)

Now to something totally different and hopefully funny: Like most serious companies we use source control to develop on the Flash Player. One of things you always need to add is a check-in note with detailed descriptions of the changes to the source code. We are really strict about this. At the beginning of this release we were sending a copy of our check-in notes to each other through email using subject like this: "check-in for bug XXXXX" since we had no automatic system. It's really important to keep other developers in the loop. Being geeks we are the word 'check-in' in the email subject quickly morphed to 'checkin, 'chickin' and eventually simply 'chicken'. Something like "Important filter changes chicken, please read!" f.ex. Until one day I made changes to some of the stroke code and my email subject just proudly said: "Stroke chicken". Laughter ensued. Ah, the small things...

Sunday, August 21, 2005

PNG support in Flash Player 8

Yes, you have probably heard about it and maybe already tried dynamically loading PNG images in Flash Player 8 using MovieClip.loadMovie(). But did you know why this was not implemented in previous releases? Sounds like an obvious feature request doesn't it? Two words: Code size. libpng, which is the most frequently used developer library for handling png files, is about 150KB of code. No one was able to find a good reason to give up as much code size for this feature alone.

At the beginning of the development of Flash Player 8 we heard the constant mantra from users: "We want PNG support!". With our answer usually being: "It's too big, won't happen!" The situation became so bad that we had an mailing list thread going on this subject which quickly became the most passionate I had seen. There was no way around it this time around, we had to find a way to include it.

After looking at the W3C PNG spec I realized that the key was to ditch libpng and rewrite a PNG loader from scratch. It was much easier than I thought and due to the excellent PNG test suite the QA impact was minimal. Since we already had zlib in the Flash Player I was able to reduce the code size of the PNG loader to less than 4KB including all of the features the standard requires and a couple more. All of that in less than a week of work. Neat, isn't it? It's always great to see that small efforts can pay back big time for users.

And please, keep on using the Feature Request page on the Macromedia web site. It really works and we look at all the requests. Without it the PNG support would not have happened. The feedback from this page unlike some of you might think does not go into the trash or /dev/null but to an email list developers and QA people subscribe to (I am not sure how we can make the Flash Player produce blue pills though, so we run some heavy duty spam filters on it ;-)

Update: If you are familiar with the PNG file format here are the chunk types we currently fully support:

IHDR Image header
IDAT Image data
PLTE Palette information
tRNS Transparency extension
gAMA Image gamma
IEND End of stream

And here are the chunk types we do NOT support (According to the PNG 1.2 specification these are truely optional and not required for displaying PNG files):

cHRM Primary chromaticities
sRGB Standard RGB color space
iCCP Embedded ICC profile
tEXt Textual data
zTXt Compressed textual data
iTXt International textual data
bKGD Background color
pHYs Physical pixel dimensions
sBIT Significant bits
sPLT Suggested palette
hIST Palette histogram
tIME Image last-modification time

Thursday, August 18, 2005

Performance traps in Flash Player 8

No question, everyone is excited about this release of the Flash Player. The new features definitely offer something which has never been seen before on the web. But, as with any new technology it can be overused. There are reasons why the Flash Player did not offer these advanced feature in earlier releases. One of them was the lack of client machines out there being able to handle the CPU load some of these new features will require. I am scheduled to write two articles about this for DevNet to point out the limitations. You will almost certainly run into these also. Here is quick summary:

  • The On2 VP6 codec roughly uses about twice the resources than Sorenson Spark codec. This includes CPU usage and memory usage. Better quality simply means more complexity on the decoding side. At least The On2 VP6 codec is still running on slower machines compared to the QuickTime 7 H.264 codec (which really has outrageous hardware requirements).

  • Alpha channel video is very complex. One of the most common mistakes which will be made is that users will make the assumption that fully transparent areas will not impact on CPU usage. This is not the case, the codec has no clue which areas are transparent, still needs to decode them AND compose them. There is a strong need to clip out as much of the transparent areas as possible. Even our marketing team ran into this when they put together the Studio Experience. Some of the alpha channel videos were something like 640x350 pixels large, although it does not look like it. This is way too large for older Macs to handle. There are some tricks you can use to get around some of these problems, but they are difficult to author.

    Overall alpha channel video is about 4 times more complex than decoding video without alpha channel (That also means that alpha channel video is about 6-8 times slower than Sorenson Spark video overall). Why is that? The Flash Player, in addition to decoding the normal video needs to decode a separate alpha channel and then needs to do compositing against the background. Especially the latter step is extremely expensive, despite the fact that we use MMX and AltiVec for this. If there is no alpha channel we can decode directly into the bitmap buffer which is not possible when an alpha channel is present.

  • The blur filter, drop shadow, bevel and glow filters are mostly limited by memory bandwidth as are many other filters. Like I explained in the previous post there is nothing more we can do performance wise in this release. Keep this in mind and make sure you test your content on the slowest machine you want to have your content working. I usually use an old black PowerBook G3 to test since these are the slowest machines which are mostly likely to be still in use. Even better, why not author your Flash files on that machine? Cumbersome, but at least you'll see the problems instantly. Too many times I have seen emails from angry customers authoring their content on their lastest and fastest Pentium 4 and then wondering why the performance sucks on an old machine. Gimme a break... Doom 3 won't run on a Pentium I 90Mhz either.

  • There is a new feature called 'Show redraw regions' in the Flash Player which will allow you to see which areas of your movie are refreshed every frame. Use it extensively to keep the areas as small as possible.

Be smart when you employ filters, effects and video. It's good to spice up content with them, but usually a more subtle use of them is better than overloading your animations with bling bling. We really do not need another 'Skip Intro' era do we? :-)

Wednesday, August 17, 2005

Implementing the blur filter in Flash Player 8

Like I said I wanted to explain how the blur filter in Flash Player 8 came to be. It was the first filter we tried to implement after the cache as bitmap functionality started working and probably also the last one which needed tweaks just before we went into code lock down.

My first attempt was to faithfully create an SVG compliant gaussian blur filter. So after studying the spec I had a quick and dirty version running for some testing. The results were not very pleasing. First, it ran very slow and secondly it looked awful when animated since box blur sizes are rounded to integers. I am pretty sure Flash users would have been upset about that. The SVG spec suggests to implement special versions with different convolution matrices for small sizes, but code size and performance where a concern for me. So I scrapped the whole idea of being SVG compliant and started to work on a more flexible implementation.

The idea of using a box blur (also called a median filter) is a good one since it can be easily optimized in a way that it's performance is not really dependent on the blur size itself compared to a real gaussian blur. Although while running a box blur three times on an image gets you a decent approximation of a gaussian blur, I decided to not call it a gaussian blur and let users run it once or twice times only also. This allows for greater flexibility when performance is critical since running it just once is much faster and can yield interesting effects. Running it just one time on a image f.ex. creates a fake motion blur effect. I also decided to make the blur filter sub pixel precise, to provide better looks when animating and get rid of the requirement for special versions for small sizes. So you can not only select sizes like 2, 3 and 4, but also 1.5, 2.5, 2.05 etc.

The first step was to write the box blur filter in plain C code which was done fairly quickly. The performance was decent I thought and I quickly moved on to other projects. Until the first testers tried to run it on a 800x600 image. Doh, the Flash Player would freeze up for several seconds. I guess it was not fast enough. :-) As one engineer quickly figured out it was the division in the inner loop of the blur filter which was the problem. He replaced it with a multiply which I boldly reverted right afterwards since animating the blur now caused flickering. Not good. It was not until I remembered from one of the x86 AMD optimizations guides that most divisions indeed can be represented with a multiply and shift. After a long weekend using some scrap C code I came up with the solution. Even better, it was now possible to use MMX and AltiVec which quickly made its way into the Flash Player.

Way later in the release cycle I continued to work on further optimizations, the last one was a special version if the blur size is 2^n. So if you select a size of 2, 4, 8 and 16 it will run slightly faster. Why only slightly you might ask? Well, at this point further optimizations have essentially no effect since the blur filter is not constrained by the CPU, but your memory bus speed. That's also the reason that it will almost always run slower on a Mac than a PC since the memory bus architecture and speeds of Macs generally lag way behind that of PCs. So don't tell me we do not care about the Mac. I simply can't magically increase throughput of data on this architecture and I already use every trick in the book like using the vec_dst() instruction. The only way to get more performance will be to use the GPU (meaning graphics card).

Just for kicks here is the inner loop MMX code for our blur filter if the size is 2^n (the other version contains a few tricks I do not really want to reveal :-). This little piece here is really where 90% of the computing time is spent when you apply a blur:

movd mm0, [esi]
punpcklbw mm0, mm7

movd mm2, [esi + ebx]
punpcklbw mm2, mm7

paddusw mm4, mm2
psubusw mm4, mm0

movq mm6, mm4
psrlw mm6, mm3
packuswb mm6, mm7
movd [edi], mm6

paddusw mm4, mm2
psubusw mm4, mm0

add edi, eax
add esi, 4
sub ecx, 1
jnz loop

As you can see everything runs in registers. Most of the time is spent in the movd instructions, the rest does not even show up. In the next version of the Flash Player we'll probably continue to optimize this further, maybe keeping track of two pixels at a time so we can use movq. There are also some tricks on x86 you can use to improve prefetching data into the cache.

Tuesday, August 16, 2005

Something terrible is about to happen...

... and it's called Microsoft Acrylic. :-) Just kidding. I downloaded it yesterday to see what the fuzz is all about and it seems not too many things have changed since the last version. This build still has no timeline enabled, although somehow they missed to remove the 'Onion Skin' menu which obviously only makes sense if there would be some sort of animation support. So either they want to keep it secret or it's simply not ready yet.

Nevertheless I tried what's there right now and frankly I think they have a long way to go. I've worked on various authoring tools in the past (even designing one from scratch) and know how challenging it can be. I guess my biggest criticism is they they work off a very traditional application design approach using a custom rendering engine. The result is that they will run into a lot of usability issues once they export to XAML. Apple is again showing us where things are going. I mean Microsoft has Direct3D and XAML which are designed to integrate with hardware. Why not build an authoring experience around this, instead of reverting to a cumbersome design and publish approach? While Macromedia Flash is still the leader in vector graphics on the web it has become quite old school in the way it approaches application design. UIs can be fixed, a broken core can't. If I would have to start over I would certainly do it very differently. For one thing I would think hard if using C++ is really the way to go. At least make it a Unicode application from start if you want to save countless of months of QA in the long term.

Some of the more amusing parts are the way some of the features are implemented. They obviously had the challenge of adding bitmap support to a fully vector based application. Somehow things seem to have fallen short when it comes to engineering quality in certain areas though. Look at this 'Gaussian Blur':

The above two rectangles have no filter applied, the lower ones have a 'Gaussian Blur' effect applied with a radius of 0(!) the other one with a radius of 10. Graphics engineers at Apple, Adobe, Macromedia, Alias, NVidia and ATI may now laugh out loud. If you copy the SVG specs at least read it. :-)

That reminds that I wanted to blog of how I came to implement the blur filter which is now in Flash Player 8. It took me several months to get it just right. Gory details will follow this week...

Saturday, August 13, 2005

The quest for a new video codec in Flash 8

Here and there I sometimes see disappointed comments about the fact that we did not pick H.264/AVC as our next generation codec. This in their opinion would have provided the ultimate video quality based on a widely adopted industry standard. Every time I feel compelled to explain the long process we went through to find a better video codec. Quality and standards are just two criterias we used and while they are important others were much more important. Let me put together a (incomplete) list here. I can't talk about all the gory details obviously, that would get me into trouble :-) (especially how I think each codec did on these points)

  1. Quality. This is the first thing we looked at and our target was to eat least cut the bandwidth in half while keeping the same visual quality.

  2. Code size. You wouldn't believe the difference we saw here. Everything from a few kilobytes to a megabyte. Download size is one of the strengths of Flash and we had to keep the code as small as possible.

  3. Portability. We do not only need to support Intel, but also PowerPC, ARM, MIPS and many others. Recompiling for a new platform had to be painless and essentially require no code changes. Optional availability of specialized code was a plus too, although we could have done some of that work ourselves.

  4. Stability. This does not only mean crashers, but an ever changing source code base or file format are real problems when creating SDKs.

  5. Legacy hardware support. It's nice to have a new shiny video codec, but if it does not run on an older Macintosh what's the point? Flash is about ubiquity, not forcing people to upgrade hardware or even require specialized hardware. Our target was a Pentium III 500Mhz and a Mac G3 running at 800Mhz.

  6. Hardware support. Looks like it's a conflict with the previous item, but it's not. We were looking for a codec which could benefit from standard graphics hardware in the future, things like iDCT, YUV conversion/display, motion compensation etc. A lot of experimental codecs failed miserably here.

  7. Performance. When your CPU usage doubles during complex scenes I consider a codec to have a serious performance problem. In some cases the video codecs we tested were dropping frames on a 3.4Ghz Pentium 4!

  8. Completeness. This mostly affects standards based codecs. If only half of the specification is implemented why even claiming to be compliant? We went this route before with Sorenson Spark which is an incomplete implementation of H.263 and it bit us badly when trying to implement certain solutions. Many codecs failed on this one.

  9. Strong support. We were looking for a codec which had excellent support from the vendor, including the ability to come up with customs solutions very quickly, both on the client and deployment side. They also had to have the ability to support not only us, but any 3rd party interested in Flash Video. A vendor which saw us as just another potential to dump their prefabricated closed solution on us was simply not interesting. Our goal is to create a complete ecosystem around Flash Video with as many players as possible.

  10. Good encoding tools. Another lesson we learned is that good encoding tools are essential for customers. If the vendor is able to provide alternatives to ours, even better.

  11. Risks for Macromedia. We had to know exactly what we were getting into. A codec with an open ended license agreement which has to be renegociated every few years simply bear incalculable risks for a company the size of Macromedia.

  12. Risks for customers. Same as the previous, but some codecs required to make a difference between the player and the video streams served by customers.

  13. Costs for customers. If you have to pay a fee for streaming your video over the web it can be a real problem. I understand that this model works well for dedicated hardware and I support it. But how do you keep track of this on the web? This is Flash. It would be like asking for money everytime you use a certain HTML tag on your web pages.

  14. ROI for customers. This was probably the most important of them all. Flash Video had to be cheaper and easier to deploy than any other solution out there.

All in all the On2 VP6 codec stood out on most of these points. We did not drop H.264/AVC because Macromedia is an evil company who likes everything to be proprietary, on the contrary. We were looking for the best balance on all of the above points. If the choice we made was the right one remains to be seen, but overall I am extremly happy, the video quality is really outstanding. Give it a try.

Thursday, August 11, 2005

Alpha masking goodness in Flash Player 8

A while back when we started implementing the cache as bitmap functionality in the Flash Player I was tasked to make masks work with it. The initial implementation was simply doing edge based masking and behaved like you think Flash should behave. What is edge based masking? It means that the vectors of the mask and masquee are intersected before they are drawn to obtain the cutout which is a vector shape in itself. That's why alpha masking was not easily possible technically in previous Flash Player versions since there is no pixel level interaction at this level between mask and masquee. Now, and so it happened a few times during this release, I stared at the code and told myself: Alpha masks are now trivial to implement with the bitmap caching! A couple of hours later it was done.

So how do you make it work? Mask and masquee have to be both movie clips and need to have the cacheAsBitmap flag set. Now you can simply do a setMask() and that is it. The only sad part about this is that due to time constraints there was simply no time to add this to the authoring tool for easy access. But, the good news is that designers will only need one line of code to make it work. Hence the example I posted here. Softwipe goodness! You can't tell me that this is not beautiful! This movie has only one line of code, the rest are simple tweens on the timeline:


Tweening bitmap fills in Flash 8

The smallest fixes sometimes create new features. Example: The SWF file format specification is supposed to allow for bitmap fill shape tweens the same way it is possible for gradient fills. Well, due to two trivial bugs in the authoring tool and the player this was never possible until now. You had to revert to using a mask to get the effect. Now you can tween bitmap fills directly on the timeline. Here is an example (Flash Player 8 required)

Tuesday, August 09, 2005

Some text features in Flash Player 8

Tuesday, August 02, 2005

Fixing one bug at a time in Flash Player 8: Bitmaps!

So now that Studio 8 is announced I can talk a little more about the 'hidden' improvements we made in Flash Player 8. As things slowly wind down work wise (finally doing Burning Man this year!) I'll try to regularly post new things on a variety of subjects, from bitmaps, vectors graphics, sound and text. There is so much new stuff the 'official' sources do not mention.

As you might know, Flash Player 7 still had a myriad of bitmap drawing issues. Some more severe than others. So as I got started on Flash Player 8 my goal was to kill all of these problems. The difficult part here was not to figure what is wrong but to retrofit essentially broken code while preserving backwards compatibility. Anytime you'll load a SWF which is version 7 or lower we have to guarantee 100% backwards compatibility.

Bitmap wrapping bug: The bug from hell. Essentially any time you rotate, flipped or changed the anchor point of bitmaps it would wrap a column and/or row. Here is an example which shows the difference between Flash Player 7 (above) and Flash Player 8 (below), you'll of course need Flash Player 8 to see the difference:

Snapping bug: When you rotated a bitmap and either hit a angle of 0 or 180 degrees, the bitmap would 'snap'. If you pay close attention you can also see that in the above Flash Player 7 example. This is closely related to the bitmap wrapping problem.

Confusing smoothing rules: In Flash Player 7 there were two ways of enabling bitmap smoothing. Either have a movie with 1 frame or set _quality="BEST". This is not only confusing but also inflexible with today's projects which sometimes have 1 frame only. In Flash Player 8 smoothing is now based on the smoothing flag setting in the library and is generally enabled for medium, high and best quality mode regardless of how many frames there are. What does the best quality mode do now? Well, there is now a high quality median filter when bitmaps are scaled down. While this is very slow it will allow to create high quality thumbnails from large images directly on the client side. Perfect for applications like flickr.com.

Color banding with smoothing: Flash Player 7 uses an approximation to smooth bitmaps. While this was a sensible thing to do in the Pentium I days, nowadays it will slow down drawing since table lookups need slow memory access. So Flash Player 8 sports a completely new smoothing routine doing a real bilinear interpolation. MMX and AltiVec versions included of course. Try to zoom in and see the glorious quality!

Sub pixel rendering: Along with the new smoothing routines you can now move bitmaps by sub pixels if smoothing is turned on for a particular bitmap. That means the bitmap will correctly move along with the shape instead of the shape being ahead of the bitmap.

Overall there are still a couple of bugs in edges cases left I haven't been able to fix in this release, maybe you can spot them?