Tuesday, June 12, 2007

Multi core support

As I mentioned in Flash Player Update 3 we finally realized that multi-core CPUs are here to stay. So why not follow the times and take advantage of it? As most of you hard core Flash developers know, rendering is a huge bottleneck. I've seen a couple of blog post complaining that their second core/CPU is not doing anything when they run the Flash Player. Well people, this is about to change in this update.

Flash Player Update 3 takes advantage of multiple CPUs in many different areas. The first one is obvious: The vector rasterizer. The Flash Player will split the rasterization workload by dividing the stage into horizontal stripes. In case you have to 2 cores/CPUs, the top half is rendered by CPU 1, the bottom half by CPU 2.

How much performance improvements will you see? That depends, unlike with a 3D ray tracer the workload is not totally independent and there is some additional overhead to handle multiple threads. As a rule of thumb you could say that you get an additional 25-33% performance improvement per CPU. That does not sound like a lot, but keep in mind that the Flash Player still does some stuff which can not be multi-threaded.

The second area we focused on a are bitmap filters. If you deal with large movie clips and use lots of filters on them you will see a nice performance jump. Again the workload is split into horizontal stripes. The following filters are supported:
  • DropShadowFilter

  • GlowFilter

  • GradientGlowFilter

  • BevelFilter

  • GradientBevelFilter

  • BlurFilter

  • ConvolutionFilter

  • ColorMatrixFilter

  • DisplacementMapFilter


Third, compositing and color transformations of bitmap cached object are also accelerated. The BitmapData APIs are NOT multi-threaded yet, hopefully we get around doing this at some point.

Fourth, VP6 video decoding now happens in a separate thread, independent of rendering and post processing of the video. In most cases that means a dramatic improvement in performance, especially if you did reach a limit performance wise with your current content despite the fact that you had a dual core machine. On my Athlon X2 4800 I can now play a true 1920x1080 pixels video at 30 frames/sec without a single dropped frame. Really cool if you ask me. It was a reason for me to buy a new toy: a Canon HV20 which can do 1080p at 24 frames/sec (actual pixel resolution is 1440x1080). Perfect for the web. Now if these video files would not clog up my drive, they are really huge...

Why is the Sorenson Spark codec not multi-threaded? Because of its design: It does not have a dual YUV buffer setup which allows us to separate blitting and video decoding.

Since I am talking about VP6 already let me mention some of the more generic improvements in VP6:

  • I added a new gaussian noise filter which is turned on when post processing is active. It lessens the effect of blockiness and makes videos more look like movies. If you are turned off by this change, please let me know as soon as possible. In most of the cases it is seen as a improvement, but you designers should tell us if things still look okay.

  • We fixed a long standing issue with VP6's automatic post processing selection on Athlon X2 class CPUs. It never worked correctly on these machines. This is due to a broken CPU driver on stock installs of Windows which affects the QueryPerformanceCounter Win32 call. Incidentally the new version of QuickTime and the Real Media Player seem to have problems related to this API, playback stutters heavily on my machine with some content because of it. I usually fix this problem by setting the process affinity to a single CPU in the Task Manager. That's what you get for not testing on AMD CPUs :-)

  • Entropy decoding and some parts of the decoder should be faster, you should see a generic 10% improvement in CPU usage.

Monday, June 11, 2007

Mip map what?

True to my reputation here at Adobe I added a miniature feature into Flash Player 9 Update 3 (9.0.60.120). It is Mip mapping. The Wikipedia entry is nice, but there is an in depth article I highly recommend to anyone interested in the subject: Part 1 and Part 2. As described in the article we use the fast and simple box filtering as apposed to the more complex methods which are not really suitable for real-time rendering. So no Kaiser in the Flash Player. Neither do we have trilinear or anisotropic filtering, but then again they make more sense with real 3D graphics. And since we are still stuck in software only land for bitmap rendering I was not interested in a performance loss either.

Why do I call it a miniature feature? Because the whole feature occupies less than 1Kb of compressed code in the binary. This is the kind of changes we engineers in the Flash Player love. Extremely small in code size, large in impact.

If you feel kind of overwhelmed with the descriptions in these articles let me summarize that the feature does in more simple words: This feature improves the quality and performance of downscaled bitmaps. Not slightly downscaled, but anything which is scaled down by more than 50%. Mip maps are nothing more than precomputed smaller and higher quality versions of an original bitmap. They are used instead of the original bitmap when something is downscaled a lot.

A few words about the Flash Player implementation and its limitation might be worth talking about here:

  • Mip maps are only created for 'static' bitmaps, e.g. anything like a JPEG, GIF or PNG you display via loadMovie(), a bitmap in the library or a BitmapData object. They do not apply to filtered objects or bitmap cached movie clips.

  • In case of video things are more tricky. If smoothing is turned on the mip mapping applies since this is overall faster than rendering the original bitmap. For non-smoothed video mip mapping is not used since it would make things much slower.

  • Mip maps level generation stops when it encounters an odd width or height. What does that mean? In the most extreme case it means that if you have a bitmap with an odd width or height pixel value to begin with, no mip map can be generated at all and therefore you will not see any benefit. This is unfortunately hard technical limitation. That also means that 'perfect' mip maps are generated from bitmaps which have a width and height which are 2^n, e.g. 256x256, 512x512, 1024x1024 etc. On average though it is enough to have bitmaps sizes which are dividable by 8, meaning you get at least 8 levels, f.ex. 640x128 -> 320x64 -> 160x32 -> 80x16 -> 40x8 -> 20x4 -> 10x2 -> 5x1.

  • The quality improvements are more visible when smoothing in turned on for a particular bitmap.

The most immediate effect of this feature can be seen with Papervision3D. The Rhino demo f.ex. shows much less aliasing in the textures when you rotate. Also, a couple of frames per second more are displayed with this demo.

Good and well, but how does 'normal' Flash content benefit? Any time you create something like an image gallery with small thumbnails which are based on larger bitmaps you will see better bitmap quality and slightly higher performance.

Will this feature ever make things look worse? Not unless you do something really dumb like downscaling heavily aliased images and still expect it to look pixelized.

I made up a quick and dirty demo which shows the reduced aliasing effect: http://www.kaourantin.net/swf/mipmap.html Make sure you have Flash Player Update 3 Beta 1 installed, otherwise the two samples will look exactly the same.

What about control on this feature? Can you select the threshold or select your own mip map bitmaps? No, right now everything is automatic. That might change in a next major revision of the Flash Player. And the threshold value, well, let me simply tell you that we use the standard OpenGL value which is <= 0.5. If you want to know how this actually works, here is I would express it in pseudo ActionScript 3, you graphic experts will get it right away:

var bitmapData:Array; // contains the mip map bitmaps
var m:Matrix; // the affine matrix to used for display

var i:int = 0;
while ( Math.sqrt(m.a*m.a+m.b*m.b) <= 0.5 &&
(bitmapData[i].width % 1) == 0 &&
(bitmapData[i].height % 1) == 0) {

// switch to next mip map level
i++;
m.a /= 2.0;
m.b /= 2.0;
m.c /= 2.0;
m.d /= 2.0;
m.tx /= 2.0;
m.ty /= 2.0;

}
mc.draw(bitmapData[i],m);
At last I should mention that mip maps apply to any SWF version 9 or newer content. SWF version 8 or lower will not use mip maps since we feared that this would impact on backwards compatibility.

Sunday, June 10, 2007

Flash Player Update 3 Beta 1

Flash Player Update 3 Beta 1 (build 9.0.60.120) is now ready for download here. This is probably the first time we have done a dot release as big as this one. We've been extremely busy over the past few months since Flash Player Update 2 (build 9.0.r45), so now it is finally time to talk about the improvements we have made. There are tons of them, so I'll use this post to summarize those I know something about and I have been working on. Stay tuned for more detailed posts of mine explaining the technical details behind these and how you will be able to leverage them:

  • Mip map support for all bitmaps for Flash 9 or newer content. This improves the quality and rendering performance of downscaled bitmaps. Perfect for thumbnails and such. Even better, Papervision 3D content now automatically looks better and should be slightly faster when large textures are used.

  • Multi-threaded vector renderer. Now we take advantage of up to 4 Cores/CPUs for vector rendering.

  • Multi-threaded bitmap filters. Same as above but this applies to bitmap functionality specifically instead of the core vector rasterizer only.

  • Multi-threaded video decoding. The VP6 video codec will now run in a separate thread if a multi-core system is detected which leaves the main thread to do rendering and post processing of the video. With this true 1080p video is now possible on most modern dual core machines. Also, the responsiveness is improved with this change. The Sorenson codec on the other hand did not get this change for technical reasons.

  • Full-screen mode with hardware scaling. Probably the biggest new feature in the Flash Player Update 3. This leverages DirectX on Windows and OpenGL on OSX. There is an new API to control the behavior which was required since we could not change current behavior and we wanted to give the maximum flexibility possible. I know you are probably eager to use this feature, we will post more information on this on labs.adobe.com soon (Update: Link to labs page is active). I'll also will give you much more technical details in an upcoming blog post.

  • Less tearing in the new full screen mode. We now have some code which will try to do VBL syncing. It's still a work in progress but we hope we can fix the remaining issues.

  • Going into full screen mode has a zoom transition effect. The beta does not work perfectly right now, but we want to get feedback if this is acceptable to end users. We will not expose an API to access/control this, either we'll leave it in and fix the remaining bugs or it is out. Also you might notice that this even affects the current full screen mode, something we will remove in the final release.

  • The Linux plugin now uses the XEmbed protocol. This is work in progress. The downside is that konqueror and Opera do not support this right now, so the Flash plugin will not work until these vendors update their plugin support. Also we are seeing decreased performance because GTK lacks somewhat in the the basic graphics API department. I'll explain in a later post.

  • Tons of performance tweaks and bug fixes. Looking the the current bug database statistic we fixed 371 bugs since 9.0.r45. Fixed really means fixed, it does not include duplicates, unreproducible bugs etc.

  • Much more.
A word of warning, this is a beta version! Do not use this version in a production environment. There are several known issues with the new features and while they might work on your machine they will not on others. We are obviously interested in the others and are looking for any issues you might encounter.