Multi core support
As I mentioned in Flash Player Update 3 we finally realized that multi-core CPUs are here to stay. So why not follow the times and take advantage of it? As most of you hard core Flash developers know, rendering is a huge bottleneck. I've seen a couple of blog post complaining that their second core/CPU is not doing anything when they run the Flash Player. Well people, this is about to change in this update.
Flash Player Update 3 takes advantage of multiple CPUs in many different areas. The first one is obvious: The vector rasterizer. The Flash Player will split the rasterization workload by dividing the stage into horizontal stripes. In case you have to 2 cores/CPUs, the top half is rendered by CPU 1, the bottom half by CPU 2.
How much performance improvements will you see? That depends, unlike with a 3D ray tracer the workload is not totally independent and there is some additional overhead to handle multiple threads. As a rule of thumb you could say that you get an additional 25-33% performance improvement per CPU. That does not sound like a lot, but keep in mind that the Flash Player still does some stuff which can not be multi-threaded.
The second area we focused on a are bitmap filters. If you deal with large movie clips and use lots of filters on them you will see a nice performance jump. Again the workload is split into horizontal stripes. The following filters are supported:
Third, compositing and color transformations of bitmap cached object are also accelerated. The BitmapData APIs are NOT multi-threaded yet, hopefully we get around doing this at some point.
Fourth, VP6 video decoding now happens in a separate thread, independent of rendering and post processing of the video. In most cases that means a dramatic improvement in performance, especially if you did reach a limit performance wise with your current content despite the fact that you had a dual core machine. On my Athlon X2 4800 I can now play a true 1920x1080 pixels video at 30 frames/sec without a single dropped frame. Really cool if you ask me. It was a reason for me to buy a new toy: a Canon HV20 which can do 1080p at 24 frames/sec (actual pixel resolution is 1440x1080). Perfect for the web. Now if these video files would not clog up my drive, they are really huge...
Why is the Sorenson Spark codec not multi-threaded? Because of its design: It does not have a dual YUV buffer setup which allows us to separate blitting and video decoding.
Since I am talking about VP6 already let me mention some of the more generic improvements in VP6:
Flash Player Update 3 takes advantage of multiple CPUs in many different areas. The first one is obvious: The vector rasterizer. The Flash Player will split the rasterization workload by dividing the stage into horizontal stripes. In case you have to 2 cores/CPUs, the top half is rendered by CPU 1, the bottom half by CPU 2.
How much performance improvements will you see? That depends, unlike with a 3D ray tracer the workload is not totally independent and there is some additional overhead to handle multiple threads. As a rule of thumb you could say that you get an additional 25-33% performance improvement per CPU. That does not sound like a lot, but keep in mind that the Flash Player still does some stuff which can not be multi-threaded.
The second area we focused on a are bitmap filters. If you deal with large movie clips and use lots of filters on them you will see a nice performance jump. Again the workload is split into horizontal stripes. The following filters are supported:
- DropShadowFilter
- GlowFilter
- GradientGlowFilter
- BevelFilter
- GradientBevelFilter
- BlurFilter
- ConvolutionFilter
- ColorMatrixFilter
- DisplacementMapFilter
Third, compositing and color transformations of bitmap cached object are also accelerated. The BitmapData APIs are NOT multi-threaded yet, hopefully we get around doing this at some point.
Fourth, VP6 video decoding now happens in a separate thread, independent of rendering and post processing of the video. In most cases that means a dramatic improvement in performance, especially if you did reach a limit performance wise with your current content despite the fact that you had a dual core machine. On my Athlon X2 4800 I can now play a true 1920x1080 pixels video at 30 frames/sec without a single dropped frame. Really cool if you ask me. It was a reason for me to buy a new toy: a Canon HV20 which can do 1080p at 24 frames/sec (actual pixel resolution is 1440x1080). Perfect for the web. Now if these video files would not clog up my drive, they are really huge...
Why is the Sorenson Spark codec not multi-threaded? Because of its design: It does not have a dual YUV buffer setup which allows us to separate blitting and video decoding.
Since I am talking about VP6 already let me mention some of the more generic improvements in VP6:
- I added a new gaussian noise filter which is turned on when post processing is active. It lessens the effect of blockiness and makes videos more look like movies. If you are turned off by this change, please let me know as soon as possible. In most of the cases it is seen as a improvement, but you designers should tell us if things still look okay.
- We fixed a long standing issue with VP6's automatic post processing selection on Athlon X2 class CPUs. It never worked correctly on these machines. This is due to a broken CPU driver on stock installs of Windows which affects the QueryPerformanceCounter Win32 call. Incidentally the new version of QuickTime and the Real Media Player seem to have problems related to this API, playback stutters heavily on my machine with some content because of it. I usually fix this problem by setting the process affinity to a single CPU in the Task Manager. That's what you get for not testing on AMD CPUs :-)
- Entropy decoding and some parts of the decoder should be faster, you should see a generic 10% improvement in CPU usage.

