Tuesday, June 12, 2007

Multi core support

As I mentioned in Flash Player Update 3 we finally realized that multi-core CPUs are here to stay. So why not follow the times and take advantage of it? As most of you hard core Flash developers know, rendering is a huge bottleneck. I've seen a couple of blog post complaining that their second core/CPU is not doing anything when they run the Flash Player. Well people, this is about to change in this update.

Flash Player Update 3 takes advantage of multiple CPUs in many different areas. The first one is obvious: The vector rasterizer. The Flash Player will split the rasterization workload by dividing the stage into horizontal stripes. In case you have to 2 cores/CPUs, the top half is rendered by CPU 1, the bottom half by CPU 2.

How much performance improvements will you see? That depends, unlike with a 3D ray tracer the workload is not totally independent and there is some additional overhead to handle multiple threads. As a rule of thumb you could say that you get an additional 25-33% performance improvement per CPU. That does not sound like a lot, but keep in mind that the Flash Player still does some stuff which can not be multi-threaded.

The second area we focused on a are bitmap filters. If you deal with large movie clips and use lots of filters on them you will see a nice performance jump. Again the workload is split into horizontal stripes. The following filters are supported:
  • DropShadowFilter

  • GlowFilter

  • GradientGlowFilter

  • BevelFilter

  • GradientBevelFilter

  • BlurFilter

  • ConvolutionFilter

  • ColorMatrixFilter

  • DisplacementMapFilter


Third, compositing and color transformations of bitmap cached object are also accelerated. The BitmapData APIs are NOT multi-threaded yet, hopefully we get around doing this at some point.

Fourth, VP6 video decoding now happens in a separate thread, independent of rendering and post processing of the video. In most cases that means a dramatic improvement in performance, especially if you did reach a limit performance wise with your current content despite the fact that you had a dual core machine. On my Athlon X2 4800 I can now play a true 1920x1080 pixels video at 30 frames/sec without a single dropped frame. Really cool if you ask me. It was a reason for me to buy a new toy: a Canon HV20 which can do 1080p at 24 frames/sec (actual pixel resolution is 1440x1080). Perfect for the web. Now if these video files would not clog up my drive, they are really huge...

Why is the Sorenson Spark codec not multi-threaded? Because of its design: It does not have a dual YUV buffer setup which allows us to separate blitting and video decoding.

Since I am talking about VP6 already let me mention some of the more generic improvements in VP6:

  • I added a new gaussian noise filter which is turned on when post processing is active. It lessens the effect of blockiness and makes videos more look like movies. If you are turned off by this change, please let me know as soon as possible. In most of the cases it is seen as a improvement, but you designers should tell us if things still look okay.

  • We fixed a long standing issue with VP6's automatic post processing selection on Athlon X2 class CPUs. It never worked correctly on these machines. This is due to a broken CPU driver on stock installs of Windows which affects the QueryPerformanceCounter Win32 call. Incidentally the new version of QuickTime and the Real Media Player seem to have problems related to this API, playback stutters heavily on my machine with some content because of it. I usually fix this problem by setting the process affinity to a single CPU in the Task Manager. That's what you get for not testing on AMD CPUs :-)

  • Entropy decoding and some parts of the decoder should be faster, you should see a generic 10% improvement in CPU usage.

24 Comments:

Blogger Lincoln said...

This is such great news! My development team has been pushing Flash to the edge with some projects and we're dying to get multicore flash.

Tuesday, June 12, 2007 2:34:00 PM  
Anonymous John Nack said...

w00t! Great stuff, Tinic.

Tuesday, June 12, 2007 9:57:00 PM  
Blogger Nate Chatellier said...

Ah the sweetness of Flash taking advantage o multiple cores (even if it is only partially)...

So I'm assuming single core Hyper Threaded cpu's should see an improvement as well?

Wednesday, June 13, 2007 11:45:00 AM  
Anonymous henrique matias said...

:D

niceeeeeeeeeee

Wednesday, June 13, 2007 9:30:00 PM  
Anonymous henrique matias said...

:D

niceeeeeee

Wednesday, June 13, 2007 9:31:00 PM  
Anonymous UnitZeroOne said...

Hi Tinic.

On the rasterizing split; you are telling us that it splits horizontally. Doesn't it make more sense to split based upon aspect ratio of the displayed movie ? Or am I totally off here ?

Thursday, June 14, 2007 6:32:00 AM  
Anonymous Anthony said...

Is the gaussian noise filter you mention automatic on certain processors (I'm not seeing it on intel mac core 2 duo)?

Or does this need to be enabled, if so where do I do this in the video encoding or flash?

I've been manually applying filters & noise to try and do this so if its now built in it will be great news!

Thursday, June 14, 2007 3:03:00 PM  
Anonymous dries said...

[b]great job!![/b]

Friday, June 15, 2007 6:04:00 AM  
Blogger Julian said...

Well, 720p and 1080p support sounds great for me, I really have to test it on my Athlon 64 3800+ soon.

My question to Tinic is: will you enhance the audio capabilities of FLV because as far as I can see right now (at least Flix Pro doesn't offer me more), the only audio codecs FLV supports are WAV and MP3.

While WAV is simply unusable when it comes to filesizes I am asking whether there are plans to include a third audio format - or whether there is even a third one included. I am mainly asking because I'd be also interested to see actual 5.1 (or even 6.1 or 7.1) sound in the videos - and so far this would be only possible using Wave audio but then you have nearly as much audio data as video data (possible codecs might be (but not limited to) Vorbis, Dolby Digital (AC3), AAC).

Thanks!

Monday, June 18, 2007 11:09:00 PM  
Blogger jfbaro said...

Great news!

Any ideas about a ARM version for FP9? I know we can run Flash 7 movies on ARM, but the performance is horrendous!! We made some benchmarks and the performance on ARM (520mhz) is around 4 times slower than X86 (600mhz). I know ARM has no FPU unit, but I think 4 times slowers is just too much difference.

Thanks

Thursday, June 21, 2007 6:44:00 AM  
Anonymous Anonymous said...

I'm having to post this from a
win32 box as there's no flash
player for my OS of choice/work,
FreeBSD.

84% market penetration ?
port the thing, should be trivial.
you'll make 85% then!

Wednesday, June 27, 2007 11:41:00 AM  
Anonymous Anonymous said...

it almost sounds pathetic to ask for another feature now that you added so much cool stuff in short while but i wondered anyway :)

Would it be possible that you add a property to displayobjects so one can set quality (anti aliasing type) on per displayobject (instead of global swf) level?

this would be very handy performancewise as one could then let´s say turn the global quality to low and just the quality to high for the few displayobjects one needs anti aliasing on.

For example if all your art was made up by raster graphics you don´t need to set quality to high to make them look good but once you´d like to rotate one graphic or scale it,it would look jagged without anti aliasing,so then one could apply anti aliasing on that one displayobject for the timespan needed.

Sunday, July 01, 2007 8:12:00 PM  
Anonymous Zbigniew L. said...

Nice to know multicore CPUs will be supported. However I'm not sure if it will be enough to resolve 99,99% of CPU usage of modern processors like Athlon64 3000+ with 1GB of twinbank memory.

First I would like to thanks for all improvements Flash plugin for Linux has received recently. Flash plugin 9 was great improvement over flash7. However when I see flash plugin (up to and including 9,0,60,120) using 99,99% of CPU (Athlon64 3000+ 2x512MB twinbank DDR400, SB Audigy2, Geforce 6150, distro compiled from sources with X and Firefox2 only) I'm going to scream.

Flash uses the following AV techniques to make swf played:
2D graphics made from drawn geometric figures and text
Video - movie clips e.g. popular in youtube
Audio - one, two or several sounds mixed together

Here I would like to present few popular, free, opensource technologies used in Linux to make Audio-Video applications usable:
(of course I will not mention such powerful techniques as shaders which are probably hard to access by web plugin)

2D graphics (Rasterization):
The best option for Flash plugin is to use cairo library http://www.cairographics.org/ which is crossplatform high quality drawing/printing library on Windows,Mac and Linux. Cairo library implements the PDF 1.4 imaging model for high quality drawing and printing.
Future Firefox(3?) is going to use it on all mentioned platforms for page render on screen (display) and paper (printing). This would allow easier integration of flash content with page content. Cairo performs operations which include stroking and filling Bézier cubic splines, transforming and compositing translucent images, and antialiased text rendering. The Cairo design is portable and allows pluggable drawing backends such as image, glitz (accelerated OpenGL output), png, ps, PDF, svg, quartz (accelerated on MAC OS platform), GDI+ (Windows2K/XP acceleration) and xlib (Linux X11 XRender acceleration).
If flash plugin can not use cairo in Linux at least provide optional, disabled by default XRender acceleration so flash will not eat 99,99% of CPU on machines with cairo library present.

Video playback:
http://labs.adobe.com/technologies/flashplayer9/releasenotes.html#known
"Hardware scaling is not available on Linux."
This is not true. There is well known, universal, very popular, widely used, supported by most closed/open source GPU drivers X-Video X11 extension.
X-Video provides high quality, ultrafast, CPU free, GPU full hardware accelerated: scaling and colorspace conversion.
Thanks to this X-Video (Xv) extension video playback in Linux/BSD is smooth, beautiful, fluent, fast and takes almost no CPU resources.
There is no copyright/piracy issues as X-Video writes video stream to video overlay which bypasses all capturing software.
Capture software just captures blue box in place where user sees video playing. Thanks to bypassing several software layers and talking directly to GPU driver X-Video is the fastest method of displaying video on X11 based screens (Linux/BSD/*nix)
Flash plugin could provide disabled by default optional X-Video acceleration so flash will not eat 99,99% of CPU where X-Video is available and could be turned on in flash. About 80% Linux/BSD/*nix machines should be Xv capable.

Audio Playback/Mixing:
ALSA offers hardware mixing on capable audio chips and dmix software mixing on ordinary onboard codec. Flash plugin could offload CPU by feeding ALSA with all sound streams. This way ALSA will use hardware mixing (up to 32 sounds together on SB Audigy2 can be mixed in hardware). Otherwise dmix software mixing will be used. This is still faster than internal flash mixing because ALSA is usually compiled with special optimizations per CPU. Could you please provide option to perform mixing using ALSA hardware accelerated mixing? This would improve sound quality and free my CPU from doing such task.

It would be nice to have all these accelerations optional and disabled by default so nothing will change for ordinary user but power users could use higher quality audio and video together with less CPU use. These accelerations are much better in terms of performance/quality optimizations than multicore design because all of them are performed by dedicated GPU/DSP.

Saturday, July 14, 2007 6:38:00 PM  
Anonymous Anonymous said...

can anyone help me reverse the jpg encoder class?
jpeg binary to bitmap?

Tuesday, July 17, 2007 1:57:00 AM  
Blogger Nick said...

Please, please, please give us hardware acceleration - not just to open the way for real 3D but for help with our 2d bitmap movement projects - tearing is just not on anymore, it has to go and full screen everything has to be an option. For our sakes, for your sakes (who wants Silverlight to win?) for the sake of humanity!!!!

Monday, July 30, 2007 4:58:00 AM  
Anonymous Manuel said...

Hi Tinic,
i'm simulating a carpet and the rendering process imply a post-processing stage whereas i apply a bloom-like filter: that works fine in Flash Player v9.0.45/47 but shows some singularities in v9.0.60.120 like a vertical bar on the left of the final composite image.
Has this already been reported?
The demo is at http://manuel.bit-fire.com/2007/08/10/the-magic-carpet/

Friday, August 10, 2007 5:55:00 PM  
Anonymous Manuel said...

Hi Tinic,
have you got any chance to look at the bar thing? I haven't had the time to track it down yet, but i released the source code since it could be easily my own fault, i'm just a two-week AS3 novice ;-)

Sunday, August 12, 2007 8:05:00 AM  
Anonymous Anonymous said...

You realize taking up one side of a modern dual core CPU to software render punch the monkey ads is really laughable? I get 100's of FPS on single core with a GPU in 3D mode all day. To take up both cores, well, thats a DOS attack.

I think flash is simply ignore the standard APIs that modern hardware accelerates quite well and software rendering everything and that is so old hat.

Tuesday, August 14, 2007 11:04:00 PM  
Anonymous <a href="http://search2.site.io/index.html">Viagra</a> said...

Glad to read articles like this. Thanks to author!

Tuesday, August 28, 2007 7:16:00 AM  
Anonymous <a href="http://courses.cvcc.vccs.edu/ENG112_GROSS/_Chat_Room/000008fd.htm">Anonimous</a> said...

Excellent website. Good work. Very useful. I will bookmark!

Sunday, September 09, 2007 12:27:00 PM  
Anonymous Anonymous said...

"The BitmapData APIs are NOT multi-threaded yet, hopefully we get around doing this at some point."

Arrgh! Performance is one of the main reasons to use the BitmapData APIs! _Please_ someone fix this to support multicore! Or just add hardware accelleration ;)

Also, I'd be interested to know how the vector rendering horizontal spit interacts with a heavy workload of BitmapData. It would be nice to see that one or more cpu core(s) render vectors and the leftover cranks out the bitmapdata, but I somehow doubt that is how it works.

thanks!

Thursday, December 13, 2007 12:53:00 AM  
Anonymous Anonymous said...

"The BitmapData APIs are NOT multi-threaded yet, hopefully we get around doing this at some point."

Arrgh! Performance is one of the main reasons to use the BitmapData APIs! _Please_ someone fix this to support multicore! Or just add hardware accelleration ;)

Also, I'd be interested to know how the vector rendering horizontal spit interacts with a heavy workload of BitmapData. It would be nice to see that one or more cpu core(s) render vectors and the leftover cranks out the bitmapdata, but I somehow doubt that is how it works.

thanks!

Thursday, December 13, 2007 1:07:00 AM  
Blogger Ben said...

I've searched the internet for days trying to find practical methods for MT in flash, and there really isn't anything.

I had almost resorted to writing some kind of stub swf for loading "special" swfs across various injected embeds, and managing the whole thing with localconnections... I'm either giving away a the key to the next ajaxesque webvolution with this or insighting a nightmare; either way, it should never have been necessary.

So perhaps Macrodobe might take a little notice before I'm forced to have my team reverse engineer Flash 9 and build a real RTE.


Flash *NEEDS* ( and I say needs, because not having these things 5 years ago was unacceptable ):

1) To use hardware, gfx pipelines exist for more than just gamers, and contrary to popular belief - 3d hardware penetration exceeds any flash player.

2) Threads. It's just that simple, threading doesn't mean multi-core... threading means what it meant when I wrote MT apps for socket 7 processors and they completely out performed ST equivalents.

...and last but not least...

3) FullScreen, without chastity. The whole keyboard blocking thing is ridiculous. Why not roll that awful security dialog into a desktop equivalent, remove the ability to FS from a "click" and have the user confirm in a "The flash file at [domain] would like to enter fullscreen mode... [x] do not ask for this domain..."


I mean really, do Adobe devs actually use computers, or is it all guess work?

Sunday, December 16, 2007 1:55:00 PM  
Blogger Slawomir said...

We've created a flash project that plays up to 4 different video clips side by side. Movies are encoded with an On2 VP6 (2k bitrate, 680x768px). On a quad core machine (Q9300) it initially uses all four cores evenly and playback is quite smooth. After a few loops of the flash movie one of the cores seems to take over all the processing and playback becomes jerky (remaining three cores do next to nothing at that point).
Has anyone came across this problem ?

Monday, June 09, 2008 4:30:00 PM  

Post a Comment

<< Home