Monday, August 29, 2005

Porting the Flash Player to 'alternative' platforms

Update (10/31/06): I still see tons of people reading this old entry. Please read this for up to date information. A beta for Linux is now available.

Now that Flash Player 8 is soon to be released for Windows and OS X, I am sure we'll get plenty of questions about support for Unix, probably mostly for Linux. No, we haven't forgotten that we need to support it. It WILL happen.

But to tell you the truth I am not even happy with the state of Flash Player 7 on Linux. So much that I use the Windows versions along with Wine to get Flash content to display on my Linux boxen. So rather than just do a quick and dirty port for Flash Player 8 I would really like to get it done right. The kicker is this: It's damn hard. Much harder than even supporting OS X. While porting command line applications to Unix is beyond trivial, porting media applications (which the Flash Player essentially is) is a real nightmare. This starts with sound support where we have to support many different sound standards (ALSA, OSS, aRTs, ESD etc.), framework support (X11, QT, GTK for copy&paste support f.ex.), IMEs (is there such a thing?), font support which is almost beyond comprehensible and many other quirks and forked 'standards'. And how about support for PowerPC and x86-64? All of this needs to be done to get something acceptable on Linux IMO.

Another problem is that Linux's main compiler is gcc, which means that ALL of our MMX code which is written in Intel notation will not compile. MMX is what makes the Flash Player reasonably fast. Having to use plain C code automatically means that your performance will be cut by at least 50% rendering wise. I am really not kidding here. So why is it so difficult to port performance optimizations? Let me show an example which shows the effort which is required to support a plethora of platforms.

WARNING! I will go into some low level assembly code here, so you might want to skip this ;-)

Disclaimer: The code here in this post was totally made up in a few minutes and is NOT in the Flash Player. I am not even sure any of this code does the right thing.

So, assume a simple generic compositing function which is used if the target platform is unknown:

struct argb {
unsigned char alpha;
unsigned char red;
unsigned char green;
unsigned char blue;
} argb;

#ifdef GENERIC
void compositeARGB(argb *src, argb *dst, int n) {
for ( int i=0; i < n; n++ ) {
int a = 256 - src->alpha;
dst->alpha= ( (dst->alpha* a ) >> 8 ) + src->alpha;
dst->red = ( (dst->red * a ) >> 8 ) + src->red ;
dst->green= ( (dst->green* a ) >> 8 ) + src->green;
dst->blue = ( (dst->blue * a ) >> 8 ) + src->blue ;
#endif // GENERIC

Now for 32bit machines we would modify this function the following way to get a drastic performance increase (50-60% faster, on PowerPC even more). Several things to notice:

  • We use array access instead of doing a dst++ and src++ since some compilers generate better code.

  • We do not access components on a byte per byte basis since that is extremely slow.

#ifdef TARGET_32BIT
void compositeARGB(argb *src, argb *dst, int n) {
for ( int i=0; i < n; n++ ) {
int a = 256 - src[i].alpha;

unsigned int dr = *((unsigned int *)&dst[i]);
unsigned int dl = ( dr >> 8 ) & 0x00FF00FF;
dr &= 0x00FF00FF;

unsigned int sr = *((unsigned int *)&src[i]);
unsigned int sl = sr & 0xFF00FF00;
sr &= 0x00FF00FF;

dr = ( ( dr * a ) & 0xFF00FF00 );
dl = ( ( dr * a ) & 0xFF00FF00 ) >> 8;

dr += sr;
dl += sl;

*((unsigned int *)&dst[i]) = dr | dl;
#endif // TARGET_32BIT

Lets continue with a MMX version specifically made to compile using Microsoft Visual Studio. Please note that we can't use intrinsics on x86 since they generate extremly slow code (also see the 3/17/2004 entry here which explains the problems). This version here is about 4 times faster than the original generic C code:

void compositeARGB(argb *src, argb *dst, int n) {

const __m64 a256 = 0x0100010001000100;

_asm {
mov edi, dst
mov esi, src
mov ecx, n
pxor mm7, mm7
movq mm6, a256

movd mm0, [esi]
punpcklbw mm0, mm7

movd mm1, [edi]
punpcklbw mm1, mm7

movq mm2, mm0
pshufw mm2, mm2, 0
movq mm3, mm6
psubusw mm3, mm2

pmullw mm1, mm3
psrlw mm1, 8
paddw mm1, mm0
packuswb mm1, mm7
movd [edi], mm1

add esi, 4
add edi, 4
sub ecx, 1
jnz loop

But this won't compile under Linux which means we would have revert to the second version which is much slower. Under Linux we need to rewrite this using AT&T notation. So here we go:

void compositeARGB(unsigned int *src, unsigned int *dst, int n) {
unsigned int a256[2];
a256[0] = 0x01000100;
a256[1] = 0x01000100;
asm volatile ("pxor %%mm7,%%mm7\n\t"
"movq (%3),%%mm6\n\t"
"movd (%0),%%mm0\n\t"
"punpcklbw %%mm7,%%mm0\n\t"

"movd (%1),%%mm1\n\t"
"punpcklbw %%mm7,%%mm1\n\t"

"movq %%mm0,%%mm2\n\t"
"pshufw $0,%%mm2,%%mm2\n\t"
"movq %%mm6,%%mm3\n\t"
"psubusw %%mm2,%%mm3\n\t"

"pmullw %%mm3,%%mm1\n\t"
"psrlw $8,%%mm1\n\t"
"paddw %%mm0,%%mm1\n\t"
"packuswb %%mm7,%%mm1\n\t"
"movd %%mm1,(%1)\n\t"

"addl $4,%0\n\t"
"addl $4,%1\n\t"
"subl $1,%2\n\t"
"jnz loop\n\t"
: "r" (src), "r" (dst), "r" (n), "r" (a256)
#endif // TARGET_MMX_GCC

What? All that work just to compile under GCC? This means we need to port thousands of lines of code to support AT&T notation (Yes, I know binutils 2.1 now has Intel notation support, although some changes are still required. In theory we could use the Intel compiler also, but if we want to ship this code as an SDK it will be a problem).

Next lets look what is needed to do to support PowerPC and AltiVec to get decent performance on that platform. Here is the AltiVec version of the compositing routine from above:

void compositeARGB(unsigned int *src, unsigned int *dst, int n) {
while ((long(dst) & 0xF) && (n > 0))
int a = 256 - src->alpha;
dst->alpha= ( (dst->alpha* a ) >> 8 ) + src->alpha;
dst->red = ( (dst->red * a ) >> 8 ) + src->red ;
dst->green= ( (dst->green* a ) >> 8 ) + src->green;
dst->blue = ( (dst->blue * a ) >> 8 ) + src->blue ;

vector unsigned char s0;
vector unsigned char s1 = vec_ld (0, (unsigned char *) src);
vector unsigned char s2;

const vector unsigned char aPr01 = (vector unsigned char)
(31, 0, 31, 0, 31, 0, 31, 0, 31, 4, 31, 4, 31, 4, 31, 4);
const vector unsigned char aPr23 = (vector unsigned char)
(31, 8, 31, 8, 31, 8, 31, 8, 31, 12, 31, 12, 31, 12, 31, 12);
const vector unsigned short v256 = (vector unsigned short) (256);
const vector unsigned char dPr = (vector unsigned char)
(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30);
const vector unsigned char sPr = vec_lvsl (0, (unsigned char *) src);

while ( n >= 4 ) {

s2 = vec_ld (16, (unsigned char *) src);
s0 = vec_perm (s1, s2, vPerm);
s1 = s2;

vector unsigned short d0 = vec_ld (0, (unsigned char *) dst);

vector unsigned short a01 = (vector unsigned short)
vec_perm (s0, (vector unsigned char) (0), aPr01);
vector unsigned short a23 = (vector unsigned short)
vec_perm (s0, (vector unsigned char) (0), aPr02);

vector unsigned short s01 = (vector unsigned short)
vec_mergeh (s0, (vector unsigned char) (0));
vector unsigned short s23 = (vector unsigned short)
vec_mergel (s0, (vector unsigned char) (0));

vector unsigned short d01 = (vector unsigned short)
vec_mergeh ((vector unsigned char) (0), d0);
vector unsigned short d23 = (vector unsigned short)
vec_mergel ((vector unsigned char) (0), d0);

a01 = vec_sub (v256, a01);
a23 = vec_sub (v256, a23);

d01 = vec_mladd (d01, a01, s01);
d23 = vec_mladd (s23, a23, s23);

vec_st ((vector unsigned char)
vec_perm (d01, d23, dPr), 0, (unsigned char *) dst);

dst += 4;
src += 4;
n -= 4;

while (n--)
int a = 256 - src->alpha;
dst->alpha= ( (dst->alpha* a ) >> 8 ) + src->alpha;
dst->red = ( (dst->red * a ) >> 8 ) + src->red ;
dst->green= ( (dst->green* a ) >> 8 ) + src->green;
dst->blue = ( (dst->blue * a ) >> 8 ) + src->blue ;

If you can understand any of this, kudos. Writing AltiVec code is even worse than writing MMX code IMO. They had the best intentions when they designed AltiVec, but code bloat and maintenance are simply a nightmare.

But we are not done yet!!! Lets say we want to port Flash Player 8 to Microsoft Windows XP 64bit edition (which is also on the TODO list for the Flash Player). This version of Windows does not support MMX, only SSE1/SSE2/SSE3 and above. That means we need to rewrite another version specifically for this OS. Also, Microsoft Visual Studio does not support inline assembly here, which means we need to use intrinsics despite the fact it generates slow code:

#ifdef TARGET_SSE2
void compositeARGB(argb *src, argb *dst, int n) {

__m128i zero; zero = _mm_xor_si128(zero,zero);
const unsigned int A256[4] = { 0x01000100, 0x01000100, 0x01000100, 0x01000100 };
__m128i a256 = *((__m128i *)A256);

for ( ; n > 1; n -= 2) {
__m128i m0 = _mm_unpacklo_epi8(*((__m128i *)src),zero);
__m128i m1 = _mm_unpacklo_epi8(*((__m128i *)dst),zero);
__m128i m2 = _mm_shuffle_epi32(_mm_shufflelo_epi16(m0,0),0);
__m128i m3 = _mm_subs_epu16(a256,m2);

m1 = _mm_mullo_epi16(m1,m3);
m1 = _mm_srli_epi16(m1,8);
m1 = _mm_add_epi16(m1,m0);
m1 = _mm_packs_epi16(m1,zero);
*((__m128i *)dst) = m1;


for ( ; n > 0 ; n-- ) {
int a = 256 - src->alpha;
dst->alpha= ( (dst->alpha* a ) >> 8 ) + src->alpha;
dst->red = ( (dst->red * a ) >> 8 ) + src->red ;
dst->green= ( (dst->green* a ) >> 8 ) + src->green;
dst->blue = ( (dst->blue * a ) >> 8 ) + src->blue ;
#endif // TARGET_SSE2

(This here is actually not quite complete and will crash hard since the source and destination data pointer need to be aligned for this to work)

Now imagine that the Flash Player has not only this one routine which needs to be optimized, but in fact dozens to hundreds of these and you see why getting a decent port is so complex.

If you bothered to read to this point it means that Macromedia wants you really bad. Really, really bad. :-) We've been looking for Linux gurus for a while (well, it has been for more than a year now without anyone being even close to what we need, I guess they all go to Google...) and for best results they would have to be in house. Collaboration with the team here on a daily basis as a long term full time employee will be key to get the best results. So apply right now for this job. Not only will you be able to do exciting work on Linux, you'll even be paid for it!


Anonymous John Dowdell said... ...? ;-)


Monday, August 29, 2005 8:10:00 PM  
Anonymous Nickolas Nikolic said...

I, in fact, read thus far as an educational exercise; I haven't stared into the heart of darkness (C+Assembly) for quite some time...

Wish it would earn brownie points for a janitor position in a company as beautiful as Macromedia. Does it? ;-)

Maybe - in between cleaning the bathrooms - I could help tinker with Dreamweaver's PHP templates so that it wasn't a second citizen to Coldfusion and it's "more direct" competitors like

I'm sorry; have to hope, it's only human...

I wish you luck in recruitment of a Linux pro.

But this method might not fit the FOSS culture. These are developers that are not as likely on the Macromedia site, nor reading this blog.

Probably better would be to proposition the kernal developers directly. This would have the added bonus of being able to see just what their quality of work is, "try before you buy."

You may even be able to propose a solid salary offer in hand with this knowledge of their previous work since the motif of this kind of investigation is that all cards are already on the table...

Indirect recruitment a`la`Google only works for PHDs. Actually, I'm not totally sure that it works at all. It is probably better at recruiting a personality rather than skills. Pomp has it's discriminatory value, but it likely isn't the right value.

Probably *best* would be to proposition a direct coordination with any large vendor that has a vested interest in making Linux the premier platform that it actually is... {IBM, Novell, Red Hat, KDE, Gnome} in much the same manner that Macromedia has tried to bring mobile manufacturers on board.

PS - by the way, charging to license of the Flash Lite platform is ludicrous. Granted, it is part of the closed culture of mobile development, but that is reinforcement of the nature of the problem... It may be viewed introduction of a disruptive technology without providing fueling momentum...

Just my 2,860,694 cents... I'll stop forcing you to read my comment now; although, I appreaciate the attention thus far... ;-)

Monday, August 29, 2005 8:27:00 PM  
Anonymous Nickolas Nikolic said...

On the drive home from where I had originally posted.

To begin, forgive me if what I am about to say is blasphemous:

It occurred to me that likely the best way to motivate activity on the Flash Player for Linux from all that the player benefits, be they individual developers, FOSS organizations (I previously forgot Debian...), or medium to large companies (I previously forgot Linspire...)

Please don't hunt me down for wondering this out-loud: Why not open source the Flash Player for Linux?

It would have clear benefit to all involved: Macromedia would keep the an honestly litigable copyright, has added mindshare in the FOSS arena - not to mention added manpower with motivation for top-notch craftsmanship.

The FOSS community receives insight to the core technology on which many projects depend. Offhand, I can think of nearly immediate benefit for the following projects: Flash4Linux {I doubt that there will ever be a Linux port of Studio, this is an important project to encourage}, Open Flash Debugger, or Xray {thank God for these last two projects}.

On the other hand, there is a control issue present that is similar, though lesser, to Sun's objections to open sourcing Java. On the other hand, the platform would not be as strong in my eyes without the availability of JBoss, Tomcat, or other environments (I'll admit it, I'm web-centric.) which can be themselves considered forks of J2SEE - actually in this respect maybe even Macromedia's JRun can be considered a distant cousin of these FOSS projects.

And even still: whilst these freely available environments exist, the Earth, planets, and the Sun Microsystems still fail to end...

If this is a foolish thought, then please let it be: it is getting a bit late, and maybe I have missed an important point. But, it isn't an option that would naturally come to mind - even for a company like Macromedia, so I thought to mention it.

Forgive the double-post, but I am unable to edit the original post - spelling issues aside...

Monday, August 29, 2005 11:05:00 PM  
Anonymous mark said...

I understand the complexity to port on linux, but what about the Lite version on all those (different) mobile platforms? Is this (the complexity) also the reason why the video decoder is not ported within Flash Lite? I hope some day the decoder will go to the mobile inside Flash Lite, so there is just one content player needed on the mobile that does it all: the UI, the games, the video ....

Friday, September 02, 2005 12:21:00 PM  
Blogger James "Doc" Livingston said...

Have you looked at liboil? I don't know if it contains everything you need, but it might be a start.

Tuesday, September 06, 2005 10:55:00 PM  
Anonymous Anonymous said...

i seriously doubt the person described by the job profile actually exists. a person with strong assembler and optimization skills will usually not care a bit about the hassle of integrating into the various sound architectures, eg, and, of course, the other way round.

i'd suggest opening 5 positions:
* linux sound integration engineer
* linux video integration engineer
* assembler/optimization guru
* platform integration engineer
* one who organizes those

seeing such a job opening put a brake on my hopes for having fp8 on linux soon. bummer.

Wednesday, September 07, 2005 7:33:00 AM  
Anonymous Anonymous said...

Some remarks:

- as a previous commenter mentioned, there's liboil for "optimized inner loops" for different platforms.
- for the "incomprehensible font situation" look into fontconfig (MIT/X licensed) and pango (LGPL licensed). The last one can deal with font internationalization, layout, and rendering.
- ... rendering, with cairo for instance. It can work with Pango for text rendering, and you can use it for all your 2D graphics output. It will also use hardware acceleration if available (through X or opengl). It tries to provide device independence, so it's also meant to facilitate printing etc.
- IME: check out UIM and SCIM. There really is such a thing :).
- copy/paste: you don't absolutely need an entire toolkit like QT or GTK, you just need to conform to the freedesktop clipboard spec.
- sound: a bit harder. I'd say one option is to use an abstraction like OpenAL (a lot of game companies do this), or Gstreamer - these can target OSS, ALSA, whatever.

Now, practically all these libraries are cross platform, and licensed in a way that they can be used by closed-source apps. (Some of these are actually included with the system, like OpenAL on Apple)

Targeting these might save you a lot of porting grief.

Tuesday, September 20, 2005 7:34:00 PM  
Anonymous Anonymous said...

Just a thought on the compiler complaint: if gcc doesn't support the syntax you want to use, why not just use the intel c compiler?

There's really no law that says you have to use gcc to build a linux executable.

Friday, September 23, 2005 3:14:00 PM  
Blogger eriser said...

> Under Linux we need to rewrite this using AT&T notation

you can:
asm(".intel_syntax noprefix
mov eax, foo

Saturday, September 24, 2005 2:18:00 AM  
Anonymous JonTHn said...

Hmm and also it would be good to have more than native linux version what about a BSD .. ?

Saturday, September 24, 2005 7:22:00 AM  
Anonymous Anonymous said...

To complete the lists of portable (mutiplateformes, mutli-OSes) libs, that could help so much to have a portable and unified codebase:

- for the sound, you may have a look at SDL ( : it works well with arts, esd, alsa, oss, ... and even work on win32 and Mac OS X. It can also be used to performs "portably" some network, display etc. tasks.

- for the sound, still: libao ( gives astraction over alsa, arts, esd, oss, etc.

- if widgets are needed, fltk ( may of use: it's tiny enough to be staticly linked on. As another commenter said, clipboard etc. compat with gtk and qt only need you to conform to specs, not to actualy use those toolkits !

And finaly, a question: why isn't gcc (and AT&T asm syntax) considered first (rather than using msvc and porting to so many others compilers) ? gcc can output working and optimised code (hey, intrisics works well, there !) on all the platforms and archs you need (win32, linux, macosx, *bsd, and targetting ppc, x86, x86-64, arm ..., well i'm not sure about windows64 though).

Weren't you using gcc, already, for the Mac OS X version of flash player ?

Saturday, September 24, 2005 12:11:00 PM  
Blogger Tinic Uro said...

- SDL is not an option since it's released under LGPL. We can not link against anything which is released under GPL or LGPL. There nothing I can do about it as much as I'd like to, this is a very strict policy we have to follow.
- libao is GPL
- fltk is LGPL
- IMO gcc is the most difficult to support because of all its quirks. I could try to go into details, but when we did the port to Linux we had dozens of compiler bugs to work around. And no, the gcc optimizer is still no on par with Visual Studio or the Intel compiler. When I examined compiled intrisinc code I cried, it was so bad (that was with gcc 3.40 on OS X Tiger).
- We use(d) CodeWarrior on MacOS X since it produced much better and smaller PowerPC code. It is still unmatched when it comes to this. We have to switch soon since MetroWerks dropped support.

Saturday, September 24, 2005 4:21:00 PM  
Blogger Tinic Uro said...

> you can:
> asm(".intel_syntax noprefix
> mov eax, foo
> ....
> );

Yes, if you continue to read my entry you'll that I mention binutils 2.1 which introduced support for this. But unfortunately it still requires a lot of changes. We might as well go intrinsics all the way.

Saturday, September 24, 2005 4:23:00 PM  
Blogger Tinic Uro said...

- liboil is nice, but not very well optimized. The YUV2RGB conversion routines f.ex. convert one pixel at a time. The functions I wrote convert 8 pixels at a time in the inner loop and are designed for scanline display which the Flash Player needs.
- We'll defintily look into fontconfig. pango is LGPL which is not an option for us.
- Designers will be up in arms if we change the rendering engine. Just one pixel difference is a big deal.

We'll check out anything open source we can use. As you can see we really need someone who knows Linux.

Saturday, September 24, 2005 4:33:00 PM  
Blogger Tinic Uro said...

> i'd suggest opening 5 positions:
> * linux sound integration engineer
> * linux video integration engineer
> * assembler/optimization guru
> * platform integration engineer
> * one who organizes those

We are no IBM or Microsoft who can throw resources at that like them. 5 engineers are half the Flash Player engineering team right now. Really. I would love to have these type of resources, it would certainly make things easier :-)

Saturday, September 24, 2005 4:38:00 PM  
Anonymous Anonymous said...

For sound access, yet other libs might be considered:

- Allegro (, MIT like licence. Support OSS, Alsa, aRts, ESD, JACK, windows (direct sound, waveout), mac os x (core audio, sound manager) ... and also provide some graphics routines.

- Portaudio (, MIT licensed. Supports OSS, Alsa, jack, win directsound, mac core audio, etc., but doesn't provide access to aRts or ESD yet.

- Fmod (, licence can be purchased, support alsa, oss, win, mac, esd, ...

For a widget toolkit, wxWidgets ( may be of use, or you can buy a commercial qt licence (and maybe make a static binary so it'll run everywhere on the same platform).

Sunday, September 25, 2005 7:05:00 AM  
Anonymous Anonymous said...

hum.. why not just using good old .asm files for optimized functions ? My company develop some audio/video softwares with highly optimized parts (usually four versions: C, 386,SSE,SSE2) for intel cpu under linux and windows. We just compile our asm files under windows and convert the binaries obj files into linux format.

Sunday, September 25, 2005 11:12:00 AM  
Anonymous Anonymous said...

You *can* link against libSDL from a closed-source software. to comply to the LGPL, you can link against it dynamically.
See here:

Monday, September 26, 2005 2:29:00 AM  
Anonymous Anonymous said...

Are you sure your people aren't confused about the LGPL? (you don't link statically with OpenGL, do you? I there's there's LGPL and closed implementations of OpenAL for instance)

... but I'll take your word for it.

In any case, for Pango for instance, you could try and see if Owen Taylor and the other contributors would be willing to relicense it, under MIT/X, BSD, or something similar. Worth a little try at least.

I understand your point about people freaking when the renderers are off by a pixel. (so, no Cairo I imagine?) I guess that does make it harder to take advantage of hardware acceleration in some cases; OpenGL is probably low-level enough though?

I'm really unsure why you'd need a toolkit (GTK/QT/FLTK/wxWidgets), all you need is input, and copy/paste, right?

In any case, thanks for the insight. I'm really hoping to see a proper flash plugin on Linux/PPC one day.

Oh, and kick the Flash IDE guys for me once. UI focus issues, MAC vs PC issues (like fonts) etc have been plaguing me and coworkers since Flash 3. The Flash player/plugin however, has always worked as advertised! ;-)

Monday, September 26, 2005 11:58:00 PM  
Anonymous Anonymous said...

Macromedia release some time ago Flash Player 7 for Linux.

Why not from this source , Macromedia build a 64bit binaries and release a 64 bit Flash Player 7?

I have an AMD64 and i miss a 64 bit Flash Plugin.

Wednesday, September 28, 2005 12:14:00 AM  
Blogger devilsclaw said...

using intel based asm is possible with gasm since 3.x.x i belive i know it is one 3.4 for sure

This code needs to be added to the top of the c or cpp file before the code section atleast to make it global or your can add it to each function

asm(".intel_syntax noprefix");

then you have to pass to the compile something like this
g++ -Wall -masm=intel -fasm -c addon.cpp
g++ -Wall -shared -o addon.o -ldl &>/dev/null

this will for the compiler to support intel bassed inline asm code

Wednesday, September 28, 2005 4:52:00 AM  
Anonymous Dmitri said...

You've mentioned that you've used "Windows versions along with Wine to get Flash content to display on my Linux boxen."

I was wondering how fast/responsive/buggy was the flash player 7 acting under this circumstance? Have you tried doing the same with flash player 8?

Friday, September 30, 2005 10:41:00 AM  
Anonymous Anonymous said...

For input methods, the standard and "unix universal" one is standardised for X11, namely XIM:

Friday, September 30, 2005 5:24:00 PM  
Anonymous Anonymous said...

Where is Tinic Uro?

There are no new answers for these questions ...

Friday, September 30, 2005 7:14:00 PM  
Anonymous Anonymous said...

"But to tell you the truth I am not even happy with the state of Flash Player 7 on Linux. So much that I use the Windows versions along with Wine to get Flash content to display on my Linux box"

Soooo that would mean that IF Flash 8 gets ported to GNU/Linux it would be a poor hack and would not be released untill like... 2010?

Wednesday, October 12, 2005 4:15:00 PM  
Blogger Christopher Peterson said...

Don't forget porting to the ARM processor family (including StrongARM, XScale, and Thumb). There are probably more ARM mobile phones than Wintel PCs in the world! :-)

Tuesday, October 18, 2005 9:08:00 PM  
Anonymous Neill Corlett said...

Man, that stuff looks fun. I've been wanting to do some Flash rendering-related projects for a long time, and to think there's actually positions open in Macromedia for this is pretty exciting.

But, I'm at Ubisoft for at least the next 6 months, so I guess someone else will get the honor!

Friday, October 21, 2005 11:16:00 AM  
Anonymous Anonymous said...

Having written a couple of cross-platform compositing engines over the last twenty years my first observation is that if you have reverted to asm you have not spent enough time a) looking at the asm output from the various c/c++ compilers; b) spent enough time doing real platform profiling.

If you had done a) you would have noticed real correlations between between writing simple unwound c/c++ code and very efficient emitted code from the compiler. Make it easier for the compiler and the compiler will usually do a very good job. Apart from device drivers and compiler bug workarounds I have not had to use embedded asm very much in the last 15 years. This observation has so far held true for compilers for 68K, PPC, x86, MIPS, SH-5, ARM, and SPARC; and for platforms as diverse as MacOS, Win32, Solaris, vxWorks, Symbian and 16 and 32 bit consoles. Even gcc (a true pig of a compiler) can be coaxed into emitted code that is close enough to safe assembler in size and speed. Of course you can still write unsafe assembler that is might be faster (or might not .. the instruction scheduler makes this question moot) but all you may have gained is 5% - 10% in speed for a codebase that is now basically unmaintainable.

b) really kicks in when you start doing real compositing, movie quality full aperture (2048x1556) 16 bit non-premultiplied components, multiple-colorspaces etc. Pixel op level optimization is not going to get you very far, all real optimization is in understanding the relative cost performance of the memory hierarchy - the relative cost of reg-to-reg, reg-toL1, reg-to-L2, reg-to-main mem ops. Thats how you make the paint ops in a high end rotoscoping app execute more than 3 times faster...

Again, no need for asm, just a very deep understanding of what the platform is actually doing.

If you need to support ambidextrous code I've found txl ( a really useful tool. I use it to support a c++ / j2me cross platform multi-media app library. Run the script one way, out comes c++ source code, run the script the other way, out come j2me compatible Java source code. The base codebase is simplified c++.

And as for Linux, there is only one successful porting strategy ... ignore everything above kernel services and the device drivers level. If you consider it just as an embedded OS kernel with a very rich set of device drivers you will save yourself a lot of grief, because as an API platform it is just like Win32, the kernel API's work, everything above is very flaky. So it usually faster to do it yourself.

The linux agp video driver will be your best friend, X-Windows will be your greatest enemy..


Monday, October 24, 2005 5:19:00 AM  
Blogger Tinic Uro said...

jmc: Well, if you think you can do better, why not do it?

Monday, October 24, 2005 7:16:00 AM  
Anonymous Anonymous said...

hi tinic

I have...

Just finishing up the Win32 and MacOS QA, and about to dive into the Symbian and Solaris QA. Fully platform independent, fully scalable, completely modular flat component architecture. Any part of the compositing pipeline can be built up or out as business opportunities arise.

And not a line of asm.

If the target processor has a deep pipeline and multiple ipu's/fpu's you are just wasting your time. You will never beat the compilers instruction scheduler. And I am saying this as someone who once got 3.5 integer results per clock out of a PPC 604 for a blitter with full alpha channel support. And on x86 (Intel not AMD) the switch overhead for SMD ops was so high that it was only worth using them for area filters or dsp funcs. On PII/III's anything less than 30 mmx insts was generally a complete waste of time.

Of course if the target is running an ARM or MIPS core, no mmu, no fpu, no L1, single ipu pipeline then asm starts making sense for certain bottlenecks.


Tuesday, October 25, 2005 2:32:00 AM  
Anonymous Anonymous said...

dont forget that you should only use either direct ALSA or something like gstreamer for sound support.
Do not care about esd (support dropped) or arts, or OSS (deprecated)
Logic would be going raw ALSA if you have your own things to decode sound (mp3, whatever), OR gstreamer if you dont (gstreamer will provide decoders)
that's the way to go on linux, as far as sound is concerned. You might consider OpenAL as it could use code from OSX. But that's not really the main choice for Linux.
No need to write code that has no use for esd or what-ever.

as far as the toolkit goes it sounds like you have only two or tree boxes and one menu, that'd work with X11 toolkit. Granted, its harder to code for but lower deps and its really a small amount of code.
Alternatively QT and GTK are equally ok , but more deps = bad when unneeded. WxWidget also good, but only if you consider using it for windows and macosx and linux. else it falls in the same dependency issue.

Btw, take care of cairo, its slow.

Sunday, October 30, 2005 4:54:00 AM  
Anonymous Anonymous said...


I'm not an expert, but I think we could find a middle point. I understand that Macromedia doesn't want to release the whole Flash Player source, but I think they could modularize it in two parts:

1) The Flash engine (the swf reader, ActionScript interpreter, etc.)
2) The platform specific code.

For what I've understood, the hardest part to port is the second one (because it's where most optimizations lay).

Then, Macromedia could release the first part as binary-only (for a lot of OSes and platforms).

For the second part they could release some kind of common source code (without the optimizations), that could be compiled easily in most architectures. This could be called "lite" or "comunity" edition, because it wouldn't have the power of the supported edition, but at least we could be able to have it in new architectures.

Then there could be community efforts for developing the platform-specific lower-layers.

I don't know if it would be so hard to do, but I think this way we could benefit each other.


Saturday, December 03, 2005 1:38:00 PM  
Anonymous Anonymous said...

What about the VectorC compiler?

Saturday, December 03, 2005 7:56:00 PM  
Anonymous Anonymous said...

And what about Flash Player 8 on eComStation - OS/2 Operating System?

You and your team need to do Nothing in the code. You only need modify the Lincence Agreement to allow Flash Player 8 for Windows to run under other Operating Systems and Innotek will do the rest. Simple and Fast and you will make happy a lot of users. ;-)

Thursday, December 22, 2005 9:56:00 AM  
Anonymous Anonymous said...

Hi all and thanx for your work.

This is a post to tell you that the "wmode=opaque" is not handled correctly on the linux flash player 7.


Wednesday, January 11, 2006 7:13:00 AM  
Anonymous Anonymous said...

I don't mean to be rude, but your entry here sounds like nothing but a whining apologetic for your lack of progress on this front. Guess what? Being a software developer that makes a peice of cross-platform software that does vector graphics and the like just isn't easy.

You still have to do it. It's your job. To shove off the large and growing x86-64 user population because you don't feel up to the task of getting the port done is ridiculuous. If you can't do it, then for god's sake, open source your code and quit or something. The community would have it up and running, a few glitches and whatnot aside, in under a week.

I could translate the mmx code you were bitching about that doesn't compile in gcc with a small script. You can use your mmx code here, you just have to change the format of it. It's a mechanical conversion...

Friday, March 03, 2006 3:38:00 AM  
Anonymous Anonymous said...

is this blog a kind of "excuse me, we're to stupid to port flash for linux" ?
or do you just want to say "damm it! we're bound to M$" ?
Ever heard about "portable/multiplatform" programming?
Please ... show your real "good will", if you want to claim "flash is really multiplatform and the real! platform for internet content".
Otherwise, I stick with old plain HTML and animated pics.

Tuesday, March 07, 2006 5:36:00 AM  
Anonymous Anonymous said...

I know the difficulties for porting an app to Linux but isnt this already ported and this is an upgrade. So adding the new functionality should be more trivial.

Patches and implementation of new features shouldnt be worry about architectural problems right?

Sunday, March 12, 2006 12:27:00 PM  
Blogger guilt said...

I suppose the comment on ASM portability wasn't quite justified. It is apparently the usual practise to support MASM syntax than a portable assembler syntax.

The few options that come to my mind are:

1. Use GNU as (with MinGW) to compile the same ASM code to both pe and elf

2. Use other portable assemblers like nasm or yasm (which also do pe, elf)

The only problem, as I understand it ... is changing the build process to fit these issues. I'm quite sure that ffdshow and XViD do well on many platforms (Windows, Linux/x86,x64 and ppc)

As for the sound support, you could always go with what the kernel supports. The newer ALSA comes with dmix enabled by default.. As far as the 2.6 kernel goes, it's ALSA. Following the kernel interface shouldn't be a problem.

The browser spec itself doesn't interface to the sound system followed... unless that happens, you'll still be caught with which audio driver to use (like Jack vs. GStreamer vs. ALSA vs. OSS, though all of them may be present!)

Tuesday, March 21, 2006 9:43:00 AM  
Anonymous Anonymous said...

AMD64 should be high priority for the port. the lack of x86-64 (specifically AMD64) is driving me nuts. I refuse to un-natify this particular install of linux I currently have. I am ready to say the same to wine anyway. I do enjoy the wide-range of flash usages but without wide support for architectures of non microshaft OSes it will eventually die to something that is GPL AND truely cross-platform.

Sunday, April 02, 2006 9:45:00 PM  
Anonymous Anonymous said...

As a system administrator who operates a group of Linux thin clients for work, this issue has been dear to my heart lately. I'm glad to hear that Macromedia is doing something about it, and I hope the merger doesn't cause too much chaos.

Thanks for supporting Flash on Linux. I expect it costs a significant amount of developer time, so I'm glad you folks find it worth it (presumably mostly due to thin client deployments, kiosks, and embedded devices).

I'm surprised that the main issue you mention is with asm routines, rather than OS integration problems. Can't you use the portable pointer-size-and-byte-order-independent C versions of your asm functions for the initial release? I assume you _do_ have such portable C versions of all your inline asm functions (if nothing else, so the asm build can be tested against the portable build). If not, you have my sympathy, as that'd make life a lot harder for porting, testing, and maintainance. If you don't have them, maybe you should be writing them anyway, porting concerns aside.

It seems odd that it's not possible to carry forward the base of the Flash 7 port (in terms of copy+paste, OS interaction, etc) and then enhance it once the basic Flash 8 functionality is out there and available. Right now, frankly, an imperfect Flash 8 port like the Flash 7 port would be a whole lot better than nothing. Flash 7 works quite well the vast majority of the time.

If you're re-doing the lot, and it sounds like you are, you're in for some fun when it comes to OS interaction and so on. If you can't link to LGPL libraries (something that strikes me as extremely odd - even Adobe do that in Adobe Reader) you're going to find things very hard going, as many of the tools that'd help you out will be LGPL. SDL, for example. I can only assume that restriction arises because you want to statically link all key libraries into the flash player executable. Flash 7 seems to be very conservative - it looks like it might statically link (or perhaps dlopen()??) gtk, freetype, and a lot of other common libraries. No fontconfig though... if you use fontconfig in Flash 8 you're going to find the font situation somewhat less excruciating.

(On a side note, I'm actually reasonably happy with how Linux handles fonts these days. It has a lot of things going for it over the win32/macOS way as of the wide implementation of fontconfig and freetype. It can still be a bit resource hungry at times, but actually handles huge numbers of installed fonts much better than win32 or Mac OS X, and is a heck of a lot less fragile in the face of problematic fonts. I work for a newspaper, so I'm used to dealing with a LOT of font problems, and I do find Linux a lot less painful there. I also write graphics design software for a hobby, so I've got a bunch of experience tackling the issue from that side too. My main complaint is the lack of universal higher-level toolkit/display-device independent services for things like glyph-combining in international text.)

Unfortunately it's probably not practical to take the Adobe Reader approach of using system libraries where they're ABI stable (eg gtk2+) and bundling libraries in a private lib/ directory for others. Since the flash player is usually loaded in-process as a shared object, you'd run into issues with linking to conflicting versions of a library (ouch!), possible lib path issues (though rpath should handle that), and issues with how applications load the library (just hope they don't use dlopen(..., RTLD_GLOBAL) !) . That said, in all reasonable scenarios it'd probably be OK, even if apps using Flash 8 had to follow some slightly stricter rules.

Let's not even think about toolkits - trying to get Flash's widget set to be consistent with different browsers' widgets (Mozilla's XUL; Qt as used by Konqueror; etc) would be painful. One person here suggested using fltk (GPL + static linking exception), and while it's not a very nice toolkit it might be suitable.

Just to throw another spanner in the works, I hope Flash 8 will work correctly over remote X11. Flash 5 used to choke horribly, which was infuriating, and now that flash 7 actually behaves I'd be very sad to see that take a step backwards.

I'd strongly suggest you check out what some game porters have been doing (eg Bioware, Id), and get in touch with the Adobe Reader Linux porting team. None of them have really had to face quite the same issues as you, but they should have some ideas to offer. Adobe Reader works pretty well on our Linux thin clients, though it does lack some important functionality (esp. in printing) vs the Windows version.

When it comes to sound support in Flash, one annoying wrinkle is that not all systems support ALSA yet. ALSA can emulate the OSS interface (usually, but not always, extremely well), but there's a lot of functionality lost by relying on that. Additionally, ALSA doesn't seem to be smart enough to figure out if software mixing is required (depending on number of HW sound channels, etc), and many users don't know how to configure software mixing. This means ALSA is sometimes still subject to the old "one app only using the sound device" problem that you folks will know and hate so well from Flash 7 on OSS. Any ALSA experts here feel free to correct me. It doesn't matter that much anyway, since not everything supports ALSA yet, so you need to support OSS ... but there's no standard sw mixer for OSS so you need to support ARTS and ESD - it just gets nastier from there. That said, I'm sure you already have a good audio abstraction layer because of the necessity to support Mac OS X and Windows audio, so this is probably not a big deal to you. If your policies didn't forbid your use of some of the more useful tools, like SDL_audio, you'd at least be able to implement only one backend for *nix systems instead of something like five.

Flash Player isn't a product I'd want to have to be responsible for getting working on a variety of platforms even without dealing with issues on Linux of closed-source binary-only distribution. I'll be interested to see how it all goes.

Sunday, April 16, 2006 6:18:00 AM  
Anonymous Anonymous said...

Maybe before starting Flash 8.5/9 they should look at how easy this will be to port to multiple platforms?

Sunday, May 14, 2006 9:48:00 AM  
Blogger tux said...

Erm, I know this is a little late, BUT, there are a number of x86 disassemblers, that you could use for automation of the task.

Thursday, May 18, 2006 8:16:00 PM  
Blogger tux said...

Also, GCC is open source. I've hacked it. You can easily write in an x86 intel mnemonic assembler in as an addon or patch, or library for inline.

It sounds like you just think linux is a pain because you love micro$oft.

Thursday, May 18, 2006 8:23:00 PM  
Anonymous Anonymous said...


Forget OSS, it's dead and being phased out. Likewise, forget ESD. Forget ARTS: it's still in active use, but its creator has told everyone to switch to something else as soon as humanly possible, because he thinks ARTS is a terrible design. :-)

You have the choices of ALSA, JACK, or Gstreamer. In each case, you only need to do one; ALSA and JACK each have shims for the other one, and Gstreamer consists of nothing *but* a set of shims for other sound systems.

Translating assembly from one dialect to another is hard work for you? You poor babies. That's *trivial* work. I do that sort of translation to relax. If you don't, you could always write an automatic translator! (Actually, I think someone already *has* -- try googling.) The GNU assembler can handle Intel notation anyway, so you shouldn't actually have to translate them!

It would be even easier to deal with if you'd put the assembly routines in separate files (altivec.asm, mmx.asm, etc.). Perhaps you absolutely needed them to be inlined into C functions, since I can't see any other reason not to put them in separate files; separate files always end up being easier to handle.

Framework support:
Pick one. They play nice with each other, you know. Copy&Paste should be done with the FreeDesktop standard.

Fonts are confused at the moment. Pick a version of freetype and go with it.

"I could try to go into details, but when we did the port to Linux we had dozens of compiler bugs to work around."

Are you sure?

Since approximately version 3.2, GCC has been quite good about fixing compiler bugs. The majority of reported bugs turn out not to be bugs: they turn out to be people relying on quirks of MSVC which aren't present in GCC.

"And no, the gcc optimizer is still no on par with Visual Studio or the Intel compiler."
YMMV, as always.

But more importantly:

Since you're releasing a version of Flash Player for MacOS X, *which has GCC as its system compiler*, you will most likely have fixed all your GCC problems before you even start making a Linux port.

OS X is an *BSD system with a proprietary graphics and sound layer. Any problems you have with Linux outside the graphics and sound implementations, you will have already had with OS X. Which is why this post sounds so odd.

"- liboil is nice, but not very well optimized. The YUV2RGB conversion routines f.ex. convert one pixel at a time. The functions I wrote convert 8 pixels at a time in the inner loop and are designed for scanline display which the Flash Player needs."

Liboil is insanely new (version 0.3), which accounts for its current not-so-great status. But it's also *free software*. You have some good inner loop routines to convert 8 pixels at a time? You get your employer to license them under the same license as the rest of liboil ("two-clause BSD"), you submit them to the liboil maintainers, and voila, your routines are now part of liboil. Try the open source methodology; you'll like it. :-)

It sounds like you don't have anyone who has the first clue about Linux development. I don't meet the primary requirement for your job listing -- namely, I'm not willing to move to San Fransisco. Unfortunately for you, since I have most of the other skills listed. :-) (Now, if that job allowed for telecommuting....)

Monday, May 22, 2006 6:33:00 PM  
Anonymous Anonymous said...

Just a real beginner know nothing comment, but D has a portable library which accepts inline assembly. It is also the fastest language out there, and has great syntax.
This could really help your development, and you can isolate plataform specific codes (or use versioning (done right - no #ifdef)).

Of course, this does not help with all the standarts out there. I am convinced that only the dystros maintainers can help, but would require a open source license.

Tuesday, June 13, 2006 3:26:00 PM  
Anonymous Anonymous said...

Bluestreak Technology has it's own multimedia engine capable of rendering Adobe(R) Flash(R) 7 authored content on very low power embedded devices and it has been ported to a number of Linux implementations already. Operators such as Time Warner Cable (TV set top boxes) and Orange (mobile) are already deploying high profile services with Bluestreak's technology.

Thursday, June 15, 2006 12:22:00 PM  
Anonymous Anonymous said...

If I would be macromedia programmier I would have started to invent some "general" ASM language which can get translated to any of the "real" asm languages used for x86, amd64, intel, mmx, etc.

would it be that hard to write some kind of abstraction layer for the assembler languages ? So you first translate your abstracted-program into the real assembler code and then let this get compiled on the target marchine.

Wednesday, July 26, 2006 3:50:00 PM  
Anonymous <a href="">Phentermine</a> said...

Great Article! Thank You!

Tuesday, August 28, 2007 1:55:00 PM  
Anonymous <a href="">Buy Phentermine</a> said...

Thanks to author! I like articles like this, very interesting.

Wednesday, August 29, 2007 1:30:00 AM  
Anonymous <a href="">Free Ringtones</a> said...

nice blog!

Sunday, September 02, 2007 9:39:00 AM  
Anonymous <a href="">buy viagra</a> said...

nice blog!Nice information

Monday, September 03, 2007 12:45:00 PM  
Anonymous <a href="">Levitra</a> said...

:-) ochen\' zaebatyj blog!

Tuesday, September 04, 2007 12:47:00 AM  
Anonymous <a href="">Buy Soma</a> said...

soglasen s vami ochen\' zaebatyj blog!

Thursday, September 06, 2007 1:48:00 AM  
Anonymous <a href="">Anonymous</a> said...

Keep up the great work. It very impressive. Enjoyed the visit!

Sunday, September 09, 2007 12:17:00 PM  
Blogger blue lang said...

anyone who is qualified for that job isn't going to want to spend all day porting asm. :D have you considered getting one linux architect and a team in china?

Thursday, November 01, 2007 7:55:00 PM  

Post a Comment

<< Home