Monday, August 20, 2007

What just happened to video on the web?

That's a question you should ask with the announcement we made tonight. I think a lot will change. This is probably one of my longest and information packed posts ever, but I think it is important we put down all cards on the table. Lets summarize what new functionality Flash Player 9 Update 3 Beta 2 contains (for the impatient: It will be available on labs.adobe.com this afternoon):
  • An file format parser implementing parts of ISO 14496-12. In terms you might understand this means a very limited sub set of MPEG-4, 3GP and QuickTime movie support.

  • Support for the 3GPP timed text specification 3GPP TS 26.245. Essentially this is a standardized subtitle format within 3GP files.

  • Partial parsing support for the 'ilst' atom which is the ID3 equivalent iTunes uses to store meta data. This really more a de-facto standard which came through the ubiquity of iTunes, there is no official documentation on the format. Look here for an incomplete list of supported tags iTunes does use.

  • A software based H.264 codec with the ability to decode Base, Mainline and High profiles. This is also an ISO standard with the identifier being ISO 14496-10.

  • An AAC decoder supporting AAC Main, AAC LC and SBR (also known as HE-AAC). The corresponding ISO specification is ISO 14496-3.

That's pretty much what we say publicly. Truth is that these specifications are so complex that no one supports 100% of it. I realize that it will be important for Adobe to communicate exactly what is and what is not supported. We are working on this and will be trying to help novices and experts alike. For those who scream murder and accuse us of going with incomplete standards support let me tell you that ISO 14496-12 specifically allows for the definition of sub sets. 3GP is one of those. We did not extend or add proprietary extensions whatsoever to the mentioned standards above, it is a pure sub set.

Why now? Short answer: Because you wanted it. Long answer: We've been working on this for a while and this was planned to be part of the next major revision of the Flash Player. What was unexpected was how impatient a lot of our customers are :-) It seems many are trying to make choices when it comes to video technologies right now. We wanted to make sure that we would offer the best possible choices to them and set a signal that we are willing to embrace industry standards. No one believed that we would make this happen.

Unfortunately, and we realized while working on this: along with adopting industry standards also comes completely new terminology which seems designed to confuse non-insiders. This makes it difficult to pin down exactly what it is what we did and how you might benefit from it. It took me several months to just understand the basics in the ISO specifications. By now I might have lost the ability to boil it down into simple terms everyone can understand. But I'll try anyway. :-)

Lets talk about actual functionality you can leverage in the Flash Player. Now I am getting really technical:

  • You can load and play .mp4,.m4v,.m4a,.mov and .3gp files using the same NetStream API you use to load FLV files now. We did not add any sort of new API in the Flash Player. All your existing video playback front ends will work as they are. As long as they do not look at the file extension that is, though renaming the files to use the .flv file extension might help your component. The Flash Player itself does not care about file extensions, you can feed it .txt files for all it matters. The Flash Player always looks inside the file to determine what type of file it is.

  • A new version of FMS is upcoming and will support the new file format. This is powerful stuff. Simply drop video files you might have encoded using one of the countless tools out there onto the server and it'll stream. Even if the moov atom is at the end of the file. Ah, that is something I have to mention as you are 100% likely to fall into this trap:

  • If you use progressive download instead of FMS make sure that the moov atom (which is the index information in MPEG-4 files) is at the beginning of the file. Otherwise you have to wait until the file is completely downloaded before it is played back. You can use tools like qt-faststart.c written by our own Mike Melanson to fix your files so that the index is at the beginning of the file. Unfortunately our tools (Premiere and AfterEffects etc.) currently place the index at the end of the file so this tool might become essential for you, at least for now. We are working hard to fix this in our video tools. There is nothing we can do in the Flash Player and iTunes/QuickTime does behave the same way.

  • The Flash Player will display the first supported video and audio track it finds in a file. Subsequent audio and video tracks are ignored and not selectable right now. This covers the majority of files out there on the web, only in rare instances do you have additional audio tracks f.ex. But I believe that for the web you would rather create several versions of a file anyway to save bandwidth. One of next major revisions of the Flash Player will add new APIs to enhance this most likely. Our goal was not to add any new APIs for this release.

  • Video needs to be in H.264 format only. MPEG-4 Part 2 (Xvid, DivX etc.) video is not supported, H.263 video is not supported, Sorenson Video is not supported. Keep in mind that a lot of pod casts are still using MPEG-4 Part 2. So do not be surprised if you do not see any video. We should be close to 100% compliant to the H.264 standard, all Base, Main, High and High 10 bit streams should play. Extended, High 4:2:2 and High 4:4:4 profiles are not officially supported at this time. They might or might not work depending on what features are used. We have no artificial lower limit on B-frames or any problems with B-pyramids like other players do. We also decode field coded streams, although this beta displays the images progressively using the weave method. The final release will be blending the two fields. There are still a couple of bugs with frame ordering/timing I need to fix in the Flash Player itself for the final release. And there is also a problem with files using the loop filter on dual core machines which causes horizontal artifacts along slice boundaries, which is my bad. The fix for this did not make it into this beta. Overall though and leaving out the bugs I listed here which are my fault, the H.264 decoder is a remarkable piece of engineering, it is provided to us by MainConcept. It weights in at less than 100KB of compressed code which is quite an achievement for such a complicated standard.

  • Audio can be either AAC Main, AAC LC or SBR, corresponding to audio object types 0, 1 and 2. We also support the '.mp3' sample type meaning tracks with mp3 audio. MP3inMP4 which intends to do multi-channel mp3 playback within mp4 files is not supported. Also, the old QuickTime specific style of embedding AAC and MP3 data is not supported. It is unlikely though that you will run into these kind of files.

  • 3gp timed text tracks. Any number of text tracks are supported and all the information, including esoteric stuff like karaoke meta data is dumped in 'onMetaData' and a new 'onTextData' NetStream callback. Language information in the individual tracks is also reported. That means you can have sub titles in several languages. Study the 3GPP TS 26.245 specification to see what information is available. Note that you have to take care of the formatting and placement of the text yourself, the Flash Player will do nothing here. Time for you to start working on one of those components which do that. You can use MP4Box to inject text data into existing files.

  • Meta data stored in the 'ilst' atom. This is usually present in iTunes files. It contains ID3 like information and is reported in the onMetaData callback as key/value pairs in a mixed array with the name 'tags'. ID3V2 is not supported right now. An incomplete list and link to tools which can edit these tags is available here.

  • Since these files contain an index unlike old FLV files, we can provide a list of save seek points, e.g. times you can seek to without having the play head jump around. You'll get this information through the onMetaData callback in an array with the name 'seekpoints'. On the downside, some files are missing this information which also means that these files are not seekable at all! This is very different from the traditional FLV file format which is rather based on the notion of key frames to determine the seek points.

  • Unencrypted audio book files contain chapter information. We expose this in the onMetaData callback as an array of objects with name 'chapters'.

  • Image tracks encoded in JPEG, GIF and PNG are accessible. Unfortunately only in AS3 as I pass this information as a byte arrays through a new callback 'onImageData'. You can simply take that byte array and use the Loader class to display the images. Most often these images represent cover artwork for audio files. TIFF image tracks are not supported, you might come across files using this. Also note that we support the 'covr' meta data stored in iTunes files, these are also accessible as byte arrays.

  • Will it be possible to place H.264 streams into the traditional FLV file structure? It will, but we strongly encourage everyone to embrace the new standard file format. There are functional limits with the FLV structure when streaming H.264 which we could not overcome without a redesign of the file format. This is one reason we are moving away from the traditional FLV file structure. Specifically dealing with sequence headers and enders is tricky with FLV streams.

  • Will it be possible to place AAC streams into an FLV file structure? Yes, though the same limitations as for H.264 apply.

  • Will the Flash Player play back multi channel AAC files? It will play them, though the sound is mixed down to two channels and resampled to 44.1Khz. We are targeting multi channel playback for one of the next major revisions of the Flash Player. This requires complete redesign of the sound engine in the Flash Player which dates from circa 1996 and has not been improved since.

  • Will the Flash Player be limited to 11Khz, 22Khz and 44.1Khz sampling rates like for MP3? No, we support all sampling rates from 8Khz to 96Khz. I implemented a 32 tap Kaiser Bessel based FIR filter which resamples the sound to 44.1Khz, retaining high quality. The most common sample rate combinations have a hard coded number of phases. In case of a 48000 to 44100 Hz conversion the filter has 147 phases f.ex. Even better: Flash Player Update 3 Beta 2 now can play back any MP3 sampling rate leveraging the same code I implemented for AAC. No more chipmunks. Ever. Err, this is actually kind of major as I have seen complaints about this bug for years :-) I fixed this problem in the AS3 Sound class, though it was using very low quality resampling. This change I made this time will fix it even for AS2 and sound in FLV files while retaining excellent quality.

  • Will it be possible to place On2 VP6 streams into the new file format? Not right now, we are still trying to figure out if it is possible for us to support this.

  • Can you play files protected by FairPlay? No.

  • Do we support MPEG-4 BIFS or other esoteric stuff (scripting, VRML etc.) from the MPEG-4 Systems specification? No. Whatever is not listed above we do not support.

  • Do we support SMIL? No. You can easily write your own SMIL parser in ActionScript though.

  • Can you use the Sound class to play back AAC/.mp4a files? No, you have to use the NetStream class. We are now getting into a situation where there is not much difference between audio and video files anymore. They are the same essentially. Hence we figured we should not further add confusion and allow to do things ten different ways which would also increase the Flash Player binary size. My guess is that we will enhance the Sound class in the future but it might go into a different direction and will not be dedicated to pure playback of static files anymore.

  • Here is a list of data which is reported in onMetaData:

    duration - Obvious. But unlike for FLV files this field will always be present.

    videocodecid - For H.264 we report 'avc1'.

    audiocodecid - For AAC we report 'mp4a', for MP3 we report '.mp3'.

    avcprofile - 66, 77, 88, 100, 110, 122 or 144 which corresponds to the H.264 profiles.

    avclevel - A number between 10 and 51. Consult this list to find out more.

    aottype - Either 0, 1 or 2. This corresponds to AAC Main, AAC LC and SBR audio types.

    moovposition - The offset in bytes of the moov atom in a file.

    trackinfo - An array of objects containing various infomation about all the tracks in a file.

    chapters - As mentioned above information about chapters in audiobooks.

    seekpoints - As mentioned above times you can directly feed into NetStream.seek();

    videoframerate - The frame rate of the video if a monotone frame rate is used. Most videos will have a monotone frame rate.

    audiosamplerate - The original sampling rate of the audio track.

    audiochannels - The original number of channels of the audio track.

    tags - As mentioned above ID3 like tag information.
Here are some good links to get an understanding of what MPEG-4, H.264 and AAC are:

http://forum.doom9.org/showthread.php?s&threadid=62723
http://forum.doom9.org/showthread.php?t=96059
http://en.wikipedia.org/wiki/H264
http://en.wikipedia.org/wiki/Advanced_Audio_Coding
http://daringfireball.net/2007/04/some_facts_about_aac

Let's put together some thought up scenarios I would imagine are important:
  • You created a pod cast for iTunes and happily distribute over this channel. Now you want to add value to it and easily make it accessible over the web without special plug-ins, reaching an audience which does not have QuickTime installed. Well, this new feature will allow you to do this. You can take your existing podcast in .m4a format and present it on any web page through the Flash Player. Add more value by adding interactivity and branding if you want to. The possibilities are endless.

  • Your media company has made or is about to make a significant investment into web video or video archiving. You are wondering what format you should choose. Video for Flash reaches everyone now, but the format is not an 'industry standard' so you have the fear that content you will create will become obsolete and unsupported at some point. Flash Player 9 Update 3 comes to the rescue: MPEG-4 is an extremely well documented ISO standard and completely vendor independent. And by using the Flash Player now you get instant gratification for viewers.

  • You want to get best the possible quality out of your video and do not want to be tied to a particular encoding solution. You also like open source software to do all of the work you need to do to encode video. A combination of libfaad, x264 and MP4Box which are all licensed under the GPL will do exactly that, albeit with little usability and requiring lots of expertise. But it will now play just fine through the most distributed run time in the world, the Adobe Flash Player.

Those are immediate benefits, there are plenty more when we look ahead. Let me mention a few of them:

  • H.264 will be supported natively by most new graphics cards. NVidia, ATI and Intel have made a commitments to have full support for it. This means better than HD video on your PC will become possible in the not so distant future.

  • There are hardware based H.264 encoders which encode at better than real time. This is important if you need to be quick to market like f.ex news organisations.

  • Digital TV, especially in Europe is quickly adopting H.264. The interoperability with the web will open new doors for a lot of media companies.

  • AAC SBR offers demonstrable advantages over plain MP3, think 5.1 channel surround sound f.ex. While the Flash Player does only support 2 channels output at this time, there is opportunity to go beyond that.

And last but not least here are some things I will not give a complete answer to since they are begging for controversy:
  • Comparing H.264 against other video codecs, might it be performance or quality. I've looked at the comparisons out there, they are at best subjective, most of the times outright marketing bull and almost always completely biased. My take is: Take a good and well accepted encoder and compare the results yourself. Your mileage will vary. And that is fine. Quality is not the main reason Flash Player 9 Update 3 has H.264 support.

  • Tell you if On2 VP6 is better or worse than H.264. Truth is that they have different strengths, not only performance and quality wise. It totally depends on your individual situation of what fits best. The Adobe Flash Player now offers more choice which is more important than anything else.

  • I am not in a position able to explain to you why we will not allow 3rd party streaming servers to stream H.264 video or AAC audio into the Flash Player. What I can tell you is that we do not allow this without proper licensing. Refer to Adobe's friendly Flash Media Server sales staff for more information.

  • I can also not help you with anything regarding broadcast fees for commerical use of H.264 and AAC streams. Please refer to the FAQ Adobe provides which usually point to contacts at MPEG-LA and Via Licensing. A summary of licencing terms for H.264 is available here.