pthreads and signals
Mike, the others on the Linux team and now also some core player developers are still working to get the Linux version into good shape. Now that I am done with the MacIntel version I could not resist to install Ubuntu on my MacBook in all its glory including compiz. We have tried a bunch of machines already so I wanted to make sure that the Flash Player would work on it.
Sure enough, the snd-hda-intel driver seems to have bugs. First I had to patch the module to get any sound output at all as they person who merged in the MacBook kernel patches did remove the custom Apple pin configs required for alsamixer to work correctly.
The second problem I had was that snd_pcm_avail_update() sometimes would return values like 0x3ffffee5 instead of the actual number of frames. It looks kind of similar to a negative value, meaning the buffer is overrun with data. But why are the 6 high bits not set then? Anyhow, this was causing havoc with sound playback and I had to add a hack to get around this temporarily until we figured out why we were shoving in too much data, which we did soon after.
Still, for some reason the Flash Player would hang after a few minutes. Mike showed me the backtrace, it was hanging in a pthread mutex lock in the progressive FLV playback code. As I had written the code for progressive FLV playback way back in Flash Player 7 and still feel responsible for it I took a peek. Progressive FLV playback was designed to be able to read from CD-Roms asynchronously so it's running in its own thread to avoid blocking the main playback thread. Hence primitives like queues need to be protected with a mutex.
Good lord, it turns out that mixing pthread mutexes and signals (semaphores are probably affected the same way) is a bad idea. Right now we are using ALSAs callback method to notify us when we can add more data to the buffer using snd_pcm_writei(). When looking at the backtrace it is obvious that the callback is initiated from a signal. The problem is that if a pthread mutex lock is contested and therefore blocking it can become the actual thread to initiate another callback. So we are now in the ALSA callback trying to acquire the same mutex it was trying to lock in the first place. A nice deadlock ensues. And while we could probably detect this situation, more bad things are bound to happen with hacks. I spent an afternoon trying to find a good solution until I decided we would use the ALSA callback only to kick off another thread (using a semaphore) which would then do the actual sound mixing. Sure enough, videos now play for hours at end on my machine without crashing/hanging. For those interested, this is how the backtrace looked like in gdb (I changed the method names so things make more sense):
Now I am sure someone could explain me why this is expected behavior and by design. My first reaction was that it had to be a bug in the pthreads implementation. And why ALSA is using signals for callbacks is also a mystery to me. I am sure there is a perfectly acceptable explanation for this also. Anyway, my goal is to make it work, the rest does not really matter...
Sure enough, the snd-hda-intel driver seems to have bugs. First I had to patch the module to get any sound output at all as they person who merged in the MacBook kernel patches did remove the custom Apple pin configs required for alsamixer to work correctly.
The second problem I had was that snd_pcm_avail_update() sometimes would return values like 0x3ffffee5 instead of the actual number of frames. It looks kind of similar to a negative value, meaning the buffer is overrun with data. But why are the 6 high bits not set then? Anyhow, this was causing havoc with sound playback and I had to add a hack to get around this temporarily until we figured out why we were shoving in too much data, which we did soon after.
Still, for some reason the Flash Player would hang after a few minutes. Mike showed me the backtrace, it was hanging in a pthread mutex lock in the progressive FLV playback code. As I had written the code for progressive FLV playback way back in Flash Player 7 and still feel responsible for it I took a peek. Progressive FLV playback was designed to be able to read from CD-Roms asynchronously so it's running in its own thread to avoid blocking the main playback thread. Hence primitives like queues need to be protected with a mutex.
Good lord, it turns out that mixing pthread mutexes and signals (semaphores are probably affected the same way) is a bad idea. Right now we are using ALSAs callback method to notify us when we can add more data to the buffer using snd_pcm_writei(). When looking at the backtrace it is obvious that the callback is initiated from a signal. The problem is that if a pthread mutex lock is contested and therefore blocking it can become the actual thread to initiate another callback. So we are now in the ALSA callback trying to acquire the same mutex it was trying to lock in the first place. A nice deadlock ensues. And while we could probably detect this situation, more bad things are bound to happen with hacks. I spent an afternoon trying to find a good solution until I decided we would use the ALSA callback only to kick off another thread (using a semaphore) which would then do the actual sound mixing. Sure enough, videos now play for hours at end on my machine without crashing/hanging. For those interested, this is how the backtrace looked like in gdb (I changed the method names so things make more sense):
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7e5b2ae in __lll_mutex_lock_wait ()
from /lib/tls/i686/cmov/libpthread.so.0
#2 0xb7e57fc7 in _L_mutex_lock_159 ()
from /lib/tls/i686/cmov/libpthread.so.0
...
#12 0xb06e9b53 in PlatformMutex
(this=0xbfbf48ec, pMutex=0xaf58b57c)
#13 0xb0998d1d in NetStreamQueue::PopPacket
(this=0xaf58b530)
...
#18 0xb071e527 in PlatformSoundMixer::FillAlsaBuffer
(this=0xaf583008)
#19 0xb071e78a in PlatformSoundMixer::AlsaCallback
(ahandler=0x8b32178)
#20 0xb5ab3aba in snd_output_buffer_open ()
from /usr/lib/libasound.so.2
#21 <signal handler called>
#22 0xb7e57f7b in pthread_mutex_lock ()
from /lib/tls/i686/cmov/libpthread.so.0
..
#25 0xb06e9b53 in PlatformMutex
(this=0xbfbf5164, pMutex=0xaf58b57c)
#26 0xb099692e in NetStreamQueue::HandleOnStatus
(this=0xaf58b530)
#27 0xb099c21a in NetStream::Receive (this=0xaf58b430)
#28 0xb09901bc in NetSocket::Receive (this=0xaf5ad140)
#29 0xb09765fa in FlashPlayer::Idle (this=0xaf57c008)
#30 0xb0700ebb in PlatformFlashPlayer::OnTimer
(this=0xaf57c008)
#31 0xb06df318 in gtkTimerCallback (data=0xaf57c008)
#32 0xb784a4a8 in g_main_context_is_owner ()
from /usr/lib/libglib-2.0.so.0
#33 0xb78488d6 in g_main_context_dispatch ()
from /usr/lib/libglib-2.0.so.0
#34 0xb784b996 in g_main_context_check ()
from /usr/lib/libglib-2.0.so.0
#35 0xb784bcb8 in g_main_loop_run ()
from /usr/lib/libglib-2.0.so.0
#36 0xb7c96765 in gtk_main ()
from /usr/lib/libgtk-x11-2.0.so.0
#37 0xb691590a in nsAppShell::Run (this=0x8147828)
at nsAppShell.cpp:139
#38 0xb68433d2 in nsAppStartup::Run (this=0x8151f70)
at nsAppStartup.cpp:150
#39 0x0804f321 in XRE_main (argc=3, argv=0xbfbf59b4,
aAppData=0x80595e0) at nsAppRunner.cpp:2374
#40 0x0804abe4 in main (argc=0, argv=0x0)
at nsBrowserApp.cpp:61
#41 0xb75deea2 in __libc_start_main ()
from /lib/tls/i686/cmov/libc.so.6
#42 0x0804ab31 in _start ()
at ../sysdeps/i386/elf/start.S:119
Now I am sure someone could explain me why this is expected behavior and by design. My first reaction was that it had to be a bug in the pthreads implementation. And why ALSA is using signals for callbacks is also a mystery to me. I am sure there is a perfectly acceptable explanation for this also. Anyway, my goal is to make it work, the rest does not really matter...


6 Comments:
You could post this kind of question in the alsa developers mail list:
https://lists.sourceforge.net/lists/listinfo/alsa-devel
Maybe they (with a proper explanation) can avoid some troubles later. :-D
Regards
So you are upset not that signals are used for the callbacks, its that alsa used the same thread, right? If not, the only other way I know for callbacks is a mainloop, and that doesn't seem timely enough for audio processing.
Of course you could just implement a lock-free queue...
I'm not sure what else is supposed to happen in this case, since signals are essentially thread agnostic if you don't do anything about them. You can think of them as being separate incompatible thread implementations.
I played around with some toy code, and you can control which threads get signals with
pthread_sigmask, and thereby prevent deadlocks.
Take a look here: http://perkinbr.blogspot.com/2006/08/toy-pthread-code.html
POSIX does not define pthread_mutex_lock as an 'async signal safe' function. For example, the manual page on Linux for pthread_mutex_lock says:
ASYNC-SIGNAL SAFETY
The mutex functions are not async-signal safe. What this means is that they should not be called from a signal handler. In particular, calling pthread_mutex_lock or pthread_mutex_unlock from a signal handler may deadlock the calling thread. "
It is not just mutex functions which are prohibited in signal handlers - a huge number of POSIX APIs are prohibited. The list of safe functions which can be used in signal handlers can be found here:
http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html#tag_02_04_03
Not that this is a current issue anymore, but the signal behaviour of pthreads has been a known shortcomming of LinuxThreads, which are now being replaced by NPTL. If you are inclined you may find plenty of information on lwn.net, for example.
I like articles like this. Thanks!
Post a Comment
<< Home