Getting hardware acceleration into Qt eglfs is tricky. Doing so on a Raspberry Pi is, unfortunately, still tricky after many years. Qt claimed to have reimplemented the Qt Multimedia module entirely, and one of their target was getting hardware acceleration where possibile. So, I thought I could start with a quick look.
Qt 5
Since Raspberry Pi was born, I had to solve the problem of hardware accelerated video in Qt. At the beginning, I wrote POT (PiOmxTextures) to solve this problem: https://github.com/carlonluca/pot. It used OpenMAX to stream decoded video into an OpenGL texture, which was then showed through a custom backend of Qt Multimedia in Qt 5. This approach worked fine, but won’t work on Pi 4/Qt 6. On the other hand, there is another component in the same repo, that includes a custom Qt Quick item to render video through omxplayer. This is the most performant approach, but has its limitations.
Qt Multimedia
I quickly tested Qt Multimedia in Qt 6 on the rpi. My build from this article should support gstreamer. All I got was a a warning on the console. I didn’t investigate further. Maybe I’ll spend more time on this in the future.
POTVL
As the classical POT is no more usable on Raspberry Pi 4, I started to have a look at POTVL, which is very simple to port to Qt 6. With a small patch, it is possible to build it. You’ll find updates on the repo.
Demo
The video is a 1080p video. As you can see, the framerate is acceptable up to a certain weight of the graphics. It seems that Qt 6 OpenGL backend still is a bit less performant than Qt 5 in this specific demo, as you can see from this test from a previous article:
so the result may even improve in the future. The benchmark app can be found here: https://github.com/carlonluca/Fall. Unfortunately POTVL is still not future proof, but it is the simplest and most efficient element to port to Qt 6. I may try something better in the near future. Have fun! Bye 😉
At the moment I still don't want to test custom firmwares or 64 bit arch builds, but I started to test a couple of new features: a new compiler from Linaro (the one provided by the foundation keeps giving me headaches), version 4.9.4 instead of 4.8, and optimised compiler flags for the Rapsberry Pi 3, which is an armv8.
In this build, Qt, ffmpeg and POT are all built with 4.9.4 Linaro toolchain and optimised compiler flags for Pi3. This will only work on Pi3.
You won't probably see much difference in GPU intensive apps, but it is a step on the road of optimisation!
Have fun! Bye! ;-)
Download the toolchain here.
Download POT 5.6.0-beta1 for Raspbian Jessie Lite Pi3 here (md5: 0eec41ef02e9369fc7e569030b8ff868).
A few months back I decided to try to implement hardware decoding in WebKit. Unfortunately this task is always pretty long and complex for many reasons. I found the time to draft an implementation for WebKit1, which is pretty useless as WebKit1 in Qt is only used outside QML and JS is executed in the main thread. Unfortunately I never found the time to implement this in WebKit2, which runs in QML and is suitable for more fluid UIs. This was the result:
This is how YouTube was running with this implementation:
Now Qt has deprecated QtWebKit and is working heavily on QtWebEngine, which is built on Chromium, so I wanted to try this road. Unfortunately these kind of things always claim a lot of time, and I don't typically have that much, but I was able to start and get something done already.
Writing a complete solution in Chromium to decode and render video takes much time, so I thought of a shortcut: creating a custom VDA (Video Decode Accelerator) that loads the POT library and reusing its entire codebase to implement decode and rendering with little modifications. This proved to be possible and now I get something on the screen.
So, to summarise: a little patch to Chromium is needed to create a VDA that dynamically loads POT library into memory and uses it with a common interface. Data and calls are translated to POT structures and are sent to POT, which then processes the buffers properly. The result of the decode operation is then sent back to the VDA through the same interface and textures are then sent to Chromium for rendering.
Still many problems remain open, there is much to be done yet as you can see from the video, but something is drawn. Have a look at the demo:
A new version of Qt is out and some interesting changes are being developed. Interesting work was done on WebEngine. This package contains an experimental build of Qt 5.8.0-rc1 with the POT driver to provide hardware acceleration to video decoding and rendering. POT was updated to build on the new plugin architecture of Qt and includes ffmpeg 3.2.2. For info on the content of the package refer to this article. The youtube player still runs on WebKit 1 and so is still hardly useful.
Please report issues on github. Have fun ;-)
Download POT 5.5.0-beta1 for Raspbian Jessie Lite Pi2 (also tested on Pi3) here (md5: 7ca1b961ab8c70176f6aeb13d0bc4f9a).
The latest version of POT (5.3.0-beta1) implements some kind of buffering which makes it also possible to do some streaming. ffmpeg provides what is needed to work on protocols like HTTP. This video is a little demo showing how simple it is to stream from VLC on HTTP, remuxing to mpeg-ts, and decoding/rendering on Raspberry Pi in an accelerated Qt application using POT.
Such feature can be useful when dealing with streams coming from many sources, like an IP camera for instance.
Have fun! Bye ;-)
This is an experimental version of POT that includes a few new features. The driver now includes an experimental version of playback of remote content including proper buffering of the media and the QtWebKit library is able to play videos using proper hardware acceleration.
Features
The driver can now be used through the regular Qt/QML interface to play content over protocols different from file://. I only tested HTTP but others should be available.
This new feature allowed me to also implement a new player in QtWebKit using the POT driver to stream content to the regular Qt QWebView. Unfortunately I could only find the time to implement this in the WebKit 1 branch, not in WebKit 2 which requires more work. This makes the implementation hardly useful, but it is a start. If anyone wanted to contribute this is where to start. This video shows the new implementation in action:
As QtWebKit now uses the GPU properly, I could also implement a YouTube player app using the same QWebView with just a few lines of code of regular Qt code:
In the video you can also see an application I wrote to implement a YouTube fullscreen player using the YouTube iframe API. Such a sample app is also included in the package.
Of course the real target would be to implement this in QtWebEngine. I was never able to find a version of QtWebEngine with video acceleration for Pi. Anyone who knows one?
Package
The package now includes:
build_valgrind.tar: a build of valgrind to analyse your code;
libpiomxtextures_qmlutils.so: QML plugin to provide a video probe in QML;
piomxtextures_browser_we: an application that tests the QtWebEngine module (run passing a URL, ./piomxtextures_browser_we http://www.youtube.com);
piomxtextures_browser_wk: an application that tests the QtWebKit module:
you can run on WebKit 1 like ./piomxtextures_browser_wk http://www.youtube.com;
or you can run on WebKit 2 like ./piomxtextures_browser_wk --wk2 http://www.youtube.com;
piomxtextures_pocplayer: a sample QML player;
piomxtextures_pocplayer_yt: a youtube sample player (run passing a video ID like ./piomxtextures_pocplayer_yt 71UvXMzVgx4);
qtdeps.tar: the usual libs needed by Qt and POT;
Qt-rasp2-5.6.0.tar: a build of Qt 5.6.0 stable including:
regular Qt 5.6.0 modules;
untested bluetooth module including BLE support;
untested MySQL plugin;
untested QtFtp module;
untested Qt hat tools module;
QtWebEngine;
QtWebKit.
Please report any bug you find on github. Don't use in production. Sources will be available shortly. Have fun! ;-)
Download POT 5.3.0-beta1 for Raspbian Jessie Lite Pi2 (also tested on Pi3) here (md5: edbe0ad2a552a5c8280e3876d16b237d).
This is a follow-up to the image provided in this article. The concept is identical, but in this case this firmware is smaller, it is based on Raspbian Jessie Lite and includes updated software. The firmware was 1.9GB while now is 759MB. This is the current setup (but I encourage to update to the latest possible firmware):
pi@raspberrypi:~ $ vcgencmd version Feb 25 2016 14:25:47 Copyright (c) 2012 Broadcom version dea971b793dd6cf89133ede5a8362eb77e4f4ade (clean) (release) pi@raspberrypi:~ $ uname -a Linux raspberrypi 4.1.18-v7+ #846 SMP Thu Feb 25 14:22:53 GMT 2016 armv7l GNU/Linux
Download the image here (torrent) (md5: 06d75d03f63674350be6cb807850bf09).
(Please be patient while downloading and decompressing)
The image includes:
Qt 5.6.0-rc1: built for armv7 with neon optimisations, including support for touch screens, gstreamer, libinput, tslib, and Bluetooth/Bluetooth BLE using bluez (complete configuration below).
QtWebKit 5.6.0: includes WebKit 1 and WebKit 2.
QtWebEngine 5.6.0-rc1: multimedia only based on ffmpeg.
Dependant libs: evdev, icu (not the one provided by raspbian), mtdev, libts.
Samples: samples from the repo to test POT.
POT: POT version 5.2.1 including ffmpeg 2.8.6.
Samples
To have a quick look at multimedia in Qt/QML test with this commands:
This is an experimental build of POT 5.1.0 including support for remote streaming. I tested it for HTTP/HTTPS with YouTube but it should work with other protocols supported by ffmpeg as well.
In case someone is interested in testing it please let me know. This is a build for Pi2 on Qt 5.6.0-beta1. To quickly test you can use POCPlayer:
Download PiOmxTextures 5.2.0 for Raspbian Jessie Pi2 here (extraction code is: cd4f).
Please file bug reports in case of issues and have fun! Bye ;-)
NOTE: If you need to test the samples please keep in mind that Qt 5.6.0 seems to have changed the order of the params to provide to qmlscene: the qml file must appear at the end.
This is a experimental build of POT 5.1.0 running on Qt 5.6.0-beta1 built for Raspbian Jessie for armv7 (Raspberry Pi 2). Do not use in production. The package includes:
Qt 5.6.0-beta1 optimised for armv7.
QtConnectivity including BLE support.
QtWebKit including both WebKit 1 and WebKit 2. Video decoding is implemented in WebKit with gstreamer (won't even do 720p properly).
QtWebEngine with unaccelerated video decoding using ffmpeg.
QtFtp.
valgrind-3.12.0.SVN: using it on PiOmxTextures requires to disable libarmmem.so in /etc/ld.so.preload. Also the default valgrind in Raspbian was not working for me.
POT 5.1.0 for accelerated playback in Qt.
No accelerated video playback is implemented in QtWebKit nor QtWebEngine in this build. CSS animations instead are properly accelerated when OpenGL rendering is used.
Download PiOmxTextures 5.1.0 for Raspbian Jessie Pi2 here (extraction code is: 508d).
I tried to do this a few months ago actually, but I got lost in the huge amount of code and architecture variants of WebKit. Now I convinced myself it would be too interesting to see it working. I don't have much spare time, so I wanted to start with the path of least-resistance and then, depending on the time available, continue climbing. This activity is very time consuming, so I do not know where I'll get.
This is however the first stage: using POT to implement 1080p hardware accelerated zero-copy video playback in the Qt port of WebKit1. No QML here. Of no use, but it somehow works. I read there are far better results out there, but I learned much by doing this, and I had fun :-)
Of course, as you can see, there is much much work being done in the UI thread. JavaScript is mostly run in the main thread, making the framerate pretty bad on heavy websites. That is why WebKit2 is there after all :-)
This is an image of Raspbian Jessie ready to test (and use) the Qt Multimedia Backend POT. This image is optimised for maximum performance (see the demos below). This is the current setup:
pi@raspberrypi ~ $ sudo /opt/vc/bin/vcgencmd version Nov 11 2015 21:31:07 Copyright (c) 2012 Broadcom version 54011a8ad59a9ae1c40bd07cddd9bcf90e779b66 (clean) (release) pi@raspberrypi ~ $ uname -a Linux raspberrypi 4.1.13-v7+ #826 SMP PREEMPT Fri Nov 13 20:19:03 GMT 2015 armv7l GNU/Linux pi@raspberrypi ~ $ lsb_release -a No LSB modules are available. Distributor ID: Raspbian Description: Raspbian GNU/Linux 8.0 (jessie) Release: 8.0 Codename: jessie
Qt 5.5.1 built for armv7 with neon optimisations, including support for touch screens, gstreamer, libinput, X11 (no multimedia), tslib, and Bluetooth/Bluetooth BLE using bluez (complete configuration below). Modules provided include QtWebKit (no multimedia support).
Dependant libs: evdev, icu (not the one provided by raspbian), mtdev, libts.
POCPlayer and QML samples to show the performance of the backend and test bugs.
POT library including ffmpeg 2.7.2.
Everything is built for armv7 including neon optimisations. To quickly test the performance there are a few things you can launch:
This is the result you should expect with a proper overclocking:
It will show you a continuous animation while running two 720p videos concurrently. To improve the frame rate of the animations you can also overclock your Pi2 (see below).
Other samples to test the features of the plugin are stored in /home/pi/samples:
Another interesting test is running qmlvideofx example included in the Qt sources. This will show you the perfomance of the effects applied on the video using shaders. You can find an executable in /usr/local/Qt-rasp2-5.5.1/examples/multimedia/video/qmlvideofx/qmlvideofx. This shows approximately what you should expect (the video shows the result on Pi1, Pi2 is just a little better):
Overclocking
The Pi can be overclocked by tuning parameters in /boot/config.txt. I provided a configuration that I find useful:
#force_turbo=1 # Voids Warranty!
#boot_delay=1 # Helps to avoid sdcard corruption when force_turbo is enabled.
#arm_freq=1100
#sdram_freq=450
#core_freq=550
#over_voltage=4
#temp_limit=80 # Will throttle to default clock speed if hit.
In the image this is all disabled. If you intent to enable please keep in mind I don't take responsibility for any consequence. Please refer to the documentation for this procedure.
Have fun! Bye ;-)
Qt Configuration
Configure summary
Building on: linux-g++ (x86_64, CPU features: mmx sse sse2)
Building for: devices/linux-rasp-pi2-g++ (arm, CPU features: neon)
Platform notes:
- Also available for Linux: linux-clang linux-kcc linux-icc linux-cxx
As someone asked, I tried to port POT to the recently released Raspbian Jessie. Everything works pretty much the same, except for a few changes. Unfortunately I needed to rebuild icu as well, the system build seemed not compatible as the compiler was updated to 4.9 in jessie. The new icu is provided in qtdeps inside the package. Everything seems to work as expected. Please report to github issues. For the moment only a build for Pi2 is provided.
Download PiOmxTextures 5.0.0 for Pi2 (Raspbian Jessie) here (extraction code is: 1e45).
NOTE:
The Qt build includes support for XCB platform but I never tested POT on XCB. This is the configuration of the Qt 5.5.0 build included in the package:
Configure summary Building on: linux-g++ (x86_64, CPU features: mmx sse sse2) Building for: devices/linux-rasp-pi2-g++ (arm, CPU features: neon) Platform notes: - Also available for Linux: linux-kcc linux-icc linux-cxx qmake vars .......... styles += mac fusion windows QT_CFLAGS_GLIB = -pthread -I/opt/rpi/sysroot/usr/include/glib-2.0 -I/opt/rpi/sysroot/usr/lib/arm-linux-gnueabihf/glib-2.0/include QT_LIBS_GLIB = -L/opt/rpi/sysroot/usr/lib/arm-linux-gnueabihf -lgthread-2.0 -pthread -lglib-2.0 QT_CFLAGS_PULSEAUDIO = -D_REENTRANT -I/opt/rpi/sysroot/usr/include/glib-2.0 -I/opt/rpi/sysroot/usr/lib/arm-linux-gnueabihf/glib-2.0/include QT_LIBS_PULSEAUDIO = -L/opt/rpi/sysroot/usr/lib/arm-linux-gnueabihf -lpulse-mainloop-glib -lpulse -lglib-2.0 QMAKE_CFLAGS_FONTCONFIG = -I/opt/rpi/sysroot/usr/include/freetype2 QMAKE_LIBS_FONTCONFIG = -L/opt/rpi/sysroot/usr/lib/arm-linux-gnueabihf -lfontconfig -lfreetype QMAKE_INCDIR_LIBUDEV = QMAKE_LIBS_LIBUDEV = -L/opt/rpi/sysroot/usr/lib/arm-linux-gnueabihf -ludev QMAKE_INCDIR_XKBCOMMON_EVDEV = QMAKE_LIBS_XKBCOMMON_EVDEV = -L/opt/rpi/sysroot/usr/lib/arm-linux-gnueabihf -lxkbcommon QMAKE_LIBINPUT_VERSION_MAJOR = 1 QMAKE_LIBINPUT_VERSION_MINOR = 0 QMAKE_INCDIR_LIBINPUT = /opt/rpi/sysroot/home/pi/qtdeps/include QMAKE_LIBS_LIBINPUT = -L/opt/rpi/sysroot/home/pi/qtdeps/lib -linput QMAKE_LIBXI_VERSION_MAJOR = 1 QMAKE_LIBXI_VERSION_MINOR = 7 QMAKE_LIBXI_VERSION_PATCH = 4 QMAKE_X11_PREFIX = /usr QMAKE_XKB_CONFIG_ROOT = /usr/share/X11/xkb QMAKE_CFLAGS_XCB = QMAKE_LIBS_XCB = -L/opt/rpi/sysroot/usr/lib/arm-linux-gnueabihf -lxcb-sync -lxcb-xfixes -lxcb-render -lxcb-randr -lxcb-image -lxcb-shm -lxcb-keysyms -lxcb-icccm -lxcb-shape -lxcb INCLUDEPATH += "/opt/rpi/sysroot/home/pi/qtdeps/include" LIBS += -L"/opt/rpi/sysroot/home/pi/qtdeps/lib" sql-drivers = sql-plugins = sqlite qmake switches ......... Build options: Configuration .......... accessibility accessibility-atspi-bridge alsa audio-backend c++11 clock-gettime clock-monotonic compile_examples concurrent cross_compile cups dbus egl eglfs eglfs_brcm enable_new_dtags evdev eventfd fontconfig full-config getaddrinfo getifaddrs glib gstreamer-1.0 harfbuzz iconv icu inotify ipv6ifname large-config largefile libinput libproxy libudev linuxfb medium-config minimal-config mremap mtdev neon nis opengl opengles2 openssl pcre png posix_fallocate pulseaudio qpa qpa reduce_exports release rpath shared small-config system-freetype system-png system-zlib tslib use_gold_linker xcb xcb-glx xcb-plugin xcb-render xcb-xlib xinput2 xkbcommon-evdev xkbcommon-qt xlib xrender Build parts ............ libs examples Mode ................... release Using sanitizer(s)...... none Using C++11 ............ yes Using gold linker....... yes Using new DTAGS ........ yes Using PCH .............. no Target compiler supports: Neon ................. yes Qt modules and options: Qt D-Bus ............... yes (loading dbus-1 at runtime) Qt Concurrent .......... yes Qt GUI ................. yes Qt Widgets ............. yes Large File ............. yes QML debugging .......... yes Use system proxies ..... no Support enabled for: Accessibility .......... yes ALSA ................... yes CUPS ................... yes Evdev .................. yes FontConfig ............. yes FreeType ............... yes (system library) Glib ................... yes GStreamer .............. yes (1.0) GTK theme .............. no HarfBuzz ............... yes (bundled copy) Iconv .................. yes ICU .................... yes Image formats: GIF .................. yes (plugin, using bundled copy) JPEG ................. yes (plugin, using bundled copy) PNG .................. yes (in QtGui, using system library) journald ............... no libinput................ yes mtdev .................. yes (system library) Networking: getaddrinfo .......... yes getifaddrs ........... yes IPv6 ifname .......... yes libproxy.............. yes OpenSSL .............. yes (loading libraries at run-time) NIS .................... yes OpenGL / OpenVG: EGL .................. yes OpenGL ............... yes (OpenGL ES 2.0+) OpenVG ............... no PCRE ................... yes (bundled copy) pkg-config ............. yes PulseAudio ............. yes QPA backends: DirectFB ............. no EGLFS ................ yes EGLFS i.MX6....... . no EGLFS KMS .......... no EGLFS Mali ......... no EGLFS Raspberry Pi . yes EGLFS X11 .......... no LinuxFB .............. yes XCB .................. yes (system library) EGL on X ........... no GLX ................ yes MIT-SHM ............ yes Xcb-Xlib ........... yes Xcursor ............ yes (loaded at runtime) Xfixes ............. yes (loaded at runtime) Xi ................. no Xi2 ................ yes Xinerama ........... yes (loaded at runtime) Xrandr ............. yes (loaded at runtime) Xrender ............ yes XKB ................ yes XShape ............. yes XSync .............. yes XVideo ............. yes Session management ..... yes SQL drivers: DB2 .................. no InterBase ............ no MySQL ................ no OCI .................. no ODBC ................. no PostgreSQL ........... no SQLite 2 ............. no SQLite ............... yes (plugin, using bundled copy) TDS .................. no tslib .................. yes udev ................... yes xkbcommon-x11........... yes (bundled copy, XKB config root: /usr/share/X11/xkb) xkbcommon-evdev......... yes zlib ................... yes (system library)
This package includes mostly bug fixes over beta2. You can consider it stable in the sense that no one complained about severe bugs for a few days. You always have to test before going to production. Please fil bugs to github if needed. Refer to previous articles on how to use.
Download PiOmxTextures 5.0.0 for Pi1 here (extraction code is: 3466). Download PiOmxTextures 5.0.0 for Pi2 here (extraction code is: ddd0).
A few weeks ago I came across the interesting task of adding deinterlacing to the POT rendering pipeline.
I had never looked into deinterlacing with POT, but the code was actually present in the repo. By switching the proper enum in the sources deinterlacing could be enabled. The resulting performance was however pretty bad when applied to 1080i videos.
This demo is, instead, the result of a few quick patches:
After a quick investigation it seemed strangely clear that the reason was too much work being done by the VPU. When running without deinterlacing, this is approximately the situation of the VPUs as reported by vcdbg:
[...] set yrange [-0.1:30.1] set multiplot set title 'Core 1 (VPU0)' set ytics ("Idle" 0, "Int timer0" 1, "Int timer2" 2, "Int codec0" 3, "Int codec2" 4, "Int 3d" 5, "Int msync2" 6, "Int dma3" 7, "Int 0x5e" 8, "Int 0x5f" 9, "Int scaler" 10, "DISPSERVX" 11, "SysTimer" 12, "Dispmanx Ch1 EO" 13, "AUDSRV" 14, "HDMI_HOTPLUG_TM" 15, "HDMI TASK" 16, "KHRN_S" 17, "powerman" 18, "temp_check" 19, "v3d_gfxh16_thre" 20, "VCHIQ-0" 21, "VCHIQr-0" 22, "ILCS_VC" 23, "ILClock" 24, "ILVDecode" 25, "ILADecode" 26, "ILEGLRender" 27, "ILAMixer" 28, "ILVScheduler" 29, "ILARender" 30) plot 'task.dat' index 30 with boxes lt 5 title 'ILARender 3.39%', 'task.dat' index 29 with boxes lt 1 title 'ILVScheduler 0.26%', 'task.dat' index 28 with boxes lt 2 title 'ILAMixer 1.19%', 'task.dat' index 27 with boxes lt 3 title 'ILEGLRender 46.3%', 'task.dat' index 26 with boxes lt 4 title 'ILADecode 0.64%', 'task.dat' index 25 with boxes lt 11 title 'ILVDecode 0.36%', 'task.dat' index 24 with boxes lt 12 title 'ILClock 0.18%', 'task.dat' index 23 with boxes lt 13 title 'ILCS_VC 0.30%', 'task.dat' index 22 with boxes lt 14 title 'VCHIQr-0 0.01%', 'task.dat' index 21 with boxes lt 15 title 'VCHIQ-0 0.30%', 'task.dat' index 20 with boxes lt 10 title 'v3d_gfxh16_thre 0.02%', 'task.dat' index 19 with boxes lt 9 title 'temp_check 0.02%', 'task.dat' index 18 with boxes lt 5 title 'powerman 0.15%', 'task.dat' index 17 with boxes lt 1 title 'KHRN_S 5.13%', 'task.dat' index 16 with boxes lt 2 title 'HDMI TASK 0.04%', 'task.dat' index 15 with boxes lt 3 title 'HDMI_HOTPLUG_TM 0.01%', 'task.dat' index 14 with boxes lt 4 title 'AUDSRV 0.08%', 'task.dat' index 13 with boxes lt 11 title 'Dispmanx Ch1 EO 0.08%', 'task.dat' index 12 with boxes lt 12 title 'SysTimer 0.16%', 'task.dat' index 11 with boxes lt 13 title 'DISPSERVX 0.14%', 'task.dat' using 1:2:($3/2) index 10 with xerrorbars lt 14 title 'Int scaler 0.07%', 'task.dat' using 1:2:($3/2) index 9 with xerrorbars lt 15 title 'Int 0x5f 0.02%', 'task.dat' using 1:2:($3/2) index 8 with xerrorbars lt 10 title 'Int 0x5e 0.10%', 'task.dat' using 1:2:($3/2) index 7 with xerrorbars lt 9 title 'Int dma3 0.17%', 'task.dat' using 1:2:($3/2) index 6 with xerrorbars lt 5 title 'Int msync2 0.00%', 'task.dat' using 1:2:($3/2) index 5 with xerrorbars lt 1 title 'Int 3d 0.07%', 'task.dat' using 1:2:($3/2) index 4 with xerrorbars lt 2 title 'Int codec2 0.11%', 'task.dat' using 1:2:($3/2) index 3 with xerrorbars lt 3 title 'Int codec0 0.02%', 'task.dat' using 1:2:($3/2) index 2 with xerrorbars lt 4 title 'Int timer2 0.10%', 'task.dat' using 1:2:($3/2) index 1 with xerrorbars lt 11 title 'Int timer0 0.03%', 0 title 'Idle 40.4%' set nomultiplot pause -1 'Hit return to continue' set yrange [-0.1:3.1] set multiplot set title 'Core 2 (VPU1)' set ytics ("Idle" 0, "Int msync3" 1, "khrn_llat_threa" 2, "H264#0 Outer" 3) plot 'task.dat' index 34 with boxes lt 5 title 'H264#0 Outer 1.96%', 'task.dat' index 33 with boxes lt 1 title 'khrn_llat_threa 1.48%', 'task.dat' using 1:2:($3/2) index 32 with xerrorbars lt 2 title 'Int msync3 0.02%', 0 title 'Idle 96.5%' set nomultiplot pause -1 'Hit return to continue' [...]
By forcing deinterlacing we get this:
[...] set yrange [-0.1:31.1] set multiplot set title 'Core 1 (VPU0)' set ytics ("Idle" 0, "Int timer0" 1, "Int timer2" 2, "Int codec0" 3, "Int codec2" 4, "Int 3d" 5, "Int msync2" 6, "Int dma3" 7, "Int 0x5e" 8, "Int 0x5f" 9, "Int scaler" 10, "DISPSERVX" 11, "SysTimer" 12, "Dispmanx Ch1 EO" 13, "AUDSRV" 14, "HDMI_HOTPLUG_TM" 15, "HDMI TASK" 16, "KHRN_S" 17, "mbox_read" 18, "powerman" 19, "temp_check" 20, "v3d_gfxh16_thre" 21, "VCHIQ-0" 22, "VCHIQr-0" 23, "ILCS_VC" 24, "ILClock" 25, "ILVDecode" 26, "ILADecode" 27, "ILAMixer" 28, "ILEGLRender" 29, "ILVScheduler" 30, "ILARender" 31) plot 'task.dat' index 31 with boxes lt 5 title 'ILARender 3.15%', 'task.dat' index 30 with boxes lt 1 title 'ILVScheduler 0.39%', 'task.dat' index 29 with boxes lt 2 title 'ILEGLRender 84.9%', 'task.dat' index 28 with boxes lt 3 title 'ILAMixer 1.42%', 'task.dat' index 27 with boxes lt 4 title 'ILADecode 0.95%', 'task.dat' index 26 with boxes lt 11 title 'ILVDecode 0.35%', 'task.dat' index 25 with boxes lt 12 title 'ILClock 0.36%', 'task.dat' index 24 with boxes lt 13 title 'ILCS_VC 0.36%', 'task.dat' index 23 with boxes lt 14 title 'VCHIQr-0 0.02%', 'task.dat' index 22 with boxes lt 15 title 'VCHIQ-0 0.34%', 'task.dat' index 21 with boxes lt 10 title 'v3d_gfxh16_thre 0.02%', 'task.dat' index 20 with boxes lt 9 title 'temp_check 0.02%', 'task.dat' index 19 with boxes lt 5 title 'powerman 0.14%', 'task.dat' index 18 with boxes lt 1 title 'mbox_read 0.14%', 'task.dat' index 17 with boxes lt 2 title 'KHRN_S 5.10%', 'task.dat' index 16 with boxes lt 3 title 'HDMI TASK 0.03%', 'task.dat' index 15 with boxes lt 4 title 'HDMI_HOTPLUG_TM 0.01%', 'task.dat' index 14 with boxes lt 11 title 'AUDSRV 0.08%', 'task.dat' index 13 with boxes lt 12 title 'Dispmanx Ch1 EO 0.08%', 'task.dat' index 12 with boxes lt 13 title 'SysTimer 0.15%', 'task.dat' index 11 with boxes lt 14 title 'DISPSERVX 0.15%', 'task.dat' using 1:2:($3/2) index 10 with xerrorbars lt 15 title 'Int scaler 0.07%', 'task.dat' using 1:2:($3/2) index 9 with xerrorbars lt 10 title 'Int 0x5f 0.03%', 'task.dat' using 1:2:($3/2) index 8 with xerrorbars lt 9 title 'Int 0x5e 0.11%', 'task.dat' using 1:2:($3/2) index 7 with xerrorbars lt 5 title 'Int dma3 0.19%', 'task.dat' using 1:2:($3/2) index 6 with xerrorbars lt 1 title 'Int msync2 0.01%', 'task.dat' using 1:2:($3/2) index 5 with xerrorbars lt 2 title 'Int 3d 0.06%', 'task.dat' using 1:2:($3/2) index 4 with xerrorbars lt 3 title 'Int codec2 0.13%', 'task.dat' using 1:2:($3/2) index 3 with xerrorbars lt 4 title 'Int codec0 0.01%', 'task.dat' using 1:2:($3/2) index 2 with xerrorbars lt 11 title 'Int timer2 0.11%', 'task.dat' using 1:2:($3/2) index 1 with xerrorbars lt 12 title 'Int timer0 0.04%', 0 title 'Idle 1.01%' set nomultiplot pause -1 'Hit return to continue' set yrange [-0.1:6.1] set multiplot set title 'Core 2 (VPU1)' set ytics ("Idle" 0, "Int msync3" 1, "SysTimer2" 2, "khrn_llat_threa" 3, "H264#0 Outer" 4, "ILARender" 5, "ILImageFX" 6) plot 'task.dat' index 38 with boxes lt 5 title 'ILImageFX 21.5%', 'task.dat' index 37 with boxes lt 1 title 'ILARender 0.00%', 'task.dat' index 36 with boxes lt 2 title 'H264#0 Outer 1.58%', 'task.dat' index 35 with boxes lt 3 title 'khrn_llat_threa 1.54%', 'task.dat' index 34 with boxes lt 4 title 'SysTimer2 0.02%', 'task.dat' using 1:2:($3/2) index 33 with xerrorbars lt 11 title 'Int msync3 0.02%', 0 title 'Idle 75.2%' set nomultiplot pause -1 'Hit return to continue' [...]
It is clear that one VPU is completely saturated by the EGL component.
By discussing this issue here I added a couple of patches to POT that now allows to reduce the resulting frame rate of the deinterlaced stream and to move the deinterlacing algorithm from the VPU to the QPU by setting three environment variables before the startup:
[...] set yrange [-0.1:30.1] set multiplot set title 'Core 1 (VPU0)' set ytics ("Idle" 0, "Int timer0" 1, "Int timer2" 2, "Int codec0" 3, "Int codec2" 4, "Int 3d" 5, "Int msync2" 6, "Int dma3" 7, "Int 0x5e" 8, "Int 0x5f" 9, "Int scaler" 10, "DISPSERVX" 11, "SysTimer" 12, "Dispmanx Ch1 EO" 13, "AUDSRV" 14, "HDMI_HOTPLUG_TM" 15, "HDMI TASK" 16, "KHRN_S" 17, "powerman" 18, "temp_check" 19, "v3d_gfxh16_thre" 20, "VCHIQ-0" 21, "VCHIQr-0" 22, "ILCS_VC" 23, "ILClock" 24, "ILVDecode" 25, "ILADecode" 26, "ILAMixer" 27, "ILEGLRender" 28, "ILVScheduler" 29, "ILARender" 30) plot 'task.dat' index 30 with boxes lt 5 title 'ILARender 3.81%', 'task.dat' index 29 with boxes lt 1 title 'ILVScheduler 0.26%', 'task.dat' index 28 with boxes lt 2 title 'ILEGLRender 55.3%', 'task.dat' index 27 with boxes lt 3 title 'ILAMixer 1.49%', 'task.dat' index 26 with boxes lt 4 title 'ILADecode 0.91%', 'task.dat' index 25 with boxes lt 11 title 'ILVDecode 0.39%', 'task.dat' index 24 with boxes lt 12 title 'ILClock 0.19%', 'task.dat' index 23 with boxes lt 13 title 'ILCS_VC 0.37%', 'task.dat' index 22 with boxes lt 14 title 'VCHIQr-0 0.02%', 'task.dat' index 21 with boxes lt 15 title 'VCHIQ-0 0.38%', 'task.dat' index 20 with boxes lt 10 title 'v3d_gfxh16_thre 0.02%', 'task.dat' index 19 with boxes lt 9 title 'temp_check 0.02%', 'task.dat' index 18 with boxes lt 5 title 'powerman 0.15%', 'task.dat' index 17 with boxes lt 1 title 'KHRN_S 4.95%', 'task.dat' index 16 with boxes lt 2 title 'HDMI TASK 0.04%', 'task.dat' index 15 with boxes lt 3 title 'HDMI_HOTPLUG_TM 0.01%', 'task.dat' index 14 with boxes lt 4 title 'AUDSRV 0.08%', 'task.dat' index 13 with boxes lt 11 title 'Dispmanx Ch1 EO 0.08%', 'task.dat' index 12 with boxes lt 12 title 'SysTimer 0.17%', 'task.dat' index 11 with boxes lt 13 title 'DISPSERVX 0.13%', 'task.dat' using 1:2:($3/2) index 10 with xerrorbars lt 14 title 'Int scaler 0.08%', 'task.dat' using 1:2:($3/2) index 9 with xerrorbars lt 15 title 'Int 0x5f 0.03%', 'task.dat' using 1:2:($3/2) index 8 with xerrorbars lt 10 title 'Int 0x5e 0.12%', 'task.dat' using 1:2:($3/2) index 7 with xerrorbars lt 9 title 'Int dma3 0.18%', 'task.dat' using 1:2:($3/2) index 6 with xerrorbars lt 5 title 'Int msync2 0.01%', 'task.dat' using 1:2:($3/2) index 5 with xerrorbars lt 1 title 'Int 3d 0.07%', 'task.dat' using 1:2:($3/2) index 4 with xerrorbars lt 2 title 'Int codec2 0.13%', 'task.dat' using 1:2:($3/2) index 3 with xerrorbars lt 3 title 'Int codec0 0.02%', 'task.dat' using 1:2:($3/2) index 2 with xerrorbars lt 4 title 'Int timer2 0.10%', 'task.dat' using 1:2:($3/2) index 1 with xerrorbars lt 11 title 'Int timer0 0.04%', 0 title 'Idle 30.3%' set nomultiplot pause -1 'Hit return to continue' set yrange [-0.1:5.1] set multiplot set title 'Core 2 (VPU1)' set ytics ("Idle" 0, "Int msync3" 1, "SysTimer2" 2, "khrn_llat_threa" 3, "H264#0 Outer" 4, "ILImageFX" 5) plot 'task.dat' index 36 with boxes lt 5 title 'ILImageFX 15.7%', 'task.dat' index 35 with boxes lt 1 title 'H264#0 Outer 1.95%', 'task.dat' index 34 with boxes lt 2 title 'khrn_llat_threa 1.43%', 'task.dat' index 33 with boxes lt 3 title 'SysTimer2 0.03%', 'task.dat' using 1:2:($3/2) index 32 with xerrorbars lt 4 title 'Int msync3 0.02%', 0 title 'Idle 80.8%' set nomultiplot pause -1 'Hit return to continue' [...]
And now enabling QPU deinterlacing:
[...] set yrange [-0.1:31.1] set multiplot set title 'Core 1 (VPU0)' set ytics ("Idle" 0, "Int timer0" 1, "Int timer2" 2, "Int codec0" 3, "Int codec2" 4, "Int 3d" 5, "Int msync2" 6, "Int dma3" 7, "Int 0x5e" 8, "Int 0x5f" 9, "Int scaler" 10, "DISPSERVX" 11, "SysTimer" 12, "Dispmanx Ch1 EO" 13, "AUDSRV" 14, "HDMI_HOTPLUG_TM" 15, "HDMI TASK" 16, "KHRN_S" 17, "mbox_read" 18, "powerman" 19, "temp_check" 20, "v3d_gfxh16_thre" 21, "VCHIQ-0" 22, "VCHIQr-0" 23, "ILCS_VC" 24, "ILClock" 25, "ILVDecode" 26, "ILADecode" 27, "ILEGLRender" 28, "ILAMixer" 29, "ILVScheduler" 30, "ILARender" 31) plot 'task.dat' index 31 with boxes lt 5 title 'ILARender 3.67%', 'task.dat' index 30 with boxes lt 1 title 'ILVScheduler 0.26%', 'task.dat' index 29 with boxes lt 2 title 'ILAMixer 1.57%', 'task.dat' index 28 with boxes lt 3 title 'ILEGLRender 50.3%', 'task.dat' index 27 with boxes lt 4 title 'ILADecode 0.97%', 'task.dat' index 26 with boxes lt 11 title 'ILVDecode 0.39%', 'task.dat' index 25 with boxes lt 12 title 'ILClock 0.19%', 'task.dat' index 24 with boxes lt 13 title 'ILCS_VC 0.37%', 'task.dat' index 23 with boxes lt 14 title 'VCHIQr-0 0.02%', 'task.dat' index 22 with boxes lt 15 title 'VCHIQ-0 0.34%', 'task.dat' index 21 with boxes lt 10 title 'v3d_gfxh16_thre 0.02%', 'task.dat' index 20 with boxes lt 9 title 'temp_check 0.02%', 'task.dat' index 19 with boxes lt 5 title 'powerman 0.15%', 'task.dat' index 18 with boxes lt 1 title 'mbox_read 0.13%', 'task.dat' index 17 with boxes lt 2 title 'KHRN_S 4.96%', 'task.dat' index 16 with boxes lt 3 title 'HDMI TASK 0.04%', 'task.dat' index 15 with boxes lt 4 title 'HDMI_HOTPLUG_TM 0.01%', 'task.dat' index 14 with boxes lt 11 title 'AUDSRV 0.08%', 'task.dat' index 13 with boxes lt 12 title 'Dispmanx Ch1 EO 0.08%', 'task.dat' index 12 with boxes lt 13 title 'SysTimer 0.17%', 'task.dat' index 11 with boxes lt 14 title 'DISPSERVX 0.14%', 'task.dat' using 1:2:($3/2) index 10 with xerrorbars lt 15 title 'Int scaler 0.08%', 'task.dat' using 1:2:($3/2) index 9 with xerrorbars lt 10 title 'Int 0x5f 0.03%', 'task.dat' using 1:2:($3/2) index 8 with xerrorbars lt 9 title 'Int 0x5e 0.12%', 'task.dat' using 1:2:($3/2) index 7 with xerrorbars lt 5 title 'Int dma3 0.19%', 'task.dat' using 1:2:($3/2) index 6 with xerrorbars lt 1 title 'Int msync2 0.01%', 'task.dat' using 1:2:($3/2) index 5 with xerrorbars lt 2 title 'Int 3d 0.07%', 'task.dat' using 1:2:($3/2) index 4 with xerrorbars lt 3 title 'Int codec2 0.14%', 'task.dat' using 1:2:($3/2) index 3 with xerrorbars lt 4 title 'Int codec0 0.02%', 'task.dat' using 1:2:($3/2) index 2 with xerrorbars lt 11 title 'Int timer2 0.10%', 'task.dat' using 1:2:($3/2) index 1 with xerrorbars lt 12 title 'Int timer0 0.04%', 0 title 'Idle 35.3%' set nomultiplot pause -1 'Hit return to continue' set yrange [-0.1:5.1] set multiplot set title 'Core 2 (VPU1)' set ytics ("Idle" 0, "Int msync3" 1, "SysTimer2" 2, "khrn_llat_threa" 3, "H264#0 Outer" 4, "ILImageFX" 5) plot 'task.dat' index 37 with boxes lt 5 title 'ILImageFX 15.1%', 'task.dat' index 36 with boxes lt 1 title 'H264#0 Outer 2.05%', 'task.dat' index 35 with boxes lt 2 title 'khrn_llat_threa 1.42%', 'task.dat' index 34 with boxes lt 3 title 'SysTimer2 0.03%', 'task.dat' using 1:2:($3/2) index 33 with xerrorbars lt 4 title 'Int msync3 0.02%', 0 title 'Idle 81.3%' set nomultiplot pause -1 'Hit return to continue' [...]
so a very small, but still positive, impact.
Being able to deinterlace 1080i may be useful when working with DVB-T/S for instance, so someone may be interested in such a feature.
NOTE: Of course to properly analyse the results we should plot multiple samples and get to conclusions accordingly, not just one as I did. However, as time is a little short, I think I'll end here this topic with these rough and approximate values. If anyone wants to go on with a better analysis, the repository will be updated soon.
This package includes mostly bug fixes and some slight improvements over beta 1. Please fil bugs to github if needed. Refer to previous articles on how to use.
Download PiOmxTextures 5.0.0-beta2 for Pi1 here (extraction code is: add4). Download PiOmxTextures 5.0.0-beta2 for Pi2 here (extraction code is: 2ad5).
NOTE: I made a mistake with the above packages. Please download the ones below.
Download PiOmxTextures 5.0.0-beta2 for Pi1 here (extraction code is: 504a).
Download PiOmxTextures 5.0.0-beta2 for Pi2 here (extraction code is: 12e3).
These are two packages containing binaries for PiOmxTextures (POT) for Raspberry Pi 1 and 2 running Raspbian.
The main changes are:
Completely changed approach to GPU buffer and structure allocations. This should avoid interrupting the renderer thread during execution of commands and, in the future, to avoid the need for patching Qt Multimedia.
Reusing textures when possible instead of reallocating when not needed.
Fixed a few bugs reported on github.
Speed up start/stop by removing the need for closing and reopening the streams.
Changed screen update procedure. This seems to improve performance again a bit than before.
Merged changes from omxplayer.
Bump ffmpeg to 2.7.2.
Added QML samples (these are not contained in the package, some are in the github repo).
Package for Pi1 is supposed to work both on Pi1 and Pi2. Package for Pi2 includes armv7 optimizations in ffmpeg, POT and Qt and is therefore only compatible with Pi2. Qt build was unchanged since last build.
Download PiOmxTextures 5.0.0 for Pi1 here.
Download PiOmxTextures 5.0.0 for Pi2 here.
The information provided for previous builds apply.
This is a beta version because I do not have much time to test. So do not use in production, but if you can invest time in testing, please report any bug you find. Me or someone else interested may try to fix.
NOTE: If you develop your own C++ app remember to enable the global shared OpenGL context in Qt. qmlscene seems to already enable that by default.
NOTE: If you need information on how to use this build, please refer to this little procedure.
The first package ran long tests and is supposed to guarantee more stability. Tests showed uptime of POCPlayer (sample media player included in the sources) way over a month uninterruptedly. The second package instead did not run any long test. Please let me know if you notice anything wrong.
Both packages contain:
piomxtextures_pocplayer: sample player which includes tests for animations, loops and commands. Have a look at the sources to understand how it works.
Qt-rasp(2)-5.5.0.tar: contains the build of Qt 5.5.0 with proper CPU optimisations. This also includes the host tools like qmake. rpath is set to /usr/local/Qt-rasp(2)-5.5.0 so you're probably better install it there.
qtdeps.tar: this contains libevdev and libinput which are not available in Raspbian repos. Qt will need these in the runtime lib path to work.
Please, let me know if you encounter issues with the packages or if anything goes wrong.
Bye! ;-)
NOTE: This is not available in github yet. But it will shortly.
*Qt Configuration for Raspberry Pi 1
Configure summary
Building on: linux-g++ (x86_64, CPU features: mmx sse sse2)
Building for: devices/linux-rasp-pi-g++ (arm, CPU features: none detected)
Platform notes:
- Also available for Linux: linux-kcc linux-icc linux-cxx