Wednesday, October 20, 2021

Hardware Accelerated Video with Qt 6 on the Raspberry Pi

Getting hardware acceleration into Qt eglfs is tricky. Doing so on a Raspberry Pi is, unfortunately, still tricky after many years. Qt claimed to have reimplemented the Qt Multimedia module entirely, and one of their target was getting hardware acceleration where possibile. So, I thought I could start with a quick look.

Qt 5

Since Raspberry Pi was born, I had to solve the problem of hardware accelerated video in Qt. At the beginning, I wrote POT (PiOmxTextures) to solve this problem: https://github.com/carlonluca/pot. It used OpenMAX to stream decoded video into an OpenGL texture, which was then showed through a custom backend of Qt Multimedia in Qt 5. This approach worked fine, but won’t work on Pi 4/Qt 6. On the other hand, there is another component in the same repo, that includes a custom Qt Quick item to render video through omxplayer. This is the most performant approach, but has its limitations.

Qt Multimedia

I quickly tested Qt Multimedia in Qt 6 on the rpi. My build from this article should support gstreamer. All I got was a a warning on the console. I didn’t investigate further. Maybe I’ll spend more time on this in the future.

POTVL

As the classical POT is no more usable on Raspberry Pi 4, I started to have a look at POTVL, which is very simple to port to Qt 6. With a small patch, it is possible to build it. You’ll find updates on the repo.

Demo

The video is a 1080p video. As you can see, the framerate is acceptable up to a certain weight of the graphics. It seems that Qt 6 OpenGL backend still is a bit less performant than Qt 5 in this specific demo, as you can see from this test from a previous article:


so the result may even improve in the future.
The benchmark app can be found here: https://github.com/carlonluca/Fall.
Unfortunately POTVL is still not future proof, but it is the simplest and most efficient element to port to Qt 6. I may try something better in the near future.
Have fun! Bye 😉

Friday, October 15, 2021

Isogeometric Analysis: Knot Insertion for NURBS

Introduction

In the classical Finite Element Method (FEM) the domain of the problem is approximated by defining a mesh and basis functions over its elements. More elements for the mesh means more basis functions and therefore larger space where approximated solutions can be found. The procedure of increasing the number of elements in the mesh is known as h-refinement. In Isogeometric Analysis (IGA), no mesh is required as the basis functions used to define the domain are the same basis functions used to define the approximate solution. Nevertheless, in IGA, the solution is still a combination of basis functions, so increasing the number of basis functions is still needed to implement some kind of h-refinement. A good way to do this is through knot insertion into the NURBS. This post also presents demo written in Octave and TypeScript.

You can find the full source code for the algorithms and all the examples in this GitHub repo.

Problem of Knot Insertion

Given the general form of a NURBS curve as described here:

$$\boldsymbol{C}^{w}\left(\xi\right)=\sum_{i=0}^{n}N^{p}_{i}\left(\xi\right)\boldsymbol{P}_i^w$$

on the knot vector:

$$\Xi=\left[\xi_0,...,\xi_m\right]$$

knot insertion is the problem of adding the knot $\bar{\xi}\in\left[\xi_k,\xi_{k+1}\right)$ in the knot vector $\Xi$, so that the following still stands:

$$\bar{\boldsymbol{C}}^w\left(\xi\right)=\sum_{i=0}^{n+1}N_{i}^{p}\left(\xi\right)\bar{\boldsymbol{P}}^{w}_{i}=\boldsymbol{C}^w\left(\xi\right),\forall\xi\in\left[\xi_0,\xi_{m+1}\right]$$

on the knot vector:

$$\Xi=\left[\xi_0,...,\xi_{m+1}\right]$$

where $\bar{\boldsymbol{P}}_i^w$ are the control points of the new curve in the projective space.

Knot insertion is supposed to keep the curve unaltered by adding and modifying control points and knots.

Solutions

The problem can then be solved by finding the $n+1$ values $\bar{\boldsymbol{P}}^w_i$, which can be done solving the linear system in $n+2$ equations:

$$\sum_{i=0}^{n+1}N_{i}^{p}\left(\xi\right)\bar{\boldsymbol{P}}^{w}_{i}=\sum_{i=0}^{n}N^{p}_{i}\left(\xi\right)\boldsymbol{P}^w_{i},\forall\xi\in\left\{\xi_0,...,\xi_{n+1}\right\}$$

There is also a simpler way to insert the knots. It can be shown that:

$$\displaylines{\bar{\boldsymbol{P}}_{i}^{w}=\alpha_{i}\boldsymbol{P}_{i}^{w}+\left(1-\alpha_{i}\right)\boldsymbol{P}_{i-1}^{w},\\
\alpha_{i}=\left\{ \begin{array}{ll} 1, & i\leq k-p\\ \dfrac{\bar{\xi}-\xi_{i}}{\xi_{i+p}-\xi_{i}}, & k-p+1\leq i\leq k\\ 0, & i\geq k+1 \end{array}\right.}$$

This is the technique used in all the implementations in the repo.

Octave Implementation

The Octave implementation includes the computeKnotInsertion script that takes a NURBS as input, together with the knot value to add and returns a new knot vector and control points. This is an example of insertion into a circle:

Knot insertion on a circle with related NURBS basis functions.

The computeSurfKnotInsertion script computes knot insertion for NURBS surfaces instead. With the script, knots can be added to both dimensions. An example is shown below:

Knot insertion on a plate with a hole computed in Octave.

TypeScript Implementation

The TypeScript implementation includes a method named insertKnot in the NurbsCurve class. That method allows to insert multiple knots into the vector, returning a new NurbsCurve, identical to the first, but with a different knot vector and different control points.

With a few lines of code:

let circle = new NurbsCirle()
drawNurbsCurve(circle.controlPoints, circle.knotVector, circle.weights, 2, false, true, plot1, null, true, "Knot Insertion 1")
circle.insertKnot(0.6, 6, 0, 1)
drawNurbsCurve(circle.controlPoints, circle.knotVector, circle.weights, 2, false, true, plot2, null, true, "Knot Insertion 2")
circle.insertKnot(0.3, 4, 0, 1)
drawNurbsCurve(circle.controlPoints, circle.knotVector, circle.weights, 2, false, true, plot3, null, true, "Knot Insertion 3")
circle.insertKnot(0.2, 2, 0, 1)
drawNurbsCurve(circle.controlPoints, circle.knotVector, circle.weights, 2, false, true, plot4, null, true, "Knot Insertion 4")

you can draw this result:

Knot insertion on a circle computed in TypeScript.
Live demo

The NurbsSurf class includes the methods insertKnotXi and insertKnotEta to insert knots in the two dimensions of the parametric space and returns the new object NurbsSurf. This is the result of the algorithm applied to a plate:

Knot insertion on a plate with a hole computed in TypeScript.
Live demo

and this is the result on a toroid (in this case knots were not inserted in the entire vector uniformly, but only in a section):

Knot insertion on a toroid computed in TypeScript.
Live demo

As expected, after each insertion, the NURBS is unaltered. This can also be seen by running the unit tests provided in the repo. For example, one unit test includes this code:

function testNurbs(n1: NurbsSurf, n2: NurbsSurf) { for (let xi = 0; xi <= 1; xi += 0.01) for (let eta = 0; eta <= 1; eta += 0.01) assert(approxEqualPoints(n1.evaluate(xi, eta), n2.evaluate(xi, eta))) } [...] measure("surf_knot_insertion_xi", () => { let n1 = new NurbsPlateHole() let n2 = new NurbsPlateHole() for (let i = 0.1; i <= 1; i += 0.9) { if (n2.Xi[i] != i) { let k = BsplineCurve.findSpan(n2.Xi, i, n2.p, n2.controlPoints.length - 1) testNurbs(n1, n2.insertKnotsXi(i, k, 0, 1)) } } })

Bye! ;-)

Saturday, September 25, 2021

Qt 6 on the Raspberry Pi on eglfs

Read this article on WordPress on a Pi! :-)

It has been some time since Qt 6 was released. Now that Qt 6.2 is almost out, I guess it is time to have a look at it.

Building Qt 6

Qt 6 moved from qmake to cmake. This makes a difference when trying to build it and cross-build it. Instructions on how to build are already available online from many sources, so I won’t repeat the theory here. I just want to note a few key points.

Configuration

I configured this build in the simplest form. To start, I want to use eglfs on OpenGL ES.

Toolchain

Building Qt requires a toolchain for building on the host and a toolchain to build for the target (the Raspberry Pi in this case). The toolchain from the host is typically provided by your distro (I used Manjaro in this case), while the toolchain to cross-build for the Raspberry Pi is officially provided here. In the past years I had a lot of weird problems building with those toolchains, so I started to create my own. Never had any problem anymore. Here you can find the toolchains:

  • Raspberry OS Stretch: here;
  • Raspberry OS Buster: here.

Please note that those toolchains are not always able to target every arch. The one for Buster is the one I used in my build, and the target arch is armv8-a.

Dependencies

Dependencies are already listed elsewhere. I would only like to note one thing here: I had problems linking because of a missing symbol: qt_resourceFeatureZstd. According to the sources, it seems that resources are compiled by the rcc tool. My guess is that the host tools use zstd by default to compress when available, but this is a problem when zstd is available on the host and not available on the target. Therefore, I suspect you have to decide if you want zstd support or not before building Qt 6 for the host, and be coherent with the build for the target. I decided to add zstd.

Build Errors

I had a few build errors building Qt 6.2 beta3. Nothing particularly relevant, but I had to apply a few changes. I did not keep track of those changes, they were all trivial.

Build Qt Test App

Once the Qt builds are ready, it is time to run. For the moment, I only want to do a quick test of QML. In my build, I forgot to set eglfs as the default platform, but it can be set it in the environment. I wrote a simple app to test Qt Quick here. The animations are very simple, but I wrote it like this to see how well many items are handled and how well the uniform movement on the y axis is rendered.

NOTE: Unlike Qt5, Qt6 includes host tools and tools built for the target. This means you can build the test app on the target directly or cross-build it from your host. For the package below anyway, you'd need to have a system compatible with those binaries.

First App

This video shows a comparison of this Qt 6 build against Qt 5. It also shows the same benchmark app on Intel UHD and nVidia.

Download

If you want to test the build of Qt 6 on your rpi, you can download it from here:

Download Qt 6.2.0-rc1 for Raspberry OS Buster armv8-a

The package does not include the toolchain, that you can download from the other links, but includes the host build. I don’t think it will be of help, as it is built for my Manjaro context, but I placed it there anyway.

The positions of the directories are relevant:
  • /opt/rpi/rpi-8.6.3: toolchain;
  • /opt/Qt-x64-6.2.0-beta3: host build;
  • /usr/local/Qt-rasp-6.2.0-beta3: position in the target;
  • /opt/rpi/sysroot: sysroot for cross-building.

Runtime Context

The is the 3D environment as seen by Qt 6:

qt.rhi.general: Created OpenGL context QSurfaceFormat(version 2.0, options QFlags<QSurfaceFormat::FormatOption>(), depthBufferSize 24, redBufferSize 8, greenBufferSize 8, blueBufferSize 8, alphaBufferSize 0, stencilBufferSize 8, samples 0, swapBehavior QSurfaceFormat::DefaultSwapBehavior, swapInterval 1, colorSpace QColorSpace(), profile  QSurfaceFormat::NoProfile)
qt.rhi.general: OpenGL VENDOR: Broadcom RENDERER: VC4 V3D 2.1 VERSION: OpenGL ES 2.0 Mesa 19.3.2
qt.scenegraph.general: MSAA sample count for the swapchain is 1. Alpha channel requested = no.
qt.scenegraph.general: rhi texture atlas dimensions: 2048x2048

Bye 😉

Experimental Implementation of the Blog on a Raspberry Pi! 😉

I recently started to add many useful open source services to my Raspberry Pi, like Jenkins, GitLab etc… I wrote some articles about running some services on a arm64 device here:

After these, I also wanted to try WordPress: it is already available as a comfortable docker image for arm64. How could I put it to work? I thought I could reimplement the Bugfree Blog in WordPress!

Unfortunately it was a bit harder than I thought: first of all this bug was making the system completely unstable. I fortunately found a workaround soon by bisecting the kernel, and now there is also an official fix. The fix is not published yet on Ubuntu, but with this docker image I can apply the patches and cross-build the Ubuntu kernel quickly.

A second problem was related to disconnections of USB devices when USB hubs are connected. This is still unresolved.

A third problem is related to the SD card: I never had problems with the SD card, but I recently broke a few. The Raspberry Pi 4 is hotter that I thought, even when load is minimum. The SD cards are guaranteed to work properly up to 85°C, and the Raspberry Pi is set to that max temp, but I tried to reduce it by using a proper case and adding a little (and very professional) Lego castle to encourage heat dissipation. Let’s hope SD cards can live longer.

New URL

The new URL for the experimental Bugfree Blog on Raspberry Pi 64 bit is:


We’ll see if the experiment succeeds, what the performance is and how much I’ll keep it active.
If you are curious, have a look!
Bye 😉

Thursday, September 23, 2021

Docker Image for the Awesome MLDonkey Service

I really like this excellent piece of software. I found some docker images for it for the arm64 architecture, but they seemed not to be very up to date, so I created my own. Here you can find my version of a dockerized MLDonkey service, built for many archs: armv7, arm64, x86 and x64. The arm versions are perfect for Raspberry Pi variants.

The latest version of the image is taken from this commit, which is the latest at the time of writing. It also includes a couple of patches that I sent to add a dark theme.

The image is built on top of debian:buster. For the moment, it is difficult to use something newer because of this known issue.

  • Image on Docker Hub: link;
  • GitHub repo for the project: link.

To run it, you only need a single command:

$ docker run -i -t -v "`pwd`/data:/var/lib/mldonkey" carlonluca/mldonkey

For more options read the readme in the repo.

Bye! 😉

Sunday, September 19, 2021

Measuring the Frame Rate in Qt/QML

I frequently need to measure the frame rate in a Qt/QML application to have an idea of how heavy the changes I'm making are. In the past, I found two approaches that seemed to answer the question: one is using a simple QML item, measuring the frequency of changes of a property. The other is using a custom QQuickItem, measuring the frequency of the calls received to refresh it. I never used the first approach because it did not feel safe, while I typically used the second.

Instead of copy/pasting code into my projects repeatedly, I thought I could add it to my collection of simple procedures in https://github.com/carlonluca/lqtutils, making it simple to integrate when needed. I therefore asked myself which one to include.

By thinking a bit about it, I thought of a different simpler approach using the signal QQuickWindow::frameSwapped(). This is what Qt doc says about it:

This signal is emitted when a frame has been queued for presenting. With vertical synchronization enabled the signal is emitted at most once per vsync interval in a continuously animating scene. This signal will be emitted from the scene graph rendering thread.

This seemed a reasonable approach, simple to implement, very concise and lightweight. The resulting code is in the repo. Doc is in: https://github.com/carlonluca/lqtutils#lqtutils_uih.

Is this Correct?

I was wondering if my approach was correct, so I thought I could compare with the other approaches. I took the snippets from this SO question. I tried the three techniques separately and also tried to add them together in the same code for a comparison of the values. For the test I used my QML app here: https://github.com/carlonluca/Fall. In the branch https://github.com/carlonluca/Fall/tree/fps_comparison I added all the components to compare. This is the result:


As it can be seen, the method based on the QQuickPaintedItem gives the same result of the one based on the frameSwapped() signal. The other one is very different. The explanation for this is probably based on the fact that the Qt scene graph on Linux renders through the threaded render mode. In fact, forcing a single-threaded render mode makes the three techniques agree on the measurement (second run in the video).

Compared to a custom QQuickPaintedItem, the frameSwapped() technique seems to be more lightweight, so it is the one I implemented in my repo.

Bye! ;-)