Voki Codder: audio

Showing posts with label audio. Show all posts

Tuesday, September 6, 2011

CLAM binders, ipyclam and PyQt

Posted by Vokimon at 3:59 PM

I just finished a pair programming session with Xavier Serra. He is going to defend his Final Career Project on Friday and we are preparing some cool demos involving binding ipyclam and pyqt. It has been an exciting session because as we debunked one by one the show stoppers that we were finding, we realized that the potential of the tool mix is wider than we thought.

What we can do now is to build a network with ipyclam, building an interface by combining pyqt and ui files and binding them to that network, all that either in live by using the interactive shell or using a written script. This formula adds the flexibility of scripting to CLAM prototyping system, raising the ceiling of what you can do without raising too much the learning threshold.

For example, let's play a file in a loop with ipyclam:

import ipyclam
n=ipyclam.Network(ipyclam.Clam_NetworkProxy())
n.file = n.types.MonoAudioFileReader
n.out = n.types.AudioSink
n.file.Loop = 1
n.file.SourceFile = "jaume-voice.mp3"
n.file > n.out

n.backend="PortAudio"
n.play()

Then you can instantiate an Oscilloscope, bind it and play:

from PyQt4 import QtCore, QtGui
a=QtGui.QApplication([])

w=n.createWidget("Oscilloscope")
w.show()
w.setProperty("clamOutPort", "file.Samples Read")

n.stop()
n.bindUi(w)
n.play()

You can also load an UI file and bind it simultaneously..

w2 = n.loadUi("myui.ui")
w2.show()
n.stop()
n.bindUi(w2)
n.play()

Adding a tonal analysis and some related views

n.stop()
n.tonal = n.types.TonalAnalysis
n.file > n.tonal
w3 = n.createWidget("CLAM::VM::KeySpace")
w3.setProperty("clamOutPort", "tonal.Chord Correlation")
n.bindUi(w3)
w3.show()
n.play()

The enabler of the mix has been the new binder architecture in CLAM. Until 1.4, the Prototyper used what we call 'binders' to relate processing elements with user interface elements. Each binder concentrates in a given kind of binding: oscilloscopes, control sliders, transport buttons...

For the upcomming CLAM 1.5 the binder interface (CLAM::QtBinder) has been redefined and moved out from the Prototyper to the qtmonitors module. So now you can use them in any CLAM application. The bindings are now based on Qt dynamic properties which provide more flexibility than former QObject name mangling.

Currently there are several binders implemented:
* Action/Button -> launch a configurator dialog on some processing
* Action/Button -> launch a open dialog for a MonoFileReader
* Checkable -> Send a bool, or a bistable float control
* Slider -> maped float/int control
* Any Monitor -> outPort to monitor
* ControlSurface -> Send pair of controls
* Slider -> ProgressControl or a MonoFileReader or similar to control

Now you can extend those binder by means of plugins as we extended processings. You can use Qt dynamic properties in the interface elements to specify how the binding is done.

The abstract interface provides the static method QtBinder::bindAllBinders() which is the one called by python bindUi. This looks for every ui element in the QObject hierarchy, looks for every registered binder and if they match it is applied.

To implement your own handle you just have to rewrite the bool handles(QObject*) method that returns true if the object managed by the binder (type, name, presence of properties...) and the bool bind(QObject * uiElement, Network & network, QStringList & errors) which does the actual binding. To register the binder, you just have to instanciate one as static variable of a cxx file in your library.

All that is still being building up but its potential seems clear, so i hope that this new way of claming will be useful to you all.

Monday, April 6, 2009

VST plugins with Qt user interface

Posted by Vokimon at 12:33 AM

I recently did an spike on what we need to make VST plugins first class CLAM citizens. CLAM allows to visually build JACK and PortAudio based applications with Qt interfaces as well as GUI-less VST and LADSPA plugins. The more flashy feature of VST is user interfaces that are mostly built using VSTGUI. We are using Qt as interface for JACK and Portaudio based apps because we are using the nice features of Qt toolkit to dynamically bind the UI elements and the underlaying processing. Moreover, Qt styling features enables shinning designer-made interfaces. Why not being able to reuse the same interface for VST and JACK? That has been a long standing TODO in CLAM so now is time to address it.

In summary, we fully solved croscompiling vst's from linux and we even started using qt interfaces as vst gui. In that last point, there still is a lot of work to do, but the basic question on whether you can use qt to edit a vst plugin is now out of any doubt.

To make the spike simpler, and in order not to collide with other CLAM developers, currently working on it, i just left apart all the CLAM wrapping part, just addressing vst crosscompiling and Qt with the sdk examples.

Cross compilation was pretty easy. This time I found lot more documentation on mingw and even scons. Just by adding the crossmingw scons tool we are already using for the apps and i managed to get Linux cross-compiled plugins running on Wine.

Adding a regular vstgui user interface is just a matter of compiling vstgui sources along with the example editor that comes in the sdk.

Once there, we should address Qt. VSTGUI is just a full graphical toolkit implementing the 'editor interface' plus a toolkit with some provided widgets and, i guess, a way of automating the binding of controls to processing. So what we need for qt is to implement the AEffEditor interface using the qt toolkit instead. The first problem is about the graphical loop. You have to create a QApplication and calling qApp::processEvents() on the editor's idle method so that qt widgets get responsive. The problem then is that, if you don't provide a QWidget as parent to your interface, it becomes a top level window ignoring the host provided window that still appears as an empty one.

VST host provides such window as a native Windows handle. How do you create a widget on an existing window handle? Months ago trolls redirected me to a commercial solution. Not such a 'solution' for us, a FLOSS project. So i was digging in windows qt source code for a hack when i found the answer just at the public and multiplatform QWidget api. QWidget::create works like a charm. The following simple class is a native window wrapper you can use as a regular QWidget.


class QVstWindow : public QWidget
{
        Q_OBJECT
public:
        QVstWindow(WId handle) {create(handle);}
        QVstWindow::~QVstWindow() {}
};

Still there are some issues: focus handling, reopening, drag&drop... But the basic mouse clicking and resizing works

Once i got that, loading a designer ui file was very easy.

As I said there are still many caveats to solve. A matter of playing with it and refining things. Here is a list of TODO's:

Communicate controls from and to the interface
Handle focus and other events properly
Build a CLAM network wrapper which reensembles more the one for LADSPA
Wiki documentation on how to build your own plugin
One button plugin generator like the one we have for LADSPA ;-)

I feel that there is more people around other projects interested in using Qt for VST plugins so this is also a call for collaborative research on pending issues, at least the generic ones. Contact us on the CLAM development list or for a broader audience in the Linux Audio Developers list.

Friday, November 14, 2008

Managing Audio Back2Back Tests

Posted by Vokimon at 3:47 PM

So long since last post, and a lot of things to explain (GSoC results, GSoC Mentor Submit, QtDevDays, CLAM network refactoring script, typed controls...). But, let's start explaining some work we did on a Back-to-back system we recently deployed for CLAM and our 3D acoustic project in Barcelona Media.

Back-to-back testing background

You, extreme programmer, might want to have unit tests (white box testing) for every single line of code you write. But, sometimes, this is a hard thing to achieve. For example, canonical test cases for audio processing algorithms that exercise a single piece of code are very hard to find. You might also want to take control of a piece of untested code in order to refactor it without introducing new bugs. In all those cases back-to-back tests are your most powerful tool.

Back-to-back tests (B2B) are black box tests that compare the output of a reference version of an algorithm with the output of an evolved version, given the same set of inputs. When a back-to-back test fails, it means that something changed but normally it doesn't give you any more information than that. If the change was expected to alter the output, you must revalidate the new output again and make it the new reference. But if the alteration was not expected, you should either roll-back the change or fix it.

In back-to-back tests there is no truth to be asserted. You just rely on the fact that the last version was OK. If b2b tests get red because an expected change of behaviour but you don't validate the new results, you will loose any control on posterior changes. So, is very important to keep them green or validating any new correct result. Because of that, B2B tests are very helpful to be used in combination of a continuous integration system such as TestFarm, that can point you to the guilty commit even if further commits have been done.

CLAM's OfflinePlayer is very convenient to do back2back testing of CLAM networks. It runs them off-line specifying some input and outputs wave files. Automate the check by subtracting the output with a reference file and checking the level against a threshold, and you have a back-to-back test.

But still maintaining the outputs up-to-date is hard. So, we have developed a python module named audiob2b.py that makes defining and maintaining b2b test on audio very easy.

Defining a b2b test suite

A test suite is defined by defining back-to-back data path, and a list of test cases, each one defining a name, a command line and a set of outputs to be checked:


#!/usr/bin/env python
# back2back.py
 from audiob2b import runBack2BackProgram
 data_path="b2b/mysuite"
 back2BackTests = [
  ("testCase1",
   "OfflinePlayer mynetwork.clamnetwork b2b/mysuite/inputs/input1.wav -o output1.wav output2.wav"
   , [
    "output1.wav",
    "output2.wav",
   ]),
  # any other testcases there
 ]
 runBack2BackProgram(data_path, sys.argv, back2BackTests)

Notice that this example uses OfflinePlayer but, as you write the full command line, you are not just limited to that. Indeed for 3D acoustics algorithms we are testing other programs that also generate wave files.

Back-to-back work flow

When you run the test suite the first time (./back2back.py without parameters) there is no reference files (expectation) and you will get a red. Current outputs will be copied into the data path like that:
b2b/mysuite/testCase1_output1_result.wav
b2b/mysuite/testCase1_output2_result.wav
...

After validating that the outputs are OK, you can accept a test case by issuing:
$ ./back2back.py --validate testCase1
The files will be moved as:
b2b/mysuite/testCase1_output1_expected.wav
b2b/mysuite/testCase1_output2_expected.wav
...

And the next time you run the tests, they will be green. At this point you can add and commit the 'expected' files on the data repository.

Whenever the output is altered in a sensible way and you get a red, you will have again the '_result' files and also some '_diff' files so that you can easily check the difference. All those files will be cleaned as soon you validate them or you get back the old results.
So the main benefit of that is that the expectation files management is almost automated so it is easier to maintain them in green.

Supporting architecture differences

Often the same algorithm provides slightly different values depending on the architecture you are running on, mostly because different precision (ie. 32 vs. 64 bits) or different implementations of the floating point functions.

Having back-to-back tests changing all the time depending on which platform you run them is not something desirable. The audiob2b module generate platform dependant expectations by validating them with the --arch flag. Platform dependant expectations are used instead the regular ones just if the ones for the current platform are found.

Future

The near future of the tool is just being used. We should extend the set of controlled networks and processing modules in CLAM. So I would like to invite other CLAM contributors to add more back2back's. Place your suite data in 'clam-test-data/b2b/'. We should decide where the suite definitions themselves should be placed. Maybe somewhere in CLAM/test but it won't be fair because dependencies on NetworkEditor and maybe in plugins.

Also a feature that would extend the kind of code we control with back-to-back, would be supporting file types other than wave files such as plain text files, or XML files (some kind smarter than just plain text). Any ideas? Comments?

Friday, July 20, 2007

Simplifying Spectral processing in CLAM

Posted by Vokimon at 11:10 AM

Current CLAM Spectrum implementation has four different internal representations: Complex array, polar array, separated magnitude and phase buffers, and separated magnitude and phase break point functions (point controled envelopes). Conversion and synchronization among such representations were handled by the Spectrum class itself and this was intended to be transparent. But processing algorithms had to know which representations were already present to trade off between doing conversion and applying different versions of a given algorithm depending on the representation.

For instance, spectrum product in polar representation is cheaper by itself than complex product, but if you have to convert incoming or outcomming spectrums it migth be not so cheap. What about having two different representations for each operand? Which one should be converted? Output format could be a discriminant but do the module has information to know which is the convenient output representation? Such decissions make the implemention of processings using spectrums really complex.

Besides that, algorithms also had to consider different cases for different scales (linear or dB). So the complexity of processing objects having to deal with spectra is very high. A lot of code dealt with querying the presence of a given representation or synchronizing them.

So, we decided to simplify all that by having several kind of spectrum for each representation and doing any representation change explicit by a converter processing. We are currently considering just two spectrum classes: MagPhaseSpectrum and ComplexSpectrum. We added also converters to the current Spectrum to enable progressive migration. The implementation of FFT and IFFT was highly simplified as we don't need to deal with conversions and prototypes configuration, just to input or output complex spectra.

This is a setup of a system which perform complex and polar products of two spectra:

The spectral product has been simplified. Now we have a product for each kind of spectrum in contrast with the previous situation where we had a single module containing implementations for every combinations of inputs.

Code is really simpler to understand and write this way. After having all that we could do a profiling on comparing which is best, conversions and cheap product or no conversions and expensive product. The option without conversions won but the situation can change on the future depending on the optimizations we could do on the conversion. Having the processing modules factored that way we could change them and doing the benchmark that easy.

In the process of takin such decission we noticed that some elementary CLAM elements such as the AudioSource, AudioSink and specially AudioMixer were taking too much CPU.

After fixing that we got a clear profile that highlights which is the proper strategy for spectrum product for our concrete case. Callgrind is a really nice tool:

So I hope we can move soon to that new way of doing spectral processing in CLAM but still a lot of code must be updated.

Monday, July 2, 2007

CLAM Plugins

Posted by Vokimon at 10:16 AM

One of the most interesting features of the upcomming CLAM release will be a plugin system that will allow users to add their own processings as a plugin to any CLAM based application.
Just by placing a library on a given folder.

We provide some cute tutorials on writting simple processings from scratch, and a SConstruct file that will compile and install everything in a given folder as a plugin.

It has been more easy than we thought as we were frightened by Visual 'mecatxis' and we just dropped it. A single class did the work. There is still some work to do on documenting the registration of the configuration dialogs for the processings and making the Processing Tree of the network editor to populate automatically from the factory content.

The former can be done very quick by adding the Qt4 configurator as a CLAM library, but we want to provide a more general solution not requiring users linking against Qt4. The later (processing tree population) is something Andreas Calvo is working with. And he is doing nice progresses.

Wednesday, April 18, 2007

Enhancing QSynth/Rosegarden/SonicVisualizer knobs for CLAM

Posted by Vokimon at 2:49 PM

I am still expanding the widgets toolbox for CLAM.

PK Widgets, the ones in my previos blog entry, are really nice but they don't integrate well with other widgets because the background and the fixed size. So you have to do a mostly PK interface or not. Most standard Qt widgets look nice with a nice theme such plastique, but for some reason QDial is the ugly duck of the collection... well, according Chris Cannam, also QLCD. I agree with him. QDial looks like old CDE style in every style you choose:

Besides the look, QDial behaviour is far from optimal. Clicking on any point jumps the pointer which is a bad thing if you want to progressively control a parameter. Also when you move the knob beyond one of the extremes you may switch sharply to the other extreme. Using such a mischievous control in a live performance could be a disaster. So let's look for asthetic and behavioral alternatives.

Going back to the look, I just liked a lot the knobs in QSynth a program by Rui Nuno Capella.

The knobs indeed were taken from Rosegarden and Pedro Lopez Cabanillas enhanced the look by adding some nice but expensive gradients. The ones that attracted me.

I took those widgets with such a long trail and I extended them.
I ported them to Qt4 using QGradients which are faster, and i did some visual improvements: shadows, bumped pointer, configurable colors, angular and linear modes of mouse handling... I added them to the CLAM Widgets plugin and added some configurability so you can change the colors from the designer. And that's how they look now:

Monday, April 9, 2007

PK Widgets integration in CLAM

Posted by Vokimon at 11:09 AM

And last but not least, the other important change i added this Easter to CLAM has been integrating Patrick Kidd's PkWidgets. This was a really dormant to-do. A working implemtation was on the 'draft' cvs from 2004.

On 2004, Patrick amazed the LAD community with his nice looking PKSampler. I must confess that I never got PKSampler sounding because a bunch of missing dependencies on Debian/Ubuntu. But, a part from the audio work, the interface was really gorgeous. Something that was not so common on the Linux Audio scene.

The nice look was mostly due to a clever combination of free technologies: Qt, Python and Povray. Povray rendered 3D widgets with a realistic look, Qt the interface foundation and the resource system and Python to glue it all.
A clever trick allows to load povray images as frames of an interactive animation to render a widget.

At that time I was starting what is now the Prototyper and I was already messing with designer plugins, i could not stand and I ported some of the Patrick widgets to C++ in order to include them in a Designer plugin.

I exchanged some mails with him but the code was left on CLAM draft repository... until Saturday. I updated it to Qt4, added some configuration features, merged some of the new pixmaps and i integrated them on the CLAMWidgets plugin. The result, now you can prototype with CLAM interfaces such that one:

Educational Vowel Synth and ControlSurface Widget

Posted by Vokimon at 10:00 AM

Some months ago I had the idea of doing a Vowel Synthesizer program which could help children to understand why the vowel triangle has some sense and it also could be used to train foreign language vowels.

The core of the idea is that axes correspond mostly to the position of the first two formants F1 and F2. So by a simple mapping, given a point on the triangle you could synthesize any vowel even those in the middle. Also by analyzing incoming voice from a microphone, the system could place a point in the triangle that identifies how far one is from the intended vowel.

I did a first prototype which synthesized something far from human but at least you can identify the vowels when you place the proper frequencies. As i had a proof of concept i stopped there.

This easter i improved the prototype by adding a new control widget: The ControlSurface, which control two parameters by moving a single point. It is really more handy than having two sliders and it is perfect for parameter exploration, not just for the Vowel Synthesizer.

It will also force a change on the way the CLAM Prototyper address the binding of controls as it controls two parameters and our current system binds a widget to a single inlet.

Realtime MFCC and LPC analysis

Posted by Vokimon at 8:14 AM

As I said on the previous blog entry, too much things done in CLAM this Easter. We fully dropped Fltk visualization module from the main libraries. Zach Welch (one of the active new commers) ported Voice2Midi to Qt4 and we moved a subset of the Qt3 visualization module to the SMSTools and dropped the rest. (Zach prepared a set of patches to port it to Qt4 but let's wait until the rewrite). All those drops mean that CLAM size has been reduced to a half of its former size: less mainteinance and faster entry for the new developers.

A few examples stopped being functional because they used Fltk Plots. One of them the LPC_example. It compiled by disabling visualization rendered the example useless. I decided to convert it into a Network Editor example.

After some hacking i had this piece of network working.

After some designer and prototyper hacking i got this interface:

Trying to understand the meaning of the LPC I also added a new output the spectral envelope:

So it captures the spectral envelope which is gonna be very useful for the vowel synth.

Then a user asked for realtime MFCC. As we already had the processings but using portless interface, and LPC visualization could be reused for MFCC, i also took this task. The result:

Mixing it all in a single analysis interface

Vowel formants are even more clear in MelSpectrum that in MelCepstrum or the Spectrum it self. Another idea to be used for the Educational Vowel Synth.

Realtime Voice Gender Change

Posted by Vokimon at 7:38 AM

This Easter has been very productive for the CLAM framework. Holidays and a bunch of new developers that have aproximated the team during these weeks have accelerated a lot the development which had already been accelerated during last months. As consequence, a lot of dormant to-do's has been addressed and most of them deserve a blog entry (pixmap widgets, realtime LPC/MFCC, Qt3/Fltk drop, Qt4 ports, new visualization widgets, new control widgets...). Just take a look at the CLAM-devel mailing list archives or the development screenshots and you'll see a im not exagerating. So let's start from the beginning.

One of the nicer things I worked on and which is going to be the star on CLAM demos for sure, is the voice Gender Change effect on the NetworkEditor. Since Xavier implemented the offline version, it has been arround for a while on the SMSTools turning Elvis voice into Elvira. It was also on the NetworkEditor but crashing or not even sounding.

So, I am a demo whore, let's make people smile. Let's make it realtime. Pau already did some work as he ported almost all the SMS transformation. I just had to fix some of the internal processings, the Pitch shift, and the SpectralShapeShift. Most of the bugs were about detecting non-harmonic parts and figure out which is the best strategy in those cases. In some case this was detected but the action relied on input to output copies implicit on the offline version but not on the realtime one.

What i was not able to solve is the CombFilter. It added a really extrident noise to the residual part. So I disabled it and it does not sound as perfet but... tada! i started listening my sister voice from the speakers instead of mine when i talked the mic, which is specially frightening, most of you know why.

That's the look the network had after having the system completed.

And a first attemp Prototyper interface for an stand alone application:

As soon as it is available as nightly snapshots try it. It is worth to listen. My first experiments i used it to learn how to do a mixt choir in Ardour with just my own voice.

Voki Codder