Bringing NUI to VST

Some of my friends know me to have a near-obsession over the emerging NUI paradigm, known to most people through little things like Apple’s iPhone or Jeff Han’s famous video. When I first saw the video, it struck me, as many, that a big change was coming. Others have felt similarly to me, but some have failed to grasp the full picture, and dismiss this revolutionary development out of hand when they try to use it with their favorite software and find it less than usable. I too encountered this disappointment years ago, when I first tried the experiment. The touch hardware is only one side of the story, but this takes a while to realize.

I started recording music on a Tascam four-track in high school, and being the nerd that I am, quickly moved into DAW land before that term really had a meaning (it’s still rather nebulous), and have been recording for my own sake ever since. As music and technology advanced (and they both have), and as I grew, I began to tire of recording. I love to hate on Ableton, but their Live software really did shake up the game. Here, finally, was an application which truly transformed the computer into an interactive musical instrument. Recording had finally broken free of the linear-time it had been bound to ever since the days of tape (well, Robert Fripp might take exception to that assertion, but he himself is exceptional).

As I began to play more with the software, I of course became frustrated with the mouse interface. I was trying to make music, not check my e-mail, damn it! Taking my hands off the guitar, picking up a mouse, finding the cursor on the screen to know how to move, it, clicking a weird button… the ergonomics were just too frustrating to make continued use of in a studio environment, let alone in performance. Being a highly process-oriented individual, I began to think of alternate solutions. A foot controller like Behringer’s FCB1010 is of great utility, but lacks the dynamic feedback capabilities that make modern screens so beautiful. It became clear that a touch screen would offer the best of all worlds. So I bought one. Only it turns out, all that software really was designed for a mouse and keyboards. The icons were just too small for my fat fingers to accurately press! On top of that, things like rotary faders don’t have a consistent feel that maps to fingers. The mouse pointer can disappear when the button is clicked, but my finger stays put. Do I want to drag up and down with my finger, well away from the display of the rotary control, in order to rotate it, really?

Fast forward. We are on the verge of cheap, omnipresent multitouch-capable hardware, with operating system-level support. This will be most radically felt in the multimedia and artistic realms, as intuitive interface diminishes the barrier of entry caused by software learning curves. Design interfaces to respond to the expressive nature of gesture, and people lose their fear of experimentation. Create interesting parameter mappings between the physical input and the digital result, and configuration choices cease to be overwhelming.

All this is a bit long winded to get to what I really want to address. Steinberg’s VST audio plug-in architecture is a defacto standard with a mind-boggling assortment of third-party offerings. Most plug-ins also offer custom GUIs for editing their parameters. Unfortunately, the vast majority of these work extremely poorly with touch. How hard would it be to add some extra exports to a VST library to expose a NUI presentation in parallel to the common GUI? Would it be possible to retrofit a layer to fit over existing VSTs? Perhaps some recommendations, if not a formal specification, for the pieces of VSTGUI to avoid in a NUI case– for instance, CCursorType doesn’t really make sense at all, and neither does CMouseWheelAxis. But there are more subtle things: what knob mode works best for touch screens? In my experience, linear click drag (kLinearMode) doesn’t seem to make much sense for touch, but for kCircularMode, care must be taken that the knob’s representation is sized large enough for the fat fingers to accurately manipulate. After all, they don’t get any of the tactile feedback of a real physical knob, so proper placement is pretty much all visual. In any case, it would be great to see some guidance on retrofitting the myriad VSTs out there to transition them from WIMP to OCGM in a timely fashion. I suppose, as always, it’ll just take some time for the adjustment to soak into the collective consciousness. And, as always, I’ll be patient but eager.

Git’s core.autocrlf More Trouble Than It’s Worth

I’ve been using Git for some projects lately, and overall I have to say it’s a good tool. One thing that has had me scratching my head for a while is the behavior of line endings. There is a lot of conflicting advice on the Web regarding the ‘best’ usage of the core.autocrlf option across different platforms. After much wasted time trying to figure out why my working tree showed modifications even after a fresh checkout, I’ve concluded the best course of action is to always set the option to false. Pretty much any IDE or text editor these days, as well as any compiler one could consider using, supports either line ending mode. Git will get confused if some files in the repository have CRLF endings and others have LF, thinking you have made changes when you have made none. In this situation, git diff will list an entire file’s contents as removed, followed by the same exact content added (well, not exactly the same… the line endings are different! But who cares?) A git diff -w, on the other hand, will not list out this absurdity. Just do a git config core.autocrlf false followed by a git reset --hard and things will be where you want them. Let the IDEs deal with the line endings, and Git is happier to just deal with content tracking however it comes at it, and I’m happier to get back to real work rather than mucking about with the tool.

ArticulatEd Exhumation

I’ve always found the available tools for hobbyist game developers to be rather rudimentary, and usually full of gaps. A couple years ago, just before ragdolls were absolutely everywhere, I began work on an idea I’d had for a while to make an editor for articulated rigid body systems, specifically targeting character rigs. I had tried other free editors out there, but they all seemed quite cumbersome, and none had any of the more advanced features I wanted.

My first prototype was a simple program made with SDL, OpenGL, and ODE, written quickly and messily just to see if the idea had any merit. The interface design and implementation would have taken too much time in SDL, as well as its C API contributing heavily to the ugliness of the code, so I decided to start fresh.

The new version uses one of my favorite software libraries, JUCE, instead of SDL, leading to a much more rapidly developed and nice looking GUI. Additionally, JUCE’s fairly nice C++ object model lead to a much cleaner source base and more easily extensible designs. A quick video:

ArticulatEd’s interface is inspired in equal parts by Pixologic’s ZBrush and the FOSS Wings3D. Indeed, it is meant to complement those two miraculous modeling programs as a counterpart in the toolchain, serving both as a physical rig (or ragdoll) creator for meshes, and as a physical animation tool (eventually). The basic idea comes from my perceived need for a straightforward tool for constructing physical representations for the graphical meshes in a game or simulation project. Creating chains of connected rigid bodies should be as simple as possible, as should manipulating the created systems into different poses. By importing a modeled mesh, it should be easy to connect the mesh to deform to the underlying rigid body skeleton throughout its various poses. Joint constraints and motors to move through the poses should be easy to tweak, or even generated automatically by the implicit skeletal structure and the movements between the various animation poses.More to come.

Patchbaying .NET, part 2

It’s been some time, but I’ve been continuing with the visual patching system for .NET that I wrote about last time. After playing around with Cairo more and more, I decided that the component model offered by WinForms, while inferior in flexibility and aesthetics, gave enough of an upper hand in terms of developing controls that I would go ahead using that. Mono’s support for WinForms seems to be increasing all the time as well, so at least for now, I’ve decided to stick with it, and focus more on the gritty details of actually getting the system working well.

I still have a long way to go with the infrastructure, but I’ve gotten to the point now where I can realistically start developing targeted packages. My first focus is on getting an OpenGL framework going, for which I’ve found the Tao Framework quite handy. Unfortunately Tao still leaves a few things to be desired. For instance, OpenGL enumerations are simply wrapped as integers, which while true to the spirit of the underlying specification, is still a royal pain compared to the much more semantically strong .NET enumerations. Overall, though, Tao is proving itself to be a quick and easy way to get the functionality I need into this framework. It’s definitely nice to not have to deal with writing the interop code myself.

So far I’ve got a primitive mesh class written, which reads in WaveFront/Alias .OBJ formatted mesh data and renders it, currently using immediate mode. Obviously this is slow and silly, but it serves its purpose as a prototype. I foresee good things happening soon on this route.

Of course the development and brainstorming for all of this has led me to realize how much potential a program like this actually has. The concept of dataflow programming has somehow managed to escape the larger consciousness of the so-called computer science community. Certainly there are many people familiar with the concepts, and even a few working on such languages (many of the visual languages, in fact, lend themselves to dataflow naturally) but overall it seems to be much neglected, especially considering the prodigious benefits it offers. Things like parallelism, which are commonly viewed as difficult to handle in traditional languages, reduce to simple graph traversal problems in the dataflow paradigm. Long unconnected paths in a DAG of the dataflow system can be automatically parallelized by a computer. Sorting the operational flow by dependencies with a topological sorting gives the computer ample opportunity to optimize the problem. This stuff is cool, and it’s a shame it isn’t used more widely in the computing community. My hope is that, by bringing some of these concepts to the .NET world, where interop is so blindingly easy, many problems can be ameliorated. Exporting patches as C# source or even emitting assemblies directly are planned features.

Patchbaying .NET

As I’ve explored more and more audio and visual software, I’ve had my interest piqued by patchbay style graphical languages, such as Pd and Max/MSP. Particularly in the VJ community, softwares such as Salvation and VSXu have made it easy to assemble powerful sound-reactive visuals, and tweak them on the fly using the same interface.

My last attempt to explore this subject dealt with visualizing Python object graphs for making visuals. I consider it a qualified success, but Python is a funny language. The weak typing and late binding make coding extremely flexible and mutable, but unfortunately also leads to a dearth of useful metadata. Contrasting this to .NET, which has metadata oozing out everywhere, I decided to try a similar experiment with that platform.

Objects in .NET tend to store references to other objects, forming a graph in memory. Connected portions of this graph are traversed at various times, such as during serialization, but being able to see a visual depiction of the graph could be really nice for some high level control. The best part is, due to the metadata, the visual depictions can automatically deduce what controls and connections of the underlying objects should be displayed.

My prototype uses WinForms, simply because it is convenient. A list displays several .NET types which can be instantiated by dragging them onto the main canvas. In this primitive version, only classes with default constructors can be created, but it’s enough to get a picture of the potential. Objects which implement a custom IModule interface can also have custom WinForms controls added to their visual module.

All the public properties of a particular object are retrieved via reflection and listed on the left edge of the module. These can be connected to other modules, as long as the types are compatible, and these connections will be displayed via arrows. When the connection is made visually, the system calls the property’s setter (again via reflection) with the linked object as the value parameter. A properly designed object should then have a reference to the linked object, which it can internally call for whatever it wishes.

There are a few caveats. Most notably, only reference types will properly ‘connect’ currently, since value types are simply copied. Therefore, for my beginning experiments, I created a class Scalar, which is basically just a referencable holder for a float. Furthermore, Scalar can be subclassed in order to implement IModule in different ways: for instance, a value controlled by an on-screen linear slider, or one controlled by a text box. However, it would still be nicest if regular value types could be used directly. Why is this a problem?

There seem to be two fundamental approaches to module-wire computation systems. The method which I’ve used so far in my systems, in which the module objects store their own data, including connections to other modules which they can call at any time, could be termed a data pull system. The method embraced by Pd and others takes a different route, forcefully pushing the data from into a module, getting the result, and transferring that to the next module. The difference can also be thought of as where the data is kept: in the former, it’s stored in the modules, while the latter it’s (conceptually) in the wires themselves.

The push method clearly obviates the problem with value types mentioned above, as they are constantly being copied to their successors in the signal chain. Unfortunately, just when these copies should be made is another unanswered question. One of my big frustrations with Pd was its inherent update rate tied to the audio frequency. This is great for audio, and solves the question of when to push data for that specific domain, but it essentially amounts to a magic number for any other type of system. If I want to create a visualization system that has nothing to do with audio, why should the audio driver rate matter at all? Figuring out an appropriate time to push value types remains an open problem for me.

2870The WinForms prototype was a fun experiment, and verified .NET’s feasibility to me for this sort of project. I personally hate coding for WinForms, though. It’s convenient for the standard Windows interfaces, but patching connections between modules isn’t such an application. The custom control rendering is hackish, sloppy, and slow, because it doesn’t really fit in with the traditional application model. Additionally, I love Linux and Mono’s support for WinForms, while rudimentary, still leaves plenty to be desired. Its underlying drawing layer, though, known as Cairo, holds much promise. After some hacking around I got a nice little test application rendering some vector graphics onto an OpenGL texture. My plan for the next step is to redevelop the patcher using Cairo to draw the main interface on top of an OpenGL window which can be used render 3d scenes, controlled by the patchbay interface in real time. It should be cool.

Unified text-based and visual programming

I’ve tried quite a few graphical programming languages over the years, such as Pure Data (pd), but having experience with more traditional text-based languages, was always left frustrated by the seemingly roundabout way of data entry. The same things that attract me to vim and Dvorak made me long for more convenient methods.

At the same time, there are some things which are much better suited to graphical programming environments. It’s easier to keep track of variables, since they just sit right in front of you. Controls can model traditional interfaces such as knobs for tweaking values, making real-time manipulation much more friendly.

Obviously, both systems have their advantages and disadvantages. While typing ‘x = y + z*2’ is much quicker and more concise than navigating multilevel menus to create discrete operators and operands in a visual system, finding exactly that right shade of indigo is much nicer with a typical palette finder rather than guess and checking with RGB triplets. Both systems are equally capable, but some tasks lend themselves more naturally to one system than the other. Being able to pick and choose which to use at any given time led me to attempt a parallel model.

In Canvasthesia I’m using Python to implement much of the higher level functionality. It’s quite a versatile language, and has the somewhat unusual ability to interpret code from an interactive console. I have a few classes which represent various types of entities, such as a Renderable entity, which startlingly enough, renders something in a scene. Adding a custom Python descriptor (EntityConnection) to a class derived from one of these base entity classes lets the system know that a particular type of entity can be connected to it.

class Test(vj.Renderable):
testlink = vj.EntityConnection(vj.EntityType.Renderable)
def __init__(self):
vj.Renderable.__init__(self)
def Render(self):
pass

This simple renderable entity does nothing of interest, as evidenced by its minimal Render method. However, it does contain a link to another renderable entity via its testlink attribute. Because it is an EntityConnection descriptor, it is automatically added to any instance’s corresponding visual control.

A few lines typed into the console creates a couple objects and adds them to the scene:

test = Test()
vj.MainScene.Attach(test)
test.testlink = Test()

Although only a single Test instance was directly attached to the scene, a simple depth-first search reveals that a second actually exists. Additionally, it is evident that there is a connection between them via the first’s testlink attribute. The patchbay shows this intuitively:Patchbay test

Likewise, a new entity could be created in the visual editor and subsequently accessed by the Python console. They are simply parallel frontends manipulating the same backing objects, allowing whichever is convenient to be used at any time. It’s the best of both worlds.

ChucK Composition

Though I first looked at ChucK quite a while back, I recently decided to give it another look and give it an actual try for more than a few minutes. After playing around with it for an evening, I ended up liking it a lot more than I had previously. The feature set has improved markedly since I last tried it, as well as the consistency and thoroughness of the examples. There are still a few areas I would definitely like to see added, mostly centered around integration with other audio applications. Being able to load ChucK scripts as a VST/AU/LADSPA plugin would be a nice advantage, although routing via Rewire or JACK would be about as good. If my interest with this language keeps up, I may decide to get involved with the development and implement some of these myself.

I’ve wondered about alternative scores for music performance a lot, and while a ChucK script may not be exactly what I’ve been thinking about, it might make a lot of sense for virtual accompaniment. A properly written score could direct the human performer, playing a specific part such as guitar or voice, via visual cues; at the same time, the input of the performer could be processed by the program to affect changes in the machine’s performance. Louder RMS values on the input could be reflected by more rapid note generation by the computer, more periodic transients could lead from ambient soundscapes to rhythmic sequences, different instruments could noodle about on riffs within the current chord, etc. Of course all the typical MIDI and OSC control tricks still apply. All this could surely be done with Ableton Live or any other host with sufficient plugins, but setting up some of the more complicated things with that particular model could be roundabout, to say the least. Obviously everything could be done with C++ as well, but it would take forever and be tough to maintain across platforms. The point is that ChucK seems to be a good middle ground. While it isn’t a ready to go solution, it offers a direct and quick path to implement any of silly audio ideas floating around.

I’ve attached my first little experimental composition script, and a sample performance of it. It uses the Stk instruments, so sounds a bit similar to many of the examples, but I think it’s a bit nicer. It’s short, but it certainly gave me a nice introduction and greatly increased my comfortability in the language. Hopefully I’ll soon be capable of writing some longer, and nicer compositions.

Reaction-diffusion experiments

I’ve been playing around with reaction-diffusion morphogenesis simulations lately, to see if I can come up with some cool art. Turing theorized this system over a half century ago, so it’s no surprise that people like Greg Turk have explored the area pretty thoroughly. It’s hard to come up with new ideas, but my recent interest in VJing led me to ponder animated patterns of reaction-diffusion.

Reaction-diffusion animations can mean simply displaying the morphogenesis simulation as it progresses, but I wanted to try something a bit more different. Given that the patterns form deterministically in the simulation, I hypothesized that subtle changes in the initial state of the simulation would lead to subtle changes in the stabilized pattern. Thus my idea was that, by slightly varying the random substrate in a coherent manner between each frame, the resulting patterns would be coherent, meaning continuously wriggling labyrinths and shifting spots.

2795

Alas, my first attempts unfortunately failed to produce anything artistically interesting. While my general hypothesis was correct, the animation is dominated by infrequent, abrupt popping updates, rather than the slowly shifting patterns I was hoping for. I may be able to make something interesting out of this by computing differences from frame to frame and interpolating between them for a longer period in the animation. In any case, further experimentation is needed.