October 2003 archives from
Piotr's R&D blog

Shadow Conferencing

Friday, October 24, 2003, 09:47PM - category HCI -
All the way back in 2000, Dan O'Sullivan took a step towards making video conferencing bearable. This could even be overlaid as a watermark on top of other applications, so you wouldn't lose screen real-estate, yet still maintain a visual awareness of other participants. Damn shame it looks like it's going to get patented.

Video Bench at the open house

Saturday, October 18, 2003, 05:31PM - category HCI -

The Chisel group exhibited the Video Bench at UVic's Engineering open house. I only helped set up and break down (and debug!), but apparently it was a huge hit. A couple interesting things I was told (Peggy, where's your own blog?): Kids playing with the Video Bench.

  • Kids were sharing seats, so the app had up to four simultaneous independent users with only two distinct identities. Apparently, it held up fine, even though it was most definitely not designed for this eventuality. Moral: always expect your users to do the unexpected, kids doubly so. Code defensively.
  • Kids picked up on the gestures and the "rhythm" required much faster than adults. The gestures, while not overly complex, do have some rather peculiar requirements (e.g. hand spread just so). Moreover, due to the relatively slow data feed, you can't move too fast, but for some gestures to be recognized, you can't move too slow. This speed fine-tuning was always the major problem when having adults try out the app at CASCON, but apparently kids just pick it up naturally. Moral: include children in any users tests for Diamond Touch apps.

For that matter, perhaps kids should be the target audience for these apps to begin with? I guess the hands-on interaction style appeals more to children anyway. Moreover, the tabletop's physical configuration promotes social and collaboration skills, two areas in which more traditional computer setups have been found lacking by parents and educators. So, Diamond Touch in kindergarten, anyone?

Spacewarps

Wednesday, October 15, 2003, 10:20PM - category HCI -

I'm getting this idea out of the way because I've recently discovered that it's (mostly) been done.

The problem

The scenario: some kind of application that supports colocated collaboration on a tabletop display. The app involves moving around elements that have a preferred orientation relative to the viewer and can be resized, but should not otherwise be deformed. (Think "Video Bench" if you've seen it, or a collaborative photo album editing app.) You immediately run into two visualization problems:

  1. The users are sitting on different sides of the table, and hence have different preferred orientations for the elements. Some elements are shared between the participants and might need to be reoriented on demand. Some elements are semi-private and will always be facing their owner.
  2. The tabletop surface is small and the display resolution is low. For editing, the elements must be large enough to show detail despite the low resolution. But if all the elements are large, you'll run out of space or have lots of overlap. So you need to manage element sizes.

A solution

The simplest solution to the problem is to give users direct control over the orientation and size of each element. However, unless these manipulations are actually the goal of the application (e.g. laying out a floorplan), they are pure overhead for the users and will distract them from the primary task. We'd like to make orientation and sizing automatic, while still giving the user some control.

Element positioning is probably the only generic manipulation you can't live without in this kind of app; the user simply must be able to reposition the elements as they desire. The obvious idea then is to use position to control orientation and size of the element as well. For example, if we somehow know where each user is, the elements get bigger as they get closer to the user and always rotate to face the person interacting with the element.

Great idea, but as I found out today it's been done. Read this paper before continuing.

So what's left to do? Here's my take on the limitations of the approach reported above:

  1. It's specific to a circular tabletop. ("Well, duh!" mutters the audience.) It's cool, but doesn't match up with the reality that most tables and projectors are decidedly rectangular in shape. I'm not convinced the polar coordinate system would be as useful on a rectangular surface.
  2. It seems to have only two automatic orientation modes: all elements face away from the center, or all elements face the same global direction. This is very limiting. In the first scheme, the orientation of elements "between" users is useless to all participants (i.e. nobody is facing them). In the second scheme, the collaborative aspect is forgotten as one participant's preference effectively takes over the whole table.
  3. The orientation and sizing mode is global: the same rules apply all over the table. There is no way to create local "zones" with different rules that may be changed independently.

A better solution?

Here's my idea on a superset of the circular design. The tabletop surface is partitioned into spaces. Each space has its own local coordinate system and, for every point in the space, determines the desired orientation and scaling factor for that point. One concrete but simple implementation would be to have a "baseline" in each space, which could be any 1D shape (e.g. a straight line segment, a circle, a point, discontinuous lines, etc.). Given a point of interest, find the shortest vector between the point and the baseline. The vector's direction gives you the desired orientation (like gravity), while the vector's length is inversely proportional to the scaling factor. The circular "central focus" mode is equivalent to putting a basedot in the centre; the "black hole" mode is a circle around the table's circumferance; the "magnet" mode is a baseline in front of the desired user.

What other interesting properties does this scheme have? For one, each user can have their own space, and arrange its "distortion field" in whatever way suits them. Also, a baseline drawn with a single finger specifies the field for the whole space, a very powerful yet simple interaction. The arbitrary shape of the baseline gives complete freedom in customizing a space, and spaces are no longer specific to a circular display.

Here's a concrete example. Say each user draws a straight baseline at their "bottom" of the display, approximately the width of their body. Then each space would have the following characteristics:

  1. All elements in front of the user face her properly.
  2. Elements to the sides are arranged in quarter-circles centred at the baseline's endpoints, facing towards the user at an angle.
  3. Elements further away (up) from the user or to the sides become smaller with distance.

Control issues

There is a non-obvious problem common to both approaches: since an element is usually larger than a pixel, which point do you use when determining the applicable distortion? I can think of three options, none very inviting:

  1. Use the centre point of the element for both determining and applying the distortion. This is nice, since the same point will always be used for each element. However, if the user doesn't "grab" the centre of the element when starting to drag it, the element could shrink/rotate out from underneath the user's finger, since the transformation will be about the centre. (For example, I grab a large element near a corner, and move it so its center is in a "small size" area. The element becomes smaller, and my finger is no longer within the element's boundaries.)
  2. Use the centre point to determine the distortion, but apply it at the grab point. This is a terrible idea, since applying the transform will shift the element's centre, thus changing the distortion factors, necessitating another distortion, etc., ad infinitum if you're not lucky. The converse (determine at grab point, apply at centre) is equally bad.
  3. Use the grab point to determine and apply the distortion. This leaves the element under the user's finger at all times, and gives a nice "physical" feeling to the movement. However, if the user drags the element using one corner, drops it, then grabs it by another corner, the applicable distortion changes suddenly and maybe unpredictably. You can also get into situations where two elements are visually located "in the same area", but have very different distortions applied since they were grabbed at different points.

I tend towards the last alternative, but I'm not sure how intuitive it would be. I'd be curious to find out what algorithm is used for the circular table.

A last remaining question is how does the table get partitioned into spaces? How are the borders determined, and what happens when a new space gets created? I don't have a good answer to this yet. I think users should be able to control the "strength" of each space, and this would provide the input to some kind of balancing algorithm that would settle the partition. When a new space is created, other spaces must adjust their borders, but it's unclear how to shift elements around. However, this would allow for some cool effects: imagine that you can drop elements into a trashcan. If you want to recover an element, you open the trashcan through some icon, and it "warps in" a space right over top of the icon with all the deleted elements in it, temporarily pushing away the elements in your space. When you close the trashcan, your space recovers its original shape and things return to normal. I think this could be modeled by using a 2D mesh for each space, kind of like "warp" deformations in typical paint tools.

Looking forward

As a final crazy idea, what if this scheme was applied to windows on a typical desktop? Granted, it might only make sense with a larger screen, and the orientation changes might be undesirable, but it's an interesting thought experiment. Compare and contrast this with the Mac OS X Exposé feature.

So, do you think there's still some interesting work to be done in this area? Are my ideas feasible and worth going for, or am I out to lunch? Or have I missed more literature and "it's all been done" already?

Video Bench

Wednesday, October 15, 2003, 09:26PM - category HCI -

Early this year James Chisan, myself, and a gang of undergrads (Jeff Cockburn, Reid Garner, Azarin Jazayeri and Jesse Wesson) developed an application I whimsically called the Video Bench. Against expectations, the application has taken on a life of its own and we recently demoed it at CASCON, getting a very warm reception from the audience. This entry explains what the application is about, what I learned from presenting it at CASCON, and what the future might hold for the Video Bench.

Genesis

It all started when Peggy obtained a Diamond Touch tabletop from MERL and was looking for some enterprising students to take it through its paces. Having recently seen Minority Report, I came up with the idea of using the table for gesture-driven video editing. To keep the project grounded, I decided to take the lead from old cut & paste film editing techniques, where strips of film were physically cut with a knife and pasted back together with tape. The design was fully sketched within days, and the Video Bench implemented in under 6 weeks.

The Video Bench in action.The Video Bench is a hands-on collaborative video editing application meant for casual users. To operate it, you sit on a special conductive mat in front of the Diamond Touch surface and use your hands to manipulate strips of video. Multiple people can use the app simultaneously without interfering with each other. The operations permitted on video strips are mostly pretty simple: play, pause, fast forward & rewind, cut & paste, copy & trash, zoom, and spread.

This last operation requires a bit of explanation. Since there's clearly not enough space on the tabletop to show every frame of each strip, the frames are collapsed into "cels". At first, each video clip loaded into the Video Bench is represented by a strip with only one cel. By moving back and forth through the video, the user can locate the desired point and cut the strip in two. However, navigating through video in this manner is tiresome (if familiar); we can do better. We allow the user to spread video by grabbing the edges of two cels in a strip and pulling them apart. As the edges get further apart, the space is filled with more cels that further subdivide the video between these two points. This is a kind of timeline zoom, where the time axis is partially projected onto the X axis, and is one of the few really novel ideas in this project.

For more details on the Video Bench, and copious amounts of screenshots and diagrams, please refer to our group's final report, keeping in mind that a few things have changed since it was written.

The unveiling

Once the report was written and submitted, I thought the project was pretty much over. However, Peggy suggested that we show it off at CASCON. James and I agreed, and we were soon booked into a prime spot of real-estate on the show floor. I'll spare you the troubled tale of troubles we went through to get all the equipment flown across the country and set up in its new location; suffice it to say, we were ready on D-day.

The Video Bench booth, finally set up.The exhibit was immensely popular with the public; we had a crowd of people around our booth whenever it was staffed. Most of that can be attributed to the inherent flashiness of the application in the middle of a rather cookie-cutter technology showcase (how exciting can you make a computer and monitor look, anyway, no matter what it's displaying?), but some people were genuinely interested in the concept and had some good suggestions. I also managed to surrepetitiously observe one person attempting to operate the Video Bench when the booth was unattended, which produced further insights on the user interface.

The most striking impression was that people were taking the Video Bench seriously. We had one person ask when (and how) we were planning to commercialize the prototype, and a number of people expressed interest in using the app. We knew, by design, that this app would only appeal to casual users, who are not willing to learn complex user interfaces. I was surprised, however, when a semi-professional videographer claimed that some clients are put off just looking at a professional video editing tool, even if they don't have to use it. Using the Video Bench to rough out a production would result in a much less threatening environment.

I also got some ideas about how to improve the user interface. The jogging operations are nearly useless: they are too imprecise to achieve frame-perfect positioning, yet too slow to quickly scan through video. Instead, dragging the cursor directly in the strip's top edge for absolute positioning, and having a relative positioning control in the bottom edge would be a better combo. (The relative positioning control would fast forward or rewind at a speed proportional to the finger's distance from some center point.) Another issue was moving strips around: people don't expect to be able to use full-hand gestures, and even once instructed they can be difficult to get right consistently. With the jog gestures out, we could use a single-finger drag in the cel to move strips. Alternatively, the adventurous user discovered that the dividers were "live", and tried to use them to move strips around. He was very confused when he discovered that the strips only moved horizontally (remember that the dividers are only meant for spread/fold). It might be a good idea to also use dividers as movement handles, as this would also allow for natural strip rotation when using two fingers!

Finally, other people mentioned that there's video-related work going on at the NRC and at UofT. (For DT-related work, see the next blog entry on spacewarps.) Another person thought that this kind of collaborative surface would be great for bioinformatics work; I didn't catch the details, but it involved visualizing protein structures and manipulating them collaboratively. Other people were keen on seeing this used for software engineering. When I brought up the fact that it's difficult to create content with fingers, they suggested that participants could have personal tablets for input to the communal surface.

Overall, people's enthusiasm and the many ideas I gathered make me think this project has some life in it yet.

A nebulous future

What's next for the Video Bench? We'll probably move it to the next generation prototype of the DT, and exhibit it a few more times at local venues. The previous section gave a few ideas for user interface refinements, and my next blog entry explores other perspectives. Brian Corrie at NewMIC did some work on a distributed version of the app, and has some other improvements in the pipeline. Ultimately, though, the Video Bench is looking for a new home: this is not mine (or James') primary area of research, and sooner or later (probably sooner) we'll have to focus on our dissertations. If the Video Bench is to survive and prosper, the project needs new blood. Anyone?