Mutual Telexistence


Concept of Mutel

We have been developing a system named "mutual telexistence" which allows for face-to-face communication between remote users. Unlike most conventional tele-conferencing systems which limit the users' viewpoint to a fixed position, mutual telexistence system should provide the images of the other users corresponding to the change of a user's eye position in the computer generated three-dimensional space.

On the other hand, the technique called "image-based rendering" has emerged recently. By using this technique, complex photo-realistic images at arbitrary viewpoint can be effectively synthesized. Fundamentally employing this technique, however, our system must synthesize the image in real-time for smooth communication. Thus, our prototype system has the following features to fulfill this "real-time" request:

  1. Geometric information such as a "depth map" is needless. (Only one parameter of the object's distance is required.) So, our system doesn't have time-consuming process such as pattern matching.
  2. The data which is processed by the rendering computer is reduced in advance at the stage of capturing the source images. This means that the rendering computer doesn't have to deal with bulky data of the "plenoptic function."
  3. Fast rendering is possible by making use of graphic hardware acceleration of texture mapping. That is to say, this system is assembled by relatively low-cost general graphic hardware.
Prototype System
Click the image to view a mpeg movie.
Prototype System Result
Click the image to view a mpeg movie.
Figure 1. Prototype System
Figure 2. Synthesized Video Sequence

The left picture of Figure 1 shows our prototype system, and the right one is its block diagram. This system consists of the camera unit, the rendering PC, and the control PC. In the camera unit, 12 small color CCD cameras are aligned horizontally in a row on the linear actuator with their intervals 54mm. These cameras are located on the gantry with rotating around their optical axes. Thus, the direction of their scanning lines is vertical. Moreover, all these cameras are synchronized by the same genlock signal and a video switch is installed between the camera unit and the rendering PC. This design allows the rendering PC to capture only one scanning line selectively among 12 video streams, which result in the reduction of data and the saving of a number of video capturing devices. The rendering PC synthesizes the images of the object from arbitrary viewpoint by texture-mapping captured vertically long tile image on a transparent plane in the computer-generated three-dimensional space. The control PC indicates the channel of this switch and controls the motion of the linear actuator.

Click Figure 2 and see a video sequence synthesized by our system. Moving human figures are successfully rendered in real-time.

More specific technical data are available in http://www.star.t.u-tokyo.ac.jp/projects/mutel/.


Contact: Yutaka Kunita <kuni@star.t.u-tokyo.ac.jp>