Processing: Direct mode

VirtualDub allows audio and video streams to be processed in direct mode. In this mode, data is simply copied from input and output. This has the advantage of much faster rendering and no quality loss, while still allowing a limited amount of editing.

Because of the way that audio and video compression works, there are some limitations imposed on the types and locations of edits that can be done in direct mode. However, because audio and video modes are independent, it is possible to have only one pipeline run in direct mode, and not incur the limitations that would be imposed by the other.

Limitations on editing compressed video streams

Video compression imposes severe restrictions on where edits can occur in the video stream in direct mode. Most compression occurs by removing redundant data between adjacent frames, which results in a delta frame that is dependant on the previous frame to be decoded properly. The result is that the previous frame can't be removed without making that delta frame undecodable. Frames which aren't dependant on the previous frame are known as key frames and serve as anchor points in the stream for seeking and editing purposes.

The rule that must be heeded when editing a direct mode stream in VirtualDub is that a portion of video to be removed must end on a keyframe.

Key frames are denoted by [K] next to the timestamp below the seek bar in VirtualDub; delta frames are denoted by [ ] instead. Because selections in VirtualDub are endpoint exclusive — meaning that the frame you end the selection on is not included in the selection — you want to end the selection on a key frame.

As an example, assume that you have a set of frames like this:

This cut is kosher:

This cut, however, is not, because it leaves a delta frame that is missing its predecessor:

When such a cut is made, VirtualDub automatically adjusts the cut ranges until the restrictions of delta frame compression are satisfied. Thus, the above cut would actually give the following:

The rules for such automatic corrections:

VirtualDub will not let you write a video stream with dangling delta frames.
Frames are always added back in, but never removed, duplicated or reordered.

Thus, if you make a mistake, you can always load in the edited file and re-edit in direct mode, making a larger cut that satisfies the rules.

The same restrictions apply to masking frames as to deleting frames; if an unmasked delta frame exists after a masked frame, the masked frame will be converted to unmasked before the operation begins.

A null frame or drop frame, which is a zero-byte frame that simply duplicates the previous frame, has special handling in VirtualDub's pipeline. These are denoted by [D] next to the timestamp indicator and are occasionally produced during video capture. Such frames are dependant upon the previous frame, but can still be removed without affecting decoding. Note that these frames occupy time in the stream, however, and so deleting them will remove the corresponding audio segment as well.

It is sometimes possible to bypass the restrictions on cut positions by using smart rendering.

Video frame decimation/conversion in direct mode

The frame rate decimation and conversion modes resample a video stream by inserting or removing frames. This essentially involves micro-editing of the stream at the frame level and suffers from similar limitations with compressed streams. Here are the frame rate limitations when using direct mode:

Frame rate adjustment simply tweaks the frame rate of the video stream and can be used without limitation.
Conversion to a higher rate works by inserting zero-byte null frames into the output stream, and can also be used without limitation. (This means you can convert a 30fps stream to 120fps with no loss and with almost no size increase.)
Conversion to a lower rate has to delete frames, but suffers from the limitation on dependant frame removal. If used on a compressed stream, the option is only able to remove frames immediately before a key frame, which means that sequences of delta frames are longer than a few frames, the video will stutter and audio sync will be affected. Conversion to a lower rate is thus only usable with a stream that has few or no delta frames.
Decimation is equivalent to conversion to a 1/N frame rate and has the same issues.

As with edits, null frames also receive special treatment here, so if a video has been upsampled from 30fps to 120fps by inserting null frames, conversion can be used to discard the null frames and drop the stream back to 30fps.

Video streams that are direct-mode friendly

A video stream using a format that only uses key frames imposes no limitations on the location of cuts in direct mode. Such formats include:

Any uncompresed RGB or paletted format
Any uncompressed YCbCr format (UYVY, YUY2, YV12, I420, etc.)
Video compression that only uses key frames, such as Huffyuv, Motion JPEG, or DV.

These formats are thus very friendly to direct-mode editing and are good choices for capture or intermediate video files.

Limitations on direct-mode imposed by source format

MPEG-1 video streams cannot be copied in direct mode, because MPEG-1 video compression is incompatible with the AVI file format. Also, MPEG-1 audio streams are always decompressed to raw PCM regardless of the audio mode setting.

DV files that use interleaved storage (type-1 DV AVI) may have their audio streams slightly modified when processing the audio stream in direct mode, because VirtualDub has to resample the audio stream in some cases to force a consistent audio sample rate. This is not a problem for AVIs that have the DV data split into traditional audio and video streams (type-2).

Limitations on editing imposed by audio compression

Audio compression works by processing blocks of audio as individual units. In direct mode, VirtualDub copies these blocks as atomic units, so the length of time corresponding to the block sets the minimum granularity for edits, and thus the accuracy of edits that can be performed.[1]

For some formats that simply translate samples 1:1, such as A-law and μ-law, the block size is one sample and no restrictions are necessary. Other formats, such as ADPCM, can have a block size as large as 2048 samples (0.18s at 11KHz). The audio compression dialog indicates the block size for each selectable format.

VirtualDub does not attempt to adjust edits to match audio granularity because the audio block size rarely corresponds to an integral number of video frames in time, which would require fractional edits. The difference between the ideal cut point and the cut point imposed by audio compression appears as sync error and thus editing a compressed audio stream should be avoided if possible.

Some compressed formats, particularly MPEG audio layer III, have additional decoding restrictions that are not described adequately in the audio format structure, such as dependencies on previous frames, or even specify a block size that is blatantly false (namely, one byte). Because VirtualDub is not able to detect or correct for such limitations, editing streams in such formats can result in audible decoding errors due to block fragments at the cut point and is not recommended.

[1] The size of the block is set, in bytes, by the nBlockAlign field in the WAVEFORMATEX structure that describes the audio format.