VirtualDub allows audio and video streams to be processed in direct mode. In this mode, data is simply copied from input and output. This has the advantage of much faster rendering and no quality loss, while still allowing a limited amount of editing.
Because of the way that audio and video compression works, there are some limitations imposed on the types and locations of edits that can be done in direct mode. However, because audio and video modes are independent, it is possible to have only one pipeline run in direct mode, and not incur the limitations that would be imposed by the other.
Video compression imposes severe restrictions on where edits can occur in the video stream in direct mode. Most compression occurs by removing redundant data between adjacent frames, which results in a delta frame that is dependant on the previous frame to be decoded properly. The result is that the previous frame can't be removed without making that delta frame undecodable. Frames which aren't dependant on the previous frame are known as key frames and serve as anchor points in the stream for seeking and editing purposes.
The rule that must be heeded when editing a direct mode stream in VirtualDub is that a portion of video to be removed must end on a keyframe.
Key frames are denoted by [K] next to the timestamp below the seek bar in VirtualDub; delta frames are denoted by [ ] instead. Because selections in VirtualDub are endpoint exclusive — meaning that the frame you end the selection on is not included in the selection — you want to end the selection on a key frame.
As an example, assume that you have a set of frames like this:
K | K | K |
This cut is kosher:
K | K | K |
This cut, however, is not, because it leaves a delta frame that is missing its predecessor:
K | K | K |
When such a cut is made, VirtualDub automatically adjusts the cut ranges until the restrictions of delta frame compression are satisfied. Thus, the above cut would actually give the following:
K | K | K |
The rules for such automatic corrections:
Thus, if you make a mistake, you can always load in the edited file and re-edit in direct mode, making a larger cut that satisfies the rules.
A null frame or drop frame, which is a zero-byte frame that simply duplicates the previous frame, has special handling in VirtualDub's pipeline. These are denoted by [D] next to the timestamp indicator and are occasionally produced during video capture. Such frames are dependant upon the previous frame, but can still be removed without affecting decoding. Note that these frames occupy time in the stream, however, and so deleting them will remove the corresponding audio segment as well.
The frame rate decimation and conversion modes resample a video stream by inserting or removing frames. This essentially involves micro-editing of the stream at the frame level and suffers from similar limitations with compressed streams. Here are the frame rate limitations when using direct mode:
As with edits, null frames also receive special treatment here, so if a video has been upsampled from 30fps to 120fps by inserting null frames, conversion can be used to discard the null frames and drop the stream back to 30fps.
A video stream using a format that only uses key frames imposes no limitations on the location of cuts in direct mode. Such formats include:
These formats are thus very friendly to direct-mode editing and are good choices for capture or intermediate video files.
MPEG-1 video streams cannot be copied in direct mode, because MPEG-1 video compression is incompatible with the AVI file format. Also, MPEG-1 audio streams are always decompressed to raw PCM regardless of the audio mode setting.
DV files that use interleaved storage (type-1 DV AVI) may have their audio streams slightly modified when processing the audio stream in direct mode, because VirtualDub has to resample the audio stream in some cases to force a consistent audio sample rate. This is not a problem for AVIs that have the DV data split into traditional audio and video streams (type-2).
Audio compression works by processing blocks of audio as individual units. In direct mode, VirtualDub copies these blocks as atomic units, so the length of time corresponding to the block sets the minimum granularity for edits, and thus the accuracy of edits that can be performed.[1]
For some formats that simply translate samples 1:1, such as A-law and μ-law, the block size is one sample and no restrictions are necessary. Other formats, such as ADPCM, can have a block size as large as 2048 samples (0.18s at 11KHz). The audio compression dialog indicates the block size for each selectable format.
VirtualDub does not attempt to adjust edits to match audio granularity because the audio block size rarely corresponds to an integral number of video frames in time, which would require fractional edits. The difference between the ideal cut point and the cut point imposed by audio compression appears as sync error and thus editing a compressed audio stream should be avoided if possible.
Some compressed formats, particularly MPEG audio layer III, have additional decoding restrictions that are not described adequately in the audio format structure, such as dependencies on previous frames, or even specify a block size that is blatantly false (namely, one byte). Because VirtualDub is not able to detect or correct for such limitations, editing streams in such formats can result in audible decoding errors due to block fragments at the cut point and is not recommended.
[1] The size of the block is set, in bytes, by the nBlockAlign field in the WAVEFORMATEX structure that describes the audio format.