Typically Sony RS422 protocol. A 9 Pin port on a video machine
enables the transport and arming to be controlled by another
machine such as a Digital Audio Workstation. The two machines are
connected by a serial cable between their 9 Pin ports.
The mixing of a large project is often simplified by pre-mixing
various components prior to a final mix. For example dialogue
tracks are commonly pre-mixed to a smaller number of tracks while
being processed through dynamics and background noise control
Automatic Dialogue Replacement – A term in common use that
describes the re-recording of actor’s dialogue after the
origional film shoot for either technical or performance reasons.
Done in a studio in sync with picture.
Named after film pioneer Jack Foley who realised that recording
effects like footsteps,body moves and prop handling etc greatly
enhanced the soundtrack. Now days Foley is an integral component
of a film’s sound track especially in creating a detailed music
and effects mix. The Foley team consists of a recordist and at
least one foley artist. Typically, for a feature film, all
characters have every footstep recorded including all changes of
footware and surfaces. There would be a clothing move track also
for all major foreground characters. Every prop that is handled
or that can make a sound is also covered. The foley team in
liason with the sound effects editors also cover many other
essential sound effect elements for the film. All these tracks,
once recorded, must be edited and fitted in perfect sync to the
The M&E is a music and effects mix without dialogue. This mix
is essential for foreign sales of the film or TV series, as
another language can be dubbed over this existing music and
effects sound track.
After the mix of a project is finished the sound post production
team (usually the re-recording mixer or their assistant) must
deliver the completed sound track in a variety of formats to
satisfy the production company and the film’s distributers. For a
feature film (in Australia) this entails the main English mix
which would be printmastered as a Dolby SR-D sound track to MO
disk for the lab to create the sound negative.
The DVD version of the film would require an AC3 file
of the above 5.1 mix delivered as data to the DVD authoring facility.
An M&E mix is required (usually delivered as .wav files or an 8 track
digital tape) which would comprise a 6 channel
(left,center,right,left surround,right surrround and sub woofer)
mix and a Dolby Stereo version on tracks 7 and 8 called an LT RT.
A mix for television which has been speed changed to match the
25fps television version. This mix is usually a Dolby Surround
LT RT but will be increasingly also delivered as a discrete 5.1
mix in Dolby E format as Digital broadcasting takes over.
Other delivery items often include D&M&E versions of the
main mix. These are typically a split of the mix into either mono
or stereo Dialogue, Music and Effects stems. They’re usually
delivered as .wav files or on an 8 track digital tape format.
An airline version is another common delivery requirement. It is
a 25fps version with a reduced dynamic range but most importantly
it must be non-expletive so as not to offend airline passengers.
A mix stem is typically a group of channels or tracks that have
their speaker positions defined. For example an atmosphere track
stem could consist of 5 tracks designated as left, right,left
surround,right surrround and sub. The same could apply for music
stems with the addition of an extra center track. Dialogue is
usually treated differently as most would be placed in the center
channel with only special effects located in the other channels.
Spot effects are often mixed to three track stems designated
left, center, right as well as the 6 track stem described
Stands for left total/ right total and is a Dolby name for the
matrix encoded 2 track Dolby Surround, Dolby Stereo mix. This mix
is a 4 channel mix encoded into 2 channels and consists of left,
center, right and surround channels. For feature film delivery we
create a Dolby Stereo LT RT (with Dolby SR noise reduction) hence
the term SR-D, as well as the Dolby Digital mix so that cinemas
that don’t have Dolby Digital playback capability can still
screen the film. All Dolby films with an SR-D sound track have
both sound tracks on the film for this very reason and also to
enable switching between the two if a problem with the Digital
sound track occurs.
A Dolby Surround LT/RT is typically for television broadcast and
has no SR noise reduction applied.
This is a term that describes the printing of the final mix of a
film. The format most commonly used to print this master mix to
is a Magneto Optical disk attached via SSCI to a host computer
that comprises part of the Dolby DS 10 mastering system. The DS
10 consists of this computer an MO
drive interfaced to a coder unit and a studio interface. This DS
10 remains the property of the Dolby corporation and is either on
permanent loan or brought in for the occasion. The Dolby
consultant for a territory is responsable for delivering and
maintaining the DS 10′s and of course for ensuring any sound
track print mastered conforms to the Dolby guidlines. The DS 10
operates like a dubber in that it’s transport must be bi-phase
controlled by a syncroniser locked to timecode.
Is a standard for digital (AES/EBU) audio transmission and
connection. It is where a pair of digital signals are transmitted
on shielded twisted pair cable. The connections either end are 3
pin XLR connectors. A special type of cable that maintains a
constant 110 ohm impedence is recomended.
An AES 3 pair can also be distributed via 75 ohm co-axial cable
as a sort of un-balanced version of the above. This is common in
a broadcast facility where most digital audio routers are
terminated via 75 ohm BNC connectors. It is recommended that a
75-110 ohm transformer be used as an interface between the two
Is a Dolby standard for multi channel digital audio sound tracks.
AC 3 encoding turns the 5.1 mix of left,center,right,left
surround, right surround and sub into a single digital bit
stream. It is used in both the creation of a printmaster by the
Dolby DS 10 and also in the broadcast of Dolby Digital sound
tracks for television. The difference between the two is that a
Dolby Digital broadcast also contains meta data ( data about
IMAX is a large format film. Each IMAX film frame is three times
the size of a 70mm frame. The sound track arrangement is quite
different to Dolby Digital. The mix is delivered as a 6 track mix
but the tracks are left, left rear, right rear, right, center and
top. There is no dedicated sub-woofer track, it is derived from
all channels in the IMAX cinema. Unlike cinemas designed
primarily for Dolby Digital, an IMAX cinema has no surround
speakers on the side walls. It’s left rear and right rear
speakers are located at the rear of the cinema and are capable of
reproducing the full audio frequency bandwidth.
Mulichannel Audio Digital Interface. These interfaces are usually
available as optional boards on digital audio workstations and
digital mixing consoles. They enable up to 52 channels of digital
audio to be transmitted along a single 75 ohm co-axial cable. It
can be an expensive interface but the low cost of the cableing
(compared to 110 ohm AES/EBU cable) makes it a very attractive
SPG/Sync pulse generator. Also known as a black burst or color black generator.
Every audio and video post production house must have a way of
ensuring that all equipment is referenced to a master clock that
defines the beginning and end of each video frame. Since a video
frame has two fields without such a reference machines could not
possibly be locked to the same leading frame edge. Black burst
generators are either of the PAL or NTSC standard.
A word clock connection between two machines ensures that the
timing of digital audio transmission and reception is identical.
In a 48KHz/24 bit set up it means that the receiving machine is
set to receive 48 thousand 24 bit words each second exactly as
delivered. Without a word clock reference the receiving machine
would exhibit pops and audible sound glitches because it is not
locked to the incomeing digital stream. Word clock provides a
much more accurate clocking between machines than video sync.
However, a mixture of video sync and word clock is the norm in
most audio post production facilities. This is because vision
machines like Digital Betacam’s and SP Betacam’s only accept
video sync not word clock. A common arrangement would be a Digial
Audio Workstation referenced to video sync, a Digital Betacam
referenced to the same video sync and a Digital mixing console in
the middle referenced to the word clock output of the Digital
- We prefer a Pal DV or Apple Motion JPEG A with a minimum resolution of 720 x 576 pixels with transparent burnt in timecode.
- Guide audio should be on track 1 and any temp music and Fx seperated and placed on track 2.
- If possible we would like a PAL SP Beta as well. No burnt-in timecode thanks.
- For a Feature Film shot at 24fps please either telecine a conformed print at 24 fps or playout from the edit suit at 24fps to SP beta or generate a 24fps vision file.
- We prefer Picture Start for each spool at a corresponding hour. For example Pic Start Spool 1 @ 01.00.00.00, Pic Start Spool 2 @ 02.00.00.00 etc. It is important to get this Pic start and timecode alignment field accurate especially when changed spools are telecined.
- Needless to say all versions of the picture for sound post-production should have an appropriate clock leader (24fps or 25fps) at the head and for film an end pip as well.
- Please provide all EDL’s in either CMX 3600 or Grass Valley
- They can preferably be e-mailed to Audio Loc or delivered on a
- The EDL’s of relevence should be clearly named to avoid
confusion when multiple EDL’s are present.
- It is most important that all the source rolls be accounted
for and accurately named. Please check.
- There is no need to sort the EDL or to provide a
- Exact specifications for music delivery, of course, depend on the project. However, as a general guideline, the music cues should be edited and fitted in place before delivery to us.
- We prefer OMF files or Bwav files.
- Music for Dolby Surround television programs and Documentaries can simply be delivered as a stereo mix. The mixer can create surround channel information if required. If it’s necessary to keep an option open (a solo instrument or vocal for example) then deliver as OMF or Bwav files. Again, as a general guideline, please keep it simple.
- Music for a Dolby SR/D or IMAX film should be mixed in 5 track stems (left, center, right, left surround and right surround). The mixer can create the subwoofer information if not supplied as a discrete channel. The usual media for delivering these music mixes is OMF or Bwav files on DVD or drive.
- We have 2 HD2 systems and can accept Protools sessions directly
- When necessary, the best way to file exchange between ProTools and Fairlight
is via OMF.
- In a Protools enviroment users have Digitranslator software either as a standalone solution or as part of the DV TOOLKIT to create and read OMF files.
- We can take either OMF1 or OMF2 files either embedded or linked to source, .aif or .wav.
- Delivery can be via CD, DVD, Firewire/USB drive or FTP
- For OMF Exports from HD systems please set your auto region Fade In/Out length to 0 msecs. This is found under – Setup/Preferences/Operation – auto region Fade In/Out length. This prevents an extra 2 clips being generated for every clip when translating to other digital workstation platforms.
- Because of the frame rate differences between 24fps film and 25fps video there will be image glitches in a 24fps telecine as an extra frame is added every second.This is not a problem for sound post production as it’s better to keep the film speed consistant through to the final delivery of the mix.
- This is however a problem when a video version of the film needs to be made.
- The origional 24fps film will have to be sped up to 25fps to get a frame rate match to video. This means that the sound mix for the video version will also have to be speed changed to match. Most sound postproduction facilities are equipped and familiar with this speed change requirement. A corresponding pitch drop is usually applied to the speed increased mix to bring it back to normal.
- One of the greatest areas of confusion we constantly come accross is in understanding film and video frame rates vs timecode frame rates. To set things straight let’s say at the outset-they are two very different things.
- The Film standard frame rate is 24fps. The PAL Video frame rate is 25fps. The NTSC Video frame rate is called 30fps but is in reality approximately 29.97fps. These are the rates that frames are displayed each second, and as such, are independent of any timecode rate.
- There are four timecode standards in common use – 24fps for film, 25fps for PAL Video and the two NTSC timecode rates of Drop and Non Drop frame timecode. There is a lot of semantic confusion around the two NTSC standards as they are commonly called a variety of names. Non Drop Timecode is 30fps Timecode. Drop Frame Timecode is 30fps Timecode with timecode numbers (Not Video Frames) dropped to keep timecode in sync with elapsed time (see following explanation in NTSC and PAL).
- 29.97 Timecode in any flavour is not an SMPTE standard though some people refer to 29.97 drop frame time code when they really mean standard Drop Frame Timecode and to 29.97 non drop frame timecode when they really mean standard Non Drop Frame Timecode.
- Camera or film speed is the frame rate speed that the project was shot at.
- When we say film speed at 24fps we mean that 24 frames of film move past a point in one second irrespective of what timecode format is running at the time, be it 30 non drop or 25 fps code
- All our film mixes are delivered at film speed with 25 frame time code.
- The reason our mix is at film speed is that the picture we work to is either an Avid dump or telecine to SP Beta at 24fps. The Avid pictures having been digitised at 24fps speed. Of course our SP Beta is running at 25 fps with 25 fps code but the source was running at 24 fps speed relative to it when the dump was made.
- Our mix will always be in sync at film speed because of this.
- If a project is intended to be delivered as a film print for cinema release then it would have been shot at 24fps (film speed),telecined/datacined at 24fps,edited (hopefully) at 24fps and played out to SP Beta for us at 24fps. 24 fps because thats the speed that cinema projectors play at.
- Please note that this applies only in PAL territories. In NTSC countries like Japan, Korea and the US the speed of the origional film (24fps) is changed on telecine. Please see NTSC and PAL.
- If the project is never intended for cinema release but shot on film ie: Television Mini-series, then it would be shot at video speed (25fps),telecined/datacined at 25fps,edited at 25fps and played out to SP Beta for us at 25fps.
- The timecode format is irrelevent, it is the speed that the film moves at that is important. Having said that, the timecode frame rate in PAL territories like Australia is standardised at 25fps irrespective of film speed.
- Of course if a project is shot on a video format ie:Digital Betacam it will automatically have a video frame rate of 25fps because that is the PAL standard.
- Sync sound recorded at the time of shooting is transfered at the appropriate film or video speed. What this means is that, if the picture is played at the speed it was shot, then there is no need to change sound speed to maintain sync. If the speed of the picture is changed however then the sound should have the corresponding speed change applied.
- The best and foolproof way for us to work on a film from an NTSC country is to telecine a print at 24fps in a PAL country.
- In PAL territories, like Australia, clock speed of video machines is referenced to the electrical AC frequency of 50Hz. In NTSC territories like the United States the reference of clock speed is to 60 Hz AC.
- In PAL land there is an exact integer number of frames in each second so that timecode has an exact correspondence with clock time ie: 25 fps at 50 Hz. In NTSC land there is not. Even though everyone refers to NTSC video as running at 30 fps it in fact does not.
- During the transition from monochrome to color television in the USA it was found that to have a frame rate of 30fps ( 2 x fields at 60 Hz reference) caused interference among the horizontal, sound and color frequencies when both color and black and white signals were broadcast simultaneously.
- These problems were resolved by reducing the field rate by a factor of exactly 1000 / 1001. This equals .1%. So NTSC color video always runs at 29.97 frames per second /59.94 Hz.
- This of course created a major problem for the previously used 30fps timecode standard as it no longer kept pace with a video frame rate running slower than 30fps by .1%. One hour of 30fps video contains 108,000 frames. One hour of 29.97 video contains 107,892 frames, 108 frames less. Consequently, with a program of exactly one hours duration, counting at 30fps would give a final frame timecode address of 01:00:03:18 or 108 frames more than real time. This lack of correlation with elapsed time was solved by what is known now as Drop Frame Timecode.
- Drop Frame Timecode is 30fps time code with frame numbers 00 and 01 dropped from minutes 1 to 9 over a 10 minute cycle. By the 10 minute mark 18 timecode frames have been dropped and timecode is back in sync with elapsed time. Please note that no video frames are dropped only timecode frames.
- In an NTSC territory when a film, shot at 24fps, is to be telecined it is actually played at 23.976 frames per second thus allowing the film frames to divide evenly into 29.97 video frames eliminating vision jitter. Any sync sound is also slowed down .1% at telecine. The usual term for this speed change is known as a 2-3 Pulldown. To make up the extra frames required between 23.976 film frames and 29.97 video frames 2 video frames represent the first film frame and 3 video frames represent the second in a continuous sequence hence (2-3)
- So sound post production folk in these lands,if they’re looking at a telecined image on a video machine, are working to a .1% slowed picture . All their work, while being in sync with the video image, can never be in sync with the film because of this speed change. On final delivery of their work they have to increase the speed of their soundtrack by exactly .1% to bring it back into sync with a print running at 24fps film speed.
- Our original Hi-8 (DA-88) mixes at film speed, can be easily manipulated to increase or decrease speed by this corresponding .1%. Most of these playback machines have a dedicated pull up/ down menu.
- To make our mix sync to a telecined NTSC picture it would be a matter of pulling down .1% speed (decreasing the speed by .1%). When returned to it’s normal film speed our mix would sync to film.
- For a more in depth examination – look at Editors Guild
- The standard format for HDTV has become a 16:9 aspect ratio
picture (wide screen) with both a 5.1 and Dolby Surround sound
track, selectable by each viewer.
- The change to complete digital broadcasting will take place
over the next 7 years with standard definition transmission
occuring simultaneously until then.
- For us, delivering these sound tracks is second nature as
we have been doing so for years in our feature film work. The
only real difference is the media that the sound track is
- Digital Broadcasting in Australia will give the viewer the
ability (after buying the necessary boxes of-course), to
receive a high quality picture with a Dolby Digital 5.1 channel
- The Dolby Surround sound track with it’s
(left,center,right,surround) channels will be used for the
simultaneous Standard Definition broadcast until the change to
digital transmission is complete. The Dolby Surround sound
track is also necessary for International versions of the
program where prospective territories may or may not have
Digital transmission capability.
- Dolby E is a form of audio coding where up to 8 sound
channels can be encoded(Dolby 571) onto a single AES3 pair, and
recorded on (for example) 2 channels of a Digital Betacam.
- These 2 channels can be decoded (Dolby 572) to retrieve the
8 audio channels prior to transmission
- There are a number of track format options available in the
encoding process but the most common is a 6-2 approach which
has the 6 channels of the 5.1 mix plus a 2 channel LT RT mix
- Dolby E is designed as a professional production format
meaning that it never reaches the viewer. The Dolby E data
stream, created by us, is delivered to the broadcast facility
where it is decoded prior to being encoded again as a AC-3
(Dolby Digital) signal and then transmitted.
- The beauty of the system is that the sound post facility
(us) can encode their surround and 5.1 mixes and record them on
2 digi Beta tracks. The broadcaster then has a master that has
both picture and 8 channel audio. Another benefit of Dolby E is
that audio and vision frames are matched exactly which means
the master can be edited free of glitches.
- The Dolby E process does however introduce 1 frame of delay
during encode and another frame of delay during decode. Because
of this we advance our mixes by 1 frame prior to Dolby E
encoding thus ensuring that the audio is frame accurately in
sync with the picture on the master tape. The responsibility
for the further frame of delay on decode lies with the
- Dolby E also incorporates metadata (data about data) that
can be set during the encode phase. This data can control the
reproduction of audio at the viewer’s location such as dynamic
- For a more information – please look at
Copyright © 2000-2006 Audio Loc