Sox Quick Reference¶
Basic Synopsis¶
sox [global-options] [inflie-options] infile1 [[infile-options] infile2] ... [outfile-options] outfile [effect [effect-options]] ...
Highlights:
- The options (e.g. global options
-h
, options for input file-v
, and options for input/output file-b
) always have a leading dash ('-' or '--'), while the effects (e.g.vol
) do not - The global options can be specified anywhere before the first effect
- The effects should be specified after the
outfile
and will be applied to theoutfile
after the inputs are combined.
Input File Combining¶
SoX supports the following methods to combine multiple input files:
- concatenate:
- Default method for SoX. Or explicitly specified by global option
--combine concatenate
- Input files must have the same sampling rate and the same number of channels.
- \(N\) input files of duration \(T_1, T_2, \cdots, T_N\) will be concatenated to one output of duration \((T_1+T_2+\cdots+T_N)\)
- Default method for SoX. Or explicitly specified by global option
- mix:
- Specified by global option
--combine mix
or-m
- Input files must have the same sampling rate.
- The \(k\)-th channel at output is the sum of the \(k\)-th channel of all \(N\) input files
- If infile option
--volume
is not specified for individual input files, a normalization factor of \(\frac{1}{N}\) will be applied by default.
- Specified by global option
- mix-power:
- Specifiled by global option
--combine mix-power
- Similar to mix, but the default normalization factor is \(\frac{1}{\sqrt{N}}\)
- Specifiled by global option
- merge:
- Specified by global option
--combine merge
or-M
- Input files must have the same sampling rate
- \(N\) input files of \(K_1, K_2, \cdots, K_N\) channels will be merged to one output of \((K_1 + K_2 + \cdots + K_N)\) channels
- Specified by global option
- multiply:
- Specified by global option
--combine multiply
or-T
- Input files must hoave the same sampling rate
- The \(k\)-th channel at output is the product of the \(k\)-th channel of all \(N\) input files. If the number of channels in the input files is not the same, the missing channels are considered to contain all zero.
- Specified by global option
Scaling Audio File¶
Scaling Input by infile option -v
¶
# scale in1 by 2x and scale in2 by 0.5x and then combine them
sox -v 2 in1.wav -v 0.5 in2.wav out.wav
Scaling Output by effect vol
¶
# scale out.wav by 2x
sox in.wav out.wav vol 2
# scale out.wav by 3dB
sox in.wav out.wav vol 3dB
Normalizing Audio with global option --norm
¶
# Normalize audio to magnitude +/-1.0
sox --norm in.wav out.wav
# Normalize audio to magnitude +/-0.5, or -6dBFS
sox --norm=-6 in.wav out.wav
Trimming Audio File¶
To trim an audio file to preserve the original audio between the timestamp {N1, N2} (unit: seconds):
# use add an equal sign (=) before N2 to denote that N2 is the end
# timestamp rather than the length
sox in.wav out.wav trim N1 =N2
To trim an audio file to discard the audio before timestamp N1 (unit: seconds) and keep L1 seconds of audio after N1:
sox in.wav out.wav trim N1 L1
Remixing the channels¶
remix
command allows the user to remix the audio channels of one file
sox in.wav out.wav remix 1-3,5 4 0
The above command creates out.wav
with 3 channels where
out.wav
channel 1 = mix ofin.wav
channel 1,2,3 and 5 (by averaging them)out.wav
channel 2 =in.wav
channel 4out.wav
channel 3 = silence
To manually control the scaling factor when mixing the channels:
sox in.wav out.wav remix 1v1.0,2v0.5,3v0.5,5v0.2
Now out.wav
channel 1 = 1.0(in.wav
channel 1) + 0.5(in.wav
channel 2) + 0.5(in.wav
channel 3) + 0.2(in.wav
channel 5)
Obtaining the length of the file¶
# Return the length in HH:MM:SS format
sox --i -d in.wav
# Return the length in seconds
sox --i -D in.wav
# Return the length in number of samples
sox --i -s in.wav
Padding silence¶
# Pad 1 seconds silence at the beginning and 2 seconds silence at
# the end of the in.wav
sox in.wav out.wav pad 1.0 2.0
# Pad 3 seconds silence at 4 minutes into the in.wav
sox in.wav out.wav pad 3.0@4:00
# Pad 5000 samples of silence at 4 minutes into the in.wav
sox in.wav out.wav pad 5000s@4:00
Changing sampling rate¶
sox in.wav -r 16000 out.wav
Adding white noise¶
# Generate noise.wav that is of the same format (duration,
# number of channels, etc) as in.wav
sox in.wav noise.wav synth whitenoise vol 0.02
# Generate out.wav by mixing in.wav and noise.wav with their
# original volume
sox -m -v 1.0 in.wav -v 1.0 noise.wav out.wav
Generating a silence or a tone¶
# Generate a 5 second, 16kHz, 2-channel, audio file containing silence.
sox -n -r 16000 -c 2 silence.wav trim 0 5
# Generate a 5 second, 8kHz, audio file containing a sine-wave of 300Hz:
sox -n -r 8000 sine.wav synth 5 sine 300
# Generate a 5 second, 8kHz, audio file containing a sine-wave swept
# from 300 to 3300 Hz:
sox -n -r 8000 sine.wav synth 5 sine 300-3300
Note:
-n
is the "null file" option and is considered as a file containing infinite amount of silence. This option is usually used with some finite-length effects such astrim
orsynth
Specify WAV format¶
# Generate a 5 second, 8kHz, audio file containing a sine-wave of 300Hz
sox -n -r 8000 sine.wav synth 5 sine 300
# Generate the wave file in 16-bit PCM
sox -n -r 8000 -b 16 -e signed-integer sine.wav synth 5 sine 300
Some other common encodings:
unsigned-integer
: PCM data stored as unsigned integers. Commonly used with an 8-bit encoding sizefloating-point
: PCM data stored as IEEE 753 single precision (32-bit) or double precision (64-bit) floating-point (‘real’) numbersa-law
: International telephony standard for logarithmic encoding to 8 bits per sample
Convert PCM format to WAV¶
Convert a PCM file of 1 channel, 16-bit, 48kHz sampling rate, signed-integer encoding to WAV
sox -t raw -c 1 -b 16 -r 48000 -e signed in.pcm out.wav
It is required to specify the file type -t raw
. Otherwise sox
will fail due to .pcm
being an unrecognized format. See soxformat for the list of the supported formats.