Sox Quick Reference¶
Basic Synopsis¶
sox [global-options] [inflie-options] infile1 [[infile-options] infile2] ... [outfile-options] outfile [effect [effect-options]] ...
Highlights:
- The options (e.g. global options
-h, options for input file-v, and options for input/output file-b) always have a leading dash ('-' or '--'), while the effects (e.g.vol) do not - The global options can be specified anywhere before the first effect
- The effects should be specified after the
outfileand will be applied to theoutfileafter the inputs are combined.
Input File Combining¶
SoX supports the following methods to combine multiple input files:
- concatenate:
- Default method for SoX. Or explicitly specified by global option
--combine concatenate - Input files must have the same sampling rate and the same number of channels.
- \(N\) input files of duration \(T_1, T_2, \cdots, T_N\) will be concatenated to one output of duration \((T_1+T_2+\cdots+T_N)\)
- Default method for SoX. Or explicitly specified by global option
- mix:
- Specified by global option
--combine mixor-m - Input files must have the same sampling rate.
- The \(k\)-th channel at output is the sum of the \(k\)-th channel of all \(N\) input files
- If infile option
--volumeis not specified for individual input files, a normalization factor of \(\frac{1}{N}\) will be applied by default.
- Specified by global option
- mix-power:
- Specifiled by global option
--combine mix-power - Similar to mix, but the default normalization factor is \(\frac{1}{\sqrt{N}}\)
- Specifiled by global option
- merge:
- Specified by global option
--combine mergeor-M - Input files must have the same sampling rate
- \(N\) input files of \(K_1, K_2, \cdots, K_N\) channels will be merged to one output of \((K_1 + K_2 + \cdots + K_N)\) channels
- Specified by global option
- multiply:
- Specified by global option
--combine multiplyor-T - Input files must hoave the same sampling rate
- The \(k\)-th channel at output is the product of the \(k\)-th channel of all \(N\) input files. If the number of channels in the input files is not the same, the missing channels are considered to contain all zero.
- Specified by global option
Scaling Audio File¶
Scaling Input by infile option -v¶
# scale in1 by 2x and scale in2 by 0.5x and then combine them
sox -v 2 in1.wav -v 0.5 in2.wav out.wav
Scaling Output by effect vol¶
# scale out.wav by 2x
sox in.wav out.wav vol 2
# scale out.wav by 3dB
sox in.wav out.wav vol 3dB
Normalizing Audio with global option --norm¶
# Normalize audio to magnitude +/-1.0
sox --norm in.wav out.wav
# Normalize audio to magnitude +/-0.5, or -6dBFS
sox --norm=-6 in.wav out.wav
Trimming Audio File¶
To trim an audio file to preserve the original audio between the timestamp {N1, N2} (unit: seconds):
# use add an equal sign (=) before N2 to denote that N2 is the end
# timestamp rather than the length
sox in.wav out.wav trim N1 =N2
To trim an audio file to discard the audio before timestamp N1 (unit: seconds) and keep L1 seconds of audio after N1:
sox in.wav out.wav trim N1 L1
Remixing the channels¶
remix command allows the user to remix the audio channels of one file
sox in.wav out.wav remix 1-3,5 4 0
The above command creates out.wav with 3 channels where
out.wavchannel 1 = mix ofin.wavchannel 1,2,3 and 5 (by averaging them)out.wavchannel 2 =in.wavchannel 4out.wavchannel 3 = silence
To manually control the scaling factor when mixing the channels:
sox in.wav out.wav remix 1v1.0,2v0.5,3v0.5,5v0.2
Now out.wav channel 1 = 1.0(in.wav channel 1) + 0.5(in.wav channel 2) + 0.5(in.wav channel 3) + 0.2(in.wav channel 5)
Obtaining the length of the file¶
# Return the length in HH:MM:SS format
sox --i -d in.wav
# Return the length in seconds
sox --i -D in.wav
# Return the length in number of samples
sox --i -s in.wav
Padding silence¶
# Pad 1 seconds silence at the beginning and 2 seconds silence at
# the end of the in.wav
sox in.wav out.wav pad 1.0 2.0
# Pad 3 seconds silence at 4 minutes into the in.wav
sox in.wav out.wav pad 3.0@4:00
# Pad 5000 samples of silence at 4 minutes into the in.wav
sox in.wav out.wav pad 5000s@4:00
Changing sampling rate¶
sox in.wav -r 16000 out.wav
Adding white noise¶
# Generate noise.wav that is of the same format (duration,
# number of channels, etc) as in.wav
sox in.wav noise.wav synth whitenoise vol 0.02
# Generate out.wav by mixing in.wav and noise.wav with their
# original volume
sox -m -v 1.0 in.wav -v 1.0 noise.wav out.wav
Generating a silence or a tone¶
# Generate a 5 second, 16kHz, 2-channel, audio file containing silence.
sox -n -r 16000 -c 2 silence.wav trim 0 5
# Generate a 5 second, 8kHz, audio file containing a sine-wave of 300Hz:
sox -n -r 8000 sine.wav synth 5 sine 300
# Generate a 5 second, 8kHz, audio file containing a sine-wave swept
# from 300 to 3300 Hz:
sox -n -r 8000 sine.wav synth 5 sine 300-3300
Note:
-nis the "null file" option and is considered as a file containing infinite amount of silence. This option is usually used with some finite-length effects such astrimorsynth
Specify WAV format¶
# Generate a 5 second, 8kHz, audio file containing a sine-wave of 300Hz
sox -n -r 8000 sine.wav synth 5 sine 300
# Generate the wave file in 16-bit PCM
sox -n -r 8000 -b 16 -e signed-integer sine.wav synth 5 sine 300
Some other common encodings:
unsigned-integer: PCM data stored as unsigned integers. Commonly used with an 8-bit encoding sizefloating-point: PCM data stored as IEEE 753 single precision (32-bit) or double precision (64-bit) floating-point (‘real’) numbersa-law: International telephony standard for logarithmic encoding to 8 bits per sample
Convert PCM format to WAV¶
Convert a PCM file of 1 channel, 16-bit, 48kHz sampling rate, signed-integer encoding to WAV
sox -t raw -c 1 -b 16 -r 48000 -e signed in.pcm out.wav
It is required to specify the file type -t raw. Otherwise sox will fail due to .pcm being an unrecognized format. See soxformat for the list of the supported formats.