Why align the phase of samples?
Besides the obvious size reduction, for example, by leaving only 3 velocity layers instead of 16, phase alignment enables solo instruments to be more dynamic or expressive. This is because phase alignment allows smooth timbre transitions while preforming a sampling technique called velocity crossfading. Yes, another entry in our jargon that we need to explain a bit.
Lets say we have two recordings of the same note, a soft one (pianissimo) and a loud one (fortissimo). We forgot to also record a middle (piano) version of the note and want to simulate it given the versions that are softer and louder. Intuitively, the middle version should have a timbre that is sounds in between the loud note and the middle note. Hence, to simulate a keyboard note velocity that is in between loud and soft, one could take a weighted average of the loud and soft notes to produce a note that has an intermediate sound. I.e., a “crossfade” between the available notes with the lower and higher velocities. Depending on the velocity level, the resulting sound would sound more like the soft sample or more like the loud sample.
Remember that velocity cross-fading is an approximation and an actual recorded note at the desired velocity might have a different timbre than a weighted average of other notes. Cross-fading assumes that there is a smooth predictable transition between the soft and loud notes. This is, indeed, reasonable to assume and seems to “generally” agree with what is observed. On the other hand, velocity cross fading avoids “sudden jumps” in timbre if the available recorded velocity layers are not close enough.
Velocity crossfading itself is often a feature in sample formats such as SFZ and is performed by the sample player for that format. However, for velocity crossfading to be successful, the samples need to be phase aligned beforehand.
As one might immediately realize, averaging two samples is not as simple as it sounds (subtle unintentional pun). In many cases, blindly mixing two samples would result to something sounding like two solo instruments played together, like in a duet. This “chorusing” is pretty much expected and is the same when simulating an ensemble sound from separate solo samples. But other unexpected audio effects might also happen. What one wants to achieve is for the mix to still sound like a solo instrument.
If one looks at a waveform of a note sample, it can be seen that it goes through positive and negative values. When we add another waveform on top of another, e.g. when we are averaging them, there’s always a chance that the negative parts of one wave will coincide with the positive parts of the other. These opposite signs will sum to a sound that is weaker than what is intended and may also cause other unintended audio effects. To avoid this, the oscillations of one wave should be aligned to the oscillations of the other before adding. Whether a wave is going up or down at a particular point in time, is characterized by its phase.