Why align the phase of samples?
Besides the obvious size reduction, for example, by leaving only 3 velocity layers instead of 16, phase alignment enables solo instruments to be more dynamic or expressive. This is because phase alignment allows smooth timbre transitions while allowing a sampling technique called velocity crossfading to be performed. Yes, another entry in our jargon that we need to explain a bit.
Velocity crossfading
Lets say we have two recordings of the same note, a soft one (pianissimo) and a loud one (fortissimo). We forgot to also record a middle (piano) version of the note and want to simulate it given the versions that are softer and louder. Intuitively, the middle version should have a timbre that sounds in between the loud note and the middle note. Hence, to simulate a keyboard note velocity that is in between loud and soft, one could take a weighted average of the loud and soft notes to produce a note that has an intermediate sound. I.e., a “crossfade” between the available notes with the lower and higher velocities. Weighing the average, means that more or less of the available samples’ volume end up in the mix. Hence, depending on the velocity level, the resulting sound would sound more like the soft sample or more like the loud sample.
Remember that velocity cross-fading is an approximation and an actual recorded note at the desired velocity might have a different timbre than a weighted average of other notes. Cross-fading assumes that there is a smooth predictable transition between the soft and loud notes. This is, indeed, reasonable to assume and seems to “generally” agree with what is observed. On the other hand, velocity cross fading avoids “sudden jumps” in timbre if the available recorded velocity layers are not close enough.

Samples that are not phase aligned often have waveform regions that are of opposite signs. Hence, mixing them together will result in sound cancellation as well as other undesired effects. The red and white lines show corresponding regions in the two waveforms.
Velocity crossfading itself is often a feature in sample formats such as SFZ and is performed by the sample player for that format. However, in solo instruments, for velocity crossfading to be successful, the samples need to be phase aligned beforehand. Otherwise, the mix of the louder and softer samples will very likely sound like distinct instruments playing side by side (among other possibilities) as will be explained later.
Phase alignment
As one might immediately realize, averaging two samples is not as simple as it sounds (subtle unintentional pun). In many cases, blindly mixing two samples would result to something sounding like two solo instruments played together, like in a duet. This “chorusing” is pretty much expected and is the same principle that allows simulating an ensemble sound from separate solo samples. But other unexpected audio effects might also happen. What one wants to achieve is for the mix to still sound like a solo instrument.
If one looks at a waveform of a note sample, it can be seen that it goes through positive and negative values. When we add another waveform on top of another, e.g. when we are averaging them, there’s always a chance that the negative parts of one wave will coincide with the positive parts of the other. These opposite signs will sum to a sound that is weaker than what is intended and may also cause other unintended audio effects. To avoid this, the oscillations of one wave should be aligned to the oscillations of the other before adding. Whether a wave is going up or down at a particular point in time, is characterized by its phase.

In a phase aligned pair of samples, the positive and negative regions of the respective waveforms align to each other. Instead of hearing two distinct instruments, a clean “solo” sound is produced when the phase aligned samples are mixed. The red and white lines show corresponding regions in the two waveforms.