I have no idea how "
a 4 or 5 stop nd filter array over some of the sensor pixels to increase the highlight exposure range" could be a practical solution to the dynamic range problem."
However, I am willing to learn more about this idea.
What Determines Dynamic Range
A sensor's effective analog dynamic is directly proportion to its analog signal-to-noise ratio. More exposure (longer shutter times, wider apertures and illuminance) increases the signal level. The dynamic range is maximized by maximizing exposure.
Another factor that affects dynamic range is the sensor's photodiode conversion gain. Designs with a low conversion gain increase the number of photoelectrons that can be stored as electrical charge in each photodiode. This maximizes signal in bright light, which in our increases dynamic range. But in low light (when dynamic range is not important) a high conversion gain maximizes photo diode sensitivity. This means camera designers must choose a compromise conversion-gain to maximize dynamic range at the expense of sensitivity - or vice versa.
The noise consists of electronic noise (read noise) due common to all circuits. Another source of noise is photon noise (also called shot noise or quantum noise.).[1] For most contemporary digital cameras the read noise level is lower than the photon noise level unless the scene illuminance is extremely low less than - EV = -3. Photon noise levels are camera independent.[2] This means exposure and conversion gain determine the maximum dynamic range.
1. Photon noise is inherent to turning light every into photoelectrons. The source cause is related to the particle-wave duality*nature of quantum mechanics. The creation of photoelectrons empirically probabilistic. The variation in photoelectron creation for a consent exposure is beyond human control.
2. Sensor area and pixel size both serve to maximize an image's total signal level. However pixel size has a limited effect in contemporary designs - details
here.