A typical camera sensor has a 2D array of photosites with a Bayer mosaic of R, G, and B filters over the photosite lenses. Thus the term "demosaic" means to obtain the color mix for each destination pixel in the target 2D RGB space by interpolating each pixels best-match probable RGB mix from the surrounding box of nine photosites. What it's doing, essentially, is creating the full color image by creating three 2D arrays, one in R, one in G, and one in B, where each position (x,y) becomes the sum of (xR,yR), (xG, yG), and (xB, yB).
Flatbed and film scanners have a variety of sensor designs, the simplest might be a single, moving strip of three photosite rows, each row being monochromatic R, G, and B. The scanner driver in that sort of design samples the subject linearly and directly assembles a set of R, G, and B 2D arrays, which add together in the same manner as the Bayer mosaic sensor after demosaicing, to assemble the complete color pixel map.
There are other designs, that's just a one simple type for illustrative purposes. But none use the Bayer mosaic design that's used in camera imager sensors to my knowledge.
G