The audio input is fed into the IP Camera. Between the IP camera and the microphone, it’s the analog signal transmission. However, the digitalization of the audio signal is done at the IP camera level. And it will be a digital signal to transmit to the NVR.
The limitation of how many microphones can be installed, for this solution, would purely depend on the number of video channels of the NVR, and whether the IP camera comes with the audio input. Say, an 8 channel NVR, with all the 8 audio-input-available IP cameras installed, can accommodate up to 8 microphones.
There are those types of IP cameras integrated with embedded microphone. The logic is the same as above.