It takes the time it needs
. To properly mimic a real cart and have a instant-on game boot, as in a real cart, data is programmed into flash memories. These memories need to be erased first, before writting, and erasing that amount of data requieres around 1 minute (you can find the manufacturer datasheets if you search for the part numbers of the cart chips). Also writting to flash chips is not as simple as placing the data on the bus and enabling the Write signal. There are a series writes that send commands to the flash to perform the write, and the write takes some time to complete (flash is slow writting)
Then there is the SD card speed. It's an embedded microprocessor with a very limited amount of ram (128KB for everything, filesystem, game list, control structures...) The SD bus runs at 25 mhz, in 4 bit mode, ideal conditions, it's 12MB/s at most. USB is not going to improve it. Those MCUs support up to usb 2.0 full speed , that is 12 mbit/s, that is around 1MB/s ideal.
Of course, you can use ram chips, but their contents are lost on boot and need to be written every boot, so, the mvs board first needs to boot a bootstrap code, that then loads the game to ram and reboots to the game, quite different from the instant boot of original carts and neosd. Also if using dram, adds an additional source of issues as neogeo was not designed for dram, and yo'll need to find the proper moments to refresh the dram, without interfering with normal execution.
About bandwidth, it's difficult, as there are many buses running in parallel: Graphics bus, z80 bus, audio bus, 68k bus.
The fastest one is the graphics bus. And I think the games use 120ns roms and it works fine (neosd flsshes are faster, 100ns), so it's 8MHZ, and it can read 32 bits at a time, so 32MB/s. This is theoretical maximum, as the actual speed the graphics hardware can access the roms is slower. It can access at most 8bits of sprite data every 12MHz clock, so that makes a maximul of 12MB/s, but this bus is also shared for the tile data and the zoom data and other internal accesses, so it's much less.
68k bus is 16 bits, with the same kind memories it could access up to 16MB/s, but the 68k can't access memory in a single cycle, and it also takes some cycles for instruction decoding, so divide it by 3 at least.
As I said, it's difficult to give a single number