I used default size, which I think is 4096 bytes. It works perfectly.
Yes, the cluster size should matter, but I haven't done any research on what's the best one. Internally, NeoSD reads the SD data in 4KB blocks, so in theory, that should be the most efficient cluster size, but the one I normally use is 32KB and it's very fast.
What matters most is that the files are not fragmented and the files are contiguous in the directory order, so the MCU doesn't have to read the FAT table often if the clusters indices are within the same sector in the FAT.
A small story: This SD card I use, usually showed the "READING SD CARD" message for about 4 seconds while scanning my neo files. While doing some tests, I moved all my neo files to a directory and when finished, moved them back to the root, and then the message is only show for less than 1 second, so if you want to have a faster neosd menu entry time, try just moving all the .neo to a directory and then back to the root. This probably sets the directory table in the same order the roms are stored in the SD, and it minimizes the times the MCU has to read a new sector because the next directory entry is not right at the end of the previous one. Also I think moving them sorted by name improves this time.
For the actual rom loading time, I don't think the cluster size or fragmentation matter too much,as the actual bottleneck is the flash writting time, and the SD reading is double buffered so while waiting for a block to write to flash, a new one is requested to be read from the SD card, and the SD reading time is usually less than the flash writting time.