The NES has what is called a PSG or programmable sound generator which is built into the video chip. To produce sounds, registers are loaded with data for a period of time to produce sounds. For example, a sweep register, frequency register, enable, sound length, sweep length, attack, decay, etc. Thorugh very time-precise software routines, the sound registers are loaded , set, reset and loaded again; even to produce a simple Do Re Mi if you will. What I mean to say is that the NES does not contain Sound files, it contains something along the lines of:
LD A, %10010101
LDH 0xFF14, A
LD A, %11111111
LDH 0xFF13, A
LD A, %10010001
LDH 0xFF10, A
LD A, %10010001
LDH 0xFF11, A
CALL WAIT
LD A, %10100000
LDH 0xFF11, A
CALL WAIT
LD A, %10101000
LDH 0xFF11, A
CALL WAIT
LDH 0xFF10, A
LD A, %00000000
RETI
Lets say that this conditions a few register and then plays three tones for 1 second each and then shut off sound. That is not something you can simply extract.

*All above memory address are hypothetical.