1

I am writing a Direction of Arrival (DOA) estimator using the MUSIC algorithm in python and am having trouble getting the correct spectrogram output.

Specifically, my resulting graph comes out aliased as such:

On the left is the SYNTHESIZED signal, using the same source azimuth and elevation as the real dataset. On the right is the real-world signal.

spectograms

These graphs looks like this when I am looking for 2 signals. If I am looking for only 1, they look like this:

spectograms2

My guess is that the reason this is happening is that I am trying to use a Hermitian, which should be reserved exclusively for complex numbers, when my signals are represented using only real values, without an imaginary one. The the real world dataset is presented as an NxM .wav where N is the number of channels (8) and M is the length of the signals (174800 or so) sampled at 44.1KHz. The data arrives in int16 format, which I then convert into floating point.

The eigenvectors I obtain from the covariance matrix look realistic, with 2 peaks, 1 twice as high as the other. The synthesized signal has only 1 peak. As seen in the graphs, if I limit to using all but the highest peak eigenvector, the spectrogram doesn't really show anything, while taking M-2 does provide peaks, but theyre aliased. I am certain this is aliasing and not a reverberation of some sort, because setting an extremely low wavelength variable for the steering matrix makes the real-world signal spectrogram perfectly symmetrical. If I shuffle microphone positions, I also never get more or less than 2 peaks.

The following are some code snippets to get an idea of how I implemented the algorithm:

The synthesized signal is created using these two functions as the core:

    def emit(shape,C):
        x = np.random.randn(shape)
        x += np.random.randn(shape)
        x *= np.sqrt(0.5)
        x *= sqrt(C)
        return x
generate(sources):
    A = array.steering_matrix(sources, wavelength)
    S = source_signal.emit((channel_count,n_snapshots),1.0)
    
    Y = A @ S
    N = noise_signal.emit((channel_count,n_snapshots),1.0)
    Y += N
    return Y

Once Y is returned, I do not tamper with it in any other way.

The full music function is as follows:

def music(A,x,k):
    R = (x @ x.conj().T) / x.shape[1]
    Ev, E = np.linalg.eigh(R)
    En = E[:,:-k]
    v = En.T.conj() @ A
    return np.reciprocal(np.sum(v * v.conj(), axis=0).real),En

Steering matrix:

def steering_matrix(sensors,sources,wavelength):
    sources = np.deg2rad(sources)

    s = 2 * np.pi / wavelength
    cos_el = np.cos(sources[:, 1])
    cc = cos_el * np.cos(sources[:, 0])
    cs = cos_el * np.sin(sources[:, 0])

    D = s * (np.outer(sensors[:, 0], cc) +
            np.outer(sensors[:, 1], cs) +
            np.outer(sensors[:, 2], np.sin(sources[:, 1])))
    
    A = np.exp(1j * D)

    return A

I am nearly certain that this is related to complex numbers. If I synthesize a signal using the steering matrix, I get a signal that has mixed values on both the real and the imag portions, but my real world data has no imag portion. So if the implementation I am using for MUSIC requires some of the information to be retained on the imag portion, everything would mess up. I am specifically questioning my use of conj().

Felkin
  • 11
  • 4
  • Not sure why you're comparing "complex numbers" and "floats". A float is simply a data type, which consequently, can be used to create complex numbers. Are you sure that your data is not a stream of complex numbers with interleaved real and imaginary parts? – Envidia Dec 08 '20 at 21:38
  • [link](https://imgur.com/Lhy79Yx) they definitely look like floats to me. The original project from which I fetched the dataset also parses them as such. – Felkin Dec 08 '20 at 21:44
  • So what data set did you use where it was working? This data set might be an interleaved set, where two of those numbers make one complex number. – Envidia Dec 08 '20 at 21:49
  • The algorithm works with a synthesized dataset. Where I use the steering matrix for the initial signal generation. That steering matrix is complex, so the output signal is complex as well. I'm not even sure how to really generate an input that would be made of only real numbers while retaining the angular information. – Felkin Dec 08 '20 at 22:11
  • The 'real world' dataset which I have is used in an example application by the authors of said dataset, there I see no complex number use anywhere, but they aren't using MUSIC in the first place in their own design, but MVDR – Felkin Dec 08 '20 at 22:13
  • You should ask whoever (or whatever) is providing this input data to provide you its format. Only then will you know how to parse it correctly. Mind you, the steering matrix and the sampled return data per channel should be complex. Is the data you linked the image of the sampled data per channel? – Envidia Dec 08 '20 at 22:14
  • Yes it is, 8 channels for 8 microphones, 176400 elements total per channel. If I plot the signal individual channels in both time and freq domains, it looks exactly like I would expect a real sound signal to look. I really feel like the format is correct. – Felkin Dec 08 '20 at 22:17
  • Would you mind explaining what the synthesized data set looks like? – Envidia Dec 08 '20 at 22:24
  • Sure, A = array.steering_matrix(sources, wavelength) S = signal.emit(n_snapshots) Y = A @ S N = signal.emit(n_snapshots) Y += N Where Y is the output. signal.emit is a function generating circularly-symmetric normal distribution array. SNR is 0. – Felkin Dec 08 '20 at 22:33
  • Ah formatting is ruined in the comment section. The essence is that I simply generate a gaussian and multiply with the steering matrix, add noise and return. – Felkin Dec 08 '20 at 22:36
  • Perhaps updating your question to provide a formatted mathematical overview of the synthesized data will help! It's really important that we know what synthesized data you used because it's already starting to appear that you maybe be missing some operations to yield the correct eigenvalue decomposition of the correlation matrix. Perhaps the synthesized data was already pre-processed. – Envidia Dec 08 '20 at 22:53
  • Ok, will update some extra code and graphs in a moment. – Felkin Dec 08 '20 at 22:54
  • updated the original post – Felkin Dec 08 '20 at 23:09
  • I still don't understand what you mean by casting your synthesized data set to "floats". Do you mean that you take the real part? If so, you should try taking the real part again and produce the signal plot to see if it looks anything like the actual signals coming in. – Envidia Dec 08 '20 at 23:19
  • Yes, I mean just taking the real part. The signal doesnt 'really' look much like the real world dataset, it just looks like white noise. My concern is that the signal I generated using the steering matrix has part of the information on the imaginary part, while the real world one only has the real. I think when I try to extract the spectral graph using the eigenvectors and the steering matrix, missing the imag part somehow causes the aliasing effect. – Felkin Dec 08 '20 at 23:30
  • I should note that the synthesized dataset produces a single high eigenvalue, while the real world one has 1 high and a 2nd that is half that. But this would reflect in the graph as a twice smaller peak no? Currently it looks aliased. – Felkin Dec 08 '20 at 23:32
  • How long is your time signal? 1 second? 2 seconds? – Envidia Dec 08 '20 at 23:55
  • 4 seconds total – Felkin Dec 08 '20 at 23:56
  • To answer your question, yes the second eigenvalue should appear as a peak. It looks like you are very close. I'll see what I can try but without having your actually data set as input it will be difficult to reproduce your problem exactly. Also, since your signal is 4 seconds long I'm guessing it's sampled at 44.1 KHz, yeah? – Envidia Dec 08 '20 at 23:59
  • It should actually be pretty easy to reproduce if you want: http://dregon.inria.fr/datasets/signal-processing-cup-2019/ I used dregon's dev_static_speech.zip .wavs. 3 in total, 44.1KHz. The mic array positions are also listed in both the website and the main competition codes, could send them here if you would like to try. It IS possible that there are some weird things with the dataset, because the microphone array is not a perfect cuboid, but a sort of 3D rhombus. I considered that maybe my steering matrix is bad. But I do use all 3 coordinates, not distances. – Felkin Dec 09 '20 at 00:03
  • i am interested in this question, but lack knowledge of some acronyms. is *"DOA"* delay of arrival? what is *"AoA"*? i used to think i knew something about estimating delay between a correlated signal received by two microphones. but i am having trouble just understanding what is going on here and what the essential problems are. – robert bristow-johnson Dec 09 '20 at 01:34
  • 1
    DoA = Direction of Arrival. AoA = Angle of Arrival. – Felkin Dec 09 '20 at 02:05
  • so DoA is the azimuth angle and AoA is the zenith? – robert bristow-johnson Dec 09 '20 at 04:16
  • Practically, they're the same thing and mean the combination of azimuth and elevation for 2D DoA and azimuth for 1D DoA. I shouldn't have really used both in my question, but half of the literature uses one and half uses the other, got the sense that maybe AoA is more used for describing the quantifiable angle of arrival, while DOA more for the task itself. Regardless, I meant azi+ele in both cases. – Felkin Dec 09 '20 at 12:45

0 Answers0