The paper concludes with a short summary and plans for the future.The ARIANNA experiment is an array of autonomous radio stations located in Antarctica. Stations have operated at sea-level on the Ross Ice Shelf in Moore’s Bay, about 110 km from McMurdo Station, which is the largest research base on the continent. In addition, two stations have operated at the South Pole, which is colder and higher in elevation than the environment at Moore’s Bay Several architectures were implemented in the prototype array at Moore’s Bay. Most stations consisted of four downward facing log periodic dipole antennas to specifically look for neutrino events, as shown in figure 1. Two other stations at Moore’s Bay and two at the South Pole were configured with eight antennas, which included a mixture of LPDAs and dipoles. These stations were simultaneously sensitive to cosmic rays that interact in the atmosphere and neutrinos. The radio signals are digitized and captured using a custom-made chip design known as the SST. The analog trigger system of ARIANNA imposes requirements on individual wave forms; a high and low threshold must occur within 5 ns, nursery grow bag and multiple antennas channels must meet the high-low threshold within a 30 ns coincidence window.
These criteria are based on the expectation that thermal noise fluctuations are approximately independent, whereas neutrino signals produce correlated high-low fluctuations in a given antenna, and produce comparable signals in multiple antenna channels. These requirements reduce the rate of thermal noise triggers for a given trigger threshold while maintaining the sensitivity to Askaryan pulses from high-energy neutrinos. Once a station has triggered, the digitized wave forms of every antenna channel contain 256 samples with a voltage accuracy of 12 bits. The event size in an eight-channel station is 132 kbits. The waveform data from all channels are piped into an Xilinx Spartan 4 FPGA, and then further processed and stored to an internal 32 GB memory card by an MBED LPC 1768 microcontroller. There are up to eight channels on each board that process the radio signal from each antenna. Once a triggered event is saved to local storage it is then transferred to UC Irvine through a long-range WiFi link during a specified communication window. The ARIANNA stations also use Iridium satellite network as a backup system. Satellite communication is relatively slow, with a typical transfer rate of one event every 2–3 minutes. For both communication methods, currently the hardware system is limited to either communication or data collection. Therefore, neutrino search operations are disabled during data communication.
As radio neutrino technologies move beyond the prototype stage, the relatively expensive and power consumptive AFAR system will be eliminated. Perhaps it will be replaced by a better wireless system, such as LTE, for sites relatively close to scientific research bases, but for more remote locations, only satellite communications such as Iridium are feasible. Given the current limitation of 0.3 events/min imposed by Iridium communication, and the fact that neutrino operations cease during data transfer which generates unwanted deatime, stations that rely solely on Iridium communication are expected to operate at trigger rates from ∼ 0.3 mHz to keep losses due to data transfer, trans, below 3%. The trigger thresholds of ARIANNA are adjusted to a certain multiple of the Signal to Noise Ratio , defined here as the ratio of the maximum absolute value of the amplitude of the waveform to the RMS noise. Currently, the pilot stations are set to trigger above 4.4 SNR to reach the constrained trigger rate of order 1 mHz. In the next section, the expected gain in sensitivity is studied for a lower threshold of 3.6 SNR, which corresponds to 100Hz, the maximum operation rate of the stations. For more information on the ARIANNA detector, see.The real-time rejection of thermal noise that is presented in this article would enable the trigger threshold to be lowered significantly — thus increasing the detection rate of UHE neutrinos — while keeping a low event rate of a few mHz.
To estimate the increase in sensitivity, the effective volume of an ARIANNA station is simulated for the two trigger thresholds corresponding to a thermal noise trigger rate of 10 mHz , and a four orders-of-magnitude higher trigger rate . We use the relationship between trigger threshold and trigger rate from to calculate the thresholds. NuRadioMC is used to simulate the sensitivity of the ARIANNA detector at Moore’s Bay. The expected radio signals are simulated in the ARIANNA detector on the Ross ice shelf, i.e., an ice shelf with a thickness of 576 m and an average attenuation length of approx. 500 m, and where the ice-water interface at the bottom of the ice shelf reflects radio signals back up with high efficiency. The generated neutrino interactions are distributed uniformly in the ice around the detector with random incoming directions. The simulation is performed for discrete neutrino energies and includes a simulation of the full detector response and the trigger algorithm as described above. The resulting gain in sensitivity is shown in figure 2 and increases by almost a factor of two at energies of 1017 eV. The improvement decreases towards higher energies because fewer of the recorded events are close to the trigger threshold but at 1018 eV there is still an increase in sensitivity of 40%.To implement a deep learning filter, the general network structure needs to be optimized for fast and accurate classification. For accuracy, the two metrics are neutrino signal efficiency and noise rejection factor , where ratio is the ratio of correctly identified noise events to the total number of noise events. The goal is to reject several orders-of-magnitude of thermal noise fluctuations while retaining most of the neutrino signals. In the following, the target is 5 orders-of-magnitude thermal noise rejection while providing a high signal efficiency at or above 95%. Typically using a more complex network structure yields more accurate results, but this also creates a slower network. These two constraints need to be optimized as the deep learning architecture is developed. In the following two sections, deep learning techniques are used to train models then study their efficiency and processing time. In section 5.1, a commonly used method of template matching will be investigated to compare with the deep learning approach.NuRadioMC is used to simulate a representative set of the expected neutrino events for the ARIANNA detector, following the same setup as described in section 2.2 but for randomly distributed neutrino energies that follow an energy spectrum expected for an astrophysical and cosmogenic neutrino flux; the astrophysical flux measurement by IceCube with a spectral index of 2.19 is combined with a model for a GZK neutrino flux based on Auger data for a 10% proton fraction. The resulting radio signals are simulated in the four LPDA antennas of the ARIANNA station by convolving the electric-field pulses with the antenna response, plastic growing bag and the rest of the signal chain is approximated with an 80 MHz to 800 MHz band-pass filter. An event is recorded if the signal pulse crossed a high and a low threshold of 3.6 times RMS noise within 5 ns in at least two LPDAs within 30 ns. At such a low trigger threshold, noise fluctuations can fulfil the trigger condition at a non-negligible rate. Therefore, the signal amplitude is required to be at least 2.8 times the RMS noise before adding noise to avoid spurious triggers on thermal-noise fluctuations. In total 121,597 events that trigger the detector are generated and this is called the signal data set in the following. The training data set for thermal noise fluctuations is obtained by simulating thermal noise in the four LPDA antennas and saving only those events where a thermal noise fluctuation fulfills the trigger condition described above. In total 1.1 million events are generated and this is called the noise data set in the following. The limitations of the simulations and their impact on the obtained results are discussed at the end of this article.All of the networks are created with Keras, a high-level interface to the machine-learning library TensorFlow. Our primary motivation is to develop a thermal noise rejection method that operates on the existing ARIANNA hardware with an evaluation rate of at least 50 Hz, which is a factor of 104 larger than our current trigger rate. To increase the execution rate of the neural network, the hardware is one option to optimize; however, any alteration to the hardware is constrained by two main factors: the power consumption of the component and the reliability in the cold climate.
Thus, this study will focus primarily on optimizing the execution rate by identifying the smallest network that reaches our objective. While the number of trainable parameters can give an indication of network size, the number of Floating Point Operations is the chosen metric for network size in this paper. The number of FLOPs can be approximated by multiplying the amount of operations performed by floating point numbers with the amount of nested loop iterations required to classify incoming data. Besides making the network size smaller, another way to improve the network speed is to reduce the input data size. Instead of feeding the signal traces from all four antennas into the network, one way to cut down on the size of input data is to use only the two antennas that caused the trigger. As each signal trace consists of 256 samples, the total input size to the network is 512 samples. In addition, a further reduced input data set is studied for various sizes by selecting the antenna with the highest signal amplitude and only using a window of values around the maximum absolute value. The window size was not fully optimized, but a good balance between input data size and efficiency is 100 samples around the maximum value. The reasoning for this is that the dominant neutrino signal does not span over the whole record length and typically only spans over less than 50 samples. The two network architectures studied in the following are a fully connected neural network and a convolutional neural network , depicted in figure 3. The FCNN used in this baseline test is a fully connected single hidden layer network with a node size of 64 for the 100 input samples and 128 for the 512 input samples, a ReLU activation, and a sigmoid activation in the output layer. The CNN structure consists of 5 filters with 10×1 kernels each, a ReLU activation, a dropout of 0.5, a max pooling with size 10×1, a flattening step to reshape the data, and a sigmoid activation in the output layer. Both the CNN and FCNN are trained using the Adam optimizer with varying learning rates from 0.0005-0.001 depending on which value works best for each individual model. The training data set contains a total of 100,000 signal events and 600,000 noise events, where 80% is for training and 20% is to validate the model during training. Once the network is trained, the test data are used which contain 21,597 signal events and 500,000 noise events. With the sigmoid activation in the output layer, the classification distribution falls between 0 and 1, where close to 0 is noise-like data and close to 1 is signal-like data. Once trained, with the 100 input sample CNN mentioned above, the distribution shown in figure 4 is obtained. From this distribution, the amount of signal efficiency vs. noise rejection can be varied by choosing different network output cut values. Training and testing these networks with each input data size yields the signal efficiency vs. noise rejection plot in figure 5. Each data point corresponds to a different network output value, and the final cut value is chosen by optimizing the noise rejection for the desired signal acceptance. All of these input data sizes produce efficiencies above the required threshold of 95% for signal, and all were able to reach at least 5 orders-of-magnitude noise rejection. Since all of the networks have efficiencies above our target of 95% for signal at 105 noise rejection, the main consideration is the amount of FLOPs required for each network because this directly impacts the processing time. Typically, CNN’s have less parameters overall due to their convolutional nature, which focuses on smaller features within a waveform; comparatively, the FCNN considers the whole waveform to make its prediction, so it requires more node connections.