Deep Learning Methods for Video Smoke Detection
Video Smoke Detection is a promising solution to detected fires in buildings with high ceilings (e.g., factories, warehouses, train stations, and tunnels) or outdoor areas (e.g., landing stripes, harbors, and pedestrian areas). It can detect smoke very fast and prevent higher human or property damage. These benefits are the reason why an increasing amount of research groups and companies are aiming to develop reliable algorithms for Video Smoke Detection. In classic approaches, physical or visual characteristics of smoke are identified and extracted by ordinary Computer Vision algorithms to distinguish smoke from non smoke events. These approaches require substantial limitations to the field of application to assure that smoke behaves as expected. Furthermore, Video Smoke Detection suffers from high false alarm rates, such that no fully automatic smoke detection is possible and an alarm candidate has to be checked by humans. Due to the success of artificial intelligence in object detection, research in Video Smoke Detection shifts more and more to apply Deep Learning methods. In this thesis, it is shown that Deep Learning methods outperform classical Computer Vision algorithms by far and can enable fully automatic Video Smoke Detection systems. Several state of the art Deep Learning methods are investigated successfully concerning performance and computing complexity. This analysis includes single frame approaches based on convolutional neural networks and temporal approaches utilizing 3D convolutions or recurrent networks. It turns out that temporal information is crucial for Deep Learning methods in Video Smoke Detection. Temporal input, like difference or optical flow of two consecutive images also improves the results. Among all investigated methods, the i3D, a network using 3D convolutions, in combination with difference images performs best. It detects smoke very fast, in many situations even faster than human. Furthermore, the computing complexity is reduced by a custom approach to 1% while maintaining 92% of the i3D performance. This is valuable, when meeting hardware restrictions on the target platform.