Safety compliance checking of construction behaviors using visual question answering
摘要
Unsafe construction behavior, one of the leading factors of accidents and casualties, can be reduced by strengthening construction inspection. However, current methods use either manual inspection or inefficient cross-modal models based on multiple backbone networks. To alleviate the problems, a "rule-question" trans-formation and annotation system is formulated, and the unsafe behavior detection is turned into a visual reasoning task: visual question answering (VQA). The VQA model is developed based on a vision-and-language Transformer, and the unsafe behavior could be identified based on the output answers. A dataset containing 16 safety rules and 2386 related construction images is used to fine-tune and validate the VQA model. The results show that the developed VQA model achieves an average recall of 0.81 at a faster reasoning speed. Finally, an applet for safety report generation is implemented to demonstrate the feasibility and practicability of the safety compliance checking based on VQA.
