ShapFuzz: Efficient Fuzzing via Shapley-Guided Byte Selection
Mutation-based fuzzing is a popular and effective method for bug exposure and discovery of unseen code in programs. However, only a few studies have focused on quantifying the importance of input bytes. The importance of each input byte is determined by its contribution degree in discovering new code. Previous work often focused on obtaining the relationship between input bytes and path constraints, ignoring the fact that not all constraint-related bytes can discover new code. In this paper, we conduct Shapley analysis to understand the effect of byte positions on fuzzing performance, and find that some byte positions contribute more than others and this property often holds across different seeds. Based on this observation, we propose a novel solution, called SHAPFUZZ, to guide byte selection and mutation in fuzzing processes. Specifically, SHAPFUZZ updates Shapley values (importance) of bytes when each input is tested during fuzzing with a low overhead. It utilizes contextual multiarmed bandit algorithm to make a trade off between mutating high Shapley value bytes and low-frequently chosen bytes. We implement a prototype of this solution based on AFL++, i.e., SHAPFUZZ, and evaluate it against ten state-of-the-art fuzzers, including five byte-scheduling fuzzers and five commonly used fuzzers. Compared to byte-scheduling fuzzers, SHAPFUZZ discovers more edges. It also exposes more bugs than the best baseline on three different sets of initial seeds. SHAPFUZZ exposes 20 more bugs than the best commonly used fuzzers, and discovers 6 more CVEs than the baseline on MAGMA. Furthermore, SHAPFUZZ discovers 11 new bugs on the latest versions of 6 widely used programs, and 3 bugs of them are confirmed by vendors.