Common theme crawler generally analyses the page content or link structure, without solving the problem of computational complexity and easy “myopia”, resulting in the page of recall and precision is not high. This paper introduces a mixed theme decision strategy, which fully considers the text content and link structure of the page. By introducing knowledge map database and entity database, the computational complexity is simplified and the judgment accuracy is increased. The experiment shows that the rate of inspection and precision is greatly improved. © Springer Nature Switzerland AG 2020.