Faster and more efficient phishing detection in M92

July 20th, 2021


Keeping Chrome users safe as they browse the web is crucially important to Chrome; in fact, security has always been one of our four core principles. In some cases, security can come at the expense of performance. In our next post in The Fast and the Curious series, we are excited to share how improvements to our phishing detection algorithms keeps users safe online. With these improvements, phishing detection is now 50 times faster and drains less battery.

Phishing detection

Every time you navigate to a new page, Chrome evaluates a collection of signals about the page to see if it matches those of phishing sites. To do that, we compare the color profile of the visited page – that’s the range and frequency of the colors present on the page – with the color profiles of common pages. For example in the image below, we can see that the colors are mostly orange, followed by green and then a touch of purple.


If the site matches a known phishing site, Chrome warns you to protect your personal information and prevent you from exposing your credentials.


What you will see if a phishing attempt is detected

To preserve your privacy, by default Chrome’s Safe Browsing mode never sends any images outside the browser. While this is great for privacy, it means that your machine has to do all the work to analyze the image.

Image processing can often generate heavy workloads because analyzing the image requires an evaluation of each pixel in what is commonly known as a “pixel loop.” Some modern monitors display upwards of 14 million pixels, so even simple operations on each of those pixels can add up to a lot of CPU use! For phishing detection, the operation that takes place on each pixel is the counting of its basic colors.

Here is what this looks like. The counts are stored in an associative data structure called a hashmap. For each pixel, we extract its RGB color values and store the counts in one of 3 different hashmaps — one for each color.


Making it more efficient

Adding one item to a hashmap is fast, but we have to do this for millions of pixels. We try to avoid reducing the number of pixels to avoid compromising the quality of the analysis. However, the computation itself can be improved.

Our improvements to the pipeline look like this:

  • The code now avoids keeping track of RGB channels in three different hashmaps and instead uses only one to index by color. Three times less counting!
  • Consecutive pixels are summed before being counted in the hashmap. For a site with a uniform background color, this can reduce the hashmap overhead to almost nothing.

Here is what the counting of the colors looks like now. Notice how there are significantly fewer operations on the hashmap:


How much faster did this get?

Starting with M92, Chrome now executes image-based phishing classification up to 50 times faster at the 50th percentile and 2.5 times faster at the 99th percentile. On average, users will get their phishing classification results after 100 milliseconds, instead of 1.8 seconds.

This benefits you in two ways as you use Chrome. First and foremost, using less CPU time to achieve the same work improves general performance. Less CPU time means less battery drain and less time with spinning fans.

Second, getting the results faster means Chrome can warn you earlier. The optimization brought the percentage of requests that took more than 5 seconds to process from 16.25% to less than 1.6%. This speed improvement makes a real difference in security – especially when it comes to stopping you from entering your password on a malicious site!

Overall, these changes achieve a reduction of almost 1.2% of the total CPU time used by all Chrome renderer processes and utility processes.

At Chrome’s scale, even minor algorithm improvements can result in major energy efficiency gains in aggregate. Here’s to many more centuries of CPU time saved!

Stay tuned for many more performance improvements to come!

Posted by Olivier Li Shing Tat-Dupuis, Chrome Developer

Data source for all statistics: Real-world data anonymously aggregated from Chrome clients.