Skip to content Skip to sidebar Skip to footer

Researchers can now detect backdoor attacks

As machine learning models are used increasingly to make decisions in multiple areas, their safety and trustworthiness have started to become a major point of concern. Since ML models are trained on data from various and often potentially untrustworthy sources, adversaries can manipulate them by inserting carefully crafted samples into the training set leading to what is called a poisoning attack, which will then allow them to insert backdoors into the ML models. 

This in turn enables malicious behaviour with external triggers and detecting this type of backdoors has been a major challenges as the backdoor triggers are known only to the adversary, 

Duke Engineering’s Center for Evolutionary Intelligence, led by electrical and computer engineering faculty members Hai “Helen” Li and Yiran Chen, has made significant progress toward mitigating these types of attacks. Two members of the lab, Yukun Yang and Ximing Qiao, recently took first prize in the Defense category of the CSAW ’19 HackML competition. 

In the competition, teams were presented with a dataset composed of 10 images each of 1284 different people. Each set of 10 images is referred to as a “class.” Teams were asked to locate the trigger hidden in a few of these classes.

“To identify a backdoor trigger, you must essentially find out three unknown variables: which class the trigger was injected into, where the attacker placed the trigger and what the trigger looks like,” said Qiao. 

“Our software scans all the classes and flags those that show strong responses, indicating the high possibility that these classes have been hacked,” explained Li. “Then the software finds the region where the hackers laid the trigger.”

The next step, said Li, is to identify what form the trigger takes—it’s usually a real, unassuming item like a hat, glasses or earrings. Because the tool can recover the likely pattern of the trigger, including shape and colour, the team could compare the information on the recovered shape—for example, two connected ovals in front of eyes, when compared with the original image, where a pair of sunglasses is revealed to be the trigger.

Neutralizing the trigger was not within the scope of the challenge, but according to Qiao, existing research suggests that the process should be simple once the trigger is identified, by retraining the model to ignore it. 

Leave a comment