Trojan Source a newly evolved attack technique allows threat actors to inject malware into the source code but remain syntactically valid for compiler...
Cambridge University researchers Nicholas Boucher and Ross Anderson published a paper on -Trojan Source attacks, a technique that can inject invisible malware that is semantically acceptable by the source code. This attack leverages subtleties such as Unicode to produce source code whose tokens are encoded differently.
Trojan Source attacks were issued the CVE-2021-42574, for tracking Bidi attacks, and CVE-2021-42694 for tracking homoglyph attacks. The attacks could potentially be a great threat to first-party software and supply-chain compromise.
The fundamental concept to exploit adversarial encoding is, to use Bidi overrides that create syntactically accurate reordering of source code. Bidi algorithm allows one to embed text of a different reading direction since it supports left-to-right and right-to-left languages.
1. Early Returns
This is a technique where a genuine return statement disguises as a comment or a string literal. This causes a function to return earlier than expected.
2. Commenting-Out
A valid piece of code does not get executed since it is stored within a comment. A human reviewer might think the code will get executed and perform functions, but it is not read by the compiler or interpreter.
3. Stretched Strings
An adversary will be able to manipulate string commands since text that appears to be outside a string literal is actually located within.
An attacker could write a function name with just one letter replaced with a visually similar character. It can also be used on variables, class names, and identifiers.
Two functions were defined to give the output sayHello
, one was declared with a Latin H and another with a Cyrillic H.
Attacks were simulated on simple programs of C, C++, C#, Javascript, Java, Rust, Go, and Python to generate proof of concept. When a program with source code is rendered, it displays a logic indicating there is no output, but the compiled version of the program gives the output 'You are admin'. For the attack vector to work, compilers must accept a form of Unicode input such as UTF-8.
"As powerful supply-chain attacks can be launched easily using these techniques, it is essential for organizations that participate in a software supply chain to implement defenses"
"It is not sufficient for a compiler to be verified; it must also be safely used. Compilers that are trivially vulnerable to adversarial text encoding cannot reasonably be described as secure." stated the paper.