company logo

Product

Our Product

We are Reshaping the way Developers find and fix vulnerabilities before they get exploited.

Solutions

By Industry

BFSI

Healthcare

Education

IT & Telecom

Government

By Role

CISO

Application Security Engineer

DevsecOps Engineer

IT Manager

Resources

Resource Library

Get actionable insight straight from our threat Intel lab to keep you informed about the ever-changing Threat landscape.

Subscribe to Our Weekly Threat Digest

Company

Contact Us

Have queries, feedback or prospects? Get in touch and we shall be with you shortly.

loading..
loading..
loading..
Loading...

Trojan

Compiler

Source Code

loading..
loading..
loading..

Trojan Source, a new attack vector leverages Unicode to manipulate compiler

Trojan Source a newly evolved attack technique allows threat actors to inject malware into the source code but remain syntactically valid for compiler...

03-Nov-2021
3 min read

Cambridge University researchers Nicholas Boucher and Ross Anderson published a paper on -Trojan Source attacks, a technique that can inject invisible malware that is semantically acceptable by the source code. This attack leverages subtleties such as Unicode to produce source code whose tokens are encoded differently.

Trojan Source attacks were issued the CVE-2021-42574, for tracking Bidi attacks, and CVE-2021-42694 for tracking homoglyph attacks. The attacks could potentially be a great threat to first-party software and supply-chain compromise.

Bidi Override

The fundamental concept to exploit adversarial encoding is, to use Bidi overrides that create syntactically accurate reordering of source code. Bidi algorithm allows one to embed text of a different reading direction since it supports left-to-right and right-to-left languages.

1. Early Returns

This is a technique where a genuine return statement disguises as a comment or a string literal. This causes a function to return earlier than expected.

2. Commenting-Out

A valid piece of code does not get executed since it is stored within a comment. A human reviewer might think the code will get executed and perform functions, but it is not read by the compiler or interpreter.

Untitled 2

3. Stretched Strings

An adversary will be able to manipulate string commands since text that appears to be outside a string literal is actually located within.

Untitled 2

Homoglyphs

An attacker could write a function name with just one letter replaced with a visually similar character. It can also be used on variables, class names, and identifiers. Two functions were defined to give the output sayHello, one was declared with a Latin H and another with a Cyrillic H.

Untitled 3

PoC

Attacks were simulated on simple programs of C, C++, C#, Javascript, Java, Rust, Go, and Python to generate proof of concept. When a program with source code is rendered, it displays a logic indicating there is no output, but the compiled version of the program gives the output 'You are admin'. For the attack vector to work, compilers must accept a form of Unicode input such as UTF-8.

"As powerful supply-chain attacks can be launched easily using these techniques, it is essential for organizations that participate in a software supply chain to implement defenses"

"It is not sufficient for a compiler to be verified; it must also be safely used. Compilers that are trivially vulnerable to adversarial text encoding cannot reasonably be described as secure." stated the paper.