Authenticode Stuffing Tricks

I recently started a project called Authenticode Lint. The tool has two purposes. The primary one being, “Am I digitally signing my binaries correctly?” and two “Are other people signing their binaries correctly?”

To back up a bit, Authenticode is the scheme that Microsoft uses to digitally sign DLLs, EXEs, etc. It’s not a difficult thing to do, but it does offer enough flexibility that it can be done in a suboptimal way. The linter is made up of a series of checks that either pass or fail.

When you sign a binary, the signature is embedded inside of it (usually, there are exceptions). The goal of the signature is to ensure the binary hasn’t been tampered with, and that it comes from a trusted source. The former presents a problem.

If I were to take a binary, and computer a signature on it to make sure it hasn’t changed, then embed the signature in the binary, I just changed the contents of the binary and invalidated the signature I just computed by embedding it.

To work around this problem, there are some places inside of EXEs that the digital signature process ignores. The notable one being the place that signatures go. So the section that signatures go is completely ignored, as is the checksum of the file in the optional header.

Now we have tamper-proof binaries that prevent changing the executable after it’s been signed, right?

Ideally, yes, but unfortunately, no. There are some legitimate reasons to change a binary after its been signed. Some applications might want to embed a per-user configuration. Re-signing the executable on a per-user basis is to costly in terms of time and security. Signing is relatively fast, but not fast enough to scale reasonably. It would also mean that to perform the re-sign, the signing keys would need to be available to an automated system. That’s generally not a good idea, as a signing key should either be on an HSM or SmartCard and always done by one (or more if using m/n) person manually.

It turns out it is possible to slightly modify an executable after its been signed. There are a few ways to do this, and I’ll cover as many as I know.

Padding

The first is some clever people noticed that you could abuse the location where signatures are stored (and thus not part of the signature itself) in binaries to store things other than signatures. Ultimately what ends up getting placed in the signature location in a binary is a structure called WIN_CERTIFICATE. This structure includes a field called “length” indicating the length of the structure, and “bCertificates” which is an array of whatever data type is specified. The structure name is poorly named. The structure can contain things other than certificates, like signatures. In almost every case it will always contain a signature.

The data that does end up going in WIN_CERTIFICATE is PKCS#7 data (1.2.840.113549.1.7.1) that contains the signature. This is in a structured format called ASN.1. ASN.1, ultimately has its own “length” for its data. The length is actually a little tricky to calculate, but by following the data structure you know where it begins and ends. Let’s imagine this whole thing like this:

WIN_CERTIFICATE
    --> dwLength = 16
    --> WIN_CERTIFICATE
           --> PKCS#7
                  --> Length = 10
                  --> {0,1,2,3,4,5,6,7,8,9}

The WIN_CERTIFICATE has a length of 16, but the signature data only has a length of 10. What are the other 6 bytes? Under normal circumstances, those other 6 bytes are all zeros. However, it turns out that Windows doesn’t care what those bytes are. They can be anything. The whole WIN_CERTIFICATE structure is ignored during the signing and verification process, so changing those extra bytes doesn’t impact it. It doesn’t appear as garbage either as part of the signature because Windows is using the PKCS#7 structure to determine its length. It stops processing the signature once it reached the end of the signature.

For historic reasons, the place where the signatures in the file go must always have a length that is a multiple of 8. That accounts for why there are two lengths. This problem wouldn’t exist if WIN_CERTIFICATE didn’t have a length at all and was always assumed to be the length of the PKCS#7 data.

Keep in mind that this extra data doesn’t change the behavior of the program. The code (.text section) isn’t changing. It’s vestigial data, but the program can be written to read the data from itself and do something with it. Hopefully none of those programs actually try to use this extra data as actual code.

Microsoft actually tried to fix by checking that the extra data is all zeros. It was opt-in with the registry using a value called EnableCertPaddingCheck with the intention of enabling it by default after a period of time to allow people to fix it. Later on, they backtracked on those plans and have withdrawn their plans to make it the default, but have left the registry as an option.

I can only guess as to why they withdrew their plans, but I would say it was for two reasons. The first being is it broke software, probably more than they expected. The second being that it was an incomplete fix. As we’ll see there are other ways we can embed data post-signing. Breaking software visibly but not getting any benefit from it doesn’t make sense.

As of April 15th 2016, Dropbox makes use of this technique.

db-pad

Unauthenticated Attributes

You can sign a binary with a signature, but can also sign a signature. That’s called a counter signature, and the most common use of them in Authenticode is timestamps. This also has the same problem that binaries do. There must be a place in the signature to place other signatures without invalidating the signature by signing it. So, like binaries, there are places inside of signatures that aren’t used when computing a signature. These are called unauthenticated attributes.

These attributes are mutable post-signing without changing the signature.

I haven’t run across any binaries that use this trick. According to Eric Lawrence Dropbox was at one point using this, possibly to work around the WIN_CERTIFICATE padding being disabled. They may have switched back to WIN_CERTIFICATE padding when Microsoft changed their minds about enforcing no padding. This technique doesn’t offer any advantage over WIN_CERTIFICATE padding and it’s a little bit harder to pull off.

Certificates

This one is interesting and is what prompted me to write this post in the first place.

A digital signature is allowed to include additional certificates within itself to help the signature verifier build a chain back to a trusted certificate (that in itself is worth another post). Much like TLS, it can include as many certificates as it wants, regardless if it participates in the chain or not. Or possibly there is more than one chain that could be built if a certificate is cross-signed.

So if your computer trusts a certificate called “Fabrikam Root”, but your certificate was issued by an intermediate certificate called “Fabrikam Reseller Intermediate”, without knowledge of the intermediate the verifier could not follow the chain of certificates back to the trusted root.

The certificates that are part of the signature are also not verified like the Unauthenticated Attributes. This allows someone to inject or replace a certificate. This appears to be what Chrome’s latest installer is doing.

Chrome’s signature includes a certificate called “Dummy Certificate”. This certificate is a bit odd, as it has an extension that is a few kilobytes. Not impossible or bad, but it stands out.  This extension has an OID of “1.3.6.1.4.1.11129.2.1.9999”. Looking at its contents, it appears to contain additional information very similar to Dropbox’s. Most of the content of the attribute is nulls, presumably to leave plenty of space for more data if they ever need it.

dummy-cert

This is a rather interesting technique, and I’m not sure what value it provides over the Unauthenticated Attributes other than it’s a bit harder to spot. It does require creating a new certificate every time, since you cannot change an existing one.

The similarity of the data to Dropbox’s made me think that this is likely being done by a tool. Indeed, after a bit of digging, it appears that Google’s Omaha project contains Go code for doing this. Dropbox is probably just using an older version of the tool that uses WIN_CERTIFICATE padding as the README seems to imply it once did.

Is this Bad?

I’m not sure. Clearly many organization that are security conscientious are doing it, so they are at least willing to accept that any risks are worth the result. I would recommend not doing it, if it can be helped. While doable, getting these things right can be hard.

You might also consider the privacy implications of this for installers. These watermarked installers can tell a lot about you if you’re using a unique installer. They know how much time has passed between downloading and installing, how many times the installer is run, and if you were signed in to an account when you downloaded the installer, it’s very likely that’s tied to the installer, too.

Authenticode Lint will attempt to flag all of these scenarios.

xchg rax, rax – 0x04

Moving along onto page 0x04, we have something different from our last two. It’s also quite short:

xor      al,0x20

That’s it, in its entirety. The al register is the lower 8 bits of the eax/rax register. Let’s demonstrate with LLDB:

register write rax 0x123456789abcdef0

rax = 0x123456789abcdef0
eax = 0x9abcdef0
ax = 0xdef0
ah = 0xde
al = 0xf0

OK, so now we know what the al register is. Now its a matter of trying to figure out what the purpose of xor’ing it with 0x20 might be. Let’s see how 0x20 might be special. It helps to look at it in a few different base representations. 0x20 is base-16, and in base-10 it’s 32, and in binary it’s b00100000. Exactly one bit. So what the xor is doing is toggling the 6th bit.

That information alone is enough to Google what the intention is. Before you do though, here’s a hint. Take a look at an ASCII table, and look at the letters in binary form.

Letter Binary
A b01000001
a b01100001
B b01000010
b b01100010
C b01000011
c b01100011
etc…

Thanks to the handy layout of the ASCII table, we can see that the xor toggles whether a letter is uppercase or lowercase.

xchg rax, rax – 0x03

Now we are on to page 0x03. This one has a little bit more going on, but the previous post prepares us for it.

Here is our code:

sub      rdx,rax
sbb      rcx,rcx
and      rcx,rdx
add      rax,rcx

Our first instruction is sub, which is subtract. It subtracts the second operand from the first operand, and stores it in the first operand. It affects quite a number of flags, too, including CF. We know from the previous post that the following instruction, sbb, pays interest to the CF flag.

x86 uses CF as a borrow flag. Meaning, if you do a-b and a is less than b, then CF is set.

Bringing this back around to our original snippet, we can assume that the second instruction will behavior differently depending on if rdx is less than rax.

Again borrowing knowledge from the last page, we know what sbb will do when both operands are the same and if the carry flag is set or not. Let’s start with the first two instructions.

We have two test cases, so we’ll run through it twice.

Here is a case where rdx is greater than rax:

Initial:
rax = 0x0000000000000005
rcx = 0x8080808080808080
rdx = 0x000000000000000d
rflags = 0x0000000000000202 (CF = 0)

Step (sub):
rax = 0x0000000000000005
rcx = 0x8080808080808080
rdx = 0x0000000000000008
rflags = 0x0000000000000202 (CF = 0)

Step: (sbb):
rax = 0x0000000000000005
rcx = 0x0000000000000000
rdx = 0x0000000000000008
rflags = 0x0000000000000246 (CF = 0)

So, to recap this, when rdx was 13, we subtracted 5 from it, putting 8 into the destination, rdx. Since this didn’t borrow, CF is 0. Next is sbb on rcx. Since CF is zero, nothing gets added to rcx. Then we subtract rcx from rcx, and store it in rcx, effectively zeroing the register.

Let’s try it with rax as 13 and rdx as 5:

Initial:
rax = 0x000000000000000d
rcx = 0x8080808080808080
rdx = 0x0000000000000005
rflags = 0x0000000000000202 (CF = 0)

Step (sub):
rax = 0x000000000000000d
rcx = 0x8080808080808080
rdx = 0xfffffffffffffff8
rflags = 0x0000000000000293 (CF = 1)

Step: (sbb):
rax = 0x000000000000000d
rcx = 0xffffffffffffffff
rdx = 0xfffffffffffffff8
rflags = 0x0000000000000297 (CF = 1)

We start by subtracting 13 from 5, which puts -8 into rdx. This also sets CF to 1. Moving on to sbb, it takes our value, 0x8080808080808080, and adds 1, resulting in 0x8080808080808081. It then subtracts 0x8080808080808081 from 0x8080808080808080, which is -1, so -1 is put in the rcx register.

Note that in this problem, it doesn’t matter what rcx is set to. After the sbb, rcx will either contain -1 (all bits set) or 0 (no bits set) depending on the previous instruction. This brings us into the next instruction, and.

The instruction does bitwise and between rdx and rcx, and stores the result in rcx. As we just figured out, rcx will either have no bits set or all of its bits set. So rcx will either be zero, or it will be what rdx is.

Finally, we add rax and rcx, and store the result in rax. Let’s see the whole thing in action.

In the first case, rcx is zero, so adding rcx and rax together does nothing to rax.

In the second case, rcx is -8, do we add -8 to 13, which is 5, which was the value of rdx in the first place.

This appears to be a conditional assignment that doesn’t use an actual conditional instruction. In pseudo code, it would like thing like this:

if (rdx < rax) then
    rax = rdx

Let's see the whole thing in action.

When rdx is 13 and rax is 5:

Initial:
rax = 0x0000000000000005
rcx = 0x8080808080808080
rdx = 0x000000000000000d
rflags = 0x0000000000000202 (CF = 0)

Step (sub):
rax = 0x0000000000000005
rcx = 0x8080808080808080
rdx = 0x0000000000000008
rflags = 0x0000000000000202 (CF = 0)

Step (sbb):
rax = 0x0000000000000005
rcx = 0x0000000000000000
rdx = 0x0000000000000008
rflags = 0x0000000000000246 (CF = 0)

Step (and):
rax = 0x0000000000000005
rcx = 0x0000000000000000
rdx = 0x0000000000000008
rflags = 0x0000000000000246 (CF = 0)


Step (add):
rax = 0x0000000000000005
rcx = 0x0000000000000000
rdx = 0x0000000000000008
rflags = 0x0000000000000206 (CF = 0)

When rax is 13 and rdx is 5:

Initial:
rax = 0x000000000000000d
rcx = 0x8080808080808080
rdx = 0x0000000000000005
rflags = 0x0000000000000202 (CF = 0)

Step (sub):
rax = 0x000000000000000d
rcx = 0x8080808080808080
rdx = 0xfffffffffffffff8
rflags = 0x0000000000000293 (CF = 1)

Step (sbb):
rax = 0x000000000000000d
rcx = 0xffffffffffffffff
rdx = 0xfffffffffffffff8
rflags = 0x0000000000000297 (CF = 1)

Step (and):
rax = 0x000000000000000d
rcx = 0xfffffffffffffff8
rdx = 0xfffffffffffffff8
rflags = 0x0000000000000282 (CF = 0)


Step (add):
rax = 0x0000000000000005
rcx = 0xfffffffffffffff8
rdx = 0xfffffffffffffff8
rflags = 0x0000000000000217 (CF = 1)

xchg rax, rax – 0x02

Moving on to 0x02, we have another short but more subtle program:

neg      rax
sbb      rax,rax
neg      rax

We have two unique instructions, all dealing directly with the rax register.

Starting with neg, this is a two’s complement negation. It’s functionally equivalent to subtracting the value from zero. It also sets the cf flag (carry flag) if the source is zero to zero, otherwise it sets cf to one.

sbb is the next instruction, or subtraction with borrow. The Intel Instruction Reference has a good description of this, “Adds the source operand (second operand) and the carry (CF) flag, and subtracts the result from the destination operand (first operand). The result of the subtraction is stored in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one instruction.) The state of the CF flag represents a borrow from a previous subtraction.

Let’s start by observing the affect of just the first two instructions and see how they work. We know from the descriptions of neg that we can expect different behaviors whether or not rax is zero, so let’s try it with one and zero and see what the results are.

Initial:
rax = 0x0000000000000000
rflags = 0x0000000000000202 (CF = 0)

Step (neg):
rax = 0x0000000000000000
rflags = 0x0000000000000246 (CF = 0)

Step (sbb):
rax = 0x0000000000000000
rflags = 0x0000000000000246 (CF = 0)

And let’s try it when rax is 1:

Initial:
rax = 0x0000000000000001
rflags = 0x0000000000000202 (CF = 0)

Step (neg):
rax = 0xffffffffffffffff
rflags = 0x0000000000000297 (CF = 1)

Step (sbb):
rax = 0xffffffffffffffff
rflags = 0x0000000000000297 (CF = 1)

Let’s walk through the first example.

rax is zero, and negating zero is zero again, so zero gets set in the destination. We can also see that CF is set to zero. Next is sbb. It adds the source (rax) and the carry flag. Zero plus zero is zero, then zero is subtracted from zero and stored in rax, which is zero.

A whole lot of zeros.

Now for the second example. We start with 1 and negate it, so -1. In two’s complement that’s 0xffffffffffffffff. We also see that the CF is set to 1. Next for sbb, we add -1 and the CF flag, so, back to zero. But we haven’t done the subtraction yet. So subtract zero from -1, and we are still left with -1, which is what we see in rax.

OK, but we still have a final neg left. We can easily determine that the negative of zero is zero for the second example, -1 negated is back to 1.

In both cases, we end up right back to where we started. Doesn’t seem spectacularly interesting. Let’s try a random-ish value for rax, like 89.

Initial:
rax = 0x0000000000000059
rflags = 0x0000000000000202 (CF = 0)

Step (neg):
rax = 0xffffffffffffffa7
rflags = 0x0000000000000293 (CF = 1)

Step (sbb):
rax = 0xffffffffffffffff
rflags = 0x0000000000000297 (CF = 1)

This one is more interesting. We take 89 and negate it, giving is -89. CF gets set to 1. Next for sbb, we take -89 and add CF, which gives us -88. The destination operand is -89 still, so -89 - (-88) is -1.

The final instruction negates that back to 1.

Turns out it behaves that way for all values other than zero because of the way neg behaves with regard to the carry flag.

So what is the purpose of this then? It’s a branchless, simple way of setting something to 1 for any value other than zero. It doesn’t seem like much, but it’s clever. In pseudo code, it might look something like this:

if (value != 0) then
    value = 1
else
    value = 0

But as the code shows, it does this all without branching. Very clever.

xchg rax, rax – 0x01

On the next page of xchg rax, rax, we’re given a very simple program:

.loop:
    xadd     rax,rdx
    loop     .loop

We know from the previous post how loop works. Each time it loops it decrements the rcx register. So we know that we need to set the register to something other than zero if we want to tinker with it, so I set it to 10.

The xadd is the instruction of interest, and that is entirely the body of the loop. xadd is exchange and add.

The Intel x86-64 reference manual describes it as “Exchanges the first operand (destination operand) with the second operand (source operand), then loads the sum of the two values into the destination operand.”

So all we are doing in the loop is adding and exchanging the values in the rax and rdx register. The book offers no hints on what the code is supposed to do, so the best we can do here is tinker with the value of the registers and see if the results are anything clever. We can make some guesses though. If the registers are both zero, we can figure that nothing interesting will ever happen. The loop will keep adding zeros until the loop counter reaches zero.

You might recognize what this does just by looking at the assembly. As a hint, set rax to 1, and rdx to 1, and watch the value of rax. Here are the values of rax after each iteration of the loop:

Initial:
rax = 0x0000000000000001
rdx = 0x0000000000000001
rcx = 0x000000000000000a

Next:
rax = 0x0000000000000002
rdx = 0x0000000000000001
rcx = 0x0000000000000009

Next: 
rax = 0x0000000000000003
rdx = 0x0000000000000002
rcx = 0x0000000000000008

Next: 
rax = 0x0000000000000005
rdx = 0x0000000000000003
rcx = 0x0000000000000007

Next: 
rax = 0x0000000000000008
rdx = 0x0000000000000005
rcx = 0x0000000000000006

Next: 
rax = 0x000000000000000d
rdx = 0x0000000000000008
rcx = 0x0000000000000005

And so on until rcx reaches zero.

You might recognize this as the Fibonacci sequence. Just about any developer at one point has tried implementing the Fibonacci sequence, either to learn a new language, for fun, or for school.

I find it impressive that using assembly you can accomplish this with a single instruction and a loop.