• Moving to Static Content

    If my site is looking a little different today, that’s because I’ve redone it from scratch. Gone is WordPress, gone is PHP.

    Like many others, I’ve started using a static site generator, in this case Jekyll. Static content makes a lot more sense, and a lot of things I wanted to play around with on my previous blog I didn’t get to do because WordPress fought me most of the way.

    Things are simpler here. There is no JavaScript. I’ve abandoned any kind of in-page analytics because I don’t value it more than I value other people’s privacy. Here, all we have is static HTML and CSS.

    No JavaScript, dynamic content, or assets from other domains means I can have a plain and simple Content Security Policy, which I effectively couldn’t do with WordPress due to the mess of inline CSS and JavaScript that were thrown around.

    It also means I can enable brotli on everything.

    Finally, there is a real deploy process for this. No more manually crushing images and creating WebP variants of the image by hand. This all happens automatically, behind the scenes.

    Making it Work

    The site’s content is now on GitHub. On commit, GitHub notifies AWS CodeDeploy, which pulls down the repository to the EC2 instance and kicks off the build. It starts as a gulp task, which runs Jekyll, then compresses images and creates WebP copies. The repository also contains the NGINX configuration, which CodeDeploy copies to the correct location and then reloads NGINX.

    AWS CodeDeploy works pretty well for this. It’s a tad difficult to get started with, which was a bit discouraging, but after reading the documentation through a few times it eventually clicked and I was able to get it working correctly.

    The migration has left some things missing, for now, such as comments, but eventually I’ll bring those back.

  • Authenticode Stuffing Tricks

    I recently started a project called Authenticode Lint. The tool has two purposes. The primary one being, “Am I digitally signing my binaries correctly?” and two “Are other people signing their binaries correctly?”

    To back up a bit, Authenticode is the scheme that Microsoft uses to digitally sign DLLs, EXEs, etc. It’s not a difficult thing to do, but it does offer enough flexibility that it can be done in a suboptimal way. The linter is made up of a series of checks that either pass or fail.

    When you sign a binary, the signature is embedded inside of it (usually, there are exceptions). The goal of the signature is to ensure the binary hasn’t been tampered with, and that it comes from a trusted source. The former presents a problem.

    If I were to take a binary, and computer a signature on it to make sure it hasn’t changed, then embed the signature in the binary, I just changed the contents of the binary and invalidated the signature I just computed by embedding it.

    To work around this problem, there are some places inside of EXEs that the digital signature process ignores. The notable one being the place that signatures go. So the section that signatures go is completely ignored, as is the checksum of the file in the optional header.

    Now we have tamper-proof binaries that prevent changing the executable after its been signed, right?

    Ideally, yes, but unfortunately, no. There are some legitimate reasons to change a binary after its been signed. Some applications might want to embed a per-user configuration. Re-signing the executable on a per-user basis is to costly in terms of time and security. Signing is relatively fast, but not fast enough to scale reasonably. It would also mean that to perform the re-sign, the signing keys would need to be available to an automated system. That’s generally not a good idea, as a signing key should either be on an HSM or SmartCard and always done by one (or more if using m/n) person manually.

    It turns out it is possible to slightly modify an executable after its been signed. There are a few ways to do this, and I’ll cover as many as I know.

  • xchg rax, rax – 0x04

    Moving along onto page 0x04, we have something different from our last two. It’s also quite short:

    xor      al,0x20

    That’s it, in its entirety. The al register is the lower 8 bits of the eax/rax register. Let’s demonstrate with LLDB:

    register write rax 0x123456789abcdef0
    rax = 0x123456789abcdef0
    eax = 0x9abcdef0
    ax = 0xdef0
    ah = 0xde
    al = 0xf0

    OK, so now we know what the al register is. Now its a matter of trying to figure out what the purpose of xor’ing it with 0x20 might be. Let’s see how 0x20 might be special. It helps to look at it in a few different base representations. 0x20 is base-16, and in base-10 it’s 32, and in binary it’s b00100000. Exactly one bit. So what the xor is doing is toggling the 6th bit.

    That information alone is enough to Google what the intention is. Before you do though, here’s a hint. Take a look at an ASCII table, and look at the letters in binary form.

    Letter Binary
    A b01000001
    a b01100001
    B b01000010
    b b01100010
    C b01000011
    c b01100011

    Thanks to the handy layout of the ASCII table, we can see that the xor toggles whether a letter is uppercase or lowercase.

  • xchg rax, rax – 0x03

    Now we are on to page 0x03. This one has a little bit more going on, but the previous post prepares us for it.

    Here is our code:

    sub      rdx,rax
    sbb      rcx,rcx
    and      rcx,rdx
    add      rax,rcx

    Our first instruction is sub, which is subtract. It subtracts the second operand from the first operand, and stores it in the first operand. It affects quite a number of flags, too, including CF. We know from the previous post that the following instruction, sbb, pays interest to the CF flag.

    x86 uses CF as a borrow flag. Meaning, if you do a-b and a is less than b, then CF is set.

    Bringing this back around to our original snippet, we can assume that the second instruction will behavior differently depending on if rdx is less than rax.

    Again borrowing knowledge from the last page, we know what sbb will do when both operands are the same and if the carry flag is set or not. Let’s start with the first two instructions.

    We have two test cases, so we’ll run through it twice.

    Here is a case where rdx is greater than rax:

  • xchg rax, rax – 0x02

    Moving on to 0x02, we have another short but more subtle program:

    neg      rax
    sbb      rax,rax
    neg      rax

    We have two unique instructions, all dealing directly with the rax register.

    Starting with neg, this is a two’s complement negation. It’s functionally equivalent to subtracting the value from zero. It also sets the cf flag (carry flag) if the source is zero to zero, otherwise it sets cf to one.

    sbb is the next instruction, or subtraction with borrow. The Intel Instruction Reference has a good description of this, “Adds the source operand (second operand) and the carry (CF) flag, and subtracts the result from the destination operand (first operand). The result of the subtraction is stored in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one instruction.) The state of the CF flag represents a borrow from a previous subtraction.”

    Let’s start by observing the affect of just the first two instructions and see how they work. We know from the descriptions of neg that we can expect different behaviors whether or not rax is zero, so let’s try it with one and zero and see what the results are.

    rax = 0x0000000000000000
    rflags = 0x0000000000000202 (CF = 0)
    Step (neg):
    rax = 0x0000000000000000
    rflags = 0x0000000000000246 (CF = 0)
    Step (sbb):
    rax = 0x0000000000000000
    rflags = 0x0000000000000246 (CF = 0)

    And let’s try it when rax is 1:

    rax = 0x0000000000000001
    rflags = 0x0000000000000202 (CF = 0)
    Step (neg):
    rax = 0xffffffffffffffff
    rflags = 0x0000000000000297 (CF = 1)
    Step (sbb):
    rax = 0xffffffffffffffff
    rflags = 0x0000000000000297 (CF = 1)

    Tip: You can use print/t $rflags to see the individual flags in LLDB. We know that the Carry Flag is bit zero.

    rax is zero, and negating zero is zero again, so zero gets set in the destination. We can also see that CF is set to zero. Next is sbb. It adds the source (rax) and the carry flag. Zero plus zero is zero, then zero is subtracted from zero and stored in rax, which is zero.

    A whole lot of zeros.

    Now for the second example. We start with 1 and negate it, so -1. In two’s complement that’s 0xffffffffffffffff. We also see that the CF is set to 1. Next for sbb, we add -1 and the CF flag, so, back to zero. But we haven’t done the subtraction yet. So subtract zero from -1, and we are still left with -1, which is what we see in rax.

    OK, but we still have a final neg left. We can easily determine that the negative of zero is zero for the second example, -1 negated is back to 1.

    In both cases, we end up right back to where we started. Doesn’t seem spectacularly interesting. Let’s try a random-ish value for rax, like 89.

  • xchg rax, rax – 0x01

    On the next page of xchg rax, rax, we’re given a very simple program:

        xadd     rax,rdx
        loop     .loop

    We know from the previous post how loop works. Each time it loops it decrements the rcx register. So we know that we need to set the register to something other than zero if we want to tinker with it, so I set it to 10.

    The xadd is the instruction of interest, and that is entirely the body of the loop. xadd is exchange and add.

    The Intel x86-64 reference manual describes it as “Exchanges the first operand (destination operand) with the second operand (source operand), then loads the sum of the two values into the destination operand.”

    So all we are doing in the loop is adding and exchanging the values in the rax and rdx register. The book offers no hints on what the code is supposed to do, so the best we can do here is tinker with the value of the registers and see if the results are anything clever. We can make some guesses though. If the registers are both zero, we can figure that nothing interesting will ever happen. The loop will keep adding zeros until the loop counter reaches zero.

    You might recognize what this does just by looking at the assembly. As a hint, set rax to 1, and rdx to 1, and watch the value of rax. Here are the values of rax after each iteration of the loop:

    rax = 0x0000000000000001
    rdx = 0x0000000000000001
    rcx = 0x000000000000000a
    rax = 0x0000000000000002
    rdx = 0x0000000000000001
    rcx = 0x0000000000000009
    rax = 0x0000000000000003
    rdx = 0x0000000000000002
    rcx = 0x0000000000000008
    rax = 0x0000000000000005
    rdx = 0x0000000000000003
    rcx = 0x0000000000000007
    rax = 0x0000000000000008
    rdx = 0x0000000000000005
    rcx = 0x0000000000000006
    rax = 0x000000000000000d
    rdx = 0x0000000000000008
    rcx = 0x0000000000000005

    And so on until rcx reaches zero.

    You might recognize this as the Fibonacci sequence. Just about any developer at one point has tried implementing the Fibonacci sequence, either to learn a new language, for fun, or for school.

    I find it impressive that using assembly you can accomplish this with a single instruction and a loop.

  • xchg rax, rax – 0x00

    I recently picked up the book xchg rax,rax. This book is fascinating to me, so I thought I would blog about my interpretations of it one page at a time. I’m not an assembler expert, but I want to get better at it. I’m not going to go over how to run the assembly, there are a lot of posts out there to get started.

    A little background on the book, in case this blog series doesn’t make sense: neither does the book. The book is 63 pages of x86-64 assembly snippets. Other than the requisite copyright notices, there are no words. There is no context to each snippet, it’s up to reader to interpret them. Fun! The book is also freely available online.

    The first page, 0x00 is fairly simple. It demonstrates different ways to zero a register.

    The first instruction zeros the eax register:

    xor      eax,eax

    This is the most common way I see to zero a register. XORing any number with itself will produce zero. It offers a very compact encoding size. Almost every function prolog is zeroing registers, so it’s a task that needs to be done quite often.

    The second instruction zeros the ebx/rbx register:

    lea      rbx,[0]

    lea, or load effective address simply loads the address zero and stores it in the destination operand, rbx. There isn’t anything better about this approach, but it’s a way you can do it.

    The next one is a bit more interesting:

    loop     $

    The loop instruction does what it implies: it loops. The $ in this case means the current address counter. So, we’re looping to the same place, over and over again. However each time the loop executes, it decrements the ecx/rcx register. When the register reaches zero, the execution continues with the instruction after the loop instruction. It’s a very inefficient way to zero the ecx/rcx register.

    The next one is bit more obvious:

    mov      rdx,0

    This moves the value zero into the edx/rdx register. This is a much quicker way to zero a register, but it has a higher encoding size. The zero ends up getting encoded as a 64-bit value to move into the rdx register. That’s several bytes just to zero a register.

    The next is similar, too:

    and      esi,0

    This does a bitwise AND with the esi/rsi register (the source) and zero, and stores the result in esi/rsi. ANDing any number with zero will always produce zero.

    The next one uses subtraction:

    sub      edi,edi

    It subtracts the edi register from itself and stores the result in the first operand, the edi register.

    And finally, it ends with this:

    push     0
    pop      rbp

    The first instruction pushes zero onto the the stack, and the second pops the value off the stack into the rbp register. This uses two whole instructions for zeroing a register, it isn’t exactly efficient.

    That wraps up the first page.

  • Parsing and modifying HTML in a Fiddler Extension

    Continuing my “do everything in Fiddler” approach to web debugging, I ran into a situation where I wanted to parse and modify the response of the server before the browser received the response using Fiddler.

    It’s definitely doable, but there wasn’t a clear cut example on how to do that, so here we go.

    The best to start is Telerik’s documentation on building an extension. This covers the ins and outs of getting started with developing an extension. Once you have a “hello world” extension working, you’re ready to start parsing HTML.

    The Fiddler interface of choice here is going to be IAutoTamper2, and use the interface method AutoTamperResponseBefore. AutoTamperResponseBefore is where we want to modify the HTML. This method is called after Fiddler has received the response from the server, but before it has pushed it to the browser. Modification’s to the response body here will be reflected in what the browser renders.

    There are a few guard checks we want to make first. Since we want to modify HTML, we should check that the response is actually HTML. We can partially accomplish this by examining the Content-Type header. If it contains “text/html”, then there is a good chance the content is HTML. Consult the IANA registry for other content types you may want to handle.

  • AWS Lambda Gets Useful with VPC support

    OK, so this post’s title is a bit harsh, but AWS Lambda has added something really great.

    To back up, Lambda is a service offered by AWS as a means of running code without jumping head first into full blown EC2 instances or Containers. They can do some very interesting things, such as using them as responses to AWS API Gateways, etc.

    Previously, there was one big hurdle to using Lambda for us. You couldn’t place them inside of a VPC. This means that whatever Lambda is accessing had to be publicly accessible. Most of our infrastructure is private within the VPC, and you couldn’t access it from the outside. Moreover, we didn’t want to make it accessible from the outside.

    There was a thread on the AWS Forums about this, and AWS listened. You can now place a Lambda function inside of a VPC. More importantly, you can assign them in to security groups.

    The use for this is very interesting to us, as, now we can use it without exposing things to the outside we didn’t want to. One interesting case might be to act as a cron job. If you want something to run periodically, but don’t want to worry about where that cron job lives, Lambda is a good place to start.

    As an example, we may want to periodically run optimize on our SOLR cluster. Well, with Lambda, we can now do that.

    We have a simple node.js script that hits our SolrCloud cluster with a GET request to http://internal-solr-cluster:8983/solr/ourcollection/update?optimize=true.

    Previously, as a Lambda function, it would not have been able to access the internal-solr-cluster Elastic Load Balancer. Once we assigned it to a VPC, placed it in the right security groups, and specified a CloudWatch Event to run on a schedule of once a week, we now have our SOLR collection getting optimized once a week without having to worry where the optimization runs from.

  • Regaining Access to OS X after a lost Yubikey

    The Yubikey by Yubico has an interesting use beyond just OTP. It can do a myriad of things, including storing certificates, OATH, and, more interestingly, HMAC-SHA1 challenge response. The last of which is interesting because it can be used with a PAM module.

    OS X supports PAM modules, and one of Yubico’s touted features is that you can install a PAM module on OS X, and you now have two factor authentication into your OS X account. In addition to the password, the Yubikey must also be plugged in.

    I set that up a while ago and it had been working fine, but I ran into a situation where I needed to turn it off, temporarily, because I couldn’t actually log in. Say, because I didn’t have my Yubikey with me.

    Turns out this is really trivial. Just boot the Mac into recovery mode by holding Command+R during boot. This let me edit the /etc/pam.d/authorization file and comment out the Yubico PAM module. Once saved, a quick reboot command later, I was back into my account, two factor turned off. The only thing to note is that you want to edit the one on your Macintosh HD volume under /Volumes, not the authorization file that the recovery partition uses.

    This made my life easier, but it also led me to believe the Yubikey PAM module on local OS X accounts had diminished value (the story is different for remote authentication). If I can just turn it off with very little effort, no authentication required, that’s worrying.

    There is a way to partially fix it – which is FileVault2. When you boot into the Recovery console with FileVault2 enabled, you cannot edit /etc/pam.d/authorization without knowing the password to the volume since it is encrypted with your password. This however, still reduces authorization to a single factor. If I have your password and no Yubikey, even with FileVault2 enabled I can get in to the account since I have physical access.

    This takes a few seconds of extra work. First, you need the UUID of the volume that you need to decrypt (like “Macintosh HD”).

    diskutil coreStorage list

    and grab the UUID of the logical volume. From there, it’s just one more command:

    diskutil coreStorage unlockVolume <UUID> -stdinpassphrase

    Enter your password, and then the volume will be mounted in /Volumes/.

    In an ideal world, the Yubikey would play a role in unlocking the FileVault2 volume. This is easy enough to do with BitLocker and certificates since the Yubikey can act like a PIV card. However I find this not possible with FileVault2. Even in the case of BitLocker, it’s difficult to accomplish this without the help of being on an Active Directory Domain Joined machine and using an Active Directory account.

    My advice would be, take the value that the Yubikey PAM module gives with a grain of salt for local account protection. At least on OS X (I have yet to bother trying on Windows) it’s quite easy to turn it off just by having access to the physical machine.

    A lot of people will be quick to point out, “If you have physical access to the hardware, then it’s game over” however that doesn’t quite mean physical security should just be completely ignored. Each little improvement has value.