Live Fast, Code Hard, Die Young

Archive for the ‘Technology’ Category

Gitly bug hunting

I wrote this article when I worked a lot on Gitly back in 2012 but somehow never got around to publishing it. I thought it was too technical. Maybe someone will find it useful though so I’m posting it now for fun…

Just the other day I was chugging along nicely with the development of Gitly (my own Git client) when suddenly I stumbled across a peculiar issue. Gitly claimed that a file in the working directory was modified even though I know it wasn’t. No other Git client reported it as modified either so clearly it must be a bug in my code, right?

Obviously that was my initial thought as well so I started bug hunting. It must certainly be a problem with my code for calculating the SHA-1 hash for the file I thought. Well, it wasn’t. After debugging it for a while I got even more puzzled. It seemed that my code was right but that the values inside the Git repository was wrong! How could that be possible?!

The mystery

  • My program thinks that a file has been modified.
  • I know that the file was not modified.

Seems like a bug that should be easy to fix, right? Well, it took me deep down the rabbit hole which is why I wanted to write about it…

(Bare with me if this is a little too technical…)

First, let’s take a step back and look at the whole picture here…

To find out what has changed in the working directory I compare the files of the latest commit in the repository with the files in the working directory. Each file has a SHA-1 hash that identifies it so to find out if a file is modified all I have to do is to compare the hashes. The file is a C# file called AppBootstrapper.cs that I knew for a fact hasn’t changed. In this particular case Gitly concludes that the files have different hashes and thus must be changed. My code that calculates the hash for this file in the working directory found it to be 2bdb764a6d2a8c7d92dc3f194f8a612c1f524795. But the file AppBootstrapper.cs is stored in the Git repository under a different hash which is eb72ad7dc91b71eba1fe6bf06b7186ed4c94a65b. Clearly something must be wrong here. Probably I am calculating something in the wrong way, right?

Let’s set the stage here. We have two hash codes for the same file. One of them must be wrong! Since hash codes a pretty long and scary let’s refer to them as hash code A and B like this:

  • Hash A: 2bdb764a6d2a8c7d92dc3f194f8a612c1f524795
  • Hash B: eb72ad7dc91b71eba1fe6bf06b7186ed4c94a65b

The theory I have now is that my code calculates hash A for the file but it must be doing it wrong. The correct hash should be hash B. Let’s examine the repository to see if I am right.

To dig into the repository I use the “real” Git command line client for Windows (msysgit). What does it say about this file?

$ git status AppBootstrapper.cs
# On branch master
nothing to commit (working directory clean)

Alright, this seems fine. The file is not modified according to msysgit. Let’s see if we can find out more. To calculate the hash for a file you can use the command git hash-object <filename>. Here is the result:

$ git hash-object AppBootstrapper.cs
2bdb764a6d2a8c7d92dc3f194f8a612c1f524795

What?! This is not what I expected. This hash is the exact same that my code produces. Something is fishy here.

Next step, dear Git – what is it that you have stored in the repository? It sure does not seem to be the same file that we have on disk. Let’s dig even deeper…

We can dump the contents of a Git object using git cat-file -p <hash> like this:

$ git cat-file -p 2bdb764a6d2a8c7d92dc3f194f8a612c1f524795
error: unable to find 2bdb764a6d2a8c7d92dc3f194f8a612c1f524795
fatal: Not a valid object name 2bdb764a6d2a8c7d92dc3f194f8a612c1f524795

Uh oh, we have nothing stored with that hash code? There is no object in the Git repository with that hash code. What about the other hash then?

$ git cat-file -p eb72ad7dc91b71eba1fe6bf06b7186ed4c94a65b
´╗┐using System;
using System.ComponentModel;
using System.Linq;
using System.Reflection;
...

Okay, this is clearly the file we are looking for. However, there is some garbage at the start of the file. Could it possibly be the UTF-8 byte order mark? Maybe that is what is causing problems? Let’s dig a little deeper…

Git stores everything quite logically so looking up an object on disk is straight forward (as long as the repository is not in a packed format). I found the file stored under .git/objects/eb/72ad7dc91b71eba1fe6bf06b7186ed4c94a65b.

Okay what can we do? I found that git has another command to dump the contents of a blob that escapes special characters. You simple write git show <hash> to use it. Let’s see what it gives us:

$ git show eb72ad7dc91b71eba1fe6bf06b7186ed4c94a65b
<EF><BB><BF>using System;
using System.ComponentModel;
using System.Linq;
using System.Reflection;
using Autofac;

Nifty indeed! It escaped the characters for us and now we clearly see that it indeed is the UTF-8 BOM which consists of the byte sequence 0xEF, 0xBB, 0xBF.

The problem is that the file on disk also has this exact same byte order mark. No luck there…

With a little help of some Ruby we can unpack the object from the Git repository and take a look inside:

>ruby -rzlib -e 'print Zlib::Inflate.new.inflate(STDIN.read)' < ./eb/72ad7dc91b71eba1fe6bf06b7186ed4c94a65b
-e:1:in `inflate': invalid distance too far back (Zlib::DataError)
 from -e:1:in `<main>'

What the heck? There seems to be an error in the compressed file?

$ git fsck
Checking object directories: 100% (256/256), done.
dangling blob 4e81d9100e10ececbb12d8375710047d0a8a7b25
dangling blob 4b7a370cf24b0a9aec69950ffbcb51c5920437e0

No problem with the repository there.

Ok this calls for bigger guns! Let’s see what libgit2 does with this file…

I wrote a little program that calls libgit2 to get the hash code for the file, like this:

git_oid hash;
if (git_odb_hashfile(&hash,
                     "d:\\projects\\current\\twitch\\source\\twitch\\AppBootstrapper.cs",
                     GIT_OBJ_BLOB) != GIT_SUCCESS)
    return -1;

std::string s(git_oid_allocfmt(&hash));
std::cout << "Hash: " << s << std::endl;

Now guess what! This little bugger gives me hash code B! Very interesting! Here is a screenshot of my debugging session showing the hash code I got:

Digging into the git_odb_hashfile method a bit more I learn that the actual hashing simply hashes the raw data of the file without considering the core.autocrlf flag. Aha! It’s starting to make some sense now. It’s a bug in libgit2!

In this memory dump we clearly see a lot of similar sequences that I have highlighted which is nothing else but Carriage Return and Line Feed characters, i.e. standard Windows line endings. In other words, the file was committed to the repository without stripping the CR characters probably because it was committed before I happened to turn on the autocrlf flag.

Of course it’s pretty obvious once you have all the facts but these things can really make you go crazy before you solve them.

Line endings is the bane of git and I don’t know why this mess exists. In my opinion end users should not have to worry about this at all. It should all just work. Unfortunately I don’t have any solution for this.

So there you go, mystery solved! It was the bane of git – the line endings…

If you want to learn more about this problem you should check out this blog post: http://timclem.wordpress.com/2012/03/01/mind-the-end-of-your-line/

In this case I’m guessing that msysgit handles this by doing an actual diff on the contents. The diff will strip out line ending differences so the file will appear unchanged even if the hash codes mismatch. There you go, some git internals revealed!

Fixing the repository…

First I added a .gitattributes file with the following line:

# Ensure text files are normalized
* text=auto

This ensures that everyone working with the repository uses the same setting which is a good thing to ensure we don’t run into more problems in the future.

$ rm .git/index
$ git reset
Unstaged changes after reset:
M Source/Twitch/AppBootstrapper.cs

Nifty! Git has now realized that this file needs to be updated. I just committed this changed file and that was it…

Advertisement

My 64K intro from Icing ’95

Just recently the demo group Farbrausch decided to release their demo tools source code on GitHub. A few days later we saw the release of the original Prince of Persia source code on GitHub. These two events inspired me to go seek my own roots and look for some of the stuff from my demo scene past. I made a few demos and intros both for Amiga and PC during the 90s and I thought, what the heck, maybe it could be fun to put it up!

So I took some time to dig out an old intro that I actually had kept the source code for and I’m posting it on GitHub if anyone is interested. It is a 64K intro that came 3rd place at Icing in 1995. I got it running in Dosbox and made a video of it below if you want to check it out.

Now remember that this was made back in ’95. There were no 3d cards, no DirectX, no Mp3 audio etc. All you see was coded in x86 assembler, ran on a 486DX4 and it all fits in a 64K file.

Now go check out the source code: https://github.com/pontusm/FatalBugs

Cheers!

Windows 8 is here!

Feeling pretty excited after watching the keynote announcing Windows 8 today!

First and foremost it erases all doubts people have had about Silverlight/XAML going away. It’s not going away but instead it is becoming a primary and native technology for developing Windows apps. Another cool thing is that XAML is no longer restricted to .NET but it can also be used from C++ if you need to squeeze out that extra performance in your app. In addition to this you can also write apps using HTML5 and Javascript to run natively on Windows using the same new WinRT APIs. This is great because all those different technologies are good in different situations. To be able to choose the one you like is excellent. To me, it’s like Christmas!

Not only is this a great platform to build stuff for, but it also feels great that you can use the awesome tools Visual Studio and Blend to do it!

All in all, I just wanted to make a quick post about this because it’s such great news. Windows 8 has a lot of new cool features that I recommend you check out.

The preview version of Windows 8 will be available later tonight from http://dev.windows.com so make sure you try it out!

My impressions from MIX10

image I’m finally back home after a crazy and wonderful week visiting MIX10 in Las Vegas. It was a great conference and I really had a good time watching the sessions and meeting people. I got the chance to talk to Brad Abrams, Nikhil Khotari and Colin Blair about WCF RIA Services, chatted with the devs working on Bing, talked to Pete Brown about his cool C64 emulator built in Silverlight, met Roland Weigelt who created the wonderful GhostDoc add-in for Visual Studio and got a tip about Sonic File Finder from his friend. All in all the atmosphere was great and a lot of the speakers and people from Microsoft were hanging around chatting with people all the time which I think was great.

So what impact did MIX10 had on me personally? Well, after doing some thinking I want to present the five most important things that made an impression on me:

1 – Windows Phone 7

image

Microsoft is really pushing the new version of their mobile phone OS. Yeah, they have stolen a lot of ideas from Apple but it’s not the first time Microsoft does that. Copying a successful concept is also a way of making business. However, they’ve introduced some really great innovations as well, improved some of the shortcomings of the competitors and I must say that the phone UI feels really slick and modern. I could actually see myself using this instead of my Iphone! 🙂

What made me really interested in the phone is the fact that it seems so simple to get started with developing for it. Since it runs Silverlight it is really easy to get an application up and running if you’re familiar with .NET development. This is radically different from developing for the Iphone or Android for me. It also supports writing applications in XNA which is a great framework for developing more advanced applications like games.

Additionally Microsoft has released the tools for free so you can build apps with no up-front investment at all. That’s a pretty nice deal for such great tools as Visual Studio and Blend.

http://developer.windowsphone.com/

2 – OData

image The Open Data Protocol (or OData for short) is a pretty awesome concept that Microsoft presented at MIX10.

Many successful internet businesses today enable their users a rich web API that is used to access their service in many different ways with various clients. The web browser is only one type of client in a larger ecosystem of mobile phones, desktop applications and other devices that wants to work with data over the web. Twitter is an example of this, where they only provide a rudimentary interface on the website and the entire experience is greatly enhanced by a large variety of clients available for different platforms.

OData is a REST based API that aims to be a standard way of accessing services over the web. Microsoft enables you to easily expose an OData interface from your ASP.NET application as well as client libraries for consuming OData from various environments. They obviously provide a .NET and WP7 client libraries, but surprisingly they have developed libraries for accessing OData also from PHP, Java and Iphone. That’s really neat!

Another cool thing is a service that goes by the name Codename “Dallas”. This is Microsoft’s solution for people to expose and make money off their data. Even if you’re a small one man business you can take advantage of this and make money if you have something interesting to share. One guy I met in the “RIA Services suite” had a lot of historical baseball data that he wanted to share and I think he could make some money with data.

Check out the keynote from day 2 at about 59:00 for a demo of this stuff:

http://live.visitmix.com/MIX10/Sessions/KEY02

3 – Azure

I saw some demos of Azure at MIX10 and it is starting to look really smooth. I haven’t really had the time to try it out yet but it is looking more and more compelling. Previously I’ve heard that it’s been quite instable and buggy but it looks like that has been straightened out. Maybe it is finally time to start building stuff for the cloud now! 🙂

I really like that you can test your stuff out in the “DevFabric” which is a simulated cloud environment on your dev machine. Then you can for example start by moving up your data storage to the cloud while still running the code on your dev machine and continue testing your solution, and when you’re ready you move it all up into the cloud. It all seemed really simple and all you need to do to get started is to install the Azure SDK which is available from here:

http://dev.windowsazure.com/

4 – IE9

image Microsoft is really taking a leap forward with version 9 of Internet Explorer. IE has lost a lot of users the past years and Microsoft is really committed to improving its performance and standards compliance. It was interesting to see a demonstration of cases where the new IE version really shines in a head to head comparison with other browsers, especially Google Chrome (which is one of the fastest browsers at the moment). Competition like this is really good for the browser market. At the same time Microsoft is really trying to help out its competitors by setting up this website with test cases that they can run to check their performance.

Some notable things about IE9 is the GPU accelerated HTML5 and better standards compliance, for example with rounded and dotted CSS borders…Examples can be found at this address: http://www.ietestdrive.com/

Also check out the keynote from day 2 where they start off talking about the new IE9 engine: http://live.visitmix.com/MIX10/Sessions/KEY02

5 – MVVM

The Model-View-ViewModel design pattern has been around for a while but it is now finally gaining more and more attention among Silverlight developers and most importantly in the tools coming from Microsoft. The new version of Blend will have some really useful features for working in a MVVM oriented way like for example being able to generate sample data based on a ViewModel class in your project.

I watched a couple of great sessions about MVVM at MIX. If you want to learn more you can check out the introductory session by Laurent Bugnion:

http://live.visitmix.com/MIX10/Sessions/EX14

I also saw the session by Rob Eisenberg and was deeply impressed by his elegant solutions and ideas:

http://live.visitmix.com/MIX10/Sessions/EX15

Conclusion

Okay, so that’s a list of things that I found most important at this conference. Of course, there was a lot more going on so I recommend that you take a look at all the videos that have been released and find the things of interest to you:

http://live.visitmix.com/Videos

Feel free to comment if you think I’ve missed something significant!