Live Fast, Code Hard, Die Young

Google App Engine SDK bug?

I encountered a problem with the Google App Engine development server today and it seemed like a good thing to blog about how I solved it.

I got this error when posting binary data to my locally running development server (GAE 1.8.9 and Python 2.7.2):

ProgrammingError(‘You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

I think the problem was due to some limitation in Sqlite but I didn’t dig very deep into it. So how to fix it?

I was able to get around the error by simply patching a source file that comes with the Google AppEngine SDK. In the file logservice_stub.py you can insert the following line:

logservice_stub_py_-_IrisServer_-____Documents_IrisServer_-10

(This file is usually located at /usr/local/google_appengine/google/appengine/api/logservice/logservice_stub.py)

Hope this helps!

 

Today I had a really annoying problem on my Mac at work. Finder kept crashing every minute…Highly annoying!

It turned out that the problem for me was Google Drive. The crashes went away when I turned off the option “Show file sync status icons and right click menu”. See screenshot below.

Image

Hopefully Google will fix this issue soon.

Gitly bug hunting

I wrote this article when I worked a lot on Gitly back in 2012 but somehow never got around to publishing it. I thought it was too technical. Maybe someone will find it useful though so I’m posting it now for fun…

Just the other day I was chugging along nicely with the development of Gitly (my own Git client) when suddenly I stumbled across a peculiar issue. Gitly claimed that a file in the working directory was modified even though I know it wasn’t. No other Git client reported it as modified either so clearly it must be a bug in my code, right?

Obviously that was my initial thought as well so I started bug hunting. It must certainly be a problem with my code for calculating the SHA-1 hash for the file I thought. Well, it wasn’t. After debugging it for a while I got even more puzzled. It seemed that my code was right but that the values inside the Git repository was wrong! How could that be possible?!

The mystery

  • My program thinks that a file has been modified.
  • I know that the file was not modified.

Seems like a bug that should be easy to fix, right? Well, it took me deep down the rabbit hole which is why I wanted to write about it…

(Bare with me if this is a little too technical…)

First, let’s take a step back and look at the whole picture here…

To find out what has changed in the working directory I compare the files of the latest commit in the repository with the files in the working directory. Each file has a SHA-1 hash that identifies it so to find out if a file is modified all I have to do is to compare the hashes. The file is a C# file called AppBootstrapper.cs that I knew for a fact hasn’t changed. In this particular case Gitly concludes that the files have different hashes and thus must be changed. My code that calculates the hash for this file in the working directory found it to be 2bdb764a6d2a8c7d92dc3f194f8a612c1f524795. But the file AppBootstrapper.cs is stored in the Git repository under a different hash which is eb72ad7dc91b71eba1fe6bf06b7186ed4c94a65b. Clearly something must be wrong here. Probably I am calculating something in the wrong way, right?

Let’s set the stage here. We have two hash codes for the same file. One of them must be wrong! Since hash codes a pretty long and scary let’s refer to them as hash code A and B like this:

  • Hash A: 2bdb764a6d2a8c7d92dc3f194f8a612c1f524795
  • Hash B: eb72ad7dc91b71eba1fe6bf06b7186ed4c94a65b

The theory I have now is that my code calculates hash A for the file but it must be doing it wrong. The correct hash should be hash B. Let’s examine the repository to see if I am right.

To dig into the repository I use the “real” Git command line client for Windows (msysgit). What does it say about this file?

$ git status AppBootstrapper.cs
# On branch master
nothing to commit (working directory clean)

Alright, this seems fine. The file is not modified according to msysgit. Let’s see if we can find out more. To calculate the hash for a file you can use the command git hash-object <filename>. Here is the result:

$ git hash-object AppBootstrapper.cs
2bdb764a6d2a8c7d92dc3f194f8a612c1f524795

What?! This is not what I expected. This hash is the exact same that my code produces. Something is fishy here.

Next step, dear Git – what is it that you have stored in the repository? It sure does not seem to be the same file that we have on disk. Let’s dig even deeper…

We can dump the contents of a Git object using git cat-file -p <hash> like this:

$ git cat-file -p 2bdb764a6d2a8c7d92dc3f194f8a612c1f524795
error: unable to find 2bdb764a6d2a8c7d92dc3f194f8a612c1f524795
fatal: Not a valid object name 2bdb764a6d2a8c7d92dc3f194f8a612c1f524795

Uh oh, we have nothing stored with that hash code? There is no object in the Git repository with that hash code. What about the other hash then?

$ git cat-file -p eb72ad7dc91b71eba1fe6bf06b7186ed4c94a65b
´╗┐using System;
using System.ComponentModel;
using System.Linq;
using System.Reflection;
...

Okay, this is clearly the file we are looking for. However, there is some garbage at the start of the file. Could it possibly be the UTF-8 byte order mark? Maybe that is what is causing problems? Let’s dig a little deeper…

Git stores everything quite logically so looking up an object on disk is straight forward (as long as the repository is not in a packed format). I found the file stored under .git/objects/eb/72ad7dc91b71eba1fe6bf06b7186ed4c94a65b.

Okay what can we do? I found that git has another command to dump the contents of a blob that escapes special characters. You simple write git show <hash> to use it. Let’s see what it gives us:

$ git show eb72ad7dc91b71eba1fe6bf06b7186ed4c94a65b
<EF><BB><BF>using System;
using System.ComponentModel;
using System.Linq;
using System.Reflection;
using Autofac;

Nifty indeed! It escaped the characters for us and now we clearly see that it indeed is the UTF-8 BOM which consists of the byte sequence 0xEF, 0xBB, 0xBF.

The problem is that the file on disk also has this exact same byte order mark. No luck there…

With a little help of some Ruby we can unpack the object from the Git repository and take a look inside:

>ruby -rzlib -e 'print Zlib::Inflate.new.inflate(STDIN.read)' < ./eb/72ad7dc91b71eba1fe6bf06b7186ed4c94a65b
-e:1:in `inflate': invalid distance too far back (Zlib::DataError)
 from -e:1:in `<main>'

What the heck? There seems to be an error in the compressed file?

$ git fsck
Checking object directories: 100% (256/256), done.
dangling blob 4e81d9100e10ececbb12d8375710047d0a8a7b25
dangling blob 4b7a370cf24b0a9aec69950ffbcb51c5920437e0

No problem with the repository there.

Ok this calls for bigger guns! Let’s see what libgit2 does with this file…

I wrote a little program that calls libgit2 to get the hash code for the file, like this:

git_oid hash;
if (git_odb_hashfile(&hash,
                     "d:\\projects\\current\\twitch\\source\\twitch\\AppBootstrapper.cs",
                     GIT_OBJ_BLOB) != GIT_SUCCESS)
    return -1;

std::string s(git_oid_allocfmt(&hash));
std::cout << "Hash: " << s << std::endl;

Now guess what! This little bugger gives me hash code B! Very interesting! Here is a screenshot of my debugging session showing the hash code I got:

Digging into the git_odb_hashfile method a bit more I learn that the actual hashing simply hashes the raw data of the file without considering the core.autocrlf flag. Aha! It’s starting to make some sense now. It’s a bug in libgit2!

In this memory dump we clearly see a lot of similar sequences that I have highlighted which is nothing else but Carriage Return and Line Feed characters, i.e. standard Windows line endings. In other words, the file was committed to the repository without stripping the CR characters probably because it was committed before I happened to turn on the autocrlf flag.

Of course it’s pretty obvious once you have all the facts but these things can really make you go crazy before you solve them.

Line endings is the bane of git and I don’t know why this mess exists. In my opinion end users should not have to worry about this at all. It should all just work. Unfortunately I don’t have any solution for this.

So there you go, mystery solved! It was the bane of git – the line endings…

If you want to learn more about this problem you should check out this blog post: http://timclem.wordpress.com/2012/03/01/mind-the-end-of-your-line/

In this case I’m guessing that msysgit handles this by doing an actual diff on the contents. The diff will strip out line ending differences so the file will appear unchanged even if the hash codes mismatch. There you go, some git internals revealed!

Fixing the repository…

First I added a .gitattributes file with the following line:

# Ensure text files are normalized
* text=auto

This ensures that everyone working with the repository uses the same setting which is a good thing to ensure we don’t run into more problems in the future.

$ rm .git/index
$ git reset
Unstaged changes after reset:
M Source/Twitch/AppBootstrapper.cs

Nifty! Git has now realized that this file needs to be updated. I just committed this changed file and that was it…

Changing the Git editor

I am somewhat annoyed by the fact that the command line version of git uses ‘vim’ as the default text editor. Yes, Vim may be cool and all but it is definitely not friendly to new users. It is an advanced tool and it should not be exposed to poor innocent souls who are just getting started with git. I want git to be lovely and cuddly to newbies – not hostile!

So, the best thing to do is to simply change the default editor that git uses when it wants you to fill in some stuff. You can do this with the following command:

$ git config --global core.editor "nano" (Mac)

C:\> git config --global core.editor "notepad" (PC)

You can replace “notepad” or “nano” with your favorite editor of choice.

Of course, you could also use a decent GUI client instead, if you feel intimidated by command line tools. :)

I always thought of Python as a language mostly useful for doing stuff on Linux machines. You know, like a pretty nice little language for scripting tasks but not useful for doing advanced stuff.

Well, I have changed my mind…This past year I have had the pleasure of working a lot in Python and I have come to like it very much. Python is the bomb!

So what I have done with it? Well…

  • I have built massively scalable cloud services running on Google App Engine.
  • I have built command line tools for intelligently packing sprite sheets, extracting font information, performing custom encryption, modifying XCode projects and more…
  • I even started working on a cross platform Git client application…

…and I must say that I have really enjoyed it very much.

Python has some really nice benefits:

  • The code is easy to read – very clean and compact
  • It is cross platform
  • Compiles to exe
  • Has lots and lots of third party libraries to do anything
  • Good integration with C
  • With Cython you can transform Python to fast performing C code
  • Interactive console (easy to try things out)
  • Large community
  • It is wrist friendly – no curly bracket typing :)

But is it really useful for any serious development? Well, you tell me! Many well known applications and services are written in Python such as GMail, Google Maps, Dropbox, Spotify and Mercurial to name a few…

But…can I really use it on Windows? Python originated in the Unix world and must therefore not work properly on Windows, right? Hmmm, I think this is the biggest hurdle for Python acceptance. Many people thinks it is hard to use on Windows and yes…that was my notion too until I tried it. As a matter of fact it is easy to install and it works really well. Let’s take a look!

How to get started – Python essentials

Before getting started, make sure you have Chocolatey – the lovely package manager for Windows.

I believe there are three essential things you need to have a basic Python setup on your system:

  1. Python itself
  2. A Python package installer
  3. Virtualenv for Python

The first two are the most important so I will leave the last out one for a future post.

Installing Python
With Chocolatey it is very easy to install the latest version of Python:

C:> cinst python.x86

This will install the 32 bit version which I recommend for now.

UPDATE: This command will install the latest version which is now Python 3.4. If you want to run an older version you can supply a version argument like this:

C:> cinst python.x86 -Version 2.7.6

This installs a specific version of Python.

Once this is done you can actually start writing code using any text editor you like. Notepad works but it is nice to have some syntax highlighting and I recommend either Sublime Text or Notepad++. Even better is to use a full featured IDE such as PyCharm which I use. It has more advanced features that makes your life easier!

Installing PIP
Once you start working with Python you realize that you sometimes need 3rd party packages. To install them you need the Python package manager PIP. You can install PIP using easy_install and easy_install can be installed with Chocolatey. Isn’t that lovely? :)

C:> cinst easy.install

After you have run this, close the command prompt window and open a new one to ensure that the environment variables are reloaded. Then run this:

C:> easy_install pip

When this is complete you are all set to go. You can now run Python stuff directly from the command prompt. You can even install Python scripts using pip if you just want to get some nice tools written in Python in this way.

What about virtualenv?
I told you there are three essentials you need on your Python system. The last thing is virtualenv. All packages you install with PIP will be installed globally unless you use virtualenv. For now, that is ok. In a future post I will explain how to use virtualenv to setup isolated development environments.

Using Python

To show a little example of how you can use Python, open up a text editor and paste in the following Python code. This short snippet will calculate the SHA-1 hashes of all files in the directory that you are in.

import hashlib, os
for f in os.listdir('.'):
  if os.path.isfile(f):
    print hashlib.sha1(open(f, 'rb').read()).hexdigest(), f

Save it out as hash.py and run it (from the command prompt) and it will show something like this:

27286076704f65d178baa46e66232a9e20d7e3dc diskstation-dir.bat
b8f7f52219520b64cac9c8ad49f533ed40cbbac8 fix-vs.py
6ebe48a8f124447600cb2bcf4135b5586933d0d2 import-itunes.bat
200b0b05edd6638aa43f1281f7a5f77bb2517303 killsteam.cmd
c86b8442e1e0561ca1715874a9d8fddda4a0d965 md5hash.py

 

That’s quite nice for only four lines of code. Python is often like that – few lines of code and high productivity. I like that.

For some reason Python is not very popular on Windows. I hope this will change in the future. If you have never tried it – give it a spin. You may like it!

 

As a long time Windows user I have always found it nice and easy to use UI’s to perform various tasks. However, some things are actually quite a bit nicer to do from the command line. One such thing is installing programs. Installation is mostly a tedious procedure where you have to go and find an installer to download, run the installer and carefully select not to install any crap with the software etc. It is a series of steps that differ from program to program and they are not especially fun – just something you have to do.

So what can we do about this? Well, we can use Chocolatey!

With Chocolatey all program installation become streamlined into a single command line operation. This means you have to think less about what to do, just look up the package you need and type it in. Package names are also easy to remember or maybe put in a text file so you have them when you need to re-install or upgrade apps. It’s very nice.

Chocolatey has been around for a few years and I actually tried it when it was released but found it to be a bit rough on the edges back then. Now it works much better! And the amount of packages available has also gone up considerably.

Installing Chocolatey

It is super easy to install Chocolatey. Just open up a command prompt (hit Windows+R, type cmd and press Enter) and then paste in the following:

C:\> @powershell -NoProfile -ExecutionPolicy unrestricted 
-Command "iex ((new-object net.webclient)
.DownloadString('https://chocolatey.org/install.ps1'))" && 
SET PATH=%PATH%;%systemdrive%\chocolatey\bin

 

Yeah, this is a mouthful of stuff. What it does is to run a Powershell command that downloads and runs a script that installs Chocolatey. This could have been done with a regular installer program but this is a way to promote installation via the command prompt to make it easy to get going.

Using Chocolatey

Once you have installed Chocolatey you can go ahead and install programs from the command prompt by simply typing cinst:

C:\> cinst vlc

 

This command will install the latest VLC for you in one simple step (you need to click Yes in the UAC prompt though). This works even if you already have VLC installed. Go ahead and try it now!

Just for fun I recorded a video of the installation procedure just to show you how easy it is to set it up.

 

So, there you have it – package management for Windows. With my Mac I use Homebrew and it is lovely and I’m really happy to see the same type of tool coming to Windows too. If you haven’t tried it yet – now is the time! :)

Here is a quick tip if you find the auto complete feature in AppCode to be a bit slow before it appears. I discovered that there is a setting that by default is configured to delay the auto complete popup by one second.

To change this just open up preferences and navigate to Editor->Code completion and adjust the delay as shown in the screenshot below:

Image

Hope this helps!

Follow

Get every new post delivered to your Inbox.