Issues with using UTF-8 Encoded Feature files in Cuke4Nuke

20. January 2010

UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It also is the default setting for some text editors such as Notepad++ on Windows and Textmate on the Mac. For Mac users, according to the manual, TextMate is heavily biased toward UTF-8.

I typically do all my text editing in Notepad++, and discovered only today at my client that the default of UTF-8 was causing a couple of strange things to happen when Cuke4nuke processed the feature files.

Using UTF-8 for Cuke4Nuke feature files will likely manifest itself as funky characters displayed before Feature in first line of the output. It's always the same 3 bizarre characters, so the reason is likely the header of the UTF-8 file. This is a minor issue though, and should not affect the function of the tests. You can see it clearly here in the Google example that comes with Cuke4Nuke:


The other, more serious, issue is that when I placed a tag such as @wip (i.e., indicating work in progress) on the first line of the feature file (before the Feature line) then using the --tags option would not work correctly for that tag. For example, with the following entry in our Nant build file I should have seen the feature we were working (tagged with @wip) being ignored in the CruiseControl.NET build, but that was not the case.

<exec program="${cuke4nuke}" workingdir="${ui.source.dir}Test">
<arg value="${ui.source.dir}Test\bin\Release\Test.dll"/>
<arg value="--tags ~@wip"/>

Cuke4Nuke would still try to execute all the scenarios in the feature file with the @wip tag on the feature, even though we had not yet completed any of the scenarios.

Richard Lawrence suggested I change the encoding of the files to ANSI (in Notepad++ just select Format->Encode in ANSI), and that did the trick. So it would seem that Cuke4Nuke is not currently handling UTF-8 encoded feature files correctly, and ANSI-encoded feature files are the way to go until this issue is resolved.


NOTE: Here is an update from Aslak Hellesøy:

Try to save as UTF-8 *without* BOM. I think that's possible with Notepad++. It's a good thing that TextMate defaults to UTF-8. Every time I try to use my name Aslak Hellesøy in an application - and it's rendered as garbage it's always because the developers have used a different encoding.

, ,