[Show/Hide Left Column]

OEM and Ansi codepages vs Unicode

NOTE: The explaination below is not up-to-date with version 2.0 and above, which now supports Unicode. Most of the page was written with version 1.x of ZTree in mind.

General issue

Windows GUI programs typically use different character sets from DOS/console programs. DOS/console character sets are known as OEM character set (or IBM PC (OEM) code pages(external link), and include various line drawing characters and miscellaneous characters not in the Windows/ANSI sets. It contains 256 characters. Most common character set is OEM 437(external link). Windows/ANSI character sets introduced by Microsoft include many accented characters NOT in the DOS/OEM character set. Windows ANSI sets are loosely based on the ISO 8859-1 standard, but there are (as usual) many criticisms(external link) against the way Microsoft tweaked the standard to fits its needs. The most common set is Windows-1252(external link) and contains 256 code points, but only 218 displayable characters. ZTreeWin can use both character sets, but not two at the same time. Because ZTreeWin is a console application, it uses natively OEM character set, which is how it displays borders with line drawing characters. Windows GUI uses natively ANSI character set. To avoid differences in filenames from an user point of view, ZTreeWin translates by default all ANSI characters to OEM equivalent and displays filenames the same way other Windows applications would do. This works 99.99% of the time because many ANSI characters have an OEM equivalent even if it is with a different binary representation. Obviously, there are shortcomings in the translation process since, as we see in the table below, there are some ANSI characters that can't be represented with OEM (and vice-versa). For most English speakers, this is not a frequent problem as all ANSI common symbols have equivalent in OEM.
ANSIbinaryOEMbinary
A65A65
é233é130
É201É144
Ê202 no equivalent
When this happens, ZTreeWin isn't able to handle the file as expected. Typically, it will issue an Error 02: File not found for the mistranslated file and no operation can be conducted on that file. In such cases, ZTree has the /O command line switch to DISABLE the translation. When ZTree is launched with that switch, filenames are displayed using OEM characters without translation. This means ANSI binary position are taken as such, without trying to find OEM equivalent. It may cause some characters to be displayed with the wrong character, but ZTree is at least capable of handling all directories and files.
ANSIbinaryOEMbinary
A65A65
é233θ233
É201╔201
Ê202╩202
Again, for most English speakers, this is usually not a big issue as untranslated characters are usually seldom-used symbols, but it may become quit annoying for a French speaker or a German speaker to see many accentuated characters common in those languages not being displayed correctly. It is even more annoying when it is not possible under /O mode to type filenames with accentuated characters. Now come two complications.

Multiple codepages

Unfortunately, there is not a unique OEM character set. On the contrary, there are many of them! Each language / regions typically has its own character set which contains special symbols and letters required in that language. These variations are called codepages. To makes things worst, the order of some characters differ from one set to another. Finally, there are sometimes many variants for a same languages, usually competing codepages developed at different times and under different OS. This makes translation from one set to another even more difficult. In the examples above, translation is only between code ANSI-1252 and OEM-437. You can actually change code pages under Windows shell with command _chcp__ (see Windows doc). When you change the codepage, the OEM characters won't be displayed the same way. But wait! There are also different ANSI codepage too! The number of different character sets is amazing. See the third tab of your Regional Settings configuration panel to have an idea of the potential mess, and the inherent impossibility to design a perfect translation scheme between OEM and ANSI. If you want to see what your current codepages are, and what characters can be displayed, download OEM_ANSI2UTF.exe (45.06 Kb). This works best if you create a shortcut, and set font to Courier New or Lucida Console

Unicode

Both OEM and ANSI systems were limited in size to 256 characters. They do not allow many characters. This is exactly why different code pages were created. But juggling between different code pages is not easy. Moreover, many languages, especially Asian languages, requires more than characters than the 256 positions allowed by ANSI and OEM sets. As a result, there are languages that can't be handled easily with these systems, even with different code pages. So Windows adopted the Unicode system(external link), a new gigantic, cross-platform character set system that can theoretically hold thousands of characters (again, Microsoft has been criticized for a proprietary implementation of an international standard, but this is another issue...). Unicode allows the writing of Greek, Hebrew, Japanese and many other non-Western alphabets within the SAME set. Windows Explorer supports a limited version of Unicode for most of its component, and thus allow filenames and directories to be written with Unicode characters. Unfortunately, Ztree does not support Unicode and obviously the vast majority of Unicode characters can't be translated under either OEM or ANSI. As a result, Ztree is literally a dead duck when it comes to handling of Unicode filenames. It's either gibberish or Error 123: Invalid filename. There is no "magic switch" this time to handle these filenames with ZTreeWin. You must use Windows Explorer for these special cases. The UnicodeAdapter utility can help accessing the offending file from the ZTreeWin Application Menu (F9).

Illustrations

Image Filenames under Windows Explorer
  • First filename uses ANSI with accented letters
  • Second filename uses Unicode (Greek letters)
  • Third filemame also uses ANSI but only with common characters
Image Filenames under ZTreeWin (ANSI translated to OEM = default)
  • The Normal filename.doc has been translated correctly from ANSI to OEM with no visible change. Ztree will handle this document without any problem.
  • The Weird filename however is not translated correctly. In Windows Explorer, the two apostrophes are different, one being straight, the other being curly typographic. The ANSI->OEM translation however result in both of them being straight apostrophe. As a result, Ztree won't be able to handle this file. This is unfortunately a pretty common error with some foreign languages. If I write a new bestseller in French and save my manuscript with Word as "L'été était chaud" (lit. "The summer was hot"), I'll end up with a typographic apostrophe, and consequently with problems under Ztree. This happens daily. Other source of error is *.url bookmarks from websites with unique ANSI / ISO 8859-1 symbols in the title page).
  • As for the filename with Greek characters, it's just gibberish and Ztree can't handle it.
Image Filenames under ZTreeWin (OEM without translation = /O switch)
  • Under this mode, the Weird filename is now displayed with OEM characters without translation. You'll see that there are no logical equivalent between ANSI and OEM (the copyright symbol now is a registered symbol, other accentuated characters are gibberish), but at least Ztree will be able to handle the file. You can delete it, move it, and even rename it (usually to remove offending characters). It may however be very difficult to rename it using accentuated characters. If the file as to be renamed as Lettre à François.doc, how can somebody 1) remember that "ç" is replaced by "icelandic theta" switch and 2) enter that icelandic theta on the keyboard? This counter intuitive situation limits the use of /O switch mainly to error-fixing operations.
  • Unicode characters are still gibberish however, and Ztree still can't handle them. Nothing Ztree can do about this file. You're doomed to Windows Explorer.

How to deal with these issues

One way around the problem is to open Explorer, and rename or delete the file. ))EXecute(( the following to open Explorer positioned at the file (does not work properly in Win98 due to OS bug). These will not work if ZTree can not handle the file name, but if you select a file next to the offending file, you will be close to the target. (Make sure ZTree is sorted by Name first.) NT, 2K, XP
Explorer /n, /root, %2:%3, /select, %4.%5
9x, ME
Explorer /e, /select, %1
Better still include these in the Application Menu, as they can be handy. More complete versions of these are in the Sample ZTree Application Menu F9 If you come across files that neither ZTree, nor any other Windows program, can delete, try the following: For FAT32 drives: Simply reboot using Win98's DOS, then run XTree to find and delete the files. I have all my Win2k machines set up so that this "Win98 DOS, then XTree" option is offered on my OS selection menu every time I boot. That same option automatically loads the CuteMouse driver, then XTree, so I can more easily navigate through XTree, and any other mouse-aware DOS programs I might load. The functional equivalent of this "Win98 DOS, then XTree" option, can, of course, be effected by creating, and setting your BIOS to boot from, a "Win98 DOS, then XTree" boot CD--or, floppy disk. For NTFS drives under WinXP--or Win2k: Boot with BartPE. (See its documentation for the proper command line switches.) Then run XTree to find and delete the files. From their web page: "The PE Builder program (pebuilder.exe) runs on Windows 2000/XP/2003/BartPE. It does not run on Windows NT4/ME/9x." Thank you, "Ryan," whose 7/15/06 ZTree Forum post I added to, to create the above.)

Contributors to this page: admin , vor0nwe , laurent , khenkel , ian and ftwrks .
Page last modified on Sunday 08 of November, 2009 14:26:01 EST by admin.