Wednesday, January 13, 2010

Encoding and File in .NET

  1. Contents can be stored in a file in various formats: ASCII, Unicode, BASE64, Binary or others.
  2. Unicode contents allows a optional byte order mark(BOM) to be placed at the beginning of the file to indicate the byte order and signal the unicode representation (utf8/utf16/utf32) of the content which can be used by a file consumer to detect the format.
  3. File contents can be represented in .NET program as text (string, char[]) or binary (byte[]). Different representations can be obtained using different IO classes. 
  4. StreamReader/StreamWriter classes are dealing with text contents. That means they need a way to transform text to and from stream. This is accomplished by the encoding object associated to the StreamReader/StreamWriter classe. The associated encoding object's constructor allows you to set if the BOM will be written to (StreamWriter) or auto detected (StreamReader) from the file. The unicode formate if detected will be used other than the one being associated.
  5. Byte array can be obtained from the file, for example, using File.ReadAllBytes. Then the GetString method of a encoding class can be used to convert it into text if you know the format used to store the file. You can use the GetBytes method of an encoding class to conver text content into encoded byte array.
  6. There are many ways to deal with files, but encoding almost always plays a role either behind the scene or used by your code explicitly.

No comments:

Post a Comment