Binary Serialization To / From String and Encoding
Recently somebody posted on the C# language newsgroup that they couldn't figure out how to convert an object to a string (and the reverse) since all the examples only showed how to write / read to a file.
I chimed in that I thought what the OP really meant was "how to convert a stream to a string" (as in using the BinaryFormatter for serialization), and so I posted the following sample:
Stream to string:
byte[] b = MyMemoryStream.ToArray();
string s = System.Text.Encoding.UTF8.GetString(b);
String to stream:
string s = "whatever";
byte[] b = System.Text.Encoding.UTF8.GetBytes(s);
MemoryStream ms = new MemoryStream(b);
Friend and fellow MVP Jon Skeet, who is pedantic to a fault, responded with this:
"That's a way which is almost guaranteed to lose data. Serialization with BinaryFormatter produces opaque binary data, which may very well not be a valid UTF-8 encoded string.
To convert arbitrary binary data to a string and back, I'd use Convert.ToBase64String and Convert.FromBase64String."
Jon is absolutely correct, and I suspect that many developers are not aware that just by choosing what one would "think" is a broad encoding, that we are guaranteed data integrity. Well, we are not.
The correct way (MSDN documentation links first:)
[MSDN] Convert.ToBase64String:
[MSDN] Convert.FromBase64String:
And, revised code sample:
Stream to string:
byte[] b = MyMemoryStream.ToArray();
string s = Convert.ToBase64String(b);
String to stream:
string s = "whatever";
byte[] b = Convert.FromBase64String(s);
MemoryStream ms = new MemoryStream(b);
I chimed in that I thought what the OP really meant was "how to convert a stream to a string" (as in using the BinaryFormatter for serialization), and so I posted the following sample:
Stream to string:
byte[] b = MyMemoryStream.ToArray();
string s = System.Text.Encoding.UTF8.GetString(b);
String to stream:
string s = "whatever";
byte[] b = System.Text.Encoding.UTF8.GetBytes(s);
MemoryStream ms = new MemoryStream(b);
Friend and fellow MVP Jon Skeet, who is pedantic to a fault, responded with this:
"That's a way which is almost guaranteed to lose data. Serialization with BinaryFormatter produces opaque binary data, which may very well not be a valid UTF-8 encoded string.
To convert arbitrary binary data to a string and back, I'd use Convert.ToBase64String and Convert.FromBase64String."
Jon is absolutely correct, and I suspect that many developers are not aware that just by choosing what one would "think" is a broad encoding, that we are guaranteed data integrity. Well, we are not.
The correct way (MSDN documentation links first:)
[MSDN] Convert.ToBase64String:
[MSDN] Convert.FromBase64String:
And, revised code sample:
Stream to string:
byte[] b = MyMemoryStream.ToArray();
string s = Convert.ToBase64String(b);
String to stream:
string s = "whatever";
byte[] b = Convert.FromBase64String(s);
MemoryStream ms = new MemoryStream(b);
People should implement their classes using the TextReader interface instead of StreamReader... And then instantiate a StringReader if they want to work with a string.
ReplyDeleteErm,
ReplyDeletepicky, picky! Does this really add value to the subject at hand?
But what to do if your string contains characters like $ or £?
ReplyDeleteUTF8Encoding can lose data whereas Convert.FromBase64String raises exception "Invalid character in Base-64 string".
Didn't mess it up for me:
ReplyDeletestring s = "$ or £?";
byte[] b = System.Text.Encoding.UTF8.GetBytes(s);
string s64 = Convert.ToBase64String(b);
BinaryFormatter bf = new BinaryFormatter();
MemoryStream ms = new MemoryStream();
bf.Serialize(ms, s64);
ms.Seek(0, 0);
BinaryFormatter bf2 = new BinaryFormatter();
object o= bf.Deserialize(ms);
string s8 = (string)o;
byte[] b4 = Convert.FromBase64String(s8);
string s5 = System.Text.Encoding.UTF8.GetString(b4);
Console.WriteLine(s5);
Console.ReadLine();
Thanks. It was the solution I was looking for.
ReplyDeleteJust what I needed, thanks.
ReplyDeleteThank you, valuable article for years to come :)
ReplyDeletePerfect example, as i know, there is a lot of people with a possible "bug" in the code.
ReplyDeleteGreat.
really thanks!
ReplyDeletegood job.
It worked for me. Thanks a lot!
ReplyDelete