Value Types, Reference Types, and writing with clarity!
Recently I served as a technical review editor for a book on C# and .NET. Among other issues, I noticed that the explanation "bullet points" describing "pass by value" and "pass by reference" semantics were not only unclear, they appeared to contradict each other. Quite annoyed, I wrote up a particularly scathing review comment and the authors took my advice (plus, I hope similar advice from other tech reviewers).
Unfortunately, when I read the section in the final published copy of the book, I suspect the authors may have jumped from the frying pan into the fire - they added more content, which instead of clarifying the issue and the major points, served to muck it up even more, in my opinion. A clear and unequivocal understanding of value types vs reference types in .NET is of the utmost importance.
Therefore, I present my own attempt. Einstein said that a theory should be as simple as possible, but no simpler. With that in mind (and to their credit, much of this relies on the MS Patterns and Practices whitepaper):
All .NET Framework data types are either value types or reference types.
Value Types
Memory for a value type is allocated on the current thread's stack. A value type's data is maintained completely within this memory allocation. The memory for a value type is maintained only for the lifetime of the stack frame in which it is created. The data in value types can outlive their stack frames when a copy is created by passing the data as a method parameter or by assigning the value type to a reference type. Value types are passed by value by default . "By Value" is when an argument is passed into a function by passing a copy of the value. In this case, changing the copy doesn't affect the original value,
If a value type is passed to a parameter of reference type, a wrapper object is created (the value type is boxed), and the value type's data is copied into the wrapper object. For example, passing an integer to a method that expects an object results in a wrapper object being created.
Reference Types
The data for reference type objects is always stored on the managed heap. Variables that are reference types consist of only the pointer to that data. The memory for reference types such as classes, delegates, and exceptions is reclaimed by the garbage collector when they are no longer referenced. It is important to know that reference types are always passed by reference. "By Reference" is when an argument is passed to a function by passing a reference to the actual value. In this case, if you change the argument in the function, you also change the original.
If you specify that a reference type should be passed by value, a copy of the reference is made and the reference to the copy is passed *.
Additional Notes on VB.NET:
Boxing in Visual Basic .NET tends to occur more frequently than in C# due to the language’s pass-by-value semantics and extra calls to GetObjectValue. Use the DirectCast operator to cast up and down an inheritance hierarchy instead of using CType. DirectCast offers superior performance because it compiles directly to MSIL. Also, note that DirectCast throws an InvalidCastException if there is no
inheritance relationship between two types.
Further, it should be noted that in the .NET Framework 2.0, Generics provide for a much more efficient mechanism to avoid the overhead of boxing, particularly with Collections.
I think the above is both simple and elegant. It has sufficient information to cover the most important points, but not "too much information". It is presented clearly, and it does not assume that the reader already knows the definitions of key terms that are used. I can understand what I wrote, and I suspect most others can.
Why can't book authors learn to do this? Developers buy and read technical books in the hopes of receiving clarity, not muck.
* Note that in the case of the difference between passing a value object by reference and a reference object by value, as noted by Bruce Wood in his comment below, MVP Jon Skeet (whose writing I much admire because he understands the word "clarity" as it applies to writing) illustrates here. In particular the finer point is, as Jon describes, "This difference is absolutely crucial to understanding parameter passing in C#, and is why I believe it is highly confusing to say that objects are passed by reference by default instead of the correct statement that object references are passed by value by default."
Bruce's other comment clarifying the finer distinction of where memory is allocated for value types based on whether they are class fields vs. local variables or method arguments should also be noted.
Unfortunately, when I read the section in the final published copy of the book, I suspect the authors may have jumped from the frying pan into the fire - they added more content, which instead of clarifying the issue and the major points, served to muck it up even more, in my opinion. A clear and unequivocal understanding of value types vs reference types in .NET is of the utmost importance.
Therefore, I present my own attempt. Einstein said that a theory should be as simple as possible, but no simpler. With that in mind (and to their credit, much of this relies on the MS Patterns and Practices whitepaper):
Value Types and Reference Types
All .NET Framework data types are either value types or reference types.
Value Types
Memory for a value type is allocated on the current thread's stack. A value type's data is maintained completely within this memory allocation. The memory for a value type is maintained only for the lifetime of the stack frame in which it is created. The data in value types can outlive their stack frames when a copy is created by passing the data as a method parameter or by assigning the value type to a reference type. Value types are passed by value by default . "By Value" is when an argument is passed into a function by passing a copy of the value. In this case, changing the copy doesn't affect the original value,
If a value type is passed to a parameter of reference type, a wrapper object is created (the value type is boxed), and the value type's data is copied into the wrapper object. For example, passing an integer to a method that expects an object results in a wrapper object being created.
Reference Types
The data for reference type objects is always stored on the managed heap. Variables that are reference types consist of only the pointer to that data. The memory for reference types such as classes, delegates, and exceptions is reclaimed by the garbage collector when they are no longer referenced. It is important to know that reference types are always passed by reference. "By Reference" is when an argument is passed to a function by passing a reference to the actual value. In this case, if you change the argument in the function, you also change the original.
If you specify that a reference type should be passed by value, a copy of the reference is made and the reference to the copy is passed *.
Additional Notes on VB.NET:
Boxing in Visual Basic .NET tends to occur more frequently than in C# due to the language’s pass-by-value semantics and extra calls to GetObjectValue. Use the DirectCast operator to cast up and down an inheritance hierarchy instead of using CType. DirectCast offers superior performance because it compiles directly to MSIL. Also, note that DirectCast throws an InvalidCastException if there is no
inheritance relationship between two types.
Further, it should be noted that in the .NET Framework 2.0, Generics provide for a much more efficient mechanism to avoid the overhead of boxing, particularly with Collections.
I think the above is both simple and elegant. It has sufficient information to cover the most important points, but not "too much information". It is presented clearly, and it does not assume that the reader already knows the definitions of key terms that are used. I can understand what I wrote, and I suspect most others can.
Why can't book authors learn to do this? Developers buy and read technical books in the hopes of receiving clarity, not muck.
* Note that in the case of the difference between passing a value object by reference and a reference object by value, as noted by Bruce Wood in his comment below, MVP Jon Skeet (whose writing I much admire because he understands the word "clarity" as it applies to writing) illustrates here. In particular the finer point is, as Jon describes, "This difference is absolutely crucial to understanding parameter passing in C#, and is why I believe it is highly confusing to say that objects are passed by reference by default instead of the correct statement that object references are passed by value by default."
Bruce's other comment clarifying the finer distinction of where memory is allocated for value types based on whether they are class fields vs. local variables or method arguments should also be noted.
I agree with your comments on value types. I question your comments on the reference type assuming I read the comment correctly. This may be due to a difference in C# and VB. Using by value (ByVal in VB) with a reference type, passes the pointer to the reference type on the heap, and using by Reference (ByRef) passes a pointer to the pointer for the reference type on the heap. In either case, changes to the object can be seen by the client code. The difference between the two methods of passing a reference type: with byval, if the argument is changed to a different instance of the class, the client code does not see the change; with byref, the client code sees the different instance of the class, and the original instance's pointer is lost. IMO, for reference types, the only difference between byval and byref is if the called method changes the instance of the class passed.
ReplyDeletePlease correct me if I am wrong, or misunderstood your comment.
The post was specifically targeted at the C# language since the book I was "complaining" about was specifically about C#.
ReplyDeleteBesides some semantic differences in VB.NET, such as being able to force parameters to be passed by value, regardless of how they are declared by enclosing the parameters in extra parentheses, I'm not aware of any differences. But then, VB.NET has so much baggage inherited from years back that it might be hard to tell!
I am really trying to get a handle on this subject and everything I read seems to be different. Based on my testing this statement is not true: "If you specify that a reference type should be passed by value, a copy of the reference is made and the reference to the copy is passed."
ReplyDeletePassing by either byvalue or byref allows the original object to be changed in the called method. The difference is:
void ChangeInstance(Person p)
{
p = New Person
}
does not change the original instance of p, whereas,
void ChangeInstance(ref Person p)
{
p = New Person
}
changes the original instance of p to the new instance.
These methods,
void ChangeProperty(Person p)
{
p.FirstName = "John"
}
void ChangeProperty(ref on p)
{
p.FirstName = "John"
}
produce the same result.
It is the same for both vb and c#. At least that's the way I understand it.
In addition, I see no need to pass a reference type byref. If I am going to change the instance, I prefer to use a function and return the new object rather than change an argument.
Your comments explaining where I have this wrong would be greatly appreciated.
void ChangeProperty(ref on p)
ReplyDeleteshould be
void ChangeProperty(ref Person p)
Re: "I am really trying". Let's clean that up a bit (results are inline in the code as comments):
ReplyDeleteusing System;
namespace PersonTest
{
class Class1
{
[STAThread]
static void Main(string[] args)
{
Person p = new Person() ;
p.FirstName = "George";
ChangeInstance(p);
Console.WriteLine("ChangeInstance: "+p.FirstName );
Person p2 = new Person() ;
p2.FirstName = "George";
ChangeInstance(ref p2);
Console.WriteLine("ChangeInstance [ref]: "+p2.FirstName );
Person p3 = new Person() ;
p3.FirstName = "George";
ChangeProperty(p3);
Console.WriteLine("ChangeProperty: "+p3.FirstName );
Person p4 = new Person() ;
p4.FirstName = "George";
ChangeProperty(ref p4);
Console.WriteLine("ChangeProperty: "+p4.FirstName );
Console.ReadLine();
// Results are:
//ChangeInstance: George
//ChangeInstance [ref]: Yoda
//ChangeProperty: John Change property
//ChangeProperty: John Change property (ref)
}
static void ChangeInstance(Person p)
{
p = new Person();
}
static void ChangeInstance(ref Person p)
{
p = new Person();
p.FirstName ="Yoda";
}
static void ChangeProperty(Person p)
{
p.FirstName = "John Change property";
}
static void ChangeProperty(ref Person p)
{
p.FirstName = "John Change property (ref)";
}
}
class Person
{
public string FirstName;
}
}
The results are what I expected. Thanks for the clear example. This is the statement that was confusing me from the original "unblog".
ReplyDelete"If you specify that a reference type should be passed by value, a copy of the reference is made and the reference to the copy is passed."
I was reading that as: a copy of the reference (as in a deep clone) is made, instead of a copy of the reference variable (the pointer) is made. Anyway, it's crystal clear now. Thanks.
First, I have to agree with anonymous. The statement
ReplyDelete"If you specify that a reference type should be passed by value, a copy of the reference is made and the reference to the copy is passed."
is incorrect. I think that Jon Skeet does an excellent job of explaining this on his page on parameter passing:
http://www.yoda.arachsys.com/csharp/parameters.html
In particular, he takes pains to note that reference types are not, by default, "passed by reference" but instead references, by default, are passed by value. He goes into detail by what he means by that, but I think that the distinction is worth it.
The other statement that isn't correct is this:
"Memory for a value type is allocated on the current thread's stack."
Yes and no. Memory for a local variable or argument that is a value type is stored on the stack. Memory for a class field that is a value type is stored on the heap, in the memory allocated for the class instance. So, your statement is true for local variables and method arguments, but not for fields in reference types.
Bruce,
ReplyDeleteGood point, and yes I have not only ready Jon's piece I actually Teleport-ed hist whole section onto my HD and "FarHTML"-ed it into a searchable CHM, because the whole thing is so well written.
Unfortunately, BOTH of the short statements you have corrected (and i do not dispute the corrections) were both lifted - verbatim- from the MS Patterns and Practices whitepaper!