Non-nullable in C#

Started by namida, June 01, 2021, 07:30:06 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

namida

Edit Simon: Split off Simon blogs
See also: D class references may be null




Here's something I just discovered that I'd be interested to hear Simon's thoughts on.

In C# 7 and above (or might've been 8?), support for explicitly nullable reference types was introduced. If a variable of a non-nullable reference type is not initialized, or you attempt to assign null to it, a compiler warning is generated (it still ultimately compiles, so this is only loosely enforced). If a variable of a nullable reference type is referenced, without a null check of some kind (or another reason it can't possibly be null, eg. if it was assigned earlier in the same block), this also generates a compiler warning. In cases where the compiler is wrong, it is possible to suffix the variable name with an ! to indicate "the variable will not be null here".

Here is a sort of example - although it may be the case that some of the warnings don't occur due to being in the same code block as where the variables are defined / assigned, so take with grain of salt, try using class variables instead of local ones if expected results don't occur. (C# should be similar enough to C / C++ for Simon to figure out what's going on here.)

private void Blah()
{
  SomeClass var1 = null; // will generate a warning
  SomeClass? var2 = null; // this doesn't, because var2 is a nullable type
 
  var1.DoThing(); // won't generate a warning - it is assumed var1 cannot be null
  var2.DoThing(); // will generate a "possible dereference of null variable" warning
  if (var2 != null)
    var2.DoThing(); // won't generate a warning due to the null check
  var2!.DoThing(); // won't generate a warning as ! indicates "don't check for possible null dereference"

  var1 = var2; // will generate a warning - var2 may be null, and var1 "cannot" accept null
  if (var2 != null)
    var1 = var2; // this is fine, because of the null check
  var1 = var2!; // won't generate a warning as ! indicates "don't check for possible null dereference"
}


The amusing part arises from that the following is taken to be valid code, and does not generate a warning or error of any kind:

private void Blah()
{
  SomeClass var1; // not nullable type
  var1 = null!; // does not generate a warning - because even though null is always null, the ! specifies it won't be null here?
}
My projects
2D Lemmings: NeoLemmix (engine) | Lemmings Plus Series (level packs) | Doomsday Lemmings (level pack)
3D Lemmings: Loap (engine) | L3DEdit (level / graphics editor) | L3DUtils (replay / etc utility) | Lemmings Plus 3D (level pack)
Non-Lemmings: Commander Keen: Galaxy Reimagined (a Commander Keen fangame)

Simon

#1
Yes, I've heard about this shift in modern C# that the typename by itself declares a non-null reference. This is ideal.

Declare stuff non-null wherever possible, it's the most common case.

Functions should take non-null parameters wherever possible. You can assume that the caller is also working mainly with non-nullable types. If he doesn't, it should be his burden to cast to non-nullabe before he calls you, especially if your function would immediately return on null. The caller shouldn't even call you then, and the solution is to take only non-null.

Some class member might be occasionally null during normal operation. Declare it nullable then, and then check for null when it appears in a method for the first time. C# has good code-flow analysis for this, this is excellent.

x?.foo() is more idiomatic than if (x != null) x.foo() if you call only this single method. I don't know if the nullable C# type also behaves like a collection that has either 0 or 1 elements; if it does: Idiomatic Scala: Your Options Do Not Match

Interesting that null violation is merely a warning in C# at compile time, I assume to keep back-compat with the heaps of earlier code. Treat this as an error in any code you touch, at least.

Avoid !, it's the cast from nullable to non-null. It's okay at system boundaries (things from the network, interfacing with older libraries). ! is a hack to quickly work with legacy code that you didn't yet refactor to proper null/non-null types.

It's surprising, but at least consistent that null! is allowed. I'd have to better understand the C# culture and history of the shift to non-nullable to argue how much the compiler should track stuff near !. Looks like the compiler tracks only (whether stuff can be null), not (whether it can be non-null)? Or maybe it's not worth it to have a special case for exactly null!.

Can you get the compiler to print the type of an expression? null often has a special type in such languages that rarely appears in code, but it would be interesting what the compiler thinks is the type of null!. Probably null is of one of the types
typeof(null)
typeof(null!)?
and, consistently, null! is of the type
typeof(null!)
that apparently is then not empty (null! belongs to this type) but it promises things that it can't eventually hold.

Really, it looks like T x = null! is perfectly fine for the compiler because T can accept null after all, the warnings (without !) are merely to help you at compile time, not to make the types work any different in the bytecode. In the bytecode, T and T? will probably be the same?



Very nice and fun quirks, thanks for looking into it!

-- Simon

namida

QuoteAvoid !, it's the cast from nullable to non-null. It's okay at system boundaries (things from the network, interfacing with older libraries). ! is a hack to quickly work with legacy code that you didn't yet refactor to proper null/non-null types.

Also for cases the compiler does not properly handle, or perhaps cannot. See the following examples:

class SomeClass
{
  private SomeOtherClass? _Blah; // nullable reference type

  public SomeClass()
  {
    SetupBlah();
    _Blah!.DoThing(); // this will otherwise generate the "possible null dereference" warning, though as humans we know SetupBlah will have assigned a value
  }

  private void SetupBlah()
  {
    _Blah = new SomeOtherClass();
  }

  private void AnotherExample()
  {
    SomeOtherClass?[] arrayOfOtherClass = new SomeOtherClass[5]; // you get a compiler error if you write "new SomeOtherClass?[5]"
    for (int i = 0; i < 5; i++)
      arrayOfOtherClass[i] = new SomeOtherClass();
    arrayOfOtherClass[2]!.DoThing(); // warning would otherwise occur here. Array support is bad.
    if (arrayOfOtherClass[3] != null)
      arrayOfOtherClass[3]!.DoThing(); // warning would otherwise occur here. Array support is REALLY bad.
  }
}


Lists / etc all have the same issue as arrays, it's not particular to arrays only.
My projects
2D Lemmings: NeoLemmix (engine) | Lemmings Plus Series (level packs) | Doomsday Lemmings (level pack)
3D Lemmings: Loap (engine) | L3DEdit (level / graphics editor) | L3DUtils (replay / etc utility) | Lemmings Plus 3D (level pack)
Non-Lemmings: Commander Keen: Galaxy Reimagined (a Commander Keen fangame)

Simon

#3
With your example of lazy initialization in SetupBlah, you know an invariant of the class that the compiler doesn't know, yeah.

Nonetheless, my hunch is to refactor this so that the compiler can see it. You get to avoid ! and the compiler can check the invariant. Your example is easy enough so that I can let SetupBlah() return the SomeOtherClass instead of setting the private member:


class X { ... }
class Y {
    private X? _Blah;

    public Y()
    {
        _Blah = SetupBlah();
        _Blah.DoThing();
    }

    private X SetupBlah()
    {
        return new X();
    }
}


For more complex cases, I can imagine some tasks that such a refactoring must address:

  • You have several fields that must be lazily initialized, but it's okay to initialize them all at the same (later) time. Consider to group these fields in a new class XGroup that eagerly initializes the fields, and then use the same SetupBlah that now initializes an XGroup instead of a single X.
  • You have several fields that must be lazily initialized, and they will be initialized at different times. No good quick idea. Maybe you can split the original class because it's doing too much. But even then, it's still likely that somebody else must solve the problem.
  • There are several methods that each require prior initialization. Then one idea may be:

class X { ... }
class Y {
    private X? _Blah;

    public Y() {}

    public void operation1()
    {
        _Blah = SetupBlah();
        _Blah.DoThing();
    }

    public void operation2()
    {
        _Blah = SetupBlah();
        _Blah.DoThing();
        _Blah.DoThing();
    }

    private X SetupBlah()
    {
        if (!_Blah) {
            _Blah = new X();
        }
        return _Blah;
    }
}





With the array: You start with this code: X?[] arr = new X[5]; and then overwrite the 5 nulls.

Can you avoid creating the entire array before you fill it? Can you instead have a list of X (not of X?) that is initially empty, then append 5 times the new X, then call toArray on the list? You'll avoid ! and you also avoid an overwriting assignment; your example overwrites the null references with references to new X later.

In a pinch, can you capture all necessary ! usage inside a new function that returns the finished X?[], casting the array to X[] as it returns?

-- Simon

namida

QuoteCan you avoid creating the entire array before you fill it? Can you instead have a list of X (not of X?) that is initially empty, then append 5 times the new X, then call toArray on the list? You'll avoid ! and you also avoid an overwriting assignment; your example overwrites the null references with references to new X later.

Array support for this feature is very bad. Even with an explicit null check (as in the last example above), it will still return the possible null dereference warning. How the array is created / filled does not matter (though it would certianly be possible, just in general, to do what you're saying - it just wouldn't make any difference). The only way to get correct behavior would be as follows:

private void BlahyMcBlahface()
{
  SomeOtherClass?[] theArray = CreateArraySomehow();
 
  if (theArray[2] != null)
    theArray[2].DoThing(); // gives warning - this is just here to make it very clear which situation I am talking about
 
  SomeOtherClass? notAnArray = theArray[2];
  if (notAnArray != null)
    notAnArray.DoThing(); // this one is handled correctly and does not generate a warning
}


Again, all of the above only generates compiler warnings. C# does not enforce the nullability stuff.
My projects
2D Lemmings: NeoLemmix (engine) | Lemmings Plus Series (level packs) | Doomsday Lemmings (level pack)
3D Lemmings: Loap (engine) | L3DEdit (level / graphics editor) | L3DUtils (replay / etc utility) | Lemmings Plus 3D (level pack)
Non-Lemmings: Commander Keen: Galaxy Reimagined (a Commander Keen fangame)

Simon

#5
Quote from: namida on June 03, 2021, 05:31:53 AM
private void BlahyMcBlahface()
{
    X?[] theArray = CreateArraySomehow();
    // ...


Can you change the return type of CreateArraySomehow() to X[] instead of X?[], and then declare
X[] theArray = ...
instead of
X?[] theArray = ...
?

Quote
if (theArray[2] != null)
    theArray[2].DoThing(); // gives warning


Looks like the compiler tracks non-nullability of the reference variables, but not of the expressions that result in references. We know that theArray[2] will produce the same reference both times, but I feel this is already a stretch to demand this knowledge from the compiler. E.g., you might have an index variable instead of 2, or a function that produces the index. The compiler would have to analyze the flow much more deeply to guarantee you the non-nullness.

QuoteSomeOtherClass? notAnArray = theArray[2];
if (notAnArray != null)
    notAnArray.DoThing(); // this one is handled correctly and does not generate a warning


It's a stretch to write this when the index is really the literal 2.

But the more complex the indexing is, the more I do this for clarity that, yes, I mean the same single notAnArray every time.

-- Simon

namida

QuoteCan you change the return type of CreateArraySomehow() to X[] instead of X?[], and then declare
X[] theArray = ...
instead of
X?[] theArray = ...
?

What if I specifically need to allow for some elements to be null?
My projects
2D Lemmings: NeoLemmix (engine) | Lemmings Plus Series (level packs) | Doomsday Lemmings (level pack)
3D Lemmings: Loap (engine) | L3DEdit (level / graphics editor) | L3DUtils (replay / etc utility) | Lemmings Plus 3D (level pack)
Non-Lemmings: Commander Keen: Galaxy Reimagined (a Commander Keen fangame)

Simon

#7
Nullable elements wasn't a requirement in reply #2. I assumed you wanted non-nullable elements in reply #4, too.

! seems spurious even here. With nullable elements, if the usage is really 1-2 functions, call through ?. instead of an extra if.

Longer usages, introduce the nullable ref first and bind it to array element, test it explicitly for non-null, and then call it !-free with compiler's blessing.

Look for standard library support for functional chains, e.g., LINQ. A first rewrite is:
Beginning with the array of X?,
filter for non-null,
map from X? to X by (e) => e!,
for each unconditionally call DoThing.

This isn't !-free because filtering doesn't change the type. Maybe the library has a special filter just for this, that filters for non-null and also hides the cast via ! for convenience? The reason is that in such functional chains, (filtering for non-null and then casting the remaining to the non-null type) goes hand-in-hand. It would be a candidate for a library function.

-- Simon

Simon

#8
Hmm, in the end, if you want nullable elements in the array, you can't avoid testing for null. Someone has to do it, either you do it explicitly, or you use an abstraction that decides for you. Maybe the C# ecosystem and the compiler make ! less of an anti-pattern than frequent casting typically is. You're in deeper than I am, and you know more about the ecosystem and its shortcomings.

Instead of nulls in the array, another alternative is the Null Object Pattern: Write a subclass that does nothing in the methods that you want to call, and put objects of this class into the array whenever you had the nulls originally.

-- Simon

namida

QuoteHmm, in the end, if you want nullable elements in the array, you can't avoid testing for null. Someone has to do it, either you do it explicitly, or you use an abstraction that decides for you. Maybe the C# ecosystem and the compiler make ! less of an anti-pattern than frequent casting typically is. You're in deeper than I am, and you know more about the ecosystem and its shortcomings.

I get that a check is necessary, and don't disagree at all - I just find it amusing that in cases where it's using the built-in array functionality (not a custom array class), and constant indexes, the compiler can't keep track of this and thus generates a warning when it shouldn't. Example:

class SomeClass
{
  private SomeOtherClass? _SingleField;
  private SomeOtherClass?[] _ArrayField;

  public SomeClass()
  {
    _SingleField = <we do not care what's here. May be null or non-null>;

    _ArrayField = new SomeOtherClass[5]; // the array itself is initialized
    for (int i = 0; i < 5; i++)
      _ArrayField[i] = <we do not care what's here. May be null or non-null>;
  }

  public void Blah()
  {
    _SingleField.DoStuff(); // gives warning, as it should
    _ArrayField[0].DoStuff(); // gives warning, as it should

    if (_SingleField != null)
      _SingleField.DoStuff(); // does not give warning, as we've checked for null
    if (_ArrayField[0] != null)
      _ArrayField[0].DoStuff(); // does give warning, even though we've checked for null
  }
}
My projects
2D Lemmings: NeoLemmix (engine) | Lemmings Plus Series (level packs) | Doomsday Lemmings (level pack)
3D Lemmings: Loap (engine) | L3DEdit (level / graphics editor) | L3DUtils (replay / etc utility) | Lemmings Plus 3D (level pack)
Non-Lemmings: Commander Keen: Galaxy Reimagined (a Commander Keen fangame)