Understanding Streams in .NET

Understanding Streams in .NET

C++ developers have many hobbies and one of their most beloved is to make fun of Visual Basic. They are usually very wrong and miss the point completely, but where there is smoke, you can usually find a small brush fire. Visual Basic 6 and earlier versions attempted to help software developers by releasing them from the burden of “code management” so they can focus on what really matters, the business rules and interface. Efficiency in software development translates into real money for organizations and Visual Basic is explicitly designed for the rapid application development environment. Still, not every aspect of Visual Basic succeeded in meeting this goal.

File management was notoriously cumbersome, and as an extension, memory buffer management was virtually non-existent. Visual Basic was not designed for low-level data manipulation, but the reality of most programming projects is that data often must be transformed or manipulated in some other fashion while traveling from point A to point B.

Stream-based objects are very convenient as the data moving through the stream can be manipulated. In fact data can move from stream to stream, transformed in every case until the final destination is reached. This abstract stream concept is useful for rapid application development as the developer is isolated from complex byte manipulation. Visual Basic 6 did not include native stream support but the .NET Framework depends upon streams as a core technology and third-party vendors are using streams to extend the capabilities of .NET. Every .NET developer should have a complete understanding of how streams work and how those streams can be leveraged to maximize software development efficiency.

The Stream class in .NET is an abstract class that is located in the System.IO namespace. As the Stream class uses an "abstract" modifier, the class is considered incomplete. What this means is that the class is intended to be used as a base class and can not be instantiated on its own. In fact, if a "new" modifier is used on the Stream class the compiler will throw an Exception. Even if the Stream class was not marked as abstract, as the purpose of the class is to only be a generic view of a sequence of bytes, the usability of the Stream class on its own is limited.

Microsoft has created a number of Stream-based classes that are contained in System.IO such as FileStream, MemoryStream, BufferedStream, BinaryReader, and BinaryWriter, while other namespaces contain such classes as NetworkStream and CryptoStream. These Stream-based classes are designed for data buffer management, file I/O, networking, and cryptography. As Microsoft describes in their documentation, .NET Streams consist of three fundamental operations (of course Stream-based classes can be extended to whatever a user requires). First, a Stream can be read from. Reading is defined as transferring data from the Stream to another location such as a byte array or any other construct that can hold data. The second operation is writing. This operation describes the movement of data from some data source such as a byte array to the Stream. The last fundamental operation is the ability to seek in the Stream. Seeking is the ability to set (or retrieve) the current position in the stream. The ability to seek is dependent on whether the Stream has a backup store. A backup store is some type of storage mechanism such as a file or memory where you know data may be present and a certain portion of that data can be requested. While most Stream-based classes are designed around certain types of backup stores, there are exceptions. For example, a NetworkStream does not have a backup store. Network data is unknown until the data has been received and as such can not be directly queried. As a side note, the most common approach to working with network data is to buffer up the data while receiving using another stream and then search that buffered data for what you need.

Before moving to a discussion of Microsoft's Stream-derived classes, the base class members need to be explained as any class that is derived from a base Stream class will override the base members (a requirement of an "abstract" class) that are themselves marked "abstract." If a Stream-based class does not need to use a member of the base class, the member must still be overridden, but a NotSupportedException error will occur when a user attempts to use that member. To alleviate having to look for those exceptions, a user can either check the documentation for the class, or check a few properties for supported functionality.

The best example for this scenario is the case of a Stream-based class that does not use a backup store, such as a NetworkStream. The Stream class has several properties that are used to specify the capability of a Stream. They include CanRead, CanSeek, and CanWrite. The CanRead property is a boolean identifying if the current stream supports reading. If the property is false, methods such as Read and BeginRead will throw a NotSupportedException error. The CanWrite property is a boolean identifying if the current stream supports writing. If the property is false, methods such as Write and BeginWrite will throw a NotSupportedException error. The CanSeek property also falls into this category. When CanSeek is false, the seek method will not be available. Plus, the Position and Length properties will also be unavailable along with the SetLength method. All of these members are related to using Seek in a Stream, which is unavailable without a backup store. Thus, the NetworkStream will return true for everything but CanSeek.

The constructor in the Stream class is not usable as the base class is not instantiated. Derived classes, on the other hand, will make extensive use of constructors depending on their specific functionality. Often times, the constructor will have the functionality that is representative of their respective backup store as we will later explore with the FileStream class.

The true meat behind the Stream class is the Read and Write methods. These are the mechanisms that move data in and out of the Stream. In fact, if the asynchronous versions, BeginRead and BeginWrite, are not overridden in a derived class they will call the derived classes' overridden Read and Write members. The Write method takes three parameters in the base class, the actual data which is represented as an array of bytes (buffer), an offset value to specify from what point to begin writing the data in the array of bytes (offset), and the amount of data to be written (length). The code will look like this assuming that a byte array called anyData already exists with a predefined amount of byte data and a Stream-based class is instantiated that uses the Write method exactly as implemented in the base Stream:

[Visual Basic]
myStream.Write (anyData, 0, anyData.Length)

[C#]
myStream.Write (anyData, 0, anyData.Length);

The first parameter is the byte array. The second parameter is 0 as that is the position that I want to start the copying from the byte array, and the last parameter tells write that I want to copy everything from anyData into the stream. The Write method is often overload with several options, but under the hood the implementer is writing the data as specified above. Don't forget to check the CanWrite property to see whether the Stream will allow the use of the Write method.

The Read method is similar but is used for reading data from the Stream into a buffer such as a byte array. Just like Write, the method takes 3 parameters that consist of a byte array buffer for storing the incoming bytes (buffer), an offset value for that byte array specifying where in the array to begin writing the data (offset), and a value specifying how much data to read from the stream into the array (length). Unlike Write, the method will return an integer specifying how many bytes were actually read from the Stream. The return value maybe less than the amount specified in the length parameter as the Stream may not have that much data. A value of 0 indicates that the end of the Stream has been reached. The CanRead property is used to determine whether the Stream supports the Read method.

When using the Read method, two other properties come into play, Position and Length. The value of Position moves as data is read (this also applies when data is written), always pointing to the current location in the Stream. When Position equals the value of Length, which is the length of the data in the Stream, the end of the Stream has been reached. If Position is set to 0, the next byte to be read will be the first one. As you may recall, a Stream without a backup store does not provide a value for Position or Length and in that circumstance the return value of Read must be used. Be aware that when working with a closed Stream (when the Close method of a Stream has been called), an ObjectDisposedException error will be thrown if a Read or a Write is attempted.

A simple example of using Read with myStream (myStream is a Stream-based class):

[Visual Basic]
Dim anyData(500) As Byte
'Starting at the beginning of the anyData buffer and at the beginning 
'of the Stream, read 500 bytes from the Stream and into the buffer.
myStream.Position = 0
myStream.Read(anyData, 0, 500)

[C#]
byte[] anyData = new byte[500];
//Starting at the beginning of the anyData buffer and at the beginning 
//of the Stream, read 500 bytes from the Stream and into the buffer.
myStream.Position = 0;
myStream.Read (anyData, 0, 500);

Microsoft .NET Streams also have the ability to act asynchronously. An asynchronous method call is non-blocking. What this means is that the function returns immediately and executes the next line of code, even though the operation may not be complete. The completion of the operation is signaled by using a delegate, although an asynchronous method will often return an AsyncResult object that can be used to identify the status of the asynchronous operation. Asynchronous methods are often called on separate threads and are used when a particular operation may be time consuming. Often times you do not want an application waiting for a long operation to complete before processing other code. When an application waits for a method to complete, the operation is called blocking, meaning that all other activities are blocked until the function returns. Blocking functions often use DoEvents to make sure that user-interface events continue to process while the operation is underway.

A Microsoft Stream base class includes the BeginRead and BeginWrite methods for asynchronous functionality. Asynchronous usage is often considered "advanced" functionality. While an in-depth discussion of asynchronous programming and the use of delegates are beyond the scope of this article, a basic description of asynchronous usage is appropriate.

While both methods are similar to their synchronous counter-parts, they take two extra parameters. The first parameter is callback which is a delegate that will be used when the asynchronous method call is complete. Callbacks are one of the more complex aspects of traditional C++, but are very similar to Events in usage. This parameter is optional as the Stream provides other notification mechanisms so a null value can be used, although using a callback is the preferred method of notification. The second extra parameter is state, which serves as a way to associate information with an asynchronous method call. For example, if four asynchronous calls are made there is no guarantee that those four calls will return in the same order. By passing some type of an identifier such as a unique ID, the information can be identified and "state" maintained.

As both BeginRead and BeginWrite are asynchronous, identifying when an operation completes is crucial. The callback approach was mentioned earlier, but the Begin methods have matching End methods, EndRead and EndWrite, that are used to signal the end of the asynchronous operation. When the EndRead method is called, the IAsyncResult object returned from BeginRead is passed as a parameter. The method will block until the operation completes and will return an integer that reports the number of bytes read from the stream. When the BeginWrite method is called, the EndWrite method is used to signal an end to the write operation. The IAsyncResult object returned from BeginWrite call is passed to the EndWrite method to identify which asynchronous call needs to be stopped. Using BeginWrite and EndWrite in a callback routine is the optimal approach for building an asynchronous architecture with the Stream.

A Stream has one other read and write mechanism designed to read or write a single byte. When a ReadByte method is called, the byte value at the current position is returned and the position of the Stream is advanced forward one place. The byte value is returned as an Integer. If the return value is a -1, the current position is at the end of the Stream. WriteByte works in a similar fashion. When calling the method a Byte value is passed as a parameter and that value is written to the Stream at the current position. After the operation the Position value is incremented by one. Internally, a byte array with one element is created and then the Read or Write methods are called depending on the action.

If a Stream is writable and supports seeking, the SetLength method is used to set the length of a Stream in bytes. If the new length value is shorter than the current Stream size, the data will be truncated. If the value is longer than the current size, the Stream size is expanded but no data will be defined for the extra size. Different types of Streams may react differently based on how the SetLength method is implemented. For example, a BufferedStream will internally call the Flush (discussed below) method before changing the length. When used with a MemoryStream, if the SetLength method is used to truncate a Stream and the Position value is greater than the resulting length of the Stream, the Position will be set to the end and a Read will result in a -1, indicating that the current position is at the end of the Stream. In most circumstances, when the required length of a Stream is known beforehand, using SetLength is good practice, but the determining factor should be the nature of the underlying purpose of the Stream.

Finally, there are two methods that are used to indicate that activity on the Stream is at some "end process" and action should be taken based on the situation. The Flush method, while specific to the type of Stream being created, is used to clear the buffers and write any data to a backup store or "end point". One example is a Stream that is currently holding data as a result or a write operation. When Flush is called, the data is copied to a file (as an example) and then the internal buffer is cleared with Position set to the beginning of the Stream. At a minimum, Flush will clear the buffers and finish writing, but when overloaded the method can be used for a number of operations. Close is another "ending" operation but with the intension of shutting down the read-write capabilities of a Stream. Unlike Flush, Close is final on a Stream, preventing further reading or writing and releasing any read-wrte related resources. A closed Stream will throw an ObjectDisposedException error when any read or write related operation is attempted. While most of the time, a closed Stream is not very useful, a Stream-derived class may contain information as a result of the previous read-write operations that are needed by the application. Some implementations of Close will call Flush internally.

Of all the Stream-derived classes that in the .NET Framework, the FileStream is the most likely to be used and is thus the perfect candidate for an example of a Stream-based implementation. Unlike the infamous file handling of previous Visual Basic versions, .NET wraps a Stream around a file using the FileStream class. Presenting a file as a Stream is a perfect fit as you would want to read a certain amount of data from a file, write a certain amount of data to a file, and be able to access any part of the file at will. The FileStream is intended as a generic file wrapper and acts as the programmatic representation for a file, allowing for file access control and buffering. Still, as the nature of FileStream is generic, read and write operations are conducted using byte arrays. Always accessing a file on the byte level is not preferred and for that reason other classes are available that take the FileStream object in their constructor and expose the actual FileStream through their own BaseStream property. For example, the StreamWriter class will take a FileStream in the Constructor and has an overload for Encoding to specify the encoding mechanism to use (such as System.Text.Encoding.UTF8). Methods such as WriteLine write data to the Stream with a line terminator.

An example:

[Visual Basic]
Dim fs As New FileStream("c:\temp\testfile.txt", FileMode.Create,
FileAccess.Write, FileShare.None)

Dim sw As New StreamWriter(fs)

sw.WriteLine("This is a test")
sw.WriteLine("This is line 2")
sw.Close()
fs.Close()

[C#]
FileStream fs = new FileStream("c:\\temp\\testfile.txt",
FileMode.CreateNew, FileAccess.Write, FileShare.None);
            
StreamWriter sw = new StreamWriter (fs);

sw.WriteLine("This is a test");
sw.WriteLine("This is line 2");
sw.Close();
fs.Close();

This code would result in a file called testfile.txt to be created in the c:\temp directory. If the file already exists, an Exception will be thrown. The file is granted write access and no file sharing is allowed until the file is closed. Two lines of text were written to the file, and the lines included termination characters. A total of 32 bytes were written to testfile.txt. If the StreamWriter was not used, this data would be written as a byte array and line termination would have had to be manually added. StreamReader would be used to read the data and methods such as ReadLine are used for reading in a line as a string. For raw data reading and writing, the BinaryWriter and BinaryReader classes can be used and operate on the same principle as the StreamWriter and StreamReader.

The FileStream has nine constructors for specifying the file to wrap and how the file should be treated. Along with the constructors, the Lock method controls file access and the Name parameter returns the name of the file passed in the constructor. Otherwise, the FileStream resembles the base Stream class.

Data management has come a long way since Visual Studio 6. Through the use of Streams, not only can data be moved with complete flexibility, but more importantly, with only a few lines of code. Diverse data can now be standardized across a single interface and used in a cohesive manner. Understanding Stream usage is one of the key elements to being a successful .NET developer, both for designing your own code, and integrating 3rd party solutions.