|This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.|
A bytestream is a sequence of bytes. Typically, each byte is from a range of 256 distinct values (octets), and so the term octet stream is sometimes used to refer to the same thing. An octet may be encoded as a sequence of 8 bits in multiple different ways (see endianness) so there is no unique and direct translation between bytestreams and bitstreams.
Bitstreams and bytestreams are used extensively in telecommunications and computing. For example, Synchronous Digital Hierarchy transports synchronous bitstreams, and Transmission Control Protocol transports an asynchronous bytestream.
Definition of bytestream
Formally, a bytestream is a certain abstraction, a communication channel down which one entity can send a sequence of bytes to the entity on the other end. Such channel is often bidirectional, but sometimes unidirectional. In almost all instances, the channel has the property that it is reliable; i.e. exactly the same bytes emerge, in exactly the same order, at the other end.
Less formally, one can think of it as a conduit between the two entities; one entity can insert bytes into the conduit, and the other entity then receives them. This conduit can be transient or persistent.
In practice, bitstreams are not used directly to encode bytestreams; a communication channel may use a signalling method that does not directly translate to bits (for instance, by transmitting signals of multiple frequencies) and typically also encodes other information such as framing and error correction together with its data.
The term bitstream is frequently used to describe the configuration data to be loaded into a field-programmable gate array (FPGA). This usage may have originated based on the common method of configuring the FPGA from a serial bit stream, typically from a serial PROM or flash memory chip, although most FPGAs also support a byte-parallel loading method as well. The detailed format of the bitstream for a particular FPGA chip is usually considered proprietary to the FPGA vendor.
In mathematics, several specific infinite sequences of bits have been studied for their mathematical properties; these include the Baum–Sweet sequence, Ehrenfeucht–Mycielski sequence, Fibonacci word, Kolakoski sequence, regular paperfolding sequence, Rudin–Shapiro sequence, and Thue–Morse sequence.
On most operating systems, including Unix-like and Windows, standard I/O libraries convert lower-level paged or buffered file access to a bytestream paradigm. In particular in Unix-like operating systems, each process has three standard streams, that are examples of unidirectional bytestreams. The Unix pipe mechanism provides bytestream communications between different processes.
Compression algorithms often code in bitstreams, as the 8 bits offered by a byte (the smallest addressable unit of memory) may be wasteful. Although typically implemented in low-level languages, some high-level languages such as Python and Java offer native interfaces for bitstream I/O. Such reduced-depth bitstreams may also be accessed by reading several bytes at a time, which may simplify access in some languages (for example, four 6-bit integers can be stored in three whole 8-bit bytes).
One well-known example of a communication protocol which provides a byte-stream service to its clients is the Transmission Control Protocol (TCP) of the Internet protocol suite, which provides a bidirectional bytestream.
The Internet media type for an arbitrary bytestream is . Other media types are defined for bytestreams in well-known formats.
Often the contents of a bytestream are dynamically created, such as the data from the keyboard and other peripherals (/dev/tty), data from the pseudorandom number generator /dev/urandom, etc. In those cases, when the destination of a bytestream (the consumer) uses bytes faster than they can be generated, the system uses process synchronization to make the destination wait until the next byte is available. When bytes are generated faster than the destination can use them, there are several techniques to deal with the situation:
- When the producer is a software algorithm, the system pauses the producer with the same process synchronization techniques.
- When the producer supports flow control, the system only sends the "ready" signal when the consumer is ready for the next byte
- When the producer can't be paused—it is a keyboard or some hardware that doesn't support flow control—the system typically attempts to temporarily store the data until the consumer is ready for it, typically using a double buffer or a queue. Often the receiver can empty the buffer before it gets completely full. A producer that continues to produce data faster than it can be consumed, even after the buffer is full, leads to unwanted buffer overflow, packet loss, and network congestion.