Fork me on GitHub

Remote Streaming

Probably the first classes anyone will use from RMIIO are the remote streams, RemoteInputStream and RemoteOutputStream. When a RemoteInputStream is utilized to move data across the network, it is essentially using a pull based approach, as the consumer of the stream pulls the data from the producer. A RemoteOutputStream, however, uses a push based approach where the producer of the data pushes the data to the consumer.

The actual remote stream implementations are:

  • Input
    • SimpleRemoteInputStream - RemoteInputStream implementation which does not use any compression on the wire. This may be useful if the client and server are on the same box, or if network bandwidth is not an issue.
    • GZIPRemoteInputStream - RemoteInputStream implementation which uses GZIP compression over the wire. For moving data of any non-trivial size, this is probably the preferred implementation. Note that it does trade off some extra CPU usage on the client and server for the reduction in network bandwidth.
    • DirectRemoteInputStream - RemoteInputStream implementation which can be used as a last-ditch alternative for certain problematic scenarios. This implementation is RMI specific and not recommended for general use. However, it will not have any problems with firewalls, and therefore may be useful when no other alternatives are viable. Note that the sequence diagrams below do not apply to this class (there are no additional RMI invocations other than the initial call). Please see the class documentation for a complete list of pros and cons before using this implementation.
  • Output

The receiver of a remote stream will generally want to interact with the stream as a normal Java InputStream or OutputStream. These wrappers include many useful features like automatic retries and buffering. The transformation can be accomplished by using static methods to wrap the remote streams:

RemoteInputStream Sequence Diagrams

These sequence diagrams show the major interactions between a client and server when the client initiates a call to the server involving a RemoteInputStream. Notice that the "Client" and "Server" objects are named purely based on who initiates the RPC interaction, but that the actual RPC calls may go both ways. "Figure 2" may be the desired scenario when dealing with firewall issues.

RemoteInputStream Sequence Diagram
RemoteInputStream Sequence Diagram
Figure 1. Client instantiated RemoteInputStream
Figure 2. Server instantiated RemoteInputStream

RemoteOutputStream Sequence Diagrams

Similar sequence diagrams for the RemoteOutputStream. Notice again that the "Client" and "Server" objects are named purely based on who initiates the RPC interaction, but that the actual RPC calls may go both ways. "Figure 3" may be the desired scenario when dealing with firewall issues.

RemoteOutputStream Sequence Diagram
RemoteOutputStream Sequence Diagram
Figure 3. Server instantiated RemoteOutputStream
Figure 4. Client instantiated RemoteOutputStream

Serializable InputStream and OutputStream

It is also possible to send something remotely which already is an InputStream or an OutputStream. While this is not usually necessary and adds a slight overhead on the receiving end, there may be times when this is desired (such as a reflective framework which makes any method call work remotely).

Remote Iteration

The RemoteIterator classes are built on top of the RemoteInputStream. They provide functionality for iterating over large collections of objects which may not fit into memory all at once. One example usage could be streaming a table from one database to another, where the actual table has millions of rows.

While the RemoteIterator implementations allow for custom object serialization, most developers will be able to simply use the concrete implementations which utilize standard Java Serialization. Because of this added flexibility, however, RemoteIterators are slightly more complicated to use than remote streams. The developer needs to instantiate both a server object (which serializes objects and pushes them into the underlying RemoteInputStream) and a client object (which deserializes the objects on the other end).

The server/client implementations are:

  • SerialRemoteIteratorServer - serializes objects using standarad Java Serialization.
  • SerialRemoteIteratorClient - deserializes objects using standard Java Serialization
  • SimpleRemoteIterator - wraps a simple Java collection as a RemoteIterator. This is useful for passing a small collection of objects to a method which expects a RemoteIterator. Instead of streaming the data, the entire collection is merely serialized as part of the "client" object using normal Java Serialization.

RemoteIterator Sequence Diagrams

These sequence diagrams show the major interactions between a client and server when the client initiates a call to the server involving a RemoteIterator. Notice that the "Client" and "Server" objects are named purely based on who initiates the RPC interaction, but that the actual RPC calls may go both ways.

RemoteIterator Sequence Diagram
Figure 5. Client instantiated RemoteIterator

RemoteIterator Sequence Diagram
Figure 6. Server instantiated RemoteIterator

Local Iterators

The local source for a SerialRemoteIteratorServer is always an IOIterator. This is similar to an Iterator, except that the methods throw IOException. There are a variety of useful methods and classes for creating IOIterators:

  • RmiioUtil.adapt will turn any Iterator into an IOIterator.
  • AbstractCloseableIOIterator - Any non-trivial IOIterator implementation should generally extend this class. This abstract class manages the complexity around ensuring that local resources are closed in a timely manner.
  • IOIteratorPipe - In more advanced applications, it may become necessary to generate the data for an IOIterator in a separate thread or threads. An IOIteratorPipe takes much of the pain out of moving the data from the the threads producing the data to the thread consuming the data via an IOIterator.

No-Delay Iteration

By default, RemoteIterators batch objects together to reduce network overhead, however certain scenarios may call for lower latency at the expense of added network overhead. One example is a logging facility where the RemoteIterator objects are log messages. The receiver may desire to receive the log messages reasonably close to when they were generated. This can be achieved by disabling compression and enabling the "noDelay" flag on the RemoteIterator. Note that this will mean that every object will be sent from the server to the client in a separate remote method call.

Custom Serialization

The base RemoteIterator implementations are built in such a way as to allow the developer to implement custom serialization of the data being sent over the wire.

The most likely server/client choices for extension are:

Monitoring

The RemoteStreamMonitor provides hooks for monitoring most of interesting events that occur in the life of a remote stream. One example use of a RemoteStreamMonitor could be monitoring the progress of a remote stream in order to provide feedback in a UI. In general, implementors of a RemoteStreamMonitor will want to extend either RemoteInputStreamMonitor or RemoteOutputStreamMonitor depending on the type of stream being monitored (note that a RemoteInputStreamMonitor can also be used to monitor a RemoteIterator).

Closing Additional Server Resources

RemoteStreamMonitors also provide a means for cleaning up additional server resources when a remote stream is finished. This is very useful for handling things like database resources or transactions which may need to be kept open for the life of the remote stream.

Custom Server Exporting

The remote stream servers utilize an instance of a RemoteStreamExporter in order to export themselves for use remotely. "Exporting" a remote stream server is the act of making it available for use externally via an RPC framework and generating a stub object which can be used remotely to access the server. The DefaultRemoteStreamExporter manages this process for standard Java RMI. However, there may be times when it is necessary to change how the remote stream servers are exported.

One reason for customizing remote stream exporting may be when dealing with firewall or security issues. It may be necessary to control on which port or transport layer (e.g. SSL) the remote streams are exported.

Additionally, it may be necessary to customize remote stream exporting in order to use an entirely different RPC framework. While the RMIIO package was designed for use with standard Java RMI, it can be utilized on other similar RPC frameworks simply by replacing the RemoteStreamExporter. There is example code in this project showing Java to C++ streaming using CORBA. Other developers have written custom RemoteStreamExporters in order to use remote streams on Application Servers with alternate RMI implementations.

Please note, however, that while the remote streams may be used on different RPC frameworks, the remote iterators are restricted to RMI compatible frameworks only, as they involve Java serialization in addition to the standard RPC interactions.

RPC Robustness Utilities

Programming using distributed processing is difficult. "Anyone telling you differently is selling you something." While the RMI framework removes some of the unnecessary pain of distributed programming, it sometimes lulls developers into thinking distributed programming is easy. Remote method calls which look like local method calls may be treated similarly by the programmer when they really should not be. But enough of my ranting. Long story short, one facet of writing robust distributed system is dealing with the inevitable remote method call failure. The best way to do this is to write idempotent remote method calls and utilize automatic retry policies.

Enter RemoteRetry, a key, "assembly level" distributed programming utility. The RemoteRetry class is an extensible base class for building custom retry policies including custom exception handling and backoff strategies. Additionally, there are a variety of methods in the class for applying a RemoteRetry strategy to a remote method call. Finally, there are a few simple implementations to cover the common cases.

All of the remote stream server implementations in the RMIIO package are built on top of the RemoteRetry facility and they all allow a developer to plugin custom policies. Tailoring the RemoteRetry policies for each application as well as applying them to all remote method calls is the first step in writing a robust distributed system.

Pseudo Socket over RMI

The RMISocket utility class can be used to simulate a socket-like connection over RMI. A single RMISocket instance enables only a single direction of communication (from the remote system to this system), in which case the instantiator of the RMISocket is an RMI server (essentially serving a RemoteOutputStream). In order for bi-directional communication to take place, each system must have an instance of RMISocket and the relevant Sources should be exchanged, in which case both systems will be acting as RMI servers.

In general, simulating a socket connection over RMI is probably not the best idea, and should not be pursued for a new project. However, when revamping an existing project, it may be desirable to layer an existing socket-based protocol over a separately established RMI connection. In such a situation, this utility could be very useful.