Using DataCaster

This section provides a high level overview of how to use DataCaster for your applications. DataCaster is a distributed database, and hence can be used like other standalone databases. Since it supports the standard JDBC interface and SQL, many Java applications can easily use DataCaster as the database. However, DataCaster is not intended to replace existing databases. Instead, DataCaster looks to enable a new class of applications that share data with other applications or other instances of the same application. Hence, we will focus on the data sharing aspect in our discussion here.

DataCaster helps to connect different applications or application instances and share data. DataCaster allows these application instances to publish and access data from each other, and use the data like other local data in the database. There are limitations to this approach and the product that need to be understood, even if some of these limitations may be reduced or eliminated as the product is enhanced. In the following, we will provide an overview of where DataCaster may be applicable and where it may not.

Sharing Data Across Applications

For any data to be available on the network, first an application or user must publish data on a DataCaster server. Typically, one or more applications publish the data to be shared and other applications access the data in their local database by setting up views of remote data, or running distributed queries.

While this is a new approach to sharing data across applications on the web, it is clearly relyong on a familar approach to interfacing the distributed data with the applications using the data. For applications where relational data is generated and consumed, the DataCaster approach may provide some benefits by making it easier for applications to use remote data. Furthermore, when the quantity of data is relatively large, this approach with materialized views makes it easier to cache data locally as needed by many applications that must deliver good user response times.

This approach has a number of limitations. DataCaster's focus, at least initially, is on a loose-coupled approach to integrating data across web sites and applications. This means data across sites is not synchronized, and data in one or more tables from the same site are not synchronized. Hence partial data is very likely at different times, and applications must be able to handle that gracefully. Another issue is the delay in updating the views with changes, which can be significant for some applications even with frequent refresh.

In addition, a number of product limitatuions exist with the current state of the product. This is a preview version, with a number of limitations on queries, view support, scalability and other aspects that will be a hindrance for many applications. These will be improved over time, but as of now aplications need to be tested for spefici new uses before one can be sure of the outcome.

Publishing Data

For an application or user to publish data for other users and applications, requires the following three steps:

  • Add remote users with passwords (or remote client hostnames if using server-level access) to allow remote users to access the data publishing server.
  • Start the database on the server.
  • Using GRANT statements allow public, specific user (authenticated) or anonymous access to the particular data that is being shared.

And optionally, change the usage controls on the publishing server for the specific user or type of user, which prevents users running expensive queries or otherwise overusing data server resources.

As with most databases, access to data in the database is controlled by using GRANT statements. In this case GRANT can be used to provide remote users access at three levels:

  • ANONYMOUS access
  • PUBLIC access
  • Specific user access by username

Accessing Published Data

On the client side, other applications would perform the following steps to access the data from a DataCaster client:

  • Add remote access user and passwords (or host/password for server-level access) to be used to access the publishing server.
  • Run the DataCaster client (or server acting as client)
  • Use distributed queries, or create materialized views to access the publlished data.

To be more specific on the last step, remote applications access the published data as DataCaster clients, in two ways:

  • Run ad-hoc queries that include the published data in the query, and return a query result that includes the desired results from the published data.
  • Create materialized views of the published data on the remote client. These views are then used by the application, just as it would use other local database data.

Anonymous Access

For some applications, anonymous access may be the best way to provide data access to remote users. In such cases, there is no remote setup required, and users can automatically access the data on the server remotely by just using the correct URL in their distributed queries and views.

Usage control can be applied to remote anonymous users, to minimize the damage caused by specific queries. However, at this time there is no control available for too many requests from a single remote host, so if anonymous access is granted, it does open the door for sending too many requests and hogging resources, as well as denial of service attacks and other ways of disrupting the server. if that is a concern, anonymous access should be used with caution.

DataCaster will soon be enhanced to allow better control of access from any individual remote host, or gateway in case of NAT or other proxy servers.

Anonymous acces is most useful in cases where client applications and users need limited access, but do not want to be bothered with setting up access to each remote server.

To allow anonymous access to data, unlike the case of authenticated remote users, you will need to GRANT access to the user ANONYMOUS. A GRANT of access to PUBLIC will not allow anonymous users to access the table or other database object. Use a specific GRAMNT to ANONYMOUS to the specific object to allow this kind of anonymous access.

Setup Access for Remote Clients

As a distributed database, DataCaster allows users on remote DataCaster servers or access tables and views on this server. For example, this allows users or applications on remote web sites to access data on your site, and vice-versa.

As a first step, you need to setup permissions for remote users to access your server. Access can be setup for remote users at the user-level or server-level. With user-level access, an individual user can access this server with the specified username and password. With server-level access, any authenticated user on the remote server can access this server.

To setup access for remote users to access data on your server, you need to use the WebAdmin utility for this purpose. This is the Remote Users page under the Remote Clients section of WebAdmin. Use this page to allow access to this server for remote users.

Before remote users can access tables and views from this server, they will need to setup their clients to use this username and password for user-level access to this server. Access to specific tables and views on this server is subject to the necessary SQL authorization with GRANT statements. Remote users will have access to specific tables or views if granted for the specific user, or public access is granted.

In addition, remote access can be setup at a server-level, so that all users on a specified remote client system have acceess to this server. This is the Remote Clients page under the Remote Clients section of WebAdmin. Use this page to allow access to this server for all users on a remote server. Before remote users can access tables and views from this server, they will need to setup their clients to use the servername and password for server-level access to this server. Access to specific tables and views on this server is subject to the necessary SQL authorization with GRANT statements. Remote users will have access to specific tables or views if granted for the specific user, or when public access is granted.

Setup Access to Remote Servers

With the DataCaster distributed database, you can access remote tables and views in SQL statements, scripts, applications or User API programs. Sometimes anonymous access will be permitted and used for this purpose. In other cases, you will setup authentication beforehand so it will be used when accessing data from remote servers.

DataCaster allows users on this server to access tables and views on remote DataCaster servers with eitehr server-level or user-level access. With user-level access, the specified local user on this server can access tables and views on the remote server. With server-level access, any authenticated user on this server can access the specified remote server.

To setup user-level access, go to the WebAdmin page User-level Access under the Access Remote Serves section. Use this page to setup access to a specified remote server for a given user on this server.

Before local users can access tables and views from the remote server, access will need to setup on the remote server, and the same username and password authorized for user-level access. In addition, access to specific tables and views on the remote server is subject to the necessary SQL authorization with GRANT statements. Remote users will have access to specific tables or views if granted for the specific user, or when public access is granted.

In addition, remote access can be setup at a server-level, so that all users on this server have acceess to a specified remote server. Use this page to allow access to a specified remote server for all users on this server. Before local users can access tables and views from the remote server, access will need to setup on the remote server, and the same servername and password authorized for server-level access.

Distributed Queries

Once remote access has been setup, both on your server and the remote server, you will be able to execute distributed queries. Instead of simple table names, you will use URLs to refer to remote tables and views. For example, the following query uses a products table in a database named storedb on a remote server named acmeserver, where DataCaster is running on the default Tomcat HTTP port 8080.

SELECT * FROM http://acmeserver:8080/Table/storedb/products

This query will work in a similar manner as a query on a local table. URLs can be used in queries in this fashion for multiple tables on multiple remote servers.

Distributed queries are highly unoptimized in DataCaster, so your mileage will vary greatly in terms of performance as well as what works and doesn't, due to timeouts and other issues, when it comes to distributed queries. It is often adviable to break up queries to optimize the processing of results on individual servers, as well as use views to control the query processing behavior.

Materialized Views

For many applications on the web, it seems likely that materialized distributed views have greater value in building applications with distributed data. It is usually necessary to provide users good response times, even when using remote data in the application. Materialized views essentially cache the data from the remote server(s) in a view that can be used just like a local table by the application.

The goal with distribuetd views is to have the required data local as far as possible, so that applications can use remote data without paying any serious performance penalty. Furthermore, it may be that the remote sites or servers are temporarily unavailable. At such times, it is desirable to have the application continue working even while the remote sites and servers are unavailable.

To use distributed materialized views, SQL statements or the User API can be used to create views. For example, the following SQL statement creates a materialied SQL view. Like the earlier query example, ituses a products table in a database named storedb on a remote server named acmeserver, where DataCaster is running on the default Tomcat HTTP port 8080.

CREATE MATERIALIZED PERIODIC REFRESH VIEW acme_products AS SELECT * FROM http://acmeserver:8080/Table/storedb/products

Now you can use acme_products in queries and other programs, and it will perform much like a local table. This CREATE statement specified the view should be periodically refreshed, and this refresh is performed by polling the remote server at fixed intervals to get all the changes from the remote server.

Usage Control

For anonymous or authenticated remote access, particularly across administrative domains, it is essential to control what a database user is allowed to do. Since SQL queries can easily be written to hog resources on the server among other things, there is a need to limit the resources used by any given remote user.

DataCaster supports multiple types of resource usage control for both local and remote users, including anonymous users. Four kinds of usage control are provided:

  • Number of Transactions per day
  • Index pages read per hour
  • Tuples processed per minute.
  • Data transferred in bytes/sec.

Using these attributes, usage can be controlled for any user as required, to minimze the damage from intentional or inadvertent excess resource consuption by local or remote users.