Introduction
Applibase DataCaster is a Java-based distributed database server for applications that need to directly share and synchronize data across the Internet. It includes a complete database engine with SQL support, JDBC and a direct API into the database internal objects. Java applications use DataCaster just as they would any database supporting JDBC, with some additional features specific to DataCaster. The purpose of using DataCaster would be to share application data more easily with other applications on the web, or other instances of the same application. These other applications or instances may belong to different administrative entities and may be accessed over the Internet.
Unlike typical databases, DataCaster is designed to function as a distributed database cutting across administrative boundaries on the web, thus allowing a database on one site to query and synchronize data with a database on another site. Applications use DataCaster just like any other database engine, but now have SQL access to data from all sites using DataCaster for which the required permissions have been granted. And unlike many other distributed database technology, DataCaster takes a simpler, loose-coupled approach to integrating data across sites, minimizing complex synchronization and distributed transactions. This makes DataCaster more suited to simpler integration applications that need to exchange a higher volume of data on the web.
Applibase DataCaster is designed for simple, data-driven, Java applications built around the relational database that is the core of DataCaster. This Java-based relational database, which supports the SQL and JDBC standards, provides applications a complete set of database services including data access and modification, persistence, synchronization and transactions. It also allows application instances to share, query and integrate data over the Internet with other applications or instances of the same application. Therefore DataCaster is most suited to data-driven applications that can benefit from built-in support for data exchange with other applications.
Web Applications
Since DataCaster is targeted at web data integration, web applications are likely to be the most common kind of application using DataCaster. Web applications can offer increased data integration with other remote web applications and other remote data sources, to provide users a richer application experience. For data-oriented applications, this ability to integrate data across sites can significantly enhance user experience and add value.
DataCaster server supports building web applications using standard tools like servlets and JSP. It is bundled with Apache Tomcat Web Server and other software to make it easy to build web applications that use Tomcat and DataCaster. Although DataCaster can be used for any application that uses a database, the server is targeted at simple web applications built around the database. And in addition, web applications that want to share data across the Internet. For such applications, DataCaster makes the data in one applications database more easily available to other applications and instances of the same application.
Architecture
DataCaster can be downloaded and used as either a server or a client package. The server version is a complete package that includes the Apache Tomcat web server, and a number of other tools necessary for publishing relational data on the web. A Web Administration console (
WebAdmin) is provided with the server package as a comprehensive remote administration tool for managing and using the DataCaster server. The server supports data-driven web applications using Tomcat integrated with the database, and provides a utility to run server applications on the server as needed.
The client is primarily a standalone database, with access to data from remote DataCaster servers. The DataCaster client may be embedded within applications that need to access data from remote servers. This allows data to be cached in the client in local database tables, as well as provide a full set of typical database functions. Applibase offers this client database as a free product, with no fee for redistribution with other applications. We see databases as a commodity with many good established databases available, and we are not targeting the database market. The client is recommended only when your applications can benefit from the ability to access and use remote data in the local database.
The DataCaster server package serves up data on the web, and includes complete client functionality that enables it to access data from other DataCaster servers. The core SQL database engine in DataCaster is embedded in both server and client, and enables applications to use DataCaster just as they would another database even when accessing remote data. The server needs to be licensed from Applibase for use and distribution. The server package is being made available free of charge to any web site interested in publishing data for free public access.
Distributed Data
Two primary methods are used to integrate data across applications.
- Distributed Queries: Here applications can directly reference and query the tables, views, etc. of remote application databases over the Internet.
- Distributed Views: In this case applications materialize views of remote application databases, and this applow the local applications to work with refreshed local copies of data from remote applications.
Distributed Queries
Applications using DataCaster can execute distributed queries that span two or more databases, using the URL syntax to identify individual tables and views on specific servers. This allows any remote table or view on any server on the net to be used in distributed queries, provided the security permissions allow such access.
DataCaster distributed query optimization is very limited and only takes care of pushing complete sub-queries it identifies to the appropriate remote server. In future some progress in this area can be anticipated, including attempting more sophisticated query rewrites based on remote server information. However, this a large and complex area and solutions are likely to be limited. Therefore applications will need to work with a set of simpler distributed queries to produce complex query results, rather than rely on the query processor to optimize more complex distributed queries.
Distributed Materialized Views
Possibly the more useful aspect of distributed data access with DataCaster is the ability to create materialized views of remote data that can be automatically refreshed. In many application scenarios it is not possible to get data from remote servers at remote locations and still have good user response times. It is then necessary to cache the required data locally, and DataCaster provides a convenient mechanism to cache remote tables, or views of remote tables. This allows applications to easily use data from other sources in providing a responsive and integrated user experience.
A loose coupled approach is almost required for integration across web sites. For remote servers outside your administrative control, it is often hard to synchronize activities and ensure always reliable communication between servers. Additionally, HTTP or HTTPS access needs to be used, which is not the well suited for high synchronization needs. The RSS approach seems to be a suitable loose-coupled approach across sites.
DataCaster has built-in support for such loose-coupled view refresh across web sites. Using special indexes on tables and views, it makes it possible to keep track of changes and refresh remote views with just those changes.
Security is clearly important when dealing with data o be shared across applications. And often required is a way to allow unauthenticated users to access the data, just as with web pages and anonymous ftp. In other cases, only authenticated users should be allowed access. Remote users typically need to be prevented from running complex or long-running queries that hog resources on the database server, affecting all other users and applications. DataCaster provides authentication and authorization for remote access, including anonymous access when required. It provides controls on usage for any remote or local user, allowing you to set limits on the adverse impact of misguided or intentionally disruptive remote requests.
Other Features
In addition to the remote access focus, DataCaster provides another unique feature not usually found in databases. This is a direct user or application Java API to the internal database objects, allowing applications to directly access and modify the database objects without need for JDBC or SQL access. This is not a separate driver package or module, rather it directly works on the internal database objects. Most applications will want to still use the standard JDBC and SQL interfaces for portability across databases. However, for some database applications this direct interface can be used to bypass the limitations and overhead ofthe query processor, and perform more direct lookups and other operations on database indexes, tables, etc.
DataCaster provides a comprehensive set of features and tools, as users would expect with a database. These include good support for JDBC and SQL, for which complete entry-level SQL-92 support is provided. A number of tools are provided, including data backup and export, database replication, and a web console to manage and use the server remotely.
Limitations
We would be remiss if we did not qualify the above with a dose of reality, i.e. where we are at the moment with the product. This is still work in progress, with a lot of the distributed features supported only for limited cases. For example, distributed materialized refreshed views are currently only supported for projections and selections. For more complex remote views, users will need to create the materialized views locally using these simple materialized projection and selection views as base tables.
A number of features have seen very limited testing, as we use them for specific applications. It is simply not possible, particularly for a tiny company like ourselves, to test all the different use cases for such a product. This will be an ongoing activity as we increase the number of test cases (we now run a few thousand test cases), build more applications, and see wider use of the product. We have a number of major enhancements planned, which is in addition to the improvements we will need to make in existing features. Therefore this product can by no means be considered stable and completely tested for all potentail applications. You will need to test each type of application to ensure it is properly supported by the product. However, we stand ready to respond to and support a wide range of applications that take advantage of these new ways of distributing data directly in the database.