Several concurrency issues arise whenever a database is accessed simultaniously be multiple processes or threads (lightweight processes sharing all their system ressources like memory and open files, including file positions). * multiprocess (MP) environments MP environments are distinguished according to - whether all processes are readonly or there are one or more writers - whether processes are single-shot (i.e. open, work, exit like CGI scripts, including PHP in CGI mode) or resident (like PHP module living in Apache 1.x/Unix childs). Note that PHP module in a multithreaded server is not multiprocess. Within a readonly environment, there is not much of a problem. Each process may read and cache file contents independent of each other. So the rest of this section discusses read/write access. In the presence of writers, there are some problems: - at least the actual writing accesses must be *strictly* mutually exclusive - it must be ensured that readers do not use old cached data (or at least use it in a well controlled manner) - it must be ensured that changed data is written and read in a consistent way These problems are addressed in reverse order: - the data structures used in OpenIsis are designed so that a consistent way of reading and writing can be defined. For example, the XRF pointer to a new or changed record is written after the record, so readers will not see an invalid pointer. However, this will work only where the operating system guarantees such semantics for file reads/writes. This again may depend on the filesystem and will not hold for most network file systems. - learning which cached blocks are outdated is not possible with reasonable effort. One possible approach is to not cache at all, i.e. resort to the operating system cache, which hopefully is properly synchronized (see above). - A simple and well supported means of mutual exclusion is the use of exclusive file locks. flock-(BSD-)style locks are sufficient; we do not need locking of file regions nor locking over NFS, which is not reliable anyway (and it's not much better with SMB). The most easy and reliable solution is to completely encapsulate any access --from database open to close-- within an exclusive lock. That way there clearly is no inconsistent cache. - for single-shot processes, this solution is reliable and does not incurr too much cost: exit will release any lock and the processes may not benefit from caching anyway. - for resident processes, the need to open and close on any request is more of a disadvantage, and it is a problem how to guarantee that any lock is released after processing. Possible perfomance enhancements that might be implemented one day: - readers could use shared locks. however, this gives a risk of writer starvation. - if all data access methods are carefully checked and a reasonable local file system is used, non-caching readers could get by without locking at all. - another, quite complex approach is to share cache memory between processes, similar to ORACLE's SGA. This would also help in guaranteeing consistent read-write-sequences. To summarize the multiprocess issues: - readonly access is fine - DO NOT TRY TO WRITE ON A NETWORK DRIVE (or at least make sure it is accessed only by one host at a time) - the best solution for multiple processes is to contact a server for writing instead of doing it themselves - for PHP as module in read/write mode, we have to rely on register_shutdown_function to close any db * multithreaded environments OpenIsis is designed to run multithreaded. Multithreading is used only within some sort of server (like database, web or servlet engine) in order to run multiple requests from multiple clients in parallel. MT environments are distinguished according to - whether they support active dispatching of requests to threads - whether they support parallel IO. Besides the basic calls for parallel IO (like pread,pwrite, or ReadFileEx "overlapped" IO in Win speak, which is missing on Win 9x/Me), this also requires condition variables (like pthread_cond_wait/broadcast, which are rather difficult to emulate on Win 9x/Me in the absence of SignalObjectAndWait) and should include memory mapping (like mmap,msync, which is working poorly on Win 9x/Me). All threads of a single process share the same cache, so dirty caches are not an issue here. Synchronization is cheaper and more easy to use. However, this great performance benefit comes at a price: While there are a few utilities without any side effects (i.e. proper FORTRAN functions), not only access to the database and it's cache, but any access to system ressources like files or the memory heap must be carefully checked for possible collisions and, when in doubt, must be synchronized -- even in a readonly environment. * Session synchronization Our strategy is to share as little as possible between threads and to protect all that must be shared (basically the database) by a single lock. The means to give each thread it's own, unshared environment is the SESSION. A session represents a single client accessing the database. (At least this is the idea, but depends on the dispatcher's abilities, see below). The session may hold result sets from previous queries, some authentication info from the client and other temporary data. In a standalone environment like the Tk GUI not connected to a server, there is only one session, the "default session" (session id 0). In a database or web-server, however, there may exist several sessions on behalf of several users at the same time. Requests from each session are serialized by some dispatcher, so that each session is accessed by at most one thread at a time. Consequently, in an environment with one session only, there also is only one thread used to access the database. To summarize, from a session's point of view, the world is single threaded. Each session has a private memory heap and even it's own IO stream buffers stdin, stdout and stderr (as streams 0,1,2) and need not care about how it is connected and to whom. Since the dispatcher guarantees that no session is accessed by more than one thread at a time, dynamic memory, streaming IO and other session ressources can be used without further interlocking. * dispatching requests and locking sessions Due to the dual nature of a session as both representing a user and serving as object of synchronization, dispatching requests has two tasks: - ensuring serialized (single-threaded) access - finding the session bound to a given user While the former is crucial in MT environments, the latter is used only if - the environment identifies a user session in the first place - the session object's ability to keep state (like result sets) is used We distinguish two cases of when and how dispatching is done: - passive/late dispatching: In most environments we have to get the session from within a thread dedicated to that request. The dispatcher is implemented as a call, accessing a session pool protected by some mutex. - active/early dispatching: Within the database server, the proper session can be looked up before a thread is allocated for a request. Here, the dispatcher is an active component, probably running in a thread on it's own (thus not requiring a mutex on the session pool). That way several requests on the same session may be queued (or discarded) without consuming any thread ressources. This should yield better performance under high load and somewhat better protection against denial of service. There are also two different situations with regard to the scope of synchronization: - per request: The session is "locked" (somehow marked as busy) until processing the request has finished. Locking is done by the dispatcher, and unlocking must be performed on exit, e.g. using register_shutdown_function in PHP. For the passive dispatcher, if some user session id is used to locate an existing session and there is already a request executing in this session, the current thread has to wait. - per use: In a high level language, i.e. Java, basic synchronization is achieved by having a Java object representing the session and marking the appropriate methods as synchronized. Note that unless we promise that sessions actually will remember some state, a simple dispatcher may decide to operate on a session pool of size 1 (one), containing only the default session, thus ruling out any parallel operation. * Configuration synchronization Operations that change the overall system state like opening a database are allowed for session 0 only. Consequently, IO (logging) and memory associated with such operations is bound to the default session. Databases may be marked for exclusive use by session 0 for example during a lengthy batch index update or in order to perform structural changes like modifying the FDT. On the other hand, the worker sessions need some confidence that configuration is not going to change while they are in the midth of processing a request. Therefore, any database that is somehow accessed by a session, is marked as used by the session and marked as unused when the session is released. This protects the database from being closed or put in exclusive ("single-user") mode and thus also configuration from being changed. Note that a request for the database need not be the same as the original user request. For a database server, the request for a database operation is all that is known, thus clients issuing several remote requests won't get no guarantee that the DB is unchanged between database accesses (regardless of the environment they are running in). When accessing a local database, the scope of locking depends on the environment as described above. An explicit lock on a local database might be provided for Java (to be unlocked in a finally clause). However, the situation is not as bad as it might look, since there are complex database accesses, bundling several operations into one. A standard example is to perform a query and not only obtain a result set, but also the contents of the first n records, like with a Z39.50 piggybacked "present". For remote databasse access, this is the most efficient operation mode anyway. * Database synchronization All database ressources like master file and index have associated in memory structures like a cache. These structures must not be accessed by more than one thread at once and are therefore protected by a mutex (some "mutual exclusion" object like a critical section). Again, there are two modes to distinguish: - basic mutex The database (actually all databases) are locked when starting an access like reading or writing a record or searching an index, and unlocked when done. Since there is not very much and especially no IO happening outside the database access, it doesn't make much sense to allow parallel access in the first place and we will rather resort to a one-session environment. - parallel IO This is the interesting case to be discussed now Parallel IO aims at using the time one thread has to wait for an IO operation to complete in order to let another thread use the CPU and possibly start additional IOs. Therefore, the mutex is released during IO. In certain situations like thread A wishing to access a cache page being read by another thread B, A has to wait on a condition which will be signaled by B after returning from the IO. The mutex and condition are implemented by an OpenIsisLockFunc, which may map it to a pthread mutex and associated condition variable. This is very similar to the concept of a monitor as implemented by Java's synchronized blocks. The mutual exclusion could be made even more finegrained by using one mutex per database and another one for global structures. With parallel IO, however, the mutex is locked only during CPU use and released during IO, so this, while adding overhead, would hardly increase concurrency on a single CPU system. On a Windoze box capable of basic mutex only, on the other hand, you would probably not access multiple databases anyway. * Summary by environments The following gives an overview of simple approaches to be used in basic implementations: - PHP/Apache1.x/Unix, any CGI: Multiple processes use mutual exclusion based on file locking. Database must be closed after request. Actually, file locking is performed always on database open/close, without asking whether there might be other processes. - PHP/MT/windoze: Uses trivial dispatcher, requests fully synchronized on default session. - PHP/MT/Apache2.0: May use real dispatcher, once the MT-Apache is stable. - Java: May use non-trivial dispatcher, if it provides LockFunc. - OpenIsis server: Uses active dispatcher / > Server multiplexer * Notes on PHP For various PHP run modes, see > http://www.php.net/manual/en/features.persistent-connections.php As of Feb.03, several extensions are > http://www.php.net/manual/en/faq.obtaining.php listed as being NOT thread-safe! --- $Id: Concurrency.txt,v 1.6 2003/02/18 18:10:20 kripke Exp $