Several concurrency issues arise whenever a database is accessed
simultaniously be multiple processes or threads
(lightweight processes sharing all their system
ressources like memory and open files, including file positions).


*	multiprocess (MP) environments

MP environments are distinguished according to
-	whether all processes are readonly or there are one or more writers
-	whether processes are single-shot
	(i.e. open, work, exit like CGI scripts, including PHP in CGI mode)
	or resident (like PHP module living in Apache 1.x/Unix childs).
	Note that PHP module in a multithreaded server is not multiprocess.

Within a readonly environment, there is not much of a problem.
Each process may read and cache file contents independent of each other.
So the rest of this section discusses read/write access.

In the presence of writers, there are some problems:
-	at least the actual writing accesses must be *strictly* mutually exclusive
-	it must be ensured that readers do not use old cached data
	(or at least use it in a well controlled manner)
-	it must be ensured that changed data is written and read in a consistent way

These problems are addressed in reverse order:
-	the data structures used in OpenIsis are designed so
	that a consistent way of reading and writing can be defined.
	For example, the XRF pointer to a new or changed record is written
	after the record, so readers will not see an invalid pointer.
	However, this will work only where the operating system guarantees
	such semantics for file reads/writes. This again may depend on the
	filesystem and will not hold for most network file systems.
-	learning which cached blocks are outdated is not possible
	with reasonable effort. One possible approach is to not cache at all,
	i.e. resort to the operating system cache, which hopefully is
	properly synchronized (see above).
-	A simple and well supported means of mutual exclusion is the use
	of exclusive file locks. flock-(BSD-)style locks are sufficient;
	we do not need locking of file regions nor locking over NFS,
	which is not reliable anyway (and it's not much better with SMB).

The most easy and reliable solution is to completely encapsulate
any access --from database open to close-- within an exclusive lock.
That way there clearly is no inconsistent cache.

-	for single-shot processes, this solution is reliable and does
	not incurr too much cost: exit will release any lock and the
	processes may not benefit from caching anyway.
-	for resident processes, the need to open and close on any request
	is more of a disadvantage, and it is a problem how to guarantee
	that any lock is released after processing.

Possible perfomance enhancements that might be implemented one day:
-	readers could use shared locks.
	however, this gives a risk of writer starvation.
-	if all data access methods are carefully checked and a reasonable
	local file system is used, non-caching readers could get by without
	locking at all.
-	another, quite complex approach is to share cache memory between processes,
	similar to ORACLE's SGA. This would also help in guaranteeing consistent
	read-write-sequences.


To summarize the multiprocess issues:
-	readonly access is fine
-	DO NOT TRY TO WRITE ON A NETWORK DRIVE
	(or at least make sure it is accessed only by one host at a time)
-	the best solution for multiple processes is to contact
	a server for writing instead of doing it themselves
-	for PHP as module in read/write mode,
	we have to rely on register_shutdown_function to close any db


*	multithreaded environments

OpenIsis is designed to run multithreaded.
Multithreading is used only within some sort of server
(like database, web or servlet engine) in order to run multiple
requests from multiple clients in parallel.

MT environments are distinguished according to
-	whether they support active dispatching of requests to threads
-	whether they support parallel IO.
	Besides the basic calls for parallel IO (like pread,pwrite,
	or ReadFileEx "overlapped" IO in Win speak, which is missing on Win 9x/Me),
	this also requires condition variables (like pthread_cond_wait/broadcast,
	which are rather difficult to emulate on Win 9x/Me in the absence
	of SignalObjectAndWait) and should include memory mapping
	(like mmap,msync, which is working poorly on Win 9x/Me).

All threads of a single process share the same cache,
so dirty caches are not an issue here.
Synchronization is cheaper and more easy to use.

However, this great performance benefit comes at a price:
While there are a few utilities without any side effects
(i.e. proper FORTRAN functions),
not only access to the database and it's cache,
but any access to system ressources like files or the memory
heap must be carefully checked for possible collisions and,
when in doubt, must be synchronized -- even in a readonly environment.


*	Session synchronization

Our strategy is to share as little as possible between threads
and to protect all that must be shared (basically the database)
by a single lock. The means to give each thread it's own,
unshared environment is the SESSION.


A session represents a single client accessing the database.
(At least this is the idea, but depends on the dispatcher's abilities,
see below).  The session may hold result sets from previous queries,
some authentication info from the client and other temporary data.
In a standalone environment like the Tk GUI not connected to a server,
there is only one session, the "default session" (session id 0).
In a database or web-server, however, there may exist several sessions
on behalf of several users at the same time.

Requests from each session are serialized by some dispatcher,
so that each session is accessed by at most one thread at a time.
Consequently, in an environment with one session only,
there also is only one thread used to access the database.


To summarize, from a session's point of view, the world is single threaded.
Each session has a private memory heap and even it's own IO stream buffers
stdin, stdout and stderr (as streams 0,1,2)
and need not care about how it is connected and to whom.
Since the dispatcher guarantees that no session is accessed
by more than one thread at a time, dynamic memory, streaming
IO and other session ressources can be used without further interlocking.


*	dispatching requests and locking sessions

Due to the dual nature of a session as both representing a user and serving
as object of synchronization, dispatching requests has two tasks:
-	ensuring serialized (single-threaded) access
-	finding the session bound to a given user

While the former is crucial in MT environments, the latter is used only if
-	the environment identifies a user session in the first place
-	the session object's ability to keep state (like result sets) is used

We distinguish two cases of when and how dispatching is done:
-	passive/late dispatching:
	In most environments we have to get the session from within a thread
	dedicated to that request. The dispatcher is implemented as a call,
	accessing a session pool protected by some mutex.
-	active/early dispatching:
	Within the database server, the proper session can be looked up
	before a thread is allocated for a request.
	Here, the dispatcher is an active component, probably running in a
	thread on it's own (thus not requiring a mutex on the session pool).
	That way several requests on the same session may be queued
	(or discarded) without consuming any thread ressources.
	This should yield better performance under high load and somewhat
	better protection against denial of service.

There are also two different situations with regard to the scope
of synchronization:
-	per request:
	The session is "locked" (somehow marked as busy) until processing
	the request has finished. Locking is done by the dispatcher,
	and unlocking must be performed on exit,
	e.g. using register_shutdown_function in PHP.
	For the passive dispatcher, if some user session id is used to locate
	an existing session and there is already a request executing in this session,
	the current thread has to wait.
-	per use:
	In a high level language, i.e. Java, basic synchronization is achieved
	by having a Java object representing the session and marking the
	appropriate methods as synchronized.

Note that unless we promise that sessions actually will remember some state,
a simple dispatcher may decide to operate on a session pool of size 1 (one),
containing only the default session, thus ruling out any parallel operation.


*	Configuration synchronization

Operations that change the overall system state like opening a database
are allowed for session 0 only. Consequently, IO (logging) and memory
associated with such operations is bound to the default session.
Databases may be marked for exclusive use by session 0
for example during a lengthy batch index update
or in order to perform structural changes like modifying the FDT.


On the other hand, the worker sessions need some confidence that
configuration is not going to change while they are in the midth of
processing a request. Therefore, any database that is somehow accessed
by a session, is marked as used by the session and marked as unused
when the session is released. This protects the database from being
closed or put in exclusive ("single-user") mode and thus also
configuration from being changed.


Note that a request for the database need not be the same as
the original user request. For a database server, the request for
a database operation is all that is known, thus clients issuing
several remote requests won't get no guarantee that the DB is unchanged
between database accesses (regardless of the environment they are running in).
When accessing a local database, the scope of locking depends on
the environment as described above. An explicit lock on a local
database might be provided for Java (to be unlocked in a finally clause).


However, the situation is not as bad as it might look, since there are
complex database accesses, bundling several operations into one.
A standard example is to perform a query and not only obtain a result set,
but also the contents of the first n records, like with a Z39.50
piggybacked "present". For remote databasse access,
this is the most efficient operation mode anyway.


*	Database synchronization

All database ressources like master file and index have associated
in memory structures like a cache. These structures must not be
accessed by more than one thread at once and are therefore protected
by a mutex (some "mutual exclusion" object like a critical section).

Again, there are two modes to distinguish:
-	basic mutex
	The database (actually all databases) are locked when starting an
	access like reading or writing a record or searching an index,
	and unlocked when done.
	Since there is not very much and especially no IO happening outside
	the database access, it doesn't make much sense to allow parallel
	access in the first place and we will rather resort to a
	one-session environment. 
-	parallel IO
	This is the interesting case to be discussed now

Parallel IO aims at using the time one thread has to wait for
an IO operation to complete in order to let another thread
use the CPU and possibly start additional IOs.
Therefore, the mutex is released during IO.

In certain situations like thread A wishing to access a cache page
being read by another thread B, A has to wait on a condition
which will be signaled by B after returning from the IO.


The mutex and condition are implemented by an OpenIsisLockFunc,
which may map it to a pthread mutex and associated condition variable.
This is very similar to the concept of a monitor as implemented
by Java's synchronized blocks.


The mutual exclusion could be made even more finegrained by using
one mutex per database and another one for global structures.
With parallel IO, however, the mutex is locked only during CPU use and
released during IO, so this, while adding overhead,
would hardly increase concurrency on a single CPU system.
On a Windoze box capable of basic mutex only, on the other hand,
you would probably not access multiple databases anyway.


*	Summary by environments

The following gives an overview of simple approaches
to be used in basic implementations:
-	PHP/Apache1.x/Unix, any CGI:
	Multiple processes use mutual exclusion based on file locking.
	Database must be closed after request.
	Actually, file locking is performed always on database open/close,
	without asking whether there might be other processes.
-	PHP/MT/windoze:
	Uses trivial dispatcher, requests fully synchronized on default session.
-	PHP/MT/Apache2.0:
	May use real dispatcher, once the MT-Apache is stable.
-	Java:
	May use non-trivial dispatcher, if it provides LockFunc.
-	OpenIsis server:
	Uses active dispatcher /
>	Server	multiplexer


*	Notes on PHP

For various PHP run modes, see
>	http://www.php.net/manual/en/features.persistent-connections.php

As of Feb.03, several extensions are
>	http://www.php.net/manual/en/faq.obtaining.php	listed
as being NOT thread-safe!


---
	$Id: Concurrency.txt,v 1.6 2003/02/18 18:10:20 kripke Exp $