openisis/doc/Concurrency.txt

Several concurrency issues arise whenever a database is accessed
simultaniously be multiple processes or threads
(lightweight processes sharing all their system
ressources like memory and open files, including file positions).


*       multiprocess (MP) environments

MP environments are distinguished according to
-       whether all processes are readonly or there are one or more writers
-       whether processes are single-shot
        (i.e. open, work, exit like CGI scripts, including PHP in CGI mode)
        or resident (like PHP module living in Apache 1.x/Unix childs).
        Note that PHP module in a multithreaded server is not multiprocess.

Within a readonly environment, there is not much of a problem.
Each process may read and cache file contents independent of each other.
So the rest of this section discusses read/write access.

In the presence of writers, there are some problems:
-       at least the actual writing accesses must be *strictly* mutually exclusive
-       it must be ensured that readers do not use old cached data
        (or at least use it in a well controlled manner)
-       it must be ensured that changed data is written and read in a consistent way

These problems are addressed in reverse order:
-       the data structures used in OpenIsis are designed so
        that a consistent way of reading and writing can be defined.
        For example, the XRF pointer to a new or changed record is written
        after the record, so readers will not see an invalid pointer.
        However, this will work only where the operating system guarantees
        such semantics for file reads/writes. This again may depend on the
        filesystem and will not hold for most network file systems.
-       learning which cached blocks are outdated is not possible
        with reasonable effort. One possible approach is to not cache at all,
        i.e. resort to the operating system cache, which hopefully is
        properly synchronized (see above).
-       A simple and well supported means of mutual exclusion is the use
        of exclusive file locks. flock-(BSD-)style locks are sufficient;
        we do not need locking of file regions nor locking over NFS,
        which is not reliable anyway (and it's not much better with SMB).

The most easy and reliable solution is to completely encapsulate
any access --from database open to close-- within an exclusive lock.
That way there clearly is no inconsistent cache.

-       for single-shot processes, this solution is reliable and does
        not incurr too much cost: exit will release any lock and the
        processes may not benefit from caching anyway.
-       for resident processes, the need to open and close on any request
        is more of a disadvantage, and it is a problem how to guarantee
        that any lock is released after processing.

Possible perfomance enhancements that might be implemented one day:
-       readers could use shared locks.
        however, this gives a risk of writer starvation.
-       if all data access methods are carefully checked and a reasonable
        local file system is used, non-caching readers could get by without
        locking at all.
-       another, quite complex approach is to share cache memory between processes,
        similar to ORACLE's SGA. This would also help in guaranteeing consistent
        read-write-sequences.


To summarize the multiprocess issues:
-       readonly access is fine
-       DO NOT TRY TO WRITE ON A NETWORK DRIVE
        (or at least make sure it is accessed only by one host at a time)
-       the best solution for multiple processes is to contact
        a server for writing instead of doing it themselves
-       for PHP as module in read/write mode,
        we have to rely on register_shutdown_function to close any db


*       multithreaded environments

OpenIsis is designed to run multithreaded.
Multithreading is used only within some sort of server
(like database, web or servlet engine) in order to run multiple
requests from multiple clients in parallel.

MT environments are distinguished according to
-       whether they support active dispatching of requests to threads
-       whether they support parallel IO.
        Besides the basic calls for parallel IO (like pread,pwrite,
        or ReadFileEx "overlapped" IO in Win speak, which is missing on Win 9x/Me),
        this also requires condition variables (like pthread_cond_wait/broadcast,
        which are rather difficult to emulate on Win 9x/Me in the absence
        of SignalObjectAndWait) and should include memory mapping
        (like mmap,msync, which is working poorly on Win 9x/Me).

All threads of a single process share the same cache,
so dirty caches are not an issue here.
Synchronization is cheaper and more easy to use.

However, this great performance benefit comes at a price:
While there are a few utilities without any side effects
(i.e. proper FORTRAN functions),
not only access to the database and it's cache,
but any access to system ressources like files or the memory
heap must be carefully checked for possible collisions and,
when in doubt, must be synchronized -- even in a readonly environment.


*       Session synchronization

Our strategy is to share as little as possible between threads
and to protect all that must be shared (basically the database)
by a single lock. The means to give each thread it's own,
unshared environment is the SESSION.


A session represents a single client accessing the database.
(At least this is the idea, but depends on the dispatcher's abilities,
see below).  The session may hold result sets from previous queries,
some authentication info from the client and other temporary data.
In a standalone environment like the Tk GUI not connected to a server,
there is only one session, the "default session" (session id 0).
In a database or web-server, however, there may exist several sessions
on behalf of several users at the same time.

Requests from each session are serialized by some dispatcher,
so that each session is accessed by at most one thread at a time.
Consequently, in an environment with one session only,
there also is only one thread used to access the database.


To summarize, from a session's point of view, the world is single threaded.
Each session has a private memory heap and even it's own IO stream buffers
stdin, stdout and stderr (as streams 0,1,2)
and need not care about how it is connected and to whom.
Since the dispatcher guarantees that no session is accessed
by more than one thread at a time, dynamic memory, streaming
IO and other session ressources can be used without further interlocking.


*       dispatching requests and locking sessions

Due to the dual nature of a session as both representing a user and serving
as object of synchronization, dispatching requests has two tasks:
-       ensuring serialized (single-threaded) access
-       finding the session bound to a given user

While the former is crucial in MT environments, the latter is used only if
-       the environment identifies a user session in the first place
-       the session object's ability to keep state (like result sets) is used

We distinguish two cases of when and how dispatching is done:
-       passive/late dispatching:
        In most environments we have to get the session from within a thread
        dedicated to that request. The dispatcher is implemented as a call,
        accessing a session pool protected by some mutex.
-       active/early dispatching:
        Within the database server, the proper session can be looked up
        before a thread is allocated for a request.
        Here, the dispatcher is an active component, probably running in a
        thread on it's own (thus not requiring a mutex on the session pool).
        That way several requests on the same session may be queued
        (or discarded) without consuming any thread ressources.
        This should yield better performance under high load and somewhat
        better protection against denial of service.

There are also two different situations with regard to the scope
of synchronization:
-       per request:
        The session is "locked" (somehow marked as busy) until processing
        the request has finished. Locking is done by the dispatcher,
        and unlocking must be performed on exit,
        e.g. using register_shutdown_function in PHP.
        For the passive dispatcher, if some user session id is used to locate
        an existing session and there is already a request executing in this session,
        the current thread has to wait.
-       per use:
        In a high level language, i.e. Java, basic synchronization is achieved
        by having a Java object representing the session and marking the
        appropriate methods as synchronized.

Note that unless we promise that sessions actually will remember some state,
a simple dispatcher may decide to operate on a session pool of size 1 (one),
containing only the default session, thus ruling out any parallel operation.


*       Configuration synchronization

Operations that change the overall system state like opening a database
are allowed for session 0 only. Consequently, IO (logging) and memory
associated with such operations is bound to the default session.
Databases may be marked for exclusive use by session 0
for example during a lengthy batch index update
or in order to perform structural changes like modifying the FDT.


On the other hand, the worker sessions need some confidence that
configuration is not going to change while they are in the midth of
processing a request. Therefore, any database that is somehow accessed
by a session, is marked as used by the session and marked as unused
when the session is released. This protects the database from being
closed or put in exclusive ("single-user") mode and thus also
configuration from being changed.


Note that a request for the database need not be the same as
the original user request. For a database server, the request for
a database operation is all that is known, thus clients issuing
several remote requests won't get no guarantee that the DB is unchanged
between database accesses (regardless of the environment they are running in).
When accessing a local database, the scope of locking depends on
the environment as described above. An explicit lock on a local
database might be provided for Java (to be unlocked in a finally clause).


However, the situation is not as bad as it might look, since there are
complex database accesses, bundling several operations into one.
A standard example is to perform a query and not only obtain a result set,
but also the contents of the first n records, like with a Z39.50
piggybacked "present". For remote databasse access,
this is the most efficient operation mode anyway.


*       Database synchronization

All database ressources like master file and index have associated
in memory structures like a cache. These structures must not be
accessed by more than one thread at once and are therefore protected
by a mutex (some "mutual exclusion" object like a critical section).

Again, there are two modes to distinguish:
-       basic mutex
        The database (actually all databases) are locked when starting an
        access like reading or writing a record or searching an index,
        and unlocked when done.
        Since there is not very much and especially no IO happening outside
        the database access, it doesn't make much sense to allow parallel
        access in the first place and we will rather resort to a
        one-session environment. 
-       parallel IO
        This is the interesting case to be discussed now

Parallel IO aims at using the time one thread has to wait for
an IO operation to complete in order to let another thread
use the CPU and possibly start additional IOs.
Therefore, the mutex is released during IO.

In certain situations like thread A wishing to access a cache page
being read by another thread B, A has to wait on a condition
which will be signaled by B after returning from the IO.


The mutex and condition are implemented by an OpenIsisLockFunc,
which may map it to a pthread mutex and associated condition variable.
This is very similar to the concept of a monitor as implemented
by Java's synchronized blocks.


The mutual exclusion could be made even more finegrained by using
one mutex per database and another one for global structures.
With parallel IO, however, the mutex is locked only during CPU use and
released during IO, so this, while adding overhead,
would hardly increase concurrency on a single CPU system.
On a Windoze box capable of basic mutex only, on the other hand,
you would probably not access multiple databases anyway.


*       Summary by environments

The following gives an overview of simple approaches
to be used in basic implementations:
-       PHP/Apache1.x/Unix, any CGI:
        Multiple processes use mutual exclusion based on file locking.
        Database must be closed after request.
        Actually, file locking is performed always on database open/close,
        without asking whether there might be other processes.
-       PHP/MT/windoze:
        Uses trivial dispatcher, requests fully synchronized on default session.
-       PHP/MT/Apache2.0:
        May use real dispatcher, once the MT-Apache is stable.
-       Java:
        May use non-trivial dispatcher, if it provides LockFunc.
-       OpenIsis server:
        Uses active dispatcher /
>       Server  multiplexer


*       Notes on PHP

For various PHP run modes, see
>       http://www.php.net/manual/en/features.persistent-connections.php

As of Feb.03, several extensions are
>       http://www.php.net/manual/en/faq.obtaining.php  listed
as being NOT thread-safe!


---
        $Id: Concurrency.txt,v 1.6 2003/02/18 18:10:20 kripke Exp $
1	Several concurrency issues arise whenever a database is accessed
2	simultaniously be multiple processes or threads
3	(lightweight processes sharing all their system
4	ressources like memory and open files, including file positions).
5
6
7	* multiprocess (MP) environments
8
9	MP environments are distinguished according to
10	- whether all processes are readonly or there are one or more writers
11	- whether processes are single-shot
12	(i.e. open, work, exit like CGI scripts, including PHP in CGI mode)
13	or resident (like PHP module living in Apache 1.x/Unix childs).
14	Note that PHP module in a multithreaded server is not multiprocess.
15
16	Within a readonly environment, there is not much of a problem.
17	Each process may read and cache file contents independent of each other.
18	So the rest of this section discusses read/write access.
19
20	In the presence of writers, there are some problems:
21	- at least the actual writing accesses must be strictly mutually exclusive
22	- it must be ensured that readers do not use old cached data
23	(or at least use it in a well controlled manner)
24	- it must be ensured that changed data is written and read in a consistent way
25
26	These problems are addressed in reverse order:
27	- the data structures used in OpenIsis are designed so
28	that a consistent way of reading and writing can be defined.
29	For example, the XRF pointer to a new or changed record is written
30	after the record, so readers will not see an invalid pointer.
31	However, this will work only where the operating system guarantees
32	such semantics for file reads/writes. This again may depend on the
33	filesystem and will not hold for most network file systems.
34	- learning which cached blocks are outdated is not possible
35	with reasonable effort. One possible approach is to not cache at all,
36	i.e. resort to the operating system cache, which hopefully is
37	properly synchronized (see above).
38	- A simple and well supported means of mutual exclusion is the use
39	of exclusive file locks. flock-(BSD-)style locks are sufficient;
40	we do not need locking of file regions nor locking over NFS,
41	which is not reliable anyway (and it's not much better with SMB).
42
43	The most easy and reliable solution is to completely encapsulate
44	any access --from database open to close-- within an exclusive lock.
45	That way there clearly is no inconsistent cache.
46
47	- for single-shot processes, this solution is reliable and does
48	not incurr too much cost: exit will release any lock and the
49	processes may not benefit from caching anyway.
50	- for resident processes, the need to open and close on any request
51	is more of a disadvantage, and it is a problem how to guarantee
52	that any lock is released after processing.
53
54	Possible perfomance enhancements that might be implemented one day:
55	- readers could use shared locks.
56	however, this gives a risk of writer starvation.
57	- if all data access methods are carefully checked and a reasonable
58	local file system is used, non-caching readers could get by without
59	locking at all.
60	- another, quite complex approach is to share cache memory between processes,
61	similar to ORACLE's SGA. This would also help in guaranteeing consistent
62	read-write-sequences.
63
64
65	To summarize the multiprocess issues:
66	- readonly access is fine
67	- DO NOT TRY TO WRITE ON A NETWORK DRIVE
68	(or at least make sure it is accessed only by one host at a time)
69	- the best solution for multiple processes is to contact
70	a server for writing instead of doing it themselves
71	- for PHP as module in read/write mode,
72	we have to rely on register_shutdown_function to close any db
73
74
75	* multithreaded environments
76
77	OpenIsis is designed to run multithreaded.
78	Multithreading is used only within some sort of server
79	(like database, web or servlet engine) in order to run multiple
80	requests from multiple clients in parallel.
81
82	MT environments are distinguished according to
83	- whether they support active dispatching of requests to threads
84	- whether they support parallel IO.
85	Besides the basic calls for parallel IO (like pread,pwrite,
86	or ReadFileEx "overlapped" IO in Win speak, which is missing on Win 9x/Me),
87	this also requires condition variables (like pthread_cond_wait/broadcast,
88	which are rather difficult to emulate on Win 9x/Me in the absence
89	of SignalObjectAndWait) and should include memory mapping
90	(like mmap,msync, which is working poorly on Win 9x/Me).
91
92	All threads of a single process share the same cache,
93	so dirty caches are not an issue here.
94	Synchronization is cheaper and more easy to use.
95
96	However, this great performance benefit comes at a price:
97	While there are a few utilities without any side effects
98	(i.e. proper FORTRAN functions),
99	not only access to the database and it's cache,
100	but any access to system ressources like files or the memory
101	heap must be carefully checked for possible collisions and,
102	when in doubt, must be synchronized -- even in a readonly environment.
103
104
105	* Session synchronization
106
107	Our strategy is to share as little as possible between threads
108	and to protect all that must be shared (basically the database)
109	by a single lock. The means to give each thread it's own,
110	unshared environment is the SESSION.
111
112
113	A session represents a single client accessing the database.
114	(At least this is the idea, but depends on the dispatcher's abilities,
115	see below). The session may hold result sets from previous queries,
116	some authentication info from the client and other temporary data.
117	In a standalone environment like the Tk GUI not connected to a server,
118	there is only one session, the "default session" (session id 0).
119	In a database or web-server, however, there may exist several sessions
120	on behalf of several users at the same time.
121
122	Requests from each session are serialized by some dispatcher,
123	so that each session is accessed by at most one thread at a time.
124	Consequently, in an environment with one session only,
125	there also is only one thread used to access the database.
126
127
128	To summarize, from a session's point of view, the world is single threaded.
129	Each session has a private memory heap and even it's own IO stream buffers
130	stdin, stdout and stderr (as streams 0,1,2)
131	and need not care about how it is connected and to whom.
132	Since the dispatcher guarantees that no session is accessed
133	by more than one thread at a time, dynamic memory, streaming
134	IO and other session ressources can be used without further interlocking.
135
136
137	* dispatching requests and locking sessions
138
139	Due to the dual nature of a session as both representing a user and serving
140	as object of synchronization, dispatching requests has two tasks:
141	- ensuring serialized (single-threaded) access
142	- finding the session bound to a given user
143
144	While the former is crucial in MT environments, the latter is used only if
145	- the environment identifies a user session in the first place
146	- the session object's ability to keep state (like result sets) is used
147
148	We distinguish two cases of when and how dispatching is done:
149	- passive/late dispatching:
150	In most environments we have to get the session from within a thread
151	dedicated to that request. The dispatcher is implemented as a call,
152	accessing a session pool protected by some mutex.
153	- active/early dispatching:
154	Within the database server, the proper session can be looked up
155	before a thread is allocated for a request.
156	Here, the dispatcher is an active component, probably running in a
157	thread on it's own (thus not requiring a mutex on the session pool).
158	That way several requests on the same session may be queued
159	(or discarded) without consuming any thread ressources.
160	This should yield better performance under high load and somewhat
161	better protection against denial of service.
162
163	There are also two different situations with regard to the scope
164	of synchronization:
165	- per request:
166	The session is "locked" (somehow marked as busy) until processing
167	the request has finished. Locking is done by the dispatcher,
168	and unlocking must be performed on exit,
169	e.g. using register_shutdown_function in PHP.
170	For the passive dispatcher, if some user session id is used to locate
171	an existing session and there is already a request executing in this session,
172	the current thread has to wait.
173	- per use:
174	In a high level language, i.e. Java, basic synchronization is achieved
175	by having a Java object representing the session and marking the
176	appropriate methods as synchronized.
177
178	Note that unless we promise that sessions actually will remember some state,
179	a simple dispatcher may decide to operate on a session pool of size 1 (one),
180	containing only the default session, thus ruling out any parallel operation.
181
182
183	* Configuration synchronization
184
185	Operations that change the overall system state like opening a database
186	are allowed for session 0 only. Consequently, IO (logging) and memory
187	associated with such operations is bound to the default session.
188	Databases may be marked for exclusive use by session 0
189	for example during a lengthy batch index update
190	or in order to perform structural changes like modifying the FDT.
191
192
193	On the other hand, the worker sessions need some confidence that
194	configuration is not going to change while they are in the midth of
195	processing a request. Therefore, any database that is somehow accessed
196	by a session, is marked as used by the session and marked as unused
197	when the session is released. This protects the database from being
198	closed or put in exclusive ("single-user") mode and thus also
199	configuration from being changed.
200
201
202	Note that a request for the database need not be the same as
203	the original user request. For a database server, the request for
204	a database operation is all that is known, thus clients issuing
205	several remote requests won't get no guarantee that the DB is unchanged
206	between database accesses (regardless of the environment they are running in).
207	When accessing a local database, the scope of locking depends on
208	the environment as described above. An explicit lock on a local
209	database might be provided for Java (to be unlocked in a finally clause).
210
211
212	However, the situation is not as bad as it might look, since there are
213	complex database accesses, bundling several operations into one.
214	A standard example is to perform a query and not only obtain a result set,
215	but also the contents of the first n records, like with a Z39.50
216	piggybacked "present". For remote databasse access,
217	this is the most efficient operation mode anyway.
218
219
220
221	* Database synchronization
222
223	All database ressources like master file and index have associated
224	in memory structures like a cache. These structures must not be
225	accessed by more than one thread at once and are therefore protected
226	by a mutex (some "mutual exclusion" object like a critical section).
227
228	Again, there are two modes to distinguish:
229	- basic mutex
230	The database (actually all databases) are locked when starting an
231	access like reading or writing a record or searching an index,
232	and unlocked when done.
233	Since there is not very much and especially no IO happening outside
234	the database access, it doesn't make much sense to allow parallel
235	access in the first place and we will rather resort to a
236	one-session environment.
237	- parallel IO
238	This is the interesting case to be discussed now
239
240	Parallel IO aims at using the time one thread has to wait for
241	an IO operation to complete in order to let another thread
242	use the CPU and possibly start additional IOs.
243	Therefore, the mutex is released during IO.
244
245	In certain situations like thread A wishing to access a cache page
246	being read by another thread B, A has to wait on a condition
247	which will be signaled by B after returning from the IO.
248
249
250	The mutex and condition are implemented by an OpenIsisLockFunc,
251	which may map it to a pthread mutex and associated condition variable.
252	This is very similar to the concept of a monitor as implemented
253	by Java's synchronized blocks.
254
255
256	The mutual exclusion could be made even more finegrained by using
257	one mutex per database and another one for global structures.
258	With parallel IO, however, the mutex is locked only during CPU use and
259	released during IO, so this, while adding overhead,
260	would hardly increase concurrency on a single CPU system.
261	On a Windoze box capable of basic mutex only, on the other hand,
262	you would probably not access multiple databases anyway.
263
264
265	* Summary by environments
266
267	The following gives an overview of simple approaches
268	to be used in basic implementations:
269	- PHP/Apache1.x/Unix, any CGI:
270	Multiple processes use mutual exclusion based on file locking.
271	Database must be closed after request.
272	Actually, file locking is performed always on database open/close,
273	without asking whether there might be other processes.
274	- PHP/MT/windoze:
275	Uses trivial dispatcher, requests fully synchronized on default session.
276	- PHP/MT/Apache2.0:
277	May use real dispatcher, once the MT-Apache is stable.
278	- Java:
279	May use non-trivial dispatcher, if it provides LockFunc.
280	- OpenIsis server:
281	Uses active dispatcher /
282	> Server multiplexer
283
284
285	* Notes on PHP
286
287	For various PHP run modes, see
288	> http://www.php.net/manual/en/features.persistent-connections.php
289
290	As of Feb.03, several extensions are
291	> http://www.php.net/manual/en/faq.obtaining.php listed
292	as being NOT thread-safe!
293
294
295	---
296	$Id: Concurrency.txt,v 1.6 2003/02/18 18:10:20 kripke Exp $