P2P Guide of Hyper Estraier Version 1

--- trunk/doc/nguide-en.html 2005/07/29 21:57:20 3 +++ trunk/doc/nguide-en.html 2005/07/30 10:45:45 4 @@ -16,7 +16,7 @@ -P2P Guide of Hyper Estraier Version 1 +P2P Guide of Hyper Estraier Version 2 @@ -45,32 +45,39 @@

Introduction

This document describes how to use P2P mechanism of Hyper Estraier. If you have never read the user's guide, please read it beforehand.

- -

estseek.cgi is not efficient because it connects to the database per execution. And, it is impossible to perform search during database updating, because estcmd locks the database. To solve the problem, Hyper Estraier provides a server program of C/S (client/server) architecture. There are a resident process keeping connection to the database and it serves some operations via network. The C/S architecture has the following advantages.

This document describes Hyper Estraier's client/server (C/S) and peer to peer (P2P) architecture. If you haven't read user's guide yet, now is a good moment to do so. +o +

There was several problems which where motivation for C/S architecture. estseek.cgi is inefficient because it has to reconnect to database for each search query. Database updates using estcmd prevented searches on same database because estcmd uses locks when doing update. To solve those problems, estmaster server process is implemented. This process is resident in memory, has control over databaseses and provides services via network. This approach also has following advantages: +

The server and its clients work on separate machines.
Plural clients of one server can work in parallel.
The database is not broken even when some clients crush.
Clients can be implemented without dependency of programming languages or APIs.
server and clients can be distributed across different machines
multiple clients and servers can work in parallel
client crash doesn't leave database in inconsistent state
clients implementation isn't specific to any programming language

Because the protocol between C/S is based on HTTP, some popular web browsers can be used as clients. Of course, clients can be implemented on your own way. It is also good idea to use such technologies around web browser as JavaScript, Flash, and so on.

Protocol between clients and servers is based on HTTP, so normal web browsers can be used as simple clients. Clients can be implemented using any languages which supports HTTP protocol like JavaScript or Flash. +

+ +

Distributed processing is based on peer to peer (P2P) architecture. For example, if you use 10 servers, each with million documents, you can search 10 million documents without much additional effort. Since all servers are equal, search service is provided even if some of servers are unavailable. There is notion of relevance of each server which can improve search results (if some parts of index is more important that others). +

Distributed processing based on P2P (Peer to Peer) architecture is supported. If you use 10 servers handling one million of documents, you can search 10 millions of documents. Because servers are equivalent, whole of the network service works successively even if a server crushes. Moreover, calculating reliability between servers is supported and it can improve search precision.

+Relevance was called reliability in previous version of this document. +

The node API is provided to hide the protocol between C/S. Using the node API, you can implement client applications without closeup know-hows about network. This document describes how to use the node API also.

This document describes node API which can be used by client applications to implement search capabilities without using network protocol between client and server.

Architecture

This section describes the P2P arhcitecture of Hyper Estraier.

This section describes the P2P architecture of Hyper Estraier.

Node Master and Node Server

If you uses many indexes, it is inefficient to run a server per index. So, a program called node master is provided. While it works as one process and uses one network port, it can handle several indexes. Because each index performs its own service, we can regard a "node master" as aggregation of several index servers. On the viewpoint, each virtual server handling an index is called "node server". Each node server has an own URL. A client application knows URL of a node server but does not know in which node master the node server works.

When using multiple indexes it is inefficient to run one server for each index. estmaster is server process which is implemented as single process which use network port and can provide search services for multiple indexes. Since each index can be searched individually, we can think of "node master" as collection of indexes which are served by single estmaster process. Each index is called "node server" (you can think of it as virtual server) and each has unique URL. Client application knows URL of node server but doesn't need to know to which node master particular index corresponds.