Software Architecture of Turtle NodePetr Matejka
7th March, 2004
This document gives brief overview of software architecture of Turtle node. Turtle nodes operate in peer-to-peer network described in , however several modifications of communication protocol were accepted. Most important changes are described in sections 1.1 and 1.2.
1.1. Anonymous addresses
First change is related to node addressing. In original protocol all nodes of query broadcast tree must remember part of routing information(immediate neighbour) for the query. Routing information is used in data retrieval phase to route retrieve request and response. In modified protocol the routing information is not spread in the network, but it is propagated back to query initiator. Each network node remembers only routing information of its own queries. Although the information is bigger, nodes can easily decide what to remember and how long to remember it, because they know what they need it for.
Following example illustrates how the
modified protocol works:
Returned routing information allows node 0 to contact node 3, although nodes 1, 2 and 3 don't store any state. Because each node understands only its own part of routing information, it is anonymous and nodes know only their immediate neighbours on the path. We call this routing information anonymous address. Number of hops between nodes might be masked by other techniques.
1.2. Virtual circuits
All communication between Turtle nodes uses virtual circuits tunneled through TCP connection. It allows data of arbitrary length to be transferred over the network. When a node wants to establish a virtual circuit, it first needs address of target node. In our case the address is acquired by query protocol as described in section 1.1. Target address is propagated along with connect request. Each node on the path unpacks (decrypts) its part of the address, so the circuit is established step-by-step to target node. Neither source nor target node know to whom they are connected. The data sent to the circuit by any of them is propagated step-by-step over the network.
Virtual circuits are also used for sending queries and query hits. They can be theoretically used for any type of application (not only querying and retrieving files) just like TCP/IP.
In this document only communication infrastructure is described. GUI part of Turtle node is omitted. All main components of communication infrastructure are depicted in Figure 2: Router, TCP Channel Manager, Service Address Resolver, Query Service, Download Server and Client.
These components are connected by
There is also a method for discovering services running on neighbour node. More detailes about this topic are given in section 2.3.
Router is the central component of Turtle node. All data going through the node go through Router. All other components are registered at Router and Router does circuit switching between them. We call these components channels. When virtual circuit connect request comes from one channel, Router decodes(decrypts) routing information and passes connect request further to another channel. When the circuit is established, Router forwards data between these two circuit segments until the circuit is closed. There can be many virtual circuits coming from/to one channel.
Each channel represents either local application or neighbour node. Router does not distinguish between these two types of channels. It simply forwards calls from one channel to another. Besides this Router provides the following services:
Internally Router creates one Router
Channel object per registered channel. Router Channel provides
2.2. TCP Channel Manager
TCP Channel Manager is responsible for communication with neighbour nodes. It listens for incoming TCP connections from neighbours and also periodically tries to connect them. If TCP connection is established and authenticated, Communication Channel object is created and registered at Router. From that moment it is possible to create virtual circuits through newly created channel.
All TCP commmunication
is done from communication thread. It receives data and passes them to
Communication Channels. Communication Channel builds command
from the data and passes it to Router (using
Communication Channel is also responsible for controlling the flow of virtual circuits. It supplies each circuit with send buffer. It does not allow sender to send more data than what can fit into the send buffer.
Because there can be many virtual circuits on one channel, data multiplexing is needed to guarantee fair division of TCP connection bandwidth. This is again up to Communication Channel. It cyclically sends data of circuits and assigns higher priority only to commands(connect, flow control, disconnect, etc.).
2.3. Service Address Resolver
So far no mechanism for discovering services running on neighbour nodes was introduced. It is neccessary to provide such mechanism, so that neighbour nodes can establish virtual circuits for basic services (currently only query service).
Each node has one special component
called Service Address Resolver(SAR) that is registered at Router as
channel with special flag. Router knows that it is SAR and provides
its address through
SAR acts as register of local basic services. They can register at SAR with name and address. Same service running on neighbour node can ask its router for address of our SAR. Then it contacts our SAR and asks (by name) it for address of basic service. After extending acquired address by local routing information neighbour service can contact our service using the address.
2.4. Query Service
Query Service component implements query mechanism described in . Query Receiver subcomponent waits for incoming circuits from neighbour nodes. It is registered at SAR, so that neighbour nodes can find it. It receives queries on established circuits and passes them to Query Manager. When query hit is received from Query Manager, Query Receiver sends hit packet to circuit from which corresponding query came.
Query Manager receives queries from Query Receiver and from GUI. It keeps hash table of recent queries to be able to detect cycles in query graph and to know from where each query came. Queries not found in hash table are stored to it and are passed to Local Query Evaluator and Query Sender. Query hits received from Local Query Evaluator and Query Sender are passed back to Query Receiver or GUI.
Query Sender asks Router for list of neighbours and their SAR addresses. It contacts neighbour SARs to find Query Receivers and connects to them. Established circuits are used to send queries to Query Receivers and to receive query hits from them. When a query hit is received from neighbour node, anonymous address contained in hit packet is extended and then the packet is passed to Query Manager.
Local Query Evaluator evaluates queries on local filesystem and passes hits to Query Manager. Address contained in hit packet is set to address of local Download Server.
2.5. Download Server
Download Server is simple component that has only one purpose – to send requested file to virtual circuit. When contacted it waits for request consisting of file name and possibly also file position (to support continuing of broken downloads). It finds requested file on local filesystem and starts sending until the file is completely sent.
2.6. Download Client
Download Client communicates with Download Server running on remote node. GUI can ask Download Client to download a file. It must provide address of Download Server and name of the file, both can be obtained by querying. Download Client then contacts Download Server and sends request to download the file. When the file is received and stored to local filesystem, GUI is notified.
2.7. Other Components
There are a few other components not mentioned yet. They are subcomponents of some above-mentioned component.
2.7.1. Application Channel
Application Channel is used by local
application components (Download Server/Client, Query Service, SAR)
to bind to router using
Application Channel supplies each virtual circuit with receive buffer and optionally also with send buffer. Using these buffers Application Channel controls flow on virtual circuits.
2.7.2. Key Repository
Key Repository stores encryption keys and provides them to TCP Channel Manager. TCP Channel Manager uses them as master keys in TCP communication with neighbour nodes. Turtle relies on symmetric cryptography, so keys must be exchanged in advance and put into Key Repository before nodes are able to communicate.
2.7.3. Addressing Utility
This is a helper component for Router that can manipulate with anonymous addresses by the following operations:
Different addressing schemes can be used in Turtle network by extending this component, e.g. addressing scheme that masks hop count.
2.7.4. Query Parser
When user enters a query, it is first parsed to binary form by Query Parser. Query in binary form is packed into query packet that is passed to Query Manager and distributed to the network. Distributing queries in parsed form saves CPU time of nodes, because parsing is done only once and not on each node.
3. Initialization and Configuration
Initialization procedure of Turtle node has the following steps:
From this moment GUI can send queries to Query Manager and expect query hits from it. Query hits can be then browsed by user and selected files are downloaded using Download Client. Download Client notifies GUI about events as interrupted download, finished download or download progress.
GUI is responsible for storing the configuration in its configuration files (or elsewhere). It passes the configuration to communication infrastructure components via their interfaces during initialization or when user changes the configuration.
 Bogdan C. Popescu, Bruno Crispo, Andrew S. Tanenbaum: Safe and Private Data Sharing with Turtle: Friends Team-Up and Beat the System
Last modified: 5th of August, 2004 by Petr Matejka