Peer-to-Peer Systems
An Introduction
Prof. David Bernstein
James Madison University
Computer Science Department
bernstdh@jmu.edu
Motivation
The Concept:
Share storage and bandwidth of a large number of (small) computers
The Traditional Argument for Peer-to-Peer Systems:
The time required to distribute a file is smaller than in client-server systems
Distributing a File: Notation
\(N\) denotes the number of machines that want the file
\(F\) denotes the size of the file (in bits)
\(u_S\) denotes the source's upload rate
\(u_i\) denotes the upload rate of machine \(i\)
\(d_i\) denotes the download rate of machine \(i\)
\(d_{\mbox{min}} = \min\{d_1, d_2, \ldots, d_N\}\)
\(D\) denotes the total time
Distributing a File: Client Server
Observations:
The server must transmit \(N \cdot F\) bits in total which must take at least \(N \cdot F / u_{S}\) seconds
The slowest peer cannot download the file in less than \(F / d_{\mbox{min}}\) seconds
Results:
Obvious: \(D \geq \max \left\{ \frac{NF}{u_S}, \frac{F}{d_{\mbox{min}}} \right\}\)
With Some Thought: \(D = \max \left\{ \frac{NF}{u_S}, \frac{F}{d_{\mbox{min}}} \right\}\)
Distributing a File: Peer-to-Peer
Observations:
The minimum time is at least \(F/u_S\)
The slowest peer cannot download the file in less than \(F / d_{\mbox{min}}\) seconds
The system must deliver a total of \(N \cdot F\) bits and the total upload rate is \(u_{\mbox{total}} = u_S + u_1 + \cdots + u_N\)
Results:
Obvious: \(D \geq \max \left\{ \frac{F}{u_S}, \frac{F}{d_{\mbox{min}}}, \frac{NF}{u_\mbox{total}} \right\}\)
With Some Thought: \(D \approx \max \left\{ \frac{F}{u_S}, \frac{F}{d_{\mbox{min}}}, \frac{NF}{u_\mbox{total}} \right\}\)
The Coordination Challenges
No central administration
Heterogeneity
No trust
Changing participants (i.e., churn)
Unreliable participants
Services:
"Required":
Locate a provider of content
Retrieve content
Optional:
Search for content
Central Directory Systems
Example:
Napster
Process:
Provider registers availability of file with directory
Requester queries directory for a provider of file
Directory responds with a provider
Requester requests file from provider
Provider provides file
Central Directory Systems (cont.)
Advantages:
Simple
Facilitates search
Disadvantages:
Not robust (i.e., single point of failure)
Centralized accountability
Flooding Systems
Example:
Gnutella
Process:
Requester looks for a file by flooding neighbors who flood their neighbors (if necessary)
One or more nodes responds to requester
Requester selects a provider
Requester requests file from provider
Provider provides file to requester
Flooding Systems (cont.)
Advantages:
Decentralized
Disadvantages:
Hard to find "unpopular" content
No guarantees
Flooding causes scaling problems
Flooding Systems with Super Peers
Example:
KaZaA
The Concept:
A hybrid of directory based and flooding-based
Some nodes are super peers that know about the nodes "under" them
Chunked Systems
Example:
BitTorrent
Storage Process:
Content is split into chunks
Chunks are stored on different nodes
Location Process:
Content has an associated tracker node
Retrieval Process:
Chunks are received from different nodes
The last chunks are downloaded from multiple nodes (to avoid halting)
Incentivized Participation
Objective:
Ensure that nodes are both providers and requesters
Approaches:
Only provide to others who provide
Order/prioritize nodes by amount provided
There's Always More to Learn