As I was talking some time ago about torrent, I was asked what exactly that is and how it works. So I just came up with writing a post about it, as I figured out that there are a lot persons who still don’t know BitTorrent, because they never heard of it, or they do not know how it works and don’t know where to look for information.
To follow this article, you should have a basic understanding of what Client/Server Architecture means. If you don’t, you could have a look at the Jabber Guide I wrote some years ago (unfortunately in German only). Beside this you do not really have to have technical background (at least I hope – as my intention is to reach those of you who don’t – all the others won’t need a post like this, anyway 😉 ).
The Name BitTorrent is made up by the words bit and torrent, and this name was chosen by the developers as this protocol is ideal with transferring big amounts of data in no time.
The idea behind BitTorrent is that not only the server coordinates and distributes the download, but the clients can also do it by themselves. This reduces the traffic of the server, and distributes the net load over the group of clients that actually download a File. So as soon as a client starts downloading a file in a BitTorrent network, he is also uploading these parts for others. As this leads to a better overall bit rate, also the clients may profit from this (see below for an explanation with example).
BitTorrent introduces quite a lot of new vocabulary which I would like to begin with, so you know what other guys are talking about, when this topic comes up.
- In the BitTorrent world the servers are called tracker (see below to understand why)
- Each file that is distributed over the BitTorrent net is split into parts. These parts are called chunks
- The group of clients that is actually involved with a downloadable file is called swarm. This group splits up into the following two subgroups:
- All clients that just have some chunks of the file, but not the total file itself, are called peers.
- All clients that have downloaded the whole file (i.e. he possesses all chunks of the file), are called seeders.
- All clients that only download without uploading are called leecher
- Torrent File
- The torrent file is one of the key component for the BitTorrent network. It is a file that usually has the extension .torrent, and sometimes you could even find .tor, to fulfill the old Windows-/Dos-Style filename requirements.
- Coll. for the BitTorrent network, or for the Torrent file.
So now you know the vocabulary, let’s see how these components work together.
First of all you have someone setting up a BitTorrent tracker – this is how every BitTorrent network starts.
This tracker now has some files that it should share. To share a file among the people, a torrent file has to be created. In the beginning this file does only contain some meta data the IP address, as well as some information on the file that is to share, like hash values of the chunks, size, filename, etc.
A client now wants to download this file. To do so he needs to get the torrent file (normally this is distributed by websites). The client then knows the source, where it can get the data and starts downloading. As soon as a chunk gets downloaded the client verifies that the chunk is complete and does not contain any errors. If so, this is reported to the tracker.
Now what the tracker does is to write this information into that torrent file.
If now there is another client that enters the swarm by downloading the torrent file, that client will not only have the information about the tracker, it will also have the information about the other peers. So it could now download the second chunk from the tracker and the first chunk from the other peer.
Why is this done? Just imagine the tracker and the peers have an upload bit rate of 10KB/s, and download bit rates that are much higher. Now as the upload is smaller, this is the limiting factor for our download speed.
When the first peer downloads from the tracker, there is a speed of 10KB/s for him. As the second peer joins this speed is of course split for both of them. If they would only download from the tracker, each of them had a speed of 5KB/s.
But now as the first peer is not using its upload for other reasons, those 10KB/s are free for use, and this is what BitTorrent takes advantage of. The second peer now can download as well from the tracker as from the first peer – and therefore manages to reach a speed of 15KB/s.
This is just a small example – as you can imagine now, the bigger the swarm for a file grows, the faster the download rates are. Just imagine 10 peers. That would make a maximum upload rate of 10 * 10KB/s = 100KB/s that a new peer could download with.
Of course in practical use there are different factors that have a reducing effect to this theoretical download rate, which are not taken into consideration here – but I guess you could surely see, that there is a significant speed factor for this network.
The other interesting things you might notice are:
- The torrent file is one of the key factors. If for example the tracker administrator decides to delete it, then there is no way to get that file anymore, even if it is still on the tracer and there are peers and seeders still willing to share.
Anyway this does not mean that the current peers have a problem. Even if the tracker deletes the torrent file, the clients still have it, and so know about where to find the chunks they still need. Deleting the torrent file only closes the gate for new peers.
- The system is totally decentralized. A swarm grows around a tracker, and different trackers do not have a connection to each other. Clients can download different files form different trackers at the same time (this is a difference to the eD2K network, which has a similar peer-to-peer connection technique, but you need a server to be connected to, which distributes the meta data to all other clients).
This also makes it possible to have coteries.
- The weak point of the system is the torrent file. This has to be published somewhere, and while downloading it you get the information of all the other participants, as it always contains the data of the current swarm (and sometimes even outdated data of clients, which used to be a peer of that cluster). So you are not really anonym. This is one of the reasons why there are also coteries, where everybody knows each other anyway. In these communities the torrent files are distributed in a restricted area only.
- The Tracker does not have to have the file which the clients are sharing. It could also be on a totally different place. And in fact it is a common procedure to delete them from the server (or not even to have them in first place) if there are enough seeders.
- As any other file sharing network, BitTorrent lives from it’s users. This fact applies especially for BitTorrent, as you only gain profits over other distribution ways, if the swarm is big enough . Generally speaking this is only the fact for current files. E.g. if a Linux Distribution releases a new Version, the download rates are excellent around the distribution date, and then slowly go down. For previous releases it will probably make no sense to use BitTorrent – you should consider to use other ways, as http or ftp here.
- Leechers are a special thread to the BitTorrent system. Take following situation:
A file is seeded by the tracker and then, after a while there are enough seeders, so the tracker’s administrator decides to take the file of the server. Now, if all the seeders disappear at once, the swarm is pretty screwed, as the distribution of the chunks goes on but the swarm is not in possession of all chunks anymore, so no one in the swarm will ever be able to finish its download. This in fact is a serious problem, which leads to such developments as Anti-Leech-Trackers, short ALTs, that calculate a ratio comparing your up- and download behavior. This reveals if you are a unfair user or even a leecher and if this is the fact than the tracker banns you from the swarm.
So. Having given you a overview of the techniques, let’s talk about how you can get involved with this technology. And the first step for this is to install and use a client.
There is a really big bunch of clients out there, the Wikipedia has a nice comparison of the important ones, so I will not give you information on each of the different clients. Instead I want to ease your decision by giving a recommendation.
The client that I would recommend is Azureus. Azureus is written in Java, which makes it compatible to different platforms. Azureus will just work fine on your Windows, Mac, Linux, and any other platforms that has a virtual machine to run Java’s Bytecode on.
Beside this it has a nice and easy to use GUI, which makes it easy for beginners to join the BitTorrent network. Though it should not be seen as a beginners only client, as Azureus contains a lot of advanced features, such as tracker-less file sharing and the ability to work as tracker as well. And if that is not enough for you, there is a bunch of plugins available for Azureus.
This of course is just a small overview of the general function of BitTorrent. There is much more to explore as you may have noticed while reading on (ALT, tracker-less file sharing, How to set up a tracker, how to create torrent files, etc.). But I guess with this information you can start and as soon as you are interested in more advanced topics, you will know how to help yourself (and quite honestly – nearly 2000 Words are enough)!.
A nice start into the world of BitTorrent file sharing you will find atLegalTorrents, another one is this List by Janko Roettgers. Also a lot of software distributors make use of BitTorrent, e.g. Gentoo torrent site.
Please be aware that there are a lot of other Torrent-Sites in the internet, but most of them are breaking copyright laws, and therefore are illegal. I do not want to encourage you in any ways to deal with such stuff ❗
So stay clean, and have fun 🙂