Distributed Media Storage/Internship report/Evaluation
Somewhere in July, 2005 I had given myself the ambitious assignment to solve Wikimedia's image storage problems over the course of 4 months time. By designing and implementing a relatively simple but powerful and scalable distributed storage system I would eliminate the current scalability and manageability issues, and give Wikimedia something to build on for the future. Today, I must conclude that I have succeeded in these goals only partially.
Internship in overview
It all started nicely in the first week. I wrote a task description, made a time schedule, set up my workstation at Kennisnet, and went to the university library. I started reading about distributed algorithms, techniques and strategies.
The following week happened to be Wikimania, a Wikimedia event which happened to coincide nicely with my internship. I went there, met fellow Wikimedia developers / system administrators, and we had a great time. During the hacking days preceding the actual event, we had some discussion about image storage and possible models but it wasn't much.
Afterwards, it was time for some serious work. I gathered statistics and performance characteristics of the current setup. As discussed earlier with my mentor, I was going to explore a few different models and creating prototypes for them, to gain some extra insight and investigate feasibility. I wanted to jump straight to a nicely distributed, bottleneck free model, but while thinking about it for two weeks, I kept seeing pitfalls everywhere, and tried to attack the problem from many angles, with only moderate success.
Because of my very limited time set for implementing the prototypes, I really wanted to start working on some implementation and decided to try a relatively easy model first, which would not need much designing before hand. Implementing this would have the advantage of making some progress towards an end result and concurrently buying some extra time to think about the more complex model.
In the next few weeks, I implemented a reasonably complete first prototype built around the concept of a central SQL server. Progress was good, and implementing it was interesting. After about 3 weeks I had the prototype doing the basic features: reliably storing and fetching objects between multiple hosts. Its performance wasn't that bad for a prototype, and all in all this seemed like a viable option for the final system. The prototype also delivered some reusable code components for the next prototype, some of which were nicely designed, and some which needed more work.
It was time to move on with the second prototype, and this one would have to be rather different from the first. After thinking about it for another week, I decided to go with the concept of a distributed hashtable with dynamic bucket assignments, and having a dynamically chosen coordinator to do basic coordinating tasks and still offer quick failover. This simplified the problem somewhat, and I started implementing my ideas. Some synchronisation problems showed up in the process, but could usually be resolved using standard techniques. This prototype was very interesting and promising, but also very hard to get right. In the end I had the second prototype working at about the same level as the first one, partly reusing code but with much more sophisticated index handling.
In the meanwhile, time for continuing work on the prototypes had run out, even with only 2 prototypes instead of the initially planned 3. I was behind on schedule, over halfway in the total project time, and had already used some of the extra time. Quite clearly I had to start working on the final design and deliverables, most notably this report, which had been neglected somewhat during the prototype phase.
Unfortunately, progress in the following two weeks or so was not good. I had a hard time deciding which way to go for the final design, and there was a lot of external distraction in the process as well. Because time pressure was rising and there was no chance of extending the deadline, I had to make a decision for a design that I could actually make work in the remaining time. At this phase it was pretty evident that no time would remain for starting implementation of a final design, and even finishing the detailed final design would be tough. I was pretty much forced to take the safe route, and go with the model of the first prototype, even if that might not be optimal performance wise. The fact that there were still some unsolved or even unexpected pitfalls left in the distributed model supported this decision.
In the final weeks most of the internship report was written (which had been somewhat neglected in earlier phases), and the final design was made. Like everything during this internship, this again took quite a lot more time than expected, resulting in a design that is not as good, complete and thorough as I would like it to be.
One of the things that surprised me most is how hard it is to maintain full productivity when having to work on a single relatively small project in full time. Because most of my previous projects have been in a voluntary open source context and most working experience has been in very flexible environments regarding work time and deadlines, I wasn't really used to that. I guess that when one can freely choose when and how much to work on something, for example in spare time, productivity automatically is high, and it's hard to impossible to attain that same level in a nonvoluntary environment.
The fact that I have worked 75% of the time at home may also not always have helped there. While saving a lot of time of travelling and other overhead, working in one's home environment makes it harder to concentrate on work, because the distinction between work and free time is less clear. Although I should add that an office environment like Kennisnet's, while very pleasant and comforting to work in, can also be a source of distraction at times...
Internship of compromises
I think I can say that this internship project was full of compromises everywhere. Because of the perhaps a bit too ambitious goal set for this project - which admittedly was my own fault - and the very fixed deadline, compromises had to be made several times in order to make it in the available time. It quickly became obvious that (even starting) implementation of the final system would not be feasible within the internship, and the end goal was changed to making a final design. Focus was shifted multiple times - it turned out that there would be no time for really analyzing the prototypes' performance for example - and in the end the remaining available time started to affect (design) decisions more and more, often not in support of the final design's properties. The result is that no single delivery is as fully complete and thorough as I would have liked them to be.
All in all I think I can say that it was a great learning experience, though. Even if the current results aren't fully complete or immediately useful, a foundation has been made for further work. Much has been learned from the prototypes, also the one that didn't make it into the final design. This experience can be used for a final implementation - based on this or possibly a rather different design - and of course, in any of my future work in this area.
The practice of working with and in a non-profit organisation like Kennisnet has been very educative and fun as well. It was good to be in a very flexible environment with lots of friendly and enthusiastic people, and much attention for creating a nice atmosphere while still maintaining good productivity.
I would like to thank...
- Gerard Meijssen, for his initial idea for this internship related to Wikimedia, and helping realising it later
- Jan-Bart de Vreede, for making things happen within Kennisnet, and providing me with all the facilities I needed
- Arjan van Krimpen, for allocating time in his very busy schedule to guide me with useful insights and comments
- all the other people at Kennisnet that made me have a great time and helped me in any way
- Ad Aerts (mentor), for heading me into the right directions, and providing me with useful references
I started in mid-summer, today we have snow... What happened? :) -- Mark Bergsma