Friday, June 17, 2011

Architecture overview, scalability and future plans

A brief overview of the architecture is described in this blog post, two years ago. At the time GSS was mostly blueprints, skeleton server code and a prototype web client. Let me recap the key design guidelines of that time:
  1. stateless back-end
  2. clients communicate with the back-end via a REST API
  3. don't use an SSL/TLS transport layer for the client-server interaction
three decisions that were meant to maximize cost-effective scalability, i.e. serve more concurrent client requests for any given hardware. 

The stateless back-end (point 1) leads to improved performance and scalability compared to the costly cluster-replicated HTTP session, but it forces clients to maintain state on their own. Also, this approach required an alternative mechanism for identifying the user in each (stateless) request; the authentication token discussed here provided such a mechanism which lacks the rich functionality of the HTTP session but it is much more efficient and scalable. Further, having a single API for all clients (point 2) increases the expandability of the platform and pushes most of the front-end logic to the client, removing load off the servers and minimizing traffic. The GSS web, mobile and desktop client implementations provide quite efficient examples of this approach. The "no SSL" decision (point 3) was based on an over-estimation of the load SSL imposes to the server and the clients. We replaced the SSL encryption with a request header signing mechanism using the authentication token as the key and added additional mechanisms for protection against reply attacks. The two first points proved to be excellent design decisions in terms of performance and scalability, however, we eventually added SSL since modern hardware can easily handle huge numbers of concurrent SSL connections. Version 2 of the API will remove the signing mechanism (among other things) and slightly improve performance, but this is material for another post :)

The REST API and the stateless nature are still key decisions in the GSS architecture. The diagram below sketches the overall architecture and its core building blocks.


The GSS server business functions are implemented by stateless EJBs deployed in multiple JBoss application servers within a worker pool that is managed by a (redundant) apache web server. These business functions are exposed through the GSS REST API and focus on manipulating the resources of the file system abstractions and its semantics (see API description). The user interface is handled by the client application that lives in the user space as a rich javascript application in the web browser, as a desktop application or as a mobile application on the user phone or tablet. As discussed earlier in this post, we do not use HTTP sessions and no cluster configuration replicating it, so we can achieve higher efficiency in  horizontal scalability. It also makes live application updates (no downtime) much easier and efficient. This design takes care of scalability at the presentation and middle tier. As seen in the diagram, the back-end tier relies on two (logically) separate storage systems:
  • storage for meta-data
  • storage for the actual file content (file bodies)
The architecture is generic in order  to allow GSS to use different solutions for both back-end systems. The file body storage could be anything, like a SAN or another RESTful service (e.g. Amazon S3), which by design is being handled by GSS as an external scalable storage service. On the other hand, meta-data storage is currently based on an RDBMS (postgreSQL via an ORM/JPA layer). We use extensive mutli-layer caching and minimize as much as possible the load to the database, but this implementation has obvious scalability limits and can not match the massive scalability potential of the business logic tier. We have achieved good performance results in low end hardware configurations (three VMs on a single hardware machine with 8 GBs RAM and 6 cores handling more than 8.000 users) but the RDBMS approach will eventually hit a brick wall as user population grows. Below are various meta-data storage related alternatives that we have already investigated, work on or plan to test in the near future:
  • Add a distributed second level caching layer. We currently deploy Ehcache on every worker and have confirmed performance improvement. A distributed second level cache will provide larger performance gains, more efficient use of resource and will also allow write-trough caching. This is a scheduled improvement that will be added to GSS independently of the storage back-end that we will eventually use. We investigated various solutions and we are currently working with JBoss's Infinispan for implementing this cache. 
  • Use postgreSQL v9.x clustering / hot replication. An obvious improvement that can quickly enter production; have a single writer and multiple reader postgreSQL instances. We consider it a short term improvement.
  • Use data partitioning / sharding. It is still under investigation, but it is not a very fitting solution since our data model has a graph form and can not easily be re-shared into a tree structure.
  • Use a NoSQL data store, i.e. drop container managed transactions and handle everything in an eventual consistency framework that will offer massive scalability.
    • Towards that end we implemented a mongoDB-based prototype (gss2 branch in the source tree) as briefly described in this post. The result is promising but this version never reached production for various reasons. We plan to work further on this implementation and fully test it in a production environment. We are currently re-evaluating various design decisions, using morphia being the most prominent one. The main problem of using morphia is the inefficient handling of entities in the server side: we retrieve JSON objects from the object store, have morphia transform them to java POJOs and then change them back to JSON at the API layer just before sending them to the client.
    • Use Solr as a NoSQL back-end. Solr is a key component of GSS (provides the full text search functionality) and it supports proven horizontal scalability. We recently started moving data into its index beyond the scope of searching files and eventually using it as a fast DB cache. The performance gain results were phenomenal and it seems that using Solr/Lucene as a back-end object store is a viable approach that can replace other NoSQL data stores. Currently investigating it further.
    • Use a hybrid approach: move as much of our data model as possible into scalable NoSQL storage and keep in the RDBMS whatever data should live in a transactional, consistent environment.
Upgrading to a massively scalable meta-data storage is a top priority, but we also keep at the back of our minds eventually adding to GSS a more tightly-coupled scalable file storage solution. There are NoSQL stores that can also handle the file bodies and allow us to merge the two back-ends into one. This is also part of our future plans. If you have ideas that can help and especially if your open source project seems ideal for providing a solution, please don't hesitate to drops a line.



Wednesday, June 15, 2011

What is GSS?

Online file storage is a very hot buzzword; everybody does it or wants to do it. If you add the 'cloud' keyword to the description, the hotness goes way up. We describe the GSS project as the open source cloud file storage platform - can't get more hot than that - but it seems that this description is perceived in many different ways by different people.


So what exactly is GSS?

Let's start with what GSS is not. It is not a distributed file system and it is not a block level storage solution.

A single sentence definition could be: GSS is a distributed application that implements a file system abstraction with full text search and rich semantics, amd supports access via a REST API.

In more than one sentences, GSS:

  • is an application level layer that can scale to multiple servers (for handling fluctuating loads),
  • handles users with unique file name-spaces that contain folders and files, a trash bin, tags, versions, permissions for sharing to users & groups of users and a powerful full text search service,
  • stores meta-data in a back-end database (we run RDBMS in production installations and we are implementing NoSQL architectures for trully massive scalability - the source repo already contains a prototype mongoDB branch),
  • stores file bodies to third party file storage systems which are treated by GSS as a black box. For example (a) in the Pithos service installation file storage is based on a large scale, redundant, hardware based SAN which is accessed by the GSS workers as a mounted file system, (b) MyNetworkFolders on the other hand uses Amazon S3 via the AWS API.

I see, but how is it useful to me?

GSS can be useful in various scenarios:
  1. Out of the box, you can easily setup a service that offers an online "file manager" interface to groups of users (in your enterprise / organization, in your customer or across enterprises). Users can access the service via the GSS web client, android / iPhone client apps, webDAV, the Firefox plugin / XUL application, or your custom clients (via the REST API).
  2. You can use the GSS server as is, for setting up the server side platform for supporting the file storage & handling requirements of your custom mobile, desktop or web applications (via the REST API)
  3. You can extendn the GSS back-end or just one of the clients to build a new application that implements your own specific requirements.
We are keeping the GSS server generic in order to be able to use it as a core building block at least in the  three cases discussed above, although we plan to add new functionality, mostly related to file-based collaboration. Along these lines, our high level roadmap priorities are (priority decreases from top to bottom):
  • Improve performance, resource utilization efficiency and scalability. Plans to move to a NoSQL back-end and efficient use of distributed second-level-caching (e.g. infinispan) fall into this category.
  • Simplify installation and out of the box support in different environments. Currently the code base requires a Shibboleth infrastructure for user authentication. It is a priority to add LDAP-based and simple DB-based login mechanism and a user admin back-office application.
  • Implement v2 of the API aiming for simplicity better usability and performance.
  • Add features and new functionality.