So what exactly is GSS?
Let's start with what GSS is not. It is not a distributed file system and it is not a block level storage solution.
A single sentence definition could be: GSS is a distributed application that implements a file system abstraction with full text search and rich semantics, amd supports access via a REST API.
In more than one sentences, GSS:
- is an application level layer that can scale to multiple servers (for handling fluctuating loads),
- handles users with unique file name-spaces that contain folders and files, a trash bin, tags, versions, permissions for sharing to users & groups of users and a powerful full text search service,
- stores meta-data in a back-end database (we run RDBMS in production installations and we are implementing NoSQL architectures for trully massive scalability - the source repo already contains a prototype mongoDB branch),
- stores file bodies to third party file storage systems which are treated by GSS as a black box. For example (a) in the Pithos service installation file storage is based on a large scale, redundant, hardware based SAN which is accessed by the GSS workers as a mounted file system, (b) MyNetworkFolders on the other hand uses Amazon S3 via the AWS API.
I see, but how is it useful to me?
GSS can be useful in various scenarios:
- Out of the box, you can easily setup a service that offers an online "file manager" interface to groups of users (in your enterprise / organization, in your customer or across enterprises). Users can access the service via the GSS web client, android / iPhone client apps, webDAV, the Firefox plugin / XUL application, or your custom clients (via the REST API).
- You can use the GSS server as is, for setting up the server side platform for supporting the file storage & handling requirements of your custom mobile, desktop or web applications (via the REST API)
- You can extendn the GSS back-end or just one of the clients to build a new application that implements your own specific requirements.
We are keeping the GSS server generic in order to be able to use it as a core building block at least in the three cases discussed above, although we plan to add new functionality, mostly related to file-based collaboration. Along these lines, our high level roadmap priorities are (priority decreases from top to bottom):
- Improve performance, resource utilization efficiency and scalability. Plans to move to a NoSQL back-end and efficient use of distributed second-level-caching (e.g. infinispan) fall into this category.
- Simplify installation and out of the box support in different environments. Currently the code base requires a Shibboleth infrastructure for user authentication. It is a priority to add LDAP-based and simple DB-based login mechanism and a user admin back-office application.
- Implement v2 of the API aiming for simplicity better usability and performance.
- Add features and new functionality.
No comments:
Post a Comment