Skip to main content.

Introduction

Many collaborative tagging systems like del.icio.us, Flickr, CiteULike , and Technorati continuously increase the quantity of human generated annotations in the form of tags, but unfortunately those annotations remain encapsulated in individual systems and rarely become used externally. Many systems (e.g., CiteULike) do not even provide a public API that would enable other applications to use the collected tagging metadata. Other systems (e.g., Technorati) provide limited APIs that make it hard or even impossible to connect tags with other elements that form the tagging context (i.e., users, resources and sources of origin). Those systems that do provide a usable API (like del.icio.us and Flickr) make no effort to integrate with or even make use of other available APIs. Since the integration of tags from various systems opens possibilities for enhanced usage of tagging metadata, we have set as our goal to develop an approach for integration of tags from different tagging systems. In order to make our approach widely adopted by today’s tagging systems, we have formulated different strategies for attracting these systems and making them adopt this approach and actually begin to integrate.

The other problem we aim to address relates to the fact that currently popular tagging systems support only contributions of humans and make limited or no effort to integrate automatic annotators (e.g., Yahoo Term Extraction Service) or open the possibility for the potential involvement of intelligent agents capable of annotation.

TagFusion system

TagFusion is a system which captures, integrates and provides access to the tagging context data from various Web sites and applications. It is based on a common repository of tagging context data and an extensible set of services leveraging the tagging data to provide different kinds of functionalities. These services are organized in a layered architecture.

Before we go into specifics of the system, it is necessary to define its target users. Since we want to integrate annotating efforts of both humans and autonomous annotators (e.g., intelligent agents) our target users are all applications that enable annotation of digital resources regardless of the nature of entities that perform the annotation activity. We refer to those applications as user systems . This implies that humans are not meant to use our system directly, but through Social Web (i.e., Web 2.0) sites, Web applications, etc. (all being considered user systems in our terminology).

Multi-level architecture

Having defined our target users as a higher level abstraction which includes in fact many different categories of users, we had to model a very flexible system that could support such heterogeneity of needs it has to satisfy. We approached this problem by introducing a mult-level architecture of services, where higher level services rely on more basic level services.

The basic level service, called TagFusion Core Service, represents the core component of this multi-level architecture. It provides support for common data manipulation operations over the repository of tagging context data. In particular, it allows for storage and retrieval of tags, together with information about the resources that were annotated with those tags, the users who used those tags for annotation, as well as the information about the source system(s) from which tags originate.

Higher levels of the system architecture consist of services responsible for providing advanced functionalities to end users or other services, and those advanced functionalities are meant to motivate user systems to aggregate the critical mass of annotations that would transform the TagFusion system into a highly useful repository of tagging metadata for all user systems.

It is necessary to point out that higher level services can also be considered user systems from the perspective of the basic TagFusion Core Service. Thus a user system can acctually be used by another user system.

Modalities of use and attracting users

Websites send metadata to the TagFusion system
The first scenario assumes that (Social) Web sites send tagging metadata related to their content to the TagFusion system, motivated by the desire to promote their content to potentially interested parties. Social Web sites that collect tagging metadata about their content would have a clear interest to export that metadata to the TagFusion system thus advertising their content and making it more visible for potential users (i.e., other users of TagFusion). For example, a blogging site can share tagging metadata related to its posts and articles and subsequently profit from better visibility of its content.

 
Harvesting data from Social Web sites
Some Social Web sites provide interfaces (APIs) for other applications to retrieve their accumulated metadata and reuse it. With regard to our multi-level architecture, we relie on the first level services to harvest annotations from available sources and import them into the TagFusion system. In brief, the motivation for using the TagFusion system in this scenario originates from a user's desire to get an integrated view of his/her annotations from various Social Web sites, e.g. in the form of a tag cloud. While providing this funcionality, the first-level service would use the opportunity to harvest metadata form Social Web sites and make it reusable through TagFusion.
Leveraging automatic annotators
We have already mentioned the possibility of integrating metadata generated by automatic annotators and keyword extractors in the TagFusion system. This possibility is enabled by making difference between human users and automatic ones, as well as keeping track of authors for all annotations in the database. Having enabled this, it is easy to call the available third party services to annotate resources and integrate their annotations with others available in the TagFusion database, thus creating a more usable repository of diverse metadata. We also hope that this repository could become useful as a base for future automatic annotators that could use it to generate new metadata and contribute it back to the system.

Future work

Future work will focus on building more first level services that could further explore the possibilities to use the accumulated metadata in different ways, possibly involving some trust mechanisms relying on users' connections. The results of FOAFRealm project could show to be helpful for representing and making use of users' connections. We also plan to create an interface that would enable SPARQL queries over the collected annotations.

Feedback

Please feel free to leave your comments and suggestions on my blog post, or send them to my email. Thank you.