Ahmet Topcu CGL Report: November 2006

november 15 report

I finished bug fixes for semantic grid research software tool.
I started to write technical report for semantic research project.

bug's report

The bugs and their current status:
Administrative Tools/Admin menu change fixed
Search Tool/Microsoft Academic Live Search order problem fixed
Search Tool/Google Scholar Search order problem fixed
Search Tool/Google Scholar Advanced Search order problem fixed

Number of search result is set to 500 for Google Scholar search all searches for process time limitaion.We can have new approach for this issue.

Currently doing more test for those fixes to make sure that fixes doesn't cause any other bugs.

Ahmet Topcu's cgl reports

Semantic Research Grid General Report

Prepared by Ahmet Topcu

09/24/2006

Using Google Scholar, Microsoft Academic Live Search to Process Metadata 2

General Architecture 2

Query Handler 3

Parser (Document Handler) 3

XML Handler 3

Item Handler 4

How Interface Works 5

Google Scholar/Microsoft Academic Live Data Handler 5

Google Scholar Advanced Search Handler 5

MyResearch Database Search_ 5

My Research Database Search_ 6

My Research Database Advanced Search_ 6

Event Model 6

Dataset Model for Event Generation_ 8

Authorization Module_ 9

Basic Functionality_ 9

Super Admin_ 9

Group Admin_ 10

Figure 1: Google Scholar/Microsoft Academic Live Search Handler Service General Structure 2

Figure 2: Search Handler Service Structure 4

Figure 3:MyResearch Database Search Architecture 6

Figure 4: Event Model 6

Figure 5: Dataset Model 8

Using Google Scholar, Microsoft Academic Live Search to Process Metadata

Google scholar is providing a service make it easier to reach the paper and book content. You may not reach to the actual content sometimes. However, metadata information is available to clients.

Google Scholar doesn’t store the content information. Clients can search and find out the papers and books. The problem is raised when the clients want to store information and compare them with other similar content. There might be multiple copies of the same information on the web. Moreover, there might be different versions for the same content. How user can get the information really wanted and stored them in his/her repository.

General Architecture

Figure 1: Google Scholar/Microsoft Academic Live Search Handler Service General Structure

There are three types of client exists for this system: Google Scholar Search web client, Google Scholar Search Web Client, Advanced Search Web Client, and Microsoft Academic Live Search Web Client. The Google Scholar and Microsoft Academic Live client provides any text query for starting search operation. The Google Scholar Advanced Search Client can also provide more options to make a query. Author, Title, Date etc. parameters can be provided by client to make more refine search. All services starts with the same structure when client initiate the services:

Client provides search parameter(s)
Search Handler Service initiated by web service client in the jsp pages.
Search Handler Service gets the SOAP request message to build up search query format. The service calls the HTTP request for specific Web Content (Google Scholar or Microsoft Academic Live)
The search query post to the Web Page.
The HTTP results from Web Pages received using HTTP get method.
The received information processed in the Services to build metadata for searched query. (This step will be explained in the next section).
The search results send back to the client using HTTP/SOAP.
The client interface populates the results like a RSS item object.
Client can use this metadata information for her/his metadata collection.

There are obvious difference between Google Scholar and Microsoft Academic Live Search. The Google Scholar Search result metadata consists of the author, title, link, and number of cited of the document. Microsoft Academic Live provides reach metadata: Title, Journal Name, Authors, Publisher, DOI, publication year.

The Search Handler Service Structure has shown in the figure 2 below. There are four major sections for this service handler.

Query Handler

First one gets the query messages and prepares the final query. Final query depends on the service type. Google Scholar and Microsoft Academic Live have different final query. In this part, POST messages parameters are prepared and results send to the Service Web pages.

Parser (Document Handler)

The Parser service process the HTTP search response from Service Web pages. The metadata information embedded in this HTML object. The parses extract the metadata for each found item. The metadata tags constructed and send to the XML Handler.

XML Handler

XML handler process the metadata tags and generate XML objects for each document. Castor is used for generation the XML document. Castor provides Java to XML binding. Generated XML object sent to the Item Handler.

Item Handler

Item handler gets the generated metadata xml objects and combines them as a whole item object (collection of metadata). It passes metadata objects through the client using HTTP/SOAP.

Figure 2: Search Handler Service Structure

How Interface Works

Google Scholar/Microsoft Academic Live Data Handler

This part provides to make a search for any keyword. The keyword can be entered using search text field or it can be selected using authors and titles stored in the My Research Database to add them to search query. They can be add to query using query using "add to query" checkbox and operator selector or any information can be enter in the search query text box. There are two options for web site to search query: Google Scholar or Microsoft Academic Live. It should be choose from web site drop down menu.

The search results will be populated in the same web page. If you want to store the results in my research database, select desired items using the check the boxes. Then click on the insert button. If you want to have more information about the query results, click on the "more info" icon. You should see the detailed information on the right page. If you see the edit icon on after the insertion operation; it means that the selected item is already existed in the database. You can click on edit icon to modify/insert data into database.

Google Scholar Advanced Search Handler

This part same as the Google scholar advanced search fields. You can choose any search query parameter in this page for any text fields. If you want to store the results in my research database, select desired items using the check boxes. Then click on the insert button. If you want to have more information about the query results, click on the "more info" icon. You should see the detailed information on the right page. You should see the detailed information on the right page. If you see the edit icon on after the insertion operation, it means that item already exist in the database. You can edit to modify data to insert them to database.

MyResearch Database Search

There are two parts for MyResearch Database Search: Basic and Advanced Search. The architecture is the same for both parts. Advanced MyResearch Database search has more parameter to search and more refine search possible for this search.

Figure 3:MyResearch Database Search Architecture

My Research Database Search

This part searches the My Research Database using the author and title query. For author you can use "AND OR" operators to make a query. You may use the operator using authors full name in the double quotes (") to refine your search. If you enter both title and author, it gets the "AND" operation using the entered title and author.

My Research Database Advanced Search

This part provides users to make further refine search using publication name, data, url besides title and author search. For author you can use "AND OR" operators to make a query. You may use the operator using authors full name in the double quotes (") to refine your search. If you enter any other search parameters using different category, it gets the "AND" operation for parameters.

Event Model

Event model explained in the figure 4 shown below. Events are triggered for update/insert operation. There is multiple way of doing this operation:

Using Google Scholar/Microsoft Academic Live search and inserting document.
Using Bookmark Tools.
Updating the citation.
Using dataset event generation

Figure 4: Event Model

Event handler gets the event trigger action and prepares the event request. It pass request through the event generator. Event generator generates events using web form and generates event data. There are two types of event: Major Event and Minor Event. In the control section events selected based on the event type. Major Event or Minor Event Action sends results to the ResultSet. The ResultSet service gets the request and put it into the distributed event database.

Dataset Model for Event Generation

DataSet Mechanism Service initiated by web client using web service. Figure shown below shows the step of event generation for dataset module. These events are minor events (updates). The service called after the search result, and than selecting the desired citation. The service connects to the distributed storage to pull the related collection of events using citation index. For each citation index there might be multiple events. The service gets the collection events and passes them through the DataSet Handler. This handler is responsible for generating the new dataset. The handler can only use selected collection of events or new event can be generated and adds to the event pool. The new dataset generated and stored in the database. Only new event added to storage. Other events are already stored in the storage. So, only the index of events generated and stored in the event database if there is no dataset transfer or new event generation.

Figure 5: Dataset Model

Authorization Module

Semantic Research Grid Toolbox has capability to handle different types of citation metadata and store them in the distributed databases. However, the system should have control mechanism for allowing multiple users and protect their metadata resources from other users. So authorization module was defined and implemented for data protection and sharing documents in a protected environment. Also, system provides a control for both group and private data. For example, you can use own group for important document which you don’t want to share them with other groups or users.

Each metadata record has read, write and delete permission. User/group and other model: One owner, multiple group access, multiple other (users) access. Owner of the file controls the contents.

Basic Functionality

-giving permission to groups or other users

-modifying owner/groups/others access write

-model for read, write, delete privilege

-owner and super admin can delete the contents

-there are two data types: private and public.

Private data can be control by owner of the content. Public data at least should be assigned to the one group. The content assigned group can be accessible by user who has access to the specific group. There are three types of user: User, group admin, and super admin. Each group has at least one admin. Users can make a request to join the group.

Group administrator confirm/deny the request, moreover group administrators can assign users to the controlled group. Group administrators are assigned by system admin who controls the whole content repository.

1-Users can make a request a specific group

2-Users can be part of other users to access the other content

3-Users can be discarded from the group.

4-One user can be part of one or more groups.

5-Users can have private (own) data

6-Users can give permission to groups or other users for their own content.

7-If user initiates any contents creation on the repository; he/she will be owner of the content.

8-User can modify access right for specific groups and users for selected collections.

Super Admin

Super admin can use this features. Super admin can assign any user to an admin for any group defined in the system. Super admin can remove any admin user from specified group from the system.

1-Super Administrator can create a new group.

2-Super Administrator can delete any content.

3-Super Administrator assign group administrator from user.

Group Admin

Group admin approves/denies any user request for specific group. Current group admin list also be displayed on this section. Group admin can use these features in the Manage Admin section.

1-Group Administrator can assign users to group

2-Group Administrator can remove users from the group

3-Group Administrator can modify the group access rights.

4-There might be more than one administrator for each group.

5-Private content doesn’t controlled by the group administrator. Only users can control their contents.

Ahmet Topcu CGL Report

Wednesday, November 15, 2006