Connecting over 25 millions NRIs worldwide
Most trusted Name in the NRI media
NRI PEOPLE- OUR NETWORK
 

 

NRI scientist developed a search engine, ranking algorithm tables
by title, document content and text reference

 

NRI scientist, Prasenjit Mitra. Assistant Professor, School of Information Sciences and Technology,.The Pennsylvania State University has developed a search engine, which identifies and extracts tables from PDF documents including indexes and ranks the search results using factors that include the table's title, text references to the table and the date of its publication.

ChemXSeer: In this project, he was involved in constructing an integrated database and digital library for chemical kinetics data. He said:

  • We have developed a chemical name and formula search engine. We are investigating novel information extraction, document segmentation, and indexing schemes.
  • We have also developed a table search engine, TableSeer that uses a novel ranking function TableRank to rank tables extracted automatically from digital documents.
  • Other topics of interest are web crawling (especially focused crawling), query expansion, and analysis of blogs and social networks.
  • In tests with documents from the Royal Society of Chemistry, TableSeer correctly identified and retrieved 93.5 percent of tables created in text-based formats

In a search of 10,000 documents from conferences, Prof. Mitra and his team found that more than 70 percent of papers in chemistry, biology and computer science included tables. Furthermore, most of those documents had multiple tables.

TableSeer automates that process and captures data not only within the table but also in tables' titles and footnotes. In addition, it enables column-name-based search so that a user can search for a particular column in a table

TableSeer can be tested online at http://chemxseer.ist.psu.edu, but the source code will be made available near the completion of the project.

The study, "TableSeer: Automatic Table Metadata Extraction and Searching in Digital Libraries," was presented at the recent 16th International World Wide Web Conference in Alberta, Canada.

 

 

 

 

 

 

 

 

Prasenjit Mitra

Assistant Professor, School of Information Sciences and Technology,.The Pennsylvania State University

  • Prasenjit Mitra received his Doctor of Philosophy degree in Electrical Engineering at Stanford University in 2004.
  • Prior to that, he had received a Master of Science degree in Computer Science at The University of Texas at Austin in December,1994.
  • His Bachelor of Technology (with Honours) degree in Computer Science and Engineering was from the Indian Institute of Technology, Kharagpur in May, 1993.
  • From 1995, he worked for five years at Oracle Corporation in Redwood Shores, CA as a senior member of the technical staff at the Server Technologies division developing database software.
  • He also worked part-time as a senior engineer at Narus, and DBWizards

His reseach interests are on social text stream analysis and XML views:

Database Systems, Digital Libraries, Data Security, Semantic Web, Information Retrieval, Artificial Intelligence.