YongHong Tian
Joint Research & Development Lab for Advanced Computer and Communication Technologies
Institute of Computing Technology
Chinese Academy
of Sciences
mailto:yhtian@jdl.ac.cn


 

 

eye

Research

 

 

 

 

Research Interests

 

Context-Based Statistical Relational Learning

The vast majority of work in statistical machine learning methods has focused on “flat” data – data consisting of identically-structured entities, typically assumed to be independent and identically distributed (IID). However, many real-world datasets are innately relational: hypertext, web pages or sites, web images, scientific papers, e-books, educational resources and more. Such semi-structured relational data consist of entities of different types, where each entity is characterized by a different set of attributes and generally has complex internal structure. Entities are related to each other via different types of relations. The relational structure is an important source of semantic information, which is often ignored by the traditional statistical learning methods. Thus we focus mainly on how to explicitly exploit such relational information in statistical learning tasks so as to build more effective and more robust models.

The main methodology used in my research stems from the context-based modeling and analysis. Here the context is defined as a collection of relevant objects and surrounding influences that make the semantics of an object unique and comprehensible. Accordingly, the contextual dependency can be regarded as a special relationship among related objects that conveys explicit semantic correlation. Our studies include the following aspects:

Firstly, we extend the dependency network model (DN) to the relational domain, and proposes a contextual dependency network model (CDN). Links among objects contain rich semantics that can be very helpful in classifying the objects. However, many irrelevant links can be found in real-world link data such as Web pages. Often, these noisy and irrelevant links do not provide useful and predictive information for categorization. It is thus important to automatically identify which links are most relevant for categorization. Towards this end, we present a CDN model for categorization in the presence of noisy and irrelevant links. The CDN model makes use of a dependency function that characterizes the contextual dependencies among objects and attempts to differentiate the impacts of the related objects on the classification. Using this model, it is possible to identify a context for a given object as its most relevant neighbors in a link graph, with which the semantic meaning of that object can be determined. We show how to learn the CDN model effectively, and how to use the Gibbs inference framework over the learned model for collective classification of multiple linked objects.

Secondly, we propose the linkage semantic kernels to capture the latent semantic relations among linked objects that are induced by the local and global structure of the link graph. Specifically, we assume that higher order correlation between indirectly connected objects can affect their semantic relations as a diffusion process on the link graph, and then proposes a semantic diffusion kernel. Moreover, the eigen-decomposition is directly performed in the kernel-induced space so as to obtain the kernels corresponding to the latent semantic space. Based on the linkage semantic kernels, we also present a kernelized contextual dependency network model (KCDN) to exploit the dependencies in a network of objects for collective classification, and describes a relevant page finding algorithm, KernelRank. For the computational efficiency on large datasets, we also develop a block-based algorithm, called BlockKernel, for LLSK kernels by exploiting the block structure of link data.

Thirdly, we propose the influence model of online social networks and its incremental learning algorithm. In this model, the sequential states of each actor and their corresponding observable behaviors can be modeled as a Hidden Markov Model (HMM), and the dynamical inter-influence relationship among them can be characterized with the Influence Model. To incrementally learn the model from time-series interaction data, a gradient-based algorithm is also induced. The influence model of online social networks can be explored in a wide variety of application domains, such as collaborative information filtering and recommendation, collective decision-making, viral marketing plan, and so on.

Fourthly, based on the well-known support vector machines (SVMs) and the linkage semantic kernels, we propose a new collective classification model called relational support vector classifier (RSVC). More details about the RSVCs will be published in an ongoing paper.

Notes: This research has been supported by NSFC from 2007 to 2009.

Cross-media Semantic Analysis and Multimedia Retrieval

Although content-based image retrieval (CBIR) techniques based on low-level features such as color, texture, and shape have been widely explored, their effectiveness and efficiency are not satisfactory. The ultimate goal of image retrieval is to provide the users with the facility to manage large image databases in an automatic, flexible and efficient way. Therefore, image retrieval systems should be armed to support high-level (semantics-based) querying and browsing of images. Combining the visual and text features can allow improved performance in conducting content-based search. For a given image, we model the relevant textual information as its multi-modal context, and regard the related images connected by hyperlinks as its link context. Two kinds of context analysis models, i.e., cross-modal correlation analysis and link-based correlation model, are used to capture the correlation among different modals of features and the topical dependency among images that is induced by the link structure.

Currently, we implement a context-based web image classification system, ConWic. The snapshot of the ConWic system is as follows:

 

Web Mining and Semantic Web

The web has been turned into one of the most important information sources and knowledge bases for scientific, educational and research purposes. For the Web to reach its full potential, we must improve its services, make it more comprehensible, and increase its usability. Data mining technique play an increasingly important role in meeting the challenges of developing the intelligent web.

This study aims to propose a new multiscale representation model of web sites and investigate the corresponding web site mining algorithms, including classification, denoising and sampling. Several context models are proposed to exploit all correlative semantic clues for site categorization and denoising. The ultimate goal is to develop an intelligent topic-specific web resource analysis tool, iExpert, for Chinese Science Digital Library Project.

 

Data Mining and Decision Support Technology

In the M.S. period, I introduced Decision Support System (DSS) over data warehouse into the mobile computing environments and proposed the Mobile Decision Support System (MDSS) architecture. The user access agent and query dispatching mechanism were implemented for mobile decision-making tasks. The system had found its applications in the custom analysis and services of telecom companies.

 

 

Project Experience

 

1. The China-US Million Book Digital Library Project Supported by U.S. and Ministry of Science & Technology of P.R. China (01/2001- ) (Research on Knowledge-Based Services): Assistant Researcher and Team Leader. For more about this project, see http://www.ulib.org.cn.

2. The Key Technologies and Demonstration Projects on Network Education, supported by The Chinese tenth “Five-year” Plan; Network Education Project for Graduate School of Chinese Academy of Sciences (09/2002-01/2005) (Research on educational resource management technologies): Assistant Researcher and Team Leader

3. The 4C Convergence-Oriented Digital Media Processing and Retrieval System, A key project supported by “Knowledge Innovation Initiative” of Chinese Academy of Sciences (10/2000-06/2003) (Research on text-image mining & retrieval): Assistant Researcher

4. The Intelligent Information Service Project for Chinese Science Digital Library (01/2002- 01/2003) (Research on Semantic-Based Information and Resource Mining Technology): Chief Researcher and Team Leader

5. Context-Driven Semantic Extraction & Retrieval for Sports Video (CSVR), supported by NEC China Lab (12/2003-5/2004) (Research on context-based multimedia semantic extraction techniques): Team leader and Major researcher

6. Network Education Project for Graduate School of Chinese Academy of Sciences (09/2000- )(Research on Student Modeling, Virtual Campus and Educational Programs Personalizing) : Assistant Researcher

7. Telecom Management System for Sichuan Province (Big 97 Engineering) (04/1998-07/1998) (Sybase 10 + PowerBuilder + IBM AIX): Kernel programmer

8. New Telecom Billing & Accounting Software System for Sichuan Province (08/1998- 08/1999) (Oracle 7 + Visual C +IBM AIX) (Research and Development of Data Mining & Decision Support Components): Kernel Programmer & Database System Analyst

9. The other Projects ever participated in: National Data Exchange Engineering Center Project for Ministry of Construction; Multimedia Data Broadcast Project for Graduate School of Chinese Academy of Sciences

 

 

 

 

Patent & Software Copyright

 

Patent: A Context-Based Approach for Semantic Extraction of Semi-structured Relational Data. YongHong Tian, TieJun Huang, Wen Gao. Process No. 200410086746.5. In process.

Patent: An Educational Resource Mata-Data Management Method and its Implementation.  TieJun Huang, YongHong Tian, PingBo Kang, etc. Process No. 200410086745.0. In process.

Software Copyright: A Knowledge-Based Intelligent Web Resource Analysis Tool. YongHong Tian, TieJun Huang, etc. No. 2003SR5216. The snapshot of this tool, named IExpert, is as follows:

Software Copyright: iMedia-NERMS: Network-Based Educational Resource Management System, YongHong Tian, TieJun Huang, etc. No. 2004SR00799. The snapshot of the iMedia-NERMS is as follows:

For more research information please see my publications, visit the JDL Lab site or contact me.