YongHong Tian
Joint
Research & Development Lab for Advanced Computer and Communication
Technologies
Institute of
mailto:yhtian@jdl.ac.cn
|
|
|
|
|
Research |
|
|
|
|
|
Research Interests |
||||||
|
Context-Based Statistical Relational Learning The vast majority of work in
statistical machine learning methods has focused on “flat” data
– data consisting of identically-structured entities, typically assumed
to be independent and identically distributed (IID). However, many real-world
datasets are innately relational: hypertext, web pages or sites, web images,
scientific papers, e-books, educational resources and more. Such
semi-structured relational data consist of entities of different types, where
each entity is characterized by a different set of attributes and generally
has complex internal structure. Entities are related to each other via
different types of relations. The relational structure is an important source
of semantic information, which is often ignored by the traditional
statistical learning methods. Thus we focus mainly on how to explicitly
exploit such relational information in statistical learning tasks so as to
build more effective and more robust models. The main methodology used in my
research stems from the context-based modeling and analysis. Here the context
is defined as a collection of relevant objects and surrounding influences
that make the semantics of an object unique and comprehensible. Accordingly,
the contextual dependency can be regarded as a special relationship among
related objects that conveys explicit semantic correlation. Our studies
include the following aspects: Firstly, we extend the dependency
network model (DN) to the relational domain, and proposes a contextual dependency network model
(CDN). Links among objects contain rich semantics that can be very helpful in
classifying the objects. However, many irrelevant links can be found in
real-world link data such as Web pages. Often, these noisy and irrelevant
links do not provide useful and predictive information for categorization. It
is thus important to automatically identify which links are most relevant for
categorization. Towards this end, we present a CDN model for categorization
in the presence of noisy and irrelevant links. The CDN model makes use of a
dependency function that characterizes the contextual dependencies among
objects and attempts to differentiate the impacts of the related objects on
the classification. Using this model, it is possible to identify a context
for a given object as its most relevant neighbors in a link graph, with which
the semantic meaning of that object can be determined. We show how to learn
the CDN model effectively, and how to use the Gibbs inference framework over
the learned model for collective classification of multiple linked objects. Secondly, we propose the linkage semantic kernels to capture
the latent semantic relations among linked objects that are induced by the
local and global structure of the link graph. Specifically, we assume that
higher order correlation between indirectly connected objects can affect
their semantic relations as a diffusion process on the link graph, and then
proposes a semantic diffusion kernel. Moreover, the eigen-decomposition is
directly performed in the kernel-induced space so as to obtain the kernels
corresponding to the latent semantic space. Based on the linkage semantic
kernels, we also present a kernelized
contextual dependency network model (KCDN) to exploit the dependencies in
a network of objects for collective classification, and describes a relevant
page finding algorithm, KernelRank.
For the computational efficiency on large datasets, we also develop a
block-based algorithm, called BlockKernel,
for LLSK kernels by exploiting the block structure of link data. Thirdly, we propose the influence
model of online social networks and its incremental learning algorithm. In
this model, the sequential states of each actor and their corresponding observable
behaviors can be modeled as a Hidden Markov Model (HMM), and the dynamical
inter-influence relationship among them can be characterized with the
Influence Model. To incrementally learn the model from time-series
interaction data, a gradient-based algorithm is also induced. The influence
model of online social networks can be explored in a wide variety of
application domains, such as collaborative information filtering and
recommendation, collective decision-making, viral marketing plan, and so on. Fourthly, based on the well-known
support vector machines (SVMs) and the linkage semantic kernels, we propose a
new collective classification model called relational support vector
classifier (RSVC). More details about the RSVCs will be published in an ongoing
paper. Notes: This research has been supported by NSFC from 2007 to 2009. Cross-media Semantic Analysis and Multimedia Retrieval Although content-based image
retrieval (CBIR) techniques based on low-level features such as color,
texture, and shape have been widely explored, their effectiveness and
efficiency are not satisfactory. The ultimate goal of image retrieval is to
provide the users with the facility to manage large image databases in an
automatic, flexible and efficient way. Therefore, image retrieval systems
should be armed to support high-level (semantics-based) querying and browsing
of images. Combining the visual and text features can allow improved
performance in conducting content-based search. For a given image, we model
the relevant textual information as its multi-modal context, and regard the
related images connected by hyperlinks as its link context. Two kinds of
context analysis models, i.e., cross-modal correlation analysis and
link-based correlation model, are used to capture the correlation among
different modals of features and the topical dependency among images that is
induced by the link structure. Currently, we implement a
context-based web image classification system, ConWic. The snapshot of the
ConWic system is as follows:
Web Mining and Semantic Web The web has been turned into one
of the most important information sources and knowledge bases for scientific,
educational and research purposes. For the Web to reach its full potential, we
must improve its services, make it more comprehensible, and increase its
usability. Data mining technique play an increasingly important role in
meeting the challenges of developing the intelligent web. This study aims to propose a new
multiscale representation model of web sites and investigate the
corresponding web site mining algorithms, including classification, denoising
and sampling. Several context models are proposed to exploit all correlative
semantic clues for site categorization and denoising. The ultimate goal is to
develop an intelligent topic-specific web resource analysis tool, iExpert,
for Chinese Science Digital Library Project. Data Mining and Decision Support Technology In the M.S. period, I
introduced Decision Support System (DSS) over data warehouse into the mobile
computing environments and proposed the Mobile Decision Support System (MDSS)
architecture. The user access agent and query dispatching mechanism were
implemented for mobile decision-making tasks. The system had found its applications
in the custom analysis and services of telecom companies. |
||||||
|
||||||
|
For more research information please see my publications, visit the JDL Lab site or contact me. |
||||||
|
|
|
|
||||
|
|
|
|||||