Our groups research focuses on human identification and activity recognition. We are also interested in developing large-scale machine learning methods in order to process large-scale data.



[Person Re-identification] [Action & Activity Recognition] [Face Recognition] [Large-scale Machine Learning]


Person Re-identification

[Go back to the Top]


The person re-identification is to match the same persons images captured at different space and different time across non-overlapping camera views in a visual surveillance system.


Relative Distance Comparison: At an early stage, we first proposed a relative distance comparison model which is a soft discriminant model in order to alleviate the over-fitting problem due to the large variations of intra-class appearance across non-overlapping camera views.

Wei-Shi Zheng, S. Gong, and T. Xiang, "Person Re-identification by Probabilistic Relative Distance Comparison", IEEE Conf. on Compuer Vision and Pattern Recognition, 2011. [PDF] 

Wei-Shi Zheng, Shaogang Gong, and Tao Xiang, "Re-identification by Relative Distance Comparison," IEEE Trans. on Pattern Analysis and Machine Intelligence,  2013.[PDF] 


Open-world RE-ID: Based on the relative comparison model, we further generalized its ability on processing the open-world person re-identification. In this work, we assume only a short list (i.e. watch list) of people is our concerns for tracking in a camera networks, while the others are actually imposters to the re-id system. We model this by developing a transfer local relative distance comparison, and our model can utilize source dataset to assist the re-id on a limited target data from the people on the watch list.

Wei-Shi Zheng, Shaogang Gong, and Tao Xiang. Towards Open-World Person Re-Identification by One-Shot Group-based Verification. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), DOI: 10.1109/TPAMI.2015.2453984, 2015.

Asymmetric Re-ID: The challenge of person re-identification (re-id) is to match individual images of the same person captured by different non-overlapping camera views against significant and unknown cross-view feature distortion.  While a large number of distance metric/subspace learning models have been developed for re-id, the cross-view transformations they learned are view-generic and thus potentially less effective in quantifying the feature distortion inherent to each camera view. Learning view-specific feature transformations for re-id (i.e., view-specific re-id), an under-studied approach, becomes an alternative resort for this problem. In TCSVT, we presented an asymmetric distance modeling on different views. In TPAMI 2017, we formulate a novel view-specific person re-identification framework from the feature augmentation point of view, called Camera coRrelation Aware Feature augmenTation (CRAFT). We perform cross-view adaptation by automatically measuring camera correlation from cross-view visual data distribution and adaptively conducting feature augmentation to transform the original features into a new adaptive space. Through our augmentation framework, view-generic learning algorithms can be readily generalized to learn and optimize view-specific sub-models whilst simultaneously modelling view-generic discrimination information.


Ying-Cong Chen (student), Xiatian Zhu, Wei-Shi Zheng*, and Jian-Huang Lai. Person Re-Identification by Camera Correlation Aware Feature Augmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2017. (DOI: 10.1109/TPAMI.2017.2666805) [Project Page (including codes)]

Ying-Cong Chen (student), Wei-Shi Zheng*, Jian-Huang Lai, Pong C. Yuen. An Asymmetric Distance Model for Cross-view Feature Mapping in Person Re-identification. IEEE Transactions on Circuits and Systems for Video Technology, 2016.[code]



Partial RE-ID: Recently, we also consider more challenging re-id problems, including the partial re-id problem and the low resolution problem. Especially, the partial re-id addressed the partial observation of a person in a real-world crowded scenario.

Wei-Shi Zheng, Xiang Li (student), Tao Xiang, Shengcai Liao, JianHuang Lai, Shaogang Gong. Partial Person Re-identification. IEEE Conf. on Computer Vision (ICCV), 2015 (oral). 

Low resolution RE-ID: Recently, we also consider more challenging re-id problems, including the partial re-id problem and the low resolution problem. Especially, the partial re-id addressed the partial observation of a person in a real-world crowded scenario.

Xiang Li (student), Wei-Shi Zheng*, Xiaojuan Wang (undergraduate student), Tao Xiang, Shaogang Gong. Multi-scale Learning for Low-resolution Person Re-identification. IEEE Conf. on Computer Vision (ICCV), 2015.


Depth RE-ID: When people appeared in extreme illumination or changed clothes, the RGB appearance-based re-id methods tended to fail. To overcome this problem, we propose to exploit depth information to provide more invariant body shape and skeleton information regardless of illumination and color change. More specifically, we exploit depth voxel covariance descriptor and further propose a locally rotation invariant depth shape descriptor called Eigen-depth feature to describe pedestrian body shape. We prove that the distance between any two covariance matrices on the Riemannian manifold is equivalent to the Euclidean distance between the corresponding Eigen-depth features. Furthermore, we propose a kernelized implicit feature transfer scheme to estimate Eigen-depth feature implicitly from RGB image when depth information is not available. We find that combining the estimated depth features with RGB-based appearance features can sometimes help to better reduce visual ambiguities of appearance features caused by illumination and similar clothes.

depth reidtransfer depth

Ancong Wu (student), Wei-Shi Zheng*, Jian-Huang Lai. Robust Depth-based Person Re-identification. IEEE Transactions on Image Processing, 2017 (DOI: 10.1109/TIP.2017.2675201).

Cross-Scenario RE-ID: In order to obtain a reliable similarity measurement between images, manually annotating a large amount of pairwise cross-camera-view person images is deemed necessary. However, such a kind of annotation is both costly and impractical for efficiently deploying a re-identification system to a completely new scenario, a new setting of non-overlapping camera views between which person images are to be matched. To solve this problem, we consider utilizing other existing person images captured in other scenarios to help the re-identification system in a target (new) scenario, provided that a few samples are captured under the new scenario. More specifically, we tackle this problem by jointly learning the similarity measurements for re-identification in different scenarios in an asymmetric way. To model the joint learning, we consider that the re-identification models share certain component across tasks. A distinct consideration in our multi-task modeling is to extract the discriminant shared component that reduces the cross-task data overlap in the shared latent space during the joint learning, so as to enhance the target inter-class separation in the shared latent space.

cross scenario reid

Xiaojuan Wang (undergraduate student), Wei-Shi Zheng*, Xiang Li (student), and Jianguo Zhang. Cross-scenario Transfer Person Re-identification. IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 8, pp. 1447-1460, 2016..[Supplementary]

Deep RE-ID: In WACV, we proposed a deep fusion neural networks in order to make deep neural networks learning complementary features to the hand-crafted features.


In TPAMI 2017, we present a domain-generic deep person appearance representation which is designed particularly to be towards view invariant for facilitating cross-view adaptation.

HIPHOP features

Shangxuan Wu(undergraduate student), Ying-Cong Chen(student), Xiang Li(student), An-Cong Wu(student), Jin-Jie You(student), and Wei-Shi Zheng*. An Enhanced Deep Feature Representation for Person Re-identification. WACV 2016.

Ying-Cong Chen (student), Xiatian Zhu, Wei-Shi Zheng*, and Jian-Huang Lai. Person Re-Identification by Camera Correlation Aware Feature Augmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2017. (DOI: 10.1109/TPAMI.2017.2666805) [Project Page (including codes)]


Video RE-ID: Most existing person re-identification (re-id) models focus on matching still person images across disjoint camera views. Since only limited information can be exploited
from still images, it is hard (if not impossible) to overcome the occlusion, pose and camera-view change, and lighting variation problems. In comparison, video-based re-id methods can utilize extra space-time information, which contains much more rich cues for matching to overcome the mentioned problems. However, we find that when using video-based representation, some inter-class difference can be much more obscure than the one when using still-image based representation, because different people could not only have similar appearance but also have similar motions and actions which are hard to align. To solve this problem, we propose a top-push distance learning model (TDL), in which we integrate a top-push constrain for matching video features of persons. The top-push constraint enforces the optimization on top-rank matching in re-id, so as to make the matching model more effective towards selecting more discriminative features to distinguish different persons.

video reid

Jinjie You (student), Ancong Wu (student), Xiang Li (student), Wei-Shi Zheng*. Top-push Video-based Person Re-identification. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016.[code]


People Detection via Context: We have ever developed transfer context learning for object and human detection. Context is critical for minimising ambiguity in object detection. In this work, a novel context modelling framework is proposed without the need of any prior scene segmentation or context annotation. This is achieved by exploring a new polar geometric histogram descriptor for context representation. In order to quantify context, we formulate a new context risk function and a maximum margin context (MMC) model to solve the minimization problem of the risk function. Crucially, the usefulness and goodness of contextual information is evaluated directly and explicitly through a discriminant context inference method and a context confidence function, so that only reliable contextual information that is relevant to object detection is utilised.

Wei-Shi Zheng, S. Gong, and T. Xiang, "," ICCV 2009. []
Wei-Shi Zheng, Shaogang Gong, and Tao Xiang, "Quantifying and Transferring Contextual Information in Object Detection," accepted by IEEE Trans. on Pattern Analysis and Machine Intelligence, 2011.


Interestingly, we have also explored the group context for assisting person re-identification. In a crowded public space, people often walk in groups, either with people they know or strangers. Associating a group of people over space and time can assist understanding individual's behaviours as it provides vital visual context for matching individuals within the group. Seemingly an `easier' task compared with person matching, this problem is in fact very challenging because a group of people can be highly non-rigid with changing relative position of people within the group and severe self-occlusions. For the first time, the problem of matching/associating groups of people over large space and time captured in multiple non-overlapping camera views is addressed by us. Specifically, a novel people group representation and a group matching algorithm are proposed. The former addresses changes in the relative positions of people in a group and the latter deals with variations in illumination and viewpoint across camera views. We also demonstrate a notable enhancement on individual Person matching by utilising the group description as visual context.

 W.-S. Zheng, S. Gong, and T. Xiang, "Associating Groups of People," BMVC 2009. []





Action & Activity Recognition

[Go back to the Top]


For this topic, we are interested in the interaction recognition, either between human and object or between human and human.

Human-Object-Interaction (HOI): The first work we did is to present an exemplar based HOI model in order to make the recognition system tolerant to inaccurate object detection.

Jian-Fang Hu (student), Wei-Shi Zheng*, Jian-Huang Lai, Shaogang Gong, and Tao Xiang. Exemplar-based Recognition of Human-Object Interactions. IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 4, pp. 647-660, 2016.[Project Page. SYSU-ACTION Dataset]


Later, in order to alleviate the lighting impact and utilize heterogeneous features for achieving a more robust recognition, we developed a RGB-D based HOI methods by presenting a joint learning on heterogeneous features. We find that features from different channels (RGB, depth) could share some similar hidden structures, and then propose a joint learning model to simultaneously explore the shared and feature-specific components as an instance of heterogeneous multi-task learning. The proposed model formed in a unified framework is capable of: 1) jointly mining a set of subspaces with the same dimensionality to exploit latent shared features across different feature channels, 2) meanwhile, quantifying the shared and feature-specific components of features in the subspaces, and 3) transferring feature-specific intermediate transforms (i-transforms) for learning fusion of heterogeneous features across datasets.

JianFang Hu (student), Wei-Shi Zheng*, Jian-Huang Lai, and Jianguo Zhang. Jointly Learning Heterogeneous Features for RGB-D Activity Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2017 (DOI:10.1109/TPAMI.2016.2640292). [Project Page (including codes)]

Jian-Fang Hu (student), Wei-Shi Zheng*, Jian-huang Lai, and Jianguo Zhang, "Jointly Learning Heterogeneous Features for RGB-D Activity Recognition," IEEE Conf. on Computer Vision and Pattern Recognition, June 2015.

Early Action Prediction: We propose a novel approach for predicting on-going activities captured by a low-cost depth camera. Our approach avoids an usual assumption in existing activity prediction systems that the progress level of on-going sequence is given. We overcome this limitation by learning a soft label for each subsequence and develop a soft
regression framework for activity prediction to learn both predictor and soft labels jointly. In order to make activity prediction work in a real-time manner, we introduce a new RGB-D feature called "local accumulative frame feature (LAFF)", which can be computed efficiently by constructing an integral feature map.

Jian-Fang Hu(student), Wei-Shi Zheng*, Liangyang Ma, Gang Wang, and Jianhuang Lai. Real-time RGB-D Activity Prediction by Soft Regression. In European Conference on Computer Vision (ECCV), 2016.


Collective Activity Recognition: For learning interaction between people in a group, we presented a graph-based interaction learning model.

Xiaobin Chang (student), Wei-Shi Zheng*, and Jianguo Zhang. Learning Person-Person Interaction in Collective Activity Recognition.
IEEE Transactions on Image Processing,
vol. 24, no. 6, pp. 1905-1918, 2015. 



Face Recognition

[Go back to the Top]



Discriminant subspace learning: There is some argument for principal component selection in PCA+LDA. This work shows small principal components (corresponding to small eigenvalues) are useful and should be carefully selected in PCA+LDA. A undation of principal component selection in LDA is established. New GA technique is used for implementation.

Wei-Shi Zheng, J. H. Lai, and Pong C. Yuen. . IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 35, no. 5, pp. 1065-1078, 2005. []


From 2003 to 2008, lots of work have shown that algorithms with (2D) matrix-based representation perform better than the traditional (1D) vector-based ones. Specially, 2D-LDA was widely reported to outperform 1D-LDA. However, would the matrix-based linear discriminant analysis be always superior and when would 1D-LDA be better? This work gives some impressive theoretical analysis and experimental comparison between 1D-LDA and 2D-LDA. Different from existing views, we find that there is no convinced evidence that 2D-LDA would always outperform 1D-LDA when the number of training samples for each class is small or when the number of discriminant features used is small.

Wei-Shi Zheng, J. H. Lai, and Stan Z. Li. . Pattern Recognition, vol. 41, no. 7, pp. 2156-2172, 2008. []


In deriving the Fishers LDA formulation, there is an assumption that the class empirical mean is equal to its expectation. However, this may not be valid in practice and this problem has been rarely discussed before. From the "perturbation" perspective, we develop a new algorithm, called perturbation LDA (P-LDA), in which perturbation random vectors are introduced to learn the effect of the difference between the class empirical mean and its expectation in Fisher criterion.

Wei-Shi Zheng, J. H. Lai, Pong C. Yuen, and Stan Z. Li, "," Patten Recognition, vol. 42, no. 5,  pp. 764-779, 2009. [] []



Sparse Feature Learning:

NMF, which is a two-sided non-negativity based matrix factorization, is popular for extraction of sparse features. However, why non-negativity should be imposed on both components and coefficients? What is case if some constraint is released? In this work, we find releasing the non-negativity constraint on the coefficient term in NMF would help extract equally/much sparser and more reconstrutive components/features as compared to the two-sided non-negativity matrix factorization techniques. The exact 17 local components of Swimmer data set are successfully extracted for the first time (to our best knowledge).

Wei-Shi Zheng, Stan Z. Li, J. H. Lai, and Shengcai Liao. . 11th IEEE International Conference on Computer Vision (ICCV), 2007. []

Wei-Shi Zheng, JianHuang Lai, Shengcai Liao, and Ran He. Extracting Non-negative Basis Images Using Pixel Dispersion Penalty. Pattern Recognition, vol. 45, no. 8, pp. 2912-2926, 2012.[PDF][CODE]


We present a sparse correntropy framework for computing robust sparse representations of face images for recognition. Compared with the state-of-the-art l1norm-based sparse representation classifier (SRC), which assumes that noise also has a sparse representation, our sparse algorithm is developed based on the maximum correntropy criterion, which is much more insensitive to outliers. In the proposed correntropy frameworks, several new methods have been developed for face recognition and object recognition.

Ran He, Wei-Shi Zheng, Tieniu Tan, and Zhenan Sun, "Half-quadratic based Iterative Minimization for Robust Sparse Representation," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 36, no. 2, pp. 261-275, 2014. [PDF]

Ran He, Wei-Shi Zheng, and BaoGang Hu, "Maximum Correntropy Criterion for Robust Face Recognition," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1561 - 1576, 2011.

Ran He, Wei-Shi Zheng, and BaoGang Hu, and Xiang-Wei Kong, "A Regularized Correntropy Framework for Robust Pattern Recognition," Neural Computation, vol. 23, no. 8, pp. 2074-2100, 2011.




VIS-NIR Face Recognition: Visual versus near infrared (VIS-NIR) face image matching uses a NIR face image as the probe and conventional VIS face images as enrollment. Existing VIS-NIR techniques assume that during classifier learning, the VIS images of each target people have their NIR counterparts. However, since corresponding VIS-NIR image pairs of the same people are not always available. To address this problem, we propose a transductive method named transductive heterogeneous face matching (THFM) to adapt the VIS-NIR matching learned from training with available image pairs to all people in the target set. In addition, we propose a simple feature representation for effective VIS-NIR matching, which can be computed in three steps, namely Log-DoG filtering, local encoding, and uniform feature normalization, to reduce heterogeneities between VIS and NIR images. The transduction approach can reduce the domain difference due to heterogeneous data and learn the discriminative model for target people simultaneously.

Jun-Yong Zhu (student), Wei-Shi Zheng*, Jian-Huang Lai, Stan Z. Li, "Matching NIR Face to VIS Face using Transduction," IEEE Transactions on Information Forensics and Security, vol. 9, no. 3, pp. 501-514, 2014.[PDF]

Jun-Yong Zhu (student), Wei-Shi Zheng*, JianHuang Lai, " Logarithm Gradient Histogram: A General Illumination Invariant Descriptor for Face Recognition," IEEE Conference on Automatic Face and Gesture Recognition, 2013 (oral) [PDF]



Face Normalization:


KPCA is a promising technique for nonlinear processing of images. A main problem in this approach is how to learn the pre-image of a kernel feature point in the input image space. However, it is always ill-posed. We present a regularized method and introduce the weakly supervised learning in order to alleviate this ill-posed estimation problem.



In solving the illumination problem for face recognition, most (if not all) existing methods either only use extracted small-scale features while discard large-scale features, or perform normalization on the whole image. In the latter case, small-scale features may be distorted when the large-scale features are modified. In this work, we argue that large-scale features of face image are important and contain useful information for face recognition as well as visual quality of normalized image. We suggest that illumination normalization should mainly perform on large-scale features of face image rather than the whole face image. A new framework is therefore developed.

Xiaohua Xie, Wei-Shi Zheng, J. H. Lai, and Pong C. Yuen. . CVPR 2008. []

Xiaohua Xie, Wei-Shi Zheng, JianHuang Lai, Pong C. Yuen, and Ching Y. Suen, "Normalization of Face Illumination Based on Large- and Small- Scale Features," IEEE Trans. on Image Processing, vol. 20, no. 7, pp. 1807 - 1821, 2011.


Micro-Expression Recognition: Due to the short duration and low intensity of micro-expressions, the recognition of micro-expression is still a challenging problem. In this paper, we develop a novel multi-task mid-level feature learning method to enhance the discrimination ability of extracted low-level features by learning a set of class-specific feature mappings, which would be used for generating our mid-level feature representation. Moreover, two weighting schemes are employed to concatenate different mid-level features. We also construct a new mobile micro-expression set to evaluate the performance of the proposed mid-level feature learning framework. The experimental results on two widely used non-mobile micro-expression datasets and one mobile micro-expression set demonstrate that the proposed method can generally improve the performance of the low-level features, and achieve comparable results with the state-of-the-art methods.

micro expression recognition

Jiazhi He (student), Jianfang Hu, Xi Lu, Wei-Shi Zheng*. Multi-Task Mid-Level Feature Learning for Micro-Expression Recognition. Pattern Recognition, vol. 66, pp. 44-52, 2017.



Large-scale Machine Learning

[Go back to the Top]


Nowadays, we have more data to process. Recently, our group is working on 1) online classifier; 2) fast search; 3) large-scale clustering.


Online Classifier: Online learning is very important for processing sequential data and helps alleviate the computation burden on large scale data as well. Especially, one-pass online learning is to predict a new coming sample's label and update the model based on the prediction, where each coming sample is used only once and never stored. So far, existing one-pass online learning methods are globally modeled and do not take the local structure of the data distribution into consideration, which is a significant factor of handling the nonlinear data separation case. In this work, we propose a local online learning (LOL) method, a multiple hyperplane Passive Aggressive algorithm integrated with online clustering, so that all local hyperplanes are learned jointly and working cooperatively. This is achieved by formulating a common component as information traffic among multiple hyperplanes in LOL. A joint optimization algorithm is proposed and theoretical analysis on the cumulative error is also provided.

Zhaoze Zhou (student), Wei-Shi Zheng*, JianHuang Hu(student), Yong Xu, Jane You. One-pass Online Learning: A Local Approach. Pattern Recognition, 2016. [code]


Fast Search: We focus on developing hash models, which search similar thing in Hamming space. Our research goes from same modal hashing to cross modal hashing, from single modality to multiple modalities.

Long-Kai Huang (undergraduate student), Qiang Yang(student), Wei-Shi Zheng*. Online Hashing. IEEE Transactions on Neural Networks and Learning Systems, 2017 (DOI: 10.1109/TNNLS.2017.2689242).[code]

Chenghao Zhang (undergraduate student), and Wei-Shi Zheng*. Semi-supervised Multi-view Discrete Hashing for Fast Image Search. IEEE Transactions on Image Processing, 2017 (DOI: 10.1109/TIP.2017.2675205).

Botong Wu (undergraduate student), Qiang Yang (student), Wei-Shi Zheng*, Yizhou Wang, and Jingdong Wang. "Quantized Correlation Hashing for Fast Cross-modal Search ," International Joint Conference on Artificial Intelligence (IJCAI), 2015

Longkai Huang(undergraduate student), Qiang Yang(student), Wei-Shi Zheng*, "Online Hashing," International Joint Conference on Artificial Intelligence (IJCAI), 2013. [(including labelme dataset)]

Qiang Yang(student), Longkai Huang(undergraduate student), Wei-Shi Zheng*, Yingbiao Ling, "Smart Hashing Update for Fast Response," International Joint Conference on Artificial Intelligence (IJCAI), 2013.


We are also interested in large scale clustering, where we have developed Euler clustering and fast competitive learning.

Jian-sheng Wu (student), Wei-Shi Zheng*, Jian-huang Lai, "Euler Clustering," International Joint Conference on Artificial Intelligence (IJCAI), 2013.

Jiansheng Wu (student), Wei-Shi Zheng*, Jian-Huang Lai. Approximate Kernel Competitive Learning. Neural Networks, pp. 117-132, 2015 [CODE]

NOTE: In the above publications, "*" indicating the PI (principal investigator) of the research work, and also the supervisor of the students of the paper of Sun Yat-sen University


My Bio


Publication List

Teaching & Tutorials


(Dataset & Code)

Recruitment on Research Associate Professors

Recruitment on Research Postdoctoral Positions

Recruitment on Graduate Students

Recruitment on Research Under-graduate Students

Our Lab