Research in Data Mining

Synopsis of the Research Area

Data Mining is an active research area at CDM with several internationally recognized research groups working in areas such as Web mining, recommender systems, social computing, and medical informatics. Faculty and students working in data mining research have been working on novel data mining algorithms that have been applied in a variety of domains including e-commerce, marketing, medicine, requirement engineering, and information retrieval. The research projects by the data mining faculty have been funded by the National Science Foundation, McArthur Foundation, and various industries with more than $2M in total funding over the past five years. A number of algorithms and applications developed in these research groups have been adopted widely by industry, including companies such as, Yahoo, Microsoft, and others. For more information, please visit the DePaul Data Mining and Predictive Analytics Center.

Current Research Projects

Medical Imaging Informatics

NSF REU Program in Medical Informatics Program:
The Medical Informatics (MedIX) program’s main objectives are to encourage talented undergraduates to pursue graduate education and to expose students to interdisciplinary research, especially at the border of information technology and medicine. The Program is sponsored by National Science Foundation since 2005 and is hosted by two interdisciplinary laboratories: the Medical Informatics Laboratory at DePaul University and the Imaging Research Institute at the University of Chicago. For more information, please visit the Program’s website.

Computer-aided lung nodule interpretation:
Early diagnosis and treatment of lung cancer offers hope for improving the outcomes of patients with this most common cause of cancer death. Our main projects are in the areas of 1) computer-aided detection (CADe), 2) computer-aided diagnosis (CADx), and 3) computer-aided diagnostic characterization (CADc). For more information on these projects, please visit the Intelligent Multimedia Processing (IMP) Lab website.

Bridging the gap between human and computer interpretation of similarity in the medical domain:
Content-Based Image Retrieval (CBIR) aims to retrieve images relevant to the image query and has the potential to be used as a decision support tool for evidence-based medicine and case-based reasoning. Our work focuses on reducing this semantic gap by investigating computer-based similarity measures and image features that are close to the human perception of similarity and encode the visual content of an image similarly to the human vision. For more information on these projects, please visit the IMP Lab website

Urban Studies and Data Mining

Predictions of urban changes in the Chicago community area:
The goal of the project is to use data mining techniques to predict changes in urban communities leading to gentrification or abandonment. The study analyzes data on the socio-economical and housing characteristics of Chicago community areas, and employs multivariate statistical methods and sequence analysis techniques to create a typology representing the social diversity of the Chicago neighborhoods, and to understand factors that affect mobility and home ownership.

Web Data Mining, Web Personalization, and Recommender Systems

Ontology-based user modeling for web personalization and recommendation:
The goal of this project is to develop a framework for ontological user modeling and study how it can be used in a variety of Web personalization tasks such as search and recommendation. For more information on this and other projects in the Center for Web Intelligence, please contact Professor Bamshad Mobasher or visit:

Recommender Systems for the Social Web:
Recommender systems that assist users’ in their information seeking and resource sharing activities can play an essential role in the evolution of the social Web. Our goal in this project is to develop a framework for the construction of effective recommender systems for social Web environment and particularly social annotation systems. We are conducting empirical analyses across several dimensions, namely, the recommendation tasks, recommendation algorithms, and the type of social annotation system. For more information on this and other projects in the Center for Web Intelligence, please contact Professor Bamshad Mobasher or visit:

Trustworthy and Secure Recommender Systems for the Web:
In this research project, we focus on the study the security properties of open user-adaptive of social web applications. Through the study of the underlying data mining and machine learning algorithms, attack modeling and empirical evaluation, we will develop attack detection approaches that substantially eliminates the threat posed by attackers seeking to biasing the system output in their favor. This work is an extension of our prior research on the security of recommender systems. For more information on this and other projects in the Center for Web Intelligence, please contact Professor Bamshad Mobasher or visit:

Using data mining and recommender systems to facilitate large-scale requirements processes:
The goal of this project is to develop a robust requirements elicitation framework and an associated library of tools which can be used to augment the functionality of wikis, forums, and specialized management tools used in the requirements domain. Specifically, we will enhance requirements clustering techniques by incorporating prior knowledge and user-derived constraints, and we will develop a contextualized recommender system designed to facilitate appropriate placement of stakeholders into requirements discussion forums generated in the clustering phase. For more information on this and other projects in the Center for Web Intelligence, please contact Professor Bamshad Mobasher or visit:

Current Research Students

Students affiliated with the Medical Informatics Lab and the Intelligent Multimedia Processing Lab. Acronyms used in the table: AI= Artificial Intelligence, DA= Data Analysis, DM= Data Mining, VC= Visual Computing

NameDegree ProgramResearch Area(s)Research Topic
Laura ChristiansonPhDDM 
Zahra FerdowsiPhDDA&DMSpatial and sequential data analysis for urban studies
Jonathan GemmellPhDDM 
John HalterPhDDA&DMHigh Frequency Analysis of SPY data
Mary RamezaniPhDDM 
Tom SchimolerPhDDDM 
Ahu SiegPhDDM 
Joseph WantrobaUGDA&DMStatistical Analysis of the relationships between human and computer-based lesion boundary delineation
Dmitriy ZinovevPhDAI&DM &VCActive learning approaches for multi-label multi-instance classification problems

Sample Publications

Journal Papers:

  • Zinovev D., Raicu D., Furst J, Armato S., “Predicting Radiological Panel Opinions using a Panel of Machine Learning Classifiers”, Algorithms Journal 2010.
  • P. Opulencia, D.S. Channin, D.S. Raicu, J.D. Furst, “Mapping LIDC, RadLex, and Lung Nodule Image Features”, Journal of Digital Imaging (JDI), 2010.
  • Zou, X., Settimi, R., Cleland-Huang. J. (2010). “Improving Automated Requirements Trace Retrieval: A Study of Term-based Enhancement Methods.” International Journal of Empirical Software Engineering, vol.15, N.2, pp. 1382-3256.

Conference Papers:

  • Kim R., Dasovich G., Bhaumik R., Brock R., Furst J.D., Raicu D.S., "An Investigation into the Relationship between Semantic and Content Based Similarity using LIDC", ACM International Conference on Multimedia Information Retrieval (MIR) 2010, Philadelphia, Pennsylvania, March 29-31, 2010.
  • Ferdowsi, Z., Settimi, R. and Raicu, D. (2010). “An application of clustering techniques to Urban Studies”, Proc. of the 6th International Conference in Data Mining, July 12-15, 2010, Las Vegas.
  • M. Ramezani, J.J. Sandvig, T. Schimoler, J. Gemmell, B. Mobasher, R. Burke: “Evaluating the Impact of Attacks in Collaborative Tagging Environments”, IEEE International Conference on Social Computing, Vancouver, Canada, August 2009. CSE (4) 2009: 136-143.
  • J. Gemmell, A. Shepitsen, B. Mobasher, R. Burke. “Personalizing Navigation in Folksonomies Using Hierarchical Tag Clustering.” Proceedings of the 10th International Conference on Data Warehousing and Knowledge Discovery (DaWaK'08), Turin, Italy, September, 2008. Lecture Notes in Computer Science 5182, pp. 196-205, Springer.
  • B. Mobasher, “Data Mining for Personalization”, In The Adaptive Web: Methods and Strategies of Web Personalization, Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.). Lecture Notes in Computer Science. Vol. 4321. Springer, Berlin Heidelberg, 2007.
  • B. Mobasher, R. Burke, C. Williams, R. Bhaumik. “Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithms”, ACM Transactions in Internet Technologies. Vol. 7, No. 4, 2007.
  • A. Sieg, B. Mobasher, R. Burke. “Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search”, IEEE Intelligent Informatics Bulletin, Vol. 8, No. 1, pp 7-18, 2007.
  • B. Mobasher, B. Liu, B. Masand, and O. Nasraoui. “Advances in Web mining and Web Usage Analysis”, Lecture Notes in Computer Science, Volume 3932, Springer, 2006. ISBN: 978-3-540-47127-1.