Tutorials - SIGIR 2024

July 14, 2024 - Washington DC, USA

Robust Information Retrieval

Yu-An Liu (University of Chinese Academy of Sciences), Ruqing Zhang (University of Chinese Academy of Sciences), Jiafeng Guo (University of Chinese Academy of Sciences), Maarten de Rijke (University of Amsterdam)

Beyond effectiveness, the robustness of an information retrieval (IR) system is increasingly attracting attention. When deployed, a critical technology such as IR should not only deliver strong performance on average but also have the ability to handle a variety of exceptional situations. In recent years, research into the robustness of IR has seen significant growth, with numerous researchers offering extensive analyses and proposing myriad strategies to address robustness challenges. In this tutorial, we first provide background information covering the basics and a taxonomy of robustness in IR. Then, we delve into adversarial robustness and out-of-distribution (OOD) robustness within IR-specific contexts, extensively reviewing recent progress in methods to enhance robustness. The tutorial concludes with a discussion on the robustness of IR in the context of large language models, highlighting ongoing challenges and promising directions for future research. This tutorial aims to generate broader attention to robustness issues in IR, facilitate an understanding of the relevant literature, and lower the barrier to entry for interested researchers and practitioners.

Large Language Models for Recommendation: Past, Present, and Future

Keqin Bao (University of Science and Technology of China), Jizhi Zhang (University of Science and Technology of China), Xinyu Lin (National University of Singapore), Yang Zhang (University of Science and Technology of China), Wenjie Wang (National University of Singapore), Fuli Feng (University of Science and Technology of China)

Large language models (LLMs) have significantly influenced recommender systems, spurring interest across academia and industry in leveraging LLMs for recommendation tasks. This includes using LLMs for generative item retrieval and ranking, and developing versatile LLMs for various recommendation tasks, potentially leading to a paradigm shift in the field of recommender systems. This tutorial aims to demystify the Large Language Model for Recommendation (LLM4Rec) by reviewing its evolution and delving into cutting-edge research. We will explore how LLMs enhance recommender systems in terms of architecture, learning paradigms, and functionalities such as conversational abilities, generalization, planning, and content generation. The tutorial will shed light on the challenges and open problems in this burgeoning field, including trustworthiness, efficiency, online training, and evaluation of LLM4Rec. We will conclude by summarizing key learnings from existing studies and outlining potential avenues for future research, with the goal of equipping the audience with a comprehensive understanding of LLM4Rec and inspiring further exploration in this transformative domain.
Tutorial website: https://generative-rec.github.io/tutorial/

Large Language Model Powered Agents for Information Retrieval

An Zhang (National University of Singapore), Yang Deng (National University of Singapore), Yankai Lin (Renmin University of China), Xu Chen (Renmin University of China), Ji-Rong Wen (Renmin University of China), Tat-Seng Chua (National University of Singapore)

The vital goal of information retrieval today extends beyond merely connecting users with relevant information they search for. It also aims to enrich the diversity, personalization, and interactivity of that connection, ensuring the information retrieval process is as seamless, beneficial, and supportive as possible in the global digital era. Current information retrieval systems often encounter challenges like a constrained understanding of queries, static and inflexible responses, limited personalization, and restricted interactivity. With the advent of large language models (LLMs), there's a transformative paradigm shift as we integrate LLM-powered agents into these systems. These agents bring forth crucial human capabilities like memory and planning to make them behave like humans in completing various tasks, effectively enhancing user engagement and offering tailored interactions. In this tutorial, we delve into the cutting-edge techniques of LLM-powered agents across various information retrieval fields, such as search engines, social networks, recommender systems, and conversational assistants. We will also explore the prevailing challenges in seamlessly incorporating these agents and hint at prospective research avenues that can revolutionize the way of information retrieval.
Tutorial website: https://llmagenttutorial.github.io/sigir2024

Recent Advances in Generative Information Retrieval

Yubao Tang (University of Chinese Academy of Sciences), Ruqing Zhang (University of Chinese Academy of Sciences), Zhaochun Ren (Leiden University), Jiafeng Guo (University of Chinese Academy of Sciences), Maarten de Rijke (University of Amsterdam)

Generative retrieval (GR) has witnessed significant growth recently in the area of information retrieval. Compared to the traditional ``index-retrieve-then-rank'' pipeline, the GR paradigm aims to consolidate all information within a corpus into a single model. Typically, a sequence-to-sequence model is trained to directly map a query to its relevant document identifiers (i.e., docids). This tutorial offers an introduction to the core concepts of the GR paradigm and a comprehensive overview of recent advances in its foundations and applications. We start by providing preliminary information covering foundational aspects and problem formulations of GR. Then, our focus shifts towards recent progress in docid design, training approaches, inference strategies, and applications of GR. We end by outlining challenges and issuing a call for future GR research. Throughout the tutorial we highlight the availability of relevant resources so as to enable a broad audience to contribute to this topic. This tutorial is intended to be beneficial to both researchers and industry practitioners interested in developing novel GR solutions or applying them in real-world scenarios.

Preventing and Detecting Misinformation Generated by Large Language Models

Aiwei Liu (Tsinghua University), Qiang Sheng (Chinese Academy of Sciences), Xuming Hu (The Hong Kong University of Science and Technology)

As large language models (LLMs) become increasingly capable and widely deployed, the risk of them generating misinformation poses a critical challenge. Misinformation from LLMs can take various forms, from factual errors due to hallucination to intentionally deceptive content, and can have severe consequences in high-stakes domains.This tutorial covers comprehensive strategies to prevent and detect misinformation generated by LLMs. We first introduce the types of misinformation LLMs can produce and their root causes. We then explore two broad categories: Preventing misinformation generation: a) AI alignment training techniques to reduce LLMs' propensity for misinformation and refuse malicious instructions during model training. b) Training-free mitigation methods like prompt guardrails, retrieval-augmented generation (RAG), and decoding strategies to curb misinformation at inference time. Detecting misinformation after generation, including a) using LLMs themselves to detect misinformation through embedded knowledge or retrieval-enhanced judgments, and b) distinguishing LLM-generated text from human-written text through black-box approaches (e.g., classifiers, probability analysis) and white-box approaches (e.g., watermarking). We also discuss the challenges and limitations of detecting LLM-generated misinformation.

Using and Evaluating Quantum Computing for Information Retrieval and Recommender Systems

Maurizio Ferrari Dacrema (Politecnico di Milano), Andrea Pasin (Università degli Studi di Padova), Paolo Cremonesi (Politecnico di Milano), Nicola Ferro (Università degli Studi di Padova)

The field of Quantum Computing (QC) has gained significant popularity in recent years, due to its potential to provide benefits in terms of efficiency and effectiveness when employed to solve certain computationally intensive tasks. In both Information Retrieval (IR) and Recommender Systems (RS) we are required to build methods that are able apply complex processing on large and heterogeneous datasets, it is natural therefore to wonder whether QC could also be applied to boost their performance. The tutorial aims to provide first an introduction to QC for an audience that is not familiar with the technology, then to show how to apply the QC paradigm of Quantum Annealing (QA) to solve practical problems that are currently faced by IR and RS systems. During the tutorial, participants will be provided with the fundamentals required to understand QC and to apply it in practice by using a real D-Wave quantum annealer through APIs.

Large Language Models for Tabular Data: Progresses and Future Directions

Haoyu Dong (Microsoft AI), Zhiruo Wang (Carnegie Mellon University), Yue Hu (University of Chinese Academy of Sciences)

This tutorial provides a comprehensive study of the advancements, challenges, and opportunities in leveraging cutting-edge LLMs for tabular data. By exploring cutting-edge methods of LLMs for table interpreting, processing, reasoning, analytics, and generation, we aim to equip researchers and practitioners with the knowledge and tools needed to foster an overview across the ML, NLP, and DB communities and unlock the full potential of LLMs for tabular data in their domains.

Empowering Large Language Models: Tool Learning for Real-World Interaction

Hongru Wang (The Chinese University of Hong Kong), Yujia Qin (Tsinghua University), Yankai Lin (Renmin University of China), Jeff Z. Pan (University of Edinburgh), Kam-Fai Wong (The Chinese University of Hong Kong)

Since the advert of large language models (LLMs), the field of tool learning has remained very active to solve various tasks in practice, including but not limited to information retrieval. This half-day tutorial provides basic concepts of this field and an overview of recent advancements with several applications. In specific, we starts with some foundational components and architecture of tool learning (i.e., cognitive tool and physical tool), and then we category existing studies in this field into tool-augmented learning and tool-oriented learning, and introduce various learning methods to empower LLMs this kind of capability. Furthermore, we provide several cases about when, what, and how to use tools in different applications. We end by some open challenges and several potential research directions for future studies. We believe this tutorial is suited for both researchers at different stages (introductory, intermediate and advanced) and industry practitioners who are interested in LLMs and tool learning.
Tutorial website: https://rulegreen.github.io/services/tools-meet-llm/

Search under Uncertainty: Cognitive Biases and Heuristics

Jiqun Liu (The University of Oklahoma), Leif Azzopardi (University of Strathclyde)

Understanding how people interact with search interfaces is core to the field of Interactive Information Retrieval (IIR). While various models have been proposed (e.g., Belkin's ASK, Berry picking, Everyday-life information seeking, Information foraging theory, Economic theory, etc.), they have largely ignored the impact of cognitive biases on search behaviour and performance. A growing body of empirical work exploring how people's cognitive biases influence search and judgments, has led to the development of new models of search that draw upon Behavioural Economics and Psychology. This full day tutorial will provide a starting point for researchers seeking to learn more about information seeking, search and retrieval under uncertainty. The tutorial will be structured into three parts. First, we will provide an introduction of the biases and heuristics program put forward by Tversky and Kahneman (1974) which assumes that people are not always rational. The second part of the tutorial will provide an overview of the types and space of biases in search, before doing a deep dive into several specific examples and the impact of biases on different types of decisions (e.g., health/medical, financial). The third part will focus on a discussion of the practical implication regarding the design and evaluation human-centered IR systems in the light of cognitive biases -- where participants will undertake some hands-on exercises.

High Recall Retrieval Via Technology-Assisted Review

Lenora Gray (Redgrave Data), David D. Lewis (Redgrave Data), Jeremy Pickens (Redgrave Data), Eugene Yang (Johns Hopkins University)

High Recall Retrieval (HRR) tasks, including eDiscovery in the law, systematic literature reviews, and sunshine law requests, focus on efficiently prioritizing relevant documents for human review. Technology-assisted review (TAR) refers to iterative human-in-the-loop workflows that combine human review with IR and AI techniques to minimize both time and manual effort while maximizing recall. The tutorial provides a comprehensive introduction to TAR. The morning session will provide an introduction to TAR, an overview of the key technologies and workflow designs used, the basics of practical evaluation methods, and social and ethical implications of TAR deployment. It is intended to appear both to traditional SIGIR attendees and to a wide range of TAR-interested professionals in the Washington, DC area. The afternoon session will go into more technical depth on the implications of TAR workflows for supervised learning algorithm design, how generative AI is beginning to be applied in TAR, more sophisticated statistical evaluation techniques, and a wide range of open research questions.