||Companies with web presence rely on web usage analysis to obtain insights on customer behavior, associations among products, impact of advertisement banners, web marketing campaigns and product promotions. The validity of these results depends heavily on the accurate|
reconstruction of the visitors' activities in the web site. To this end, many sites employ cookies that distinguish among different users coming from the same proxy server or anonymizer. However, the set of activities thus grouped together refer to the whole lifetime of a cookie at the user's host. The activities performed during each visit to the web site, the \sessions", are not grouped properly, thus prohibiting the monitoring of changes in the user's behaviour and in her interaction with the site during each session. The reconstruction of user sessions, the so-called \sessionizing" is blurred by client caches and multiple instantiations of the user's browser. Sessionizing tools exploit infor- mation on the site's topology and statistics on its usage, in order to assess the correct contents of a user session. These tools are based on heuristic rules and on assumptions about the site's usage, and are therefore prone to error. In this study, we provide a formal framework for the evaluation of the accuracy of sessionizing tools. We introduce a set of measures that compute the extent to which real sessions are successfully reconstructed by different sessionizers. The wide range of measures proposed re ects the fact that some web usage analysis applications require exact reconstruction of a session, while for others ordering and page revisits are not important. On the basis of these measures, we compute and evaluate a number of sessionizing tools using the log data of a real web site.