I received my Laurea in Mathematics from the University of Milano in 1991, and my Ph.D. in Computer Science from the University of Milano in 1996.
Presently I am an associate professor at the Computer Science Department of the University of Milano, and a member of the LAW. You can mail me if you want. Information related to the DSI (e.g., phone numbers) can be found on my DSI web page.
Current Interests
- Web-graph compression;
- Web-graph analysis;
- Efficient data structures for large datasets;
- Indexing of large document collections;
- Automatic generation of software from conceptual models;
- High-performance parallel web crawling;
- Computability on the reals;
- Theory of anonymous networks (topological properties and computability problems);
- Self-stabilizing systems;
- ...and, last but not least, any programming activity.
More in detail...
Web-graph compression
Snapshots
of the web generated very large graphs—you need a compact
representation. WebGraph is
a framework built to this purpose. Among other things, WebGraph uses new instantaneous codes for the integers
and new aggressive algorithmic compression
techniques.
Nonetheless, compression algorithms need good orderings of the node—orderings that expose the inner structure of the graph. We have made some interesting experiments to find new, efficient orderings.
Web-graph analysis
Web graphs have special properties whose study requires a sizeable amount of mathematics, but also a careful study of actual web graphs. We have studied, for instance, the paradoxical way PageRank evolves during a crawl, and the way PageRank changes depending on the damping factor.
If you're interested in PageRank and similar scores for the web, please have a look at this survey about the history of spectral ranking methods. You might be surprised 8^).
The development of UbiCrawler, WebGraph, and of the software necessary for our large-scale experiments spun a number of other satellite Java projects whose code is available as free software, including fastutil and Managing Gigabytes for Java.
Efficient data structures for large datasets
More generally, very large data sets requires efficient compressed data structures. An exciting recent development in this direction is that of succinct data structures, which implement standard data structures using the minimum possible space without slowing down more than by a constant factor. I proposed very efficient implementations of several basic building blocks used in succinct data structures by means of broadword programming. The building blocks are used throughout the code of the Sux and Sux4J projects, which aim at providing off-the-shelf useful basic succinct data structures such as the Elias–Fano representation of monotone sequences, Majewski-Wormald-Havas-Czech functions, and more.
In the same spirit, we devised and implemented (almost) optimal monotone minimal perfect hash functions and other low-memory, high-speed data structures for very large sets of strings.
Indexing of large document collections
MG4J is a
framework for building indices of large document collection based on the classical inverted-index approach.
The kind of index constructed is very configurable (e.g., you can choose your preferred coding method), and
moreover some new research has gone into providing efficient skips and minimal-interval semantics. Its direct
counterpart is Archive4J, which makes it possible to access
quickly and in a highly compressed form summaries of information about documents in a collection.
Automatic generation of software from conceptual models
ERW is an innovative system for
handling complex databases using a web browser. It uses the most recent
standards endorsed by the W3C to offer to the user a sophisticated
environment, similar to a dedicated client. Moreover, the user interface is
generated in a completely automatic way starting from a conceptual
description of the database by means of an XML-based language for
entity-relationship schemata. Customisation of the resulting application
is governed by specification percolation.
High-performance parallel web crawling
I collaborate with Paolo Boldi, Bruno Codenotti and Massimo Santini to the development of UbiCrawler, a
scalable, fault-tolerant and fully distributed web crawler. The first
report on the design of UbiCrawler won the Best
Poster Award at the Tenth World Wide Web
Conference. You can also read a more in-depth report.
Computability on the reals
I proved, in collaboration with Paolo Boldi, the
first results comparing
the Blum–Shub–Smale model (i.e., computation with exact reals)
and Type 2 Theory of Effectivity (i.e., feasible discrete approximations).
We gave a precise quantification of the gap between the two models using
degree theory: in a slogan, equality is a
jump. We also introduced a notion of Turing closure for Archimedean
field, which was used to characterize those fields in which Type 2-decidable sets exist.
Computability problems in anonymous networks
Deciding whether a problem can be solved with a certain knowledge in a
network where all processors are identically programmed (an
anonymous network) is a problem dating about 20 years. The concept
of graph fibration gave rise to an effective characterization of the tasks
solvable under a wide range of models and knowledge assumptions.
Topological properties of distributed systems
Faster solutions to classical distributed problems are possible if the way
processors label the link interconnecting them respects a global coherence
property called sense of direction, and formalized by Flocchini,
Mans and Santoro (for instance, a mesh with the standard
North/South/Est/West labelling has this property). We showed that weak sense of direction can be decided in
logarithmic parallel time, and this implies that the same is true of
Cayley colour graphs; this is the first complexity result on Cayley graph
(the non-coloured case is not known to be polynomial or NP-complete).
We also proved the first lower bounds for weak sense of direction, using noncostructive techniques based on random graph theory, and we extended these results to regular graphs by proving some new properties of random regular graphs of high degree.
Self-stabilizing systems
Self-stabilizing distributed system must converge to a desired behaviour in spite of an arbitrary initial state. By making self-stabilizing the classical view construction algorithm, we showed that the effective characterization for anonymous network can be extended to self-stabilizing systems. As a (surprising) consequence, there is no difference in distributed computational power between an arbitrary initial state and a chosen initial state if the latter is forced to be the same for all processors.
