ACM SIG MM eNewsletter ACM SIG MM webpage

ACM SIG Multimedia

ACM SIGMM eNewsletter

July 2008

SIG Multimedia publications
SIG Multimedia eNewsletter


Dear Member of the SIGMM Community,

We are proud to present the third issue of the SIG Multimedia newsletter. It features the announcement of a SIGMM Outstanding Technical Contributions award winner, a highly insightful feature about multimedia information retrieval, reports about the just-concluded PhD work by several young researchers, event reports, and several announcements that may be of use to the community.

The SIGMM eNews editorial team wants to congratulate the winner of the ACM SIGMM Outstanding Technical Contributions award, but read on to discover his identity. The award will be handed out at this year's ACM Multimedia Conference, which takes place in Vancouver from October 27 to October 31. We hope that you have planned for our SIG's most important conference and that we can meet you there. You can find more information about it on its web page at

The feature paper in this edition is not just a paper review as in previous issues. We are glad to include here the answer of Prof. Theo Pavlidis to questions that were asked by members of our communities. Having seen the question asked by SIG members in spring, "The Holy Grail of Multimedia Information Retrieval: So Close or Yet So Far Away?", he has been inspired to answer, and hopes to start a discussion with you. We hope that you consider sharing this discussion also with other members of the community, by carrying it to our mailing list at A good way of posting to the list is through the web form at

Staying with the SIG's services for a moment, you should know that the volunteers maintaining SIGMM's web site (at announce a redesign that brings you two new features. The webmaster, Prof. Prabhakaran Balakrishnan, explains the new features:

The weblog that we are providing is powered by WordPress, which is one of the most popular weblogging tools. Through the Sigmm weblog, members can provide information, news or commentary on a particular subject in multimedia research area. The readers can leave comments or use trackback functionality containing a link to the post, on the specific article. The existing members in the Sigmm website can log in the weblog without registering and writing the post which will be published by administrator. Please click the SIGMM Weblog in the left-side menu of the Sigmm website. Or follow the link:
The Sigmm forum site is powered by phpBB, a popular open source forum tool which millions of people use for on-line social networking. Members can have a discussion in a particular categorized forum section. The forum site helps to find research papers, software tools and other resources. The existing members in the Sigmm website can log in the forum without registering, write new topics, reply to the post, and search specific topics. Please click the SIGMM Forum in the left-side menu of the Sigmm website. Or follow the link:

Another major news item in this edition is the report from NOSSDAV 2008, which we hope will give you the pointers to some interesting research. For your interest, the web version of the newsletter is also reporting from CVDB 2007. Furthermore, summaries of four PhD theses are included in this issue. The candidates are well distributed geographically just as their topics represent a mix of our research topics, as befits our community.

We can also refer you to freely available material; a GPU-accelerated image processing and computer vision library, and the FAU salient image database are featured in this issue. The newsletter concludes with a series of brief announcements, where more announcements and more details can be found on the web.

We hope that you enjoy this third newsletter, and hope to see you all in Vancouver at ACM Multimedia 2008!

With kind regards,
Carsten Griwodz
(for the Editors of the SIGMM electronic newsletter)

Table of Contents

  1. Editorial
  2. Awards for SIGMM members
  3. Featured Paper
  4. Event Reports
  5. PhD thesis abstracts
  6. Freely available source code, traces and test content
  7. Calls for Papers
  8. Award Opportunities
  9. Job Opportunities
  10. Impressum

Awards for SIGMM members

ACM SIGMM Award for Outstanding Technical Contributions award to Ralf Steinmetz

The 2008 winner of the prestigious ACM Special Interest Group on Multimedia Award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications is Professor Ralf Steinmetz from the Technical University Darmstadt in Germany. Professor Steinmetz receives this award "for pioneering work in multimedia communications and the fundamentals of multimedia synchronization". The SIGMM award will be given to Professor Steinmetz at the ACM International Conference on Multimedia 2008 in October.

Featured Paper

The Holy Grail of Multimedia Information Retrieval: So Close or Yet So Far Away? - An Answer


Venue: online


Featured paper by Theo Pavlidis, © Theo Pavlidis 2008

Theo Pavlidis received a Ph.D. in Electrical Engineering from the University of California at Berkeley in 1964. He was with Princeton University during 1964-80, a AT&T Bell Labs during 1980-86 and Stony Brook University during 1986-2001. During 2001-2002 he was chief computer scientist of Symbol Technologies. He is a Life Fellow of IEEE and a Fellow of IAPR. In 2000 he was awarded by IAPR the King-Sun Fu prize for "fundamental contributions to the theory and methodology of structural pattern recognition."

He has authored more than 150 technical papers, five books, co-edited three books and received fifteen patents on various aspects of bar coding and document analysis. He is the co-inventor of the two-dimensional bar code PDF417. He was the editor-in-chief of the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) from 1982 to 1986.

The lead editorial of a recent (April 2008) special issue of the IEEE Proceedings on multimedia retrieval was titled "The Holy Grail of Multimedia Information Retrieval: So Close or Yet So Far Away?" (Hanjalic et al. [HLMS08]). In this note I try to provide an answer. Several authors have already pointed out that the state of the art of Content-Based Image Retrieval is currently unsatisfactory: Yanai et al. [YSGB05] state "current image retrieval methods are well off the required mark" and Huston et al. [HSHZ04] state "the semantic gap between the user's needs and the capability of CBIR algorithms remains significant." Deserno et al. [DAL08] state "Research as reported in the scientific literature, however, has not made significant inroads as medical CBIR applications incorporated into routine clinical medicine or medical research." I have posted on the web a lenghty discussion of the issues and this document is a summary with frequent references to the posting for documentation.

Several authors point to the existence of a semantic gap (for example, Smeulders et al. [SWSGJ00], Rasiwasia et al. [RMV07], and Hanjalic et al. [HLMS08]) as that between the pixel values and the interpretation of the image. Deserno et al. [DAL08] point out that the semantic gap is only of several gaps in CBIR efforts and presents an interesting ontology of such gaps. I prefer to address the issue in a different way, based on my experience of both theoretical and applied research in pattern recognition. Certainly talking about a semantic gap is a major understatement. The semantic interpretation of an image has very little to do with statistics of the values of the pixels. One of the most popular set of features that have been used in CBIR literature is color histograms. However it is well known that histogram equalization is a method that rarely, if ever, affects the semantics while it distorts significantly the histograms. Click here if you need to be convinced about that fact. One can repeat similar experiments with edge strength or texture histograms. Such global statistics can distinguish only extreme cases, for example, images with a single dominant color.

The reliance on color appears particularly puzzling because it is well known that information that appears to be conveyed by color is conveyed by luminance: two-color displays where both colors have the same luminance are hard to discern. This phenomenon is called isoluminance (see the site of Michael Bach for an impressive demonstration) and its existence implies that an image search based only on luminance is not going to be significantly inferior to a search that includes colors. For the same reason the JPEG standard uses twice the resolution for luminance as it does for chrominance. The ability of color blind people to deal quite well with most tasks of daily life as well as the ability of everybody to obtain information from "black-and-white" images offer additional support to this view.

The representation of an image by simple features usually results in loss of information so that different pictures may map onto the same set of features. If an image has area A and each pixel has L possible levels, then there are LA such images possible. On the other hand the number of possible histograms is less than AL. Usually the perceptually significant L is much smaller than A so that different images may have similar color histograms. Another way of expressing that phenomenon is to think of the mapping of image into a set of features as a hashing process. The way such mapping are done results into too many hashing collisions. Images of quite different nature are mapped into the same features.

When a method is tested on a relatively small set of images such "hashing collisions" may not happen, so the results from such sets appear impressive. But the method is likely to fail on larger sets. The key parameter is not so much the absolute size of the set of images tested as its relation to the number of features used. This realtionship has been studied in Pattern Classification and it is usually called the "curse of dimensionality". Duda and Hart [DH73 p. 95] present the following analysis.. Let d be the number of degree of freedom of the classifier. If the classifier is linear d equals the number of features F, otherwise it is greater than F. If n is the number of training samples, then for n equal to 2(d+1) there is 50% probability that successful classification can be obtained with a random assignment of two classes on the sample points. Such a success can be called spurious classification and the classifier is likely to fail on new samples. This issue is discussed at great length in Chapter 3.8 ("Problems of Dimensionality") and other parts of [DH73] as well as in Chapter 4.3 ("Parzen Windows") of the new edition of the text [DHS01]. For this reason it has been the practice to select the number of samples per class to be considerably greater than the number of features, usually five or even ten times as many.

A well-known observation from research in vision is that humans rely a lot on broad context and are likely to ignore local details. Bela Julesz [Ju91] stated that "in real-life situations, bottom-up and top-down processes are interwoven in intricate ways," and that "progress in psychobiology is ... hampered ... by our inability to find the proper levels of complexity for describing mental phenomena". According to V. S. Ramachandran "Perceptions emerge as a result of reverberations of signals between different levels of the sensory hierarchy, indeed across different senses" ([RB98], p. 56). Ramachandran is explicitly critical of the view that "sensory processing involves a one-way cascade of information (processing)" [ibid] Context has been a major challenge to Artificial Intelligence research overall and the Nobel laureate Arno Penzias has put very succinctly: "Human intelligence almost always thrives on context while computers work on abstract numbers alone. Independence from context is in fact a great strength of mathematics." [Pe89, p. 49]

The human visual system is able to ignore local cues once a high level interpretation has been achieved. This is illustrated by the following two sentences:

New York State lacks proper facilities for the mentally III.

The New York Jets won Superbowl III.

The last word in each sentence is interpreted differently, even though they are identical. Such occurrences are even more likely in pictures. (See the main paper for more examples.)

Once we recognize the importance of context, there is at least one simple requirement for techniques for image retrieval: It is not enough to look at parts of an image, we must also look at the overall composition. Fischer et al. [FSGD08] offer an example of improved retrieval by including object relationships in a medical application.

I believe that in order to construct a CBIR system we must make sure that the set of images is such that the following properties hold. The two "starred" properties (No. 2 and No. 3) may be considered as optional. One may attempt to build a CBIR system without them, but when hold, it is far more likely that a mechanical system will match or exceed human perfomance. Also the presence of the properties listed below does not negate the need to follow other good CBIR practices such as feedback from the user [IA03].

  1. The mapping between semantics and image features is well defined. This way the issue of "semantic gap" is put to rest. The representation by features can be made without loss of information, in such as way so that only images with similar interpretation and no others are mapped into the same set of features. (Examples abound in the pattern recognition literature, including OCR and fingerprint recognition.)
  2. The context is known. For example, we may know that the object of interest occupies a particular area in the image or is close some other well defined feature. The context can be taken advantage in the design and selection of the features.
  3. The matching of images requires careful scrutiny. This is a process that humans are not very good at and it is likely that machines can match or even exceed human performance. Medical, industrial, forensic, security, as well as other applications seem to offer problems of this type.
  4. The accuracy requirements are well defined. Knowing the relative significance of false matches versus omitted true matches is particularly important.
  5. The computational requirements are well defined. In many cases instant response is not needed. For example, in a medical application it may take well over an hour to produce an image, so waiting another hour to find matches in a database is not particularly onerous. In security applications a fast response is often necessary but in that case the database is proprietary (not a public collection) and it may organized accordingly for fast retrieval.

I do not see how we can have a mathematical representation without loss of information for the general category of "all images" but we have a good chance of achieving that by confining the problem into retrieving images from a particular category only (for example, matching an iris scan to a collection of iris scans). Biometrics and medical applications are areas that seem particularly promising. Probably, the most successful instance of CBIR is the automatic fingerprint identification (retrieving a match to a fingerprint from a database) is used in practice by the FBI and there are several commercial products for fingerprint verification (checking whether two fingerprints are the same), for example by Zvetco. (The mention of such systems attests only to their existence, rather than to their specific abilities.) In such applications the mapping of images to features was well understood and the main challenge has been the reliable extraction of such features from the image (for example, the minutiae in the case of fingerprints). Research is now continuing in more challenging soft biometrics such as the matching of tattoos and scars (Lee et al. [LJJ08]).

I think that the first property is critical because if the images and objects of interest be characterized formally so the results can be scalable. The use of a standard set of images may be fine for the initial stages of the research but the algorithm must soon be tested on any image that satisfies appropriate criteria. Let me digress here for a moment. I have worked in industry where an imaging product is tested in the customer's site with blue collar workers (as opposed to graduate students) handling the equipment and, while hard, it is quite possible to design systems that perform well in an unlimited testing set (for example in the patent by Pavlidis et al. [PJHHL06]).

Evaluation of methodologies solely on the basis of statistical results of their performance on "standard sets" is fraught with risks because of the difficulty of ensuring that the membership of the set and the rules of evaluation are representative of the real world where the methodologies will be applied. See, for example, the study by Ioannidis [Io05]. Even though [Io05] address mainly medical studies, several of the issues it raises are also relevant to pattern recognition. It is important to pay close attention to what is considered a satisfactory performance.

In conclusion, the answer to the question of the title is that past CBIR research has relied excessively on the mapping of images onto unreliable features and the use of statistical tests on relatively small sets of images. While such tests may produce impressive results in publications, they are not scalable and therefore not applicable to practical situations.

Retrieval of images without restrictions in their nature from large datases such as found on the web is well beyond the current state of the art and it will have to wait both advances in image analysis and in computer architectures.

Sources Cited

[DAL08] T. M. Deserno, S. Antani, and R. Long "Ontology of Gaps in Content-Based Image Retrieval", Journal of Digital Imaging, 2008.
[DH73] R. O. Duda and P. E. Hart Pattern Classification and Scene Analysis, Wiley, 1973.
[DHS01] R. O. Duda, P. E. Hart, and D. G. Stork Pattern Classification, 2nd edition, Wiley, 2001.
[FSGD08] B. Fischer, M. Sauren, M. O. Güld, and T. M. Deserno "Scene Analysis with Strcuctural Prototypes for Content-Based Image Retrieval in Medicine", Proc. SPIE Conf. No. 6914, 2008.
[HLMS08] A. Hanjalic, R. Lienhart, W-Y Ma, and J. R. Smith "The Holy Grail of Multimedia Information Retrieval: So Close or Yet So Far Away?" IEEE Proceedings, Special Issue on Multimedia Information Retrieval, 96 (4), 2008, pp. 541-547.
[HSHZ04] L. Huston, R. Sukthankar, D. Hoiem. and J. Zhang "SnapFind: Brute Force Interactive Image Retrieval" Proceedings of International Conference on Image Processing and Graphics, 2004.
[IA03] Q. Iqbal, and J. K. Aggarwal "Feature Integration, Multi-image Queries and Relevance Feedback in Image Retrieval", Proc. VISUAL 2003, pp. 467-474.
[Io05] John P. A. Ioannidis "Why Most Published Research Findings Are False" Public Library of Science - Medicine 2(8): e124, 2005.
[Ju91] B. Julesz "Early vision and focal attention", Reviews of Modern Physics, vol. 63, (July 1991), pp. 735-772.
[LJJ08] J-E. Lee, A. K. Jain, R. Jin "Scars, Marks, and Tattoos (SMT): Soft Biometric for Suspect and Victim Identification" under review
[PJHHL06] T. Pavlidis, E. Joseph, D. He, E. Hatton, and K. Lu "Measurement of dimensions of solid objects from two-dimensional image(s)" U. S. Patent 6,995,762, February 7, 2006. Discussion of the methodology and links to sources on line.
[Pe89] Arno Penzias Ideas and Information, Norton, 1989.
[RB98] V. S. Ramachandran and S. Blakeslee Phantoms in the Brain, William Morrow and Company Inc., New York, 1998
[RMV07] Rasiwasia, N., P. J. Moreno, and N. Vasconcelos "Bridging the Gap: Query by Semantic Example", IEEE Trans. Multimedia, vol. 9, 2007, pp. 923-938.
[SWSGJ00] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain "Content-Based Image Retrieval at the End of the Early Years", IEEE Trans. PAMI, vol. 22, 2000, pp 1349-1380.
[YSGB05] K. Yanai, N. V. Shirahatti, P. Gabbur, and K. Barnard "Evaluation Strategies for Image Understanding and Retrieval", Proc. ACM MIR, Nov. 2005.

Reports from SIGMM Sponsored and Co-sponsored Events

Workshop on Network and Operating System Support for Digital Audio and Video

Conference Chairs: Lars Wolf, Carsten Griwodz

Event location: Braunschweig, Germany

Event date: May 29 and 30, 2008


Sponsored by ACM SIG Multimedia

Report by Carsten Griwodz and Lars Wolf

NOSSDAV 2008 was opened with a reception on May 28, 2008, and was held at the Institute of Operating Systems and Computer Networks (IBR) at Technische Universität Braunschweig (TU Braunschweig) on May 29 and 30, 2008.

The motto of NOSSDAV 2008 was "interactivity". The motto underlined the workshop feeling of NOSSDAV and inspired attendees to interact with each other, but covered also the scientific program comprising a keynote speach, an invited talk and 17 peer-reviewed papers (by authors from 11 different countries), where support for interactive multimedia played a major role. Also under this motto, we had included an extended demonstration session into the workshop that allowed attendees to interact with their colleagues' experimental multimedia systems. 12 of these demos are also represented by short papers in the proceedings.

Attendees' expressed that they liked the workshop program and that they were very satisfied with the IBR staff who guaranteed the smooth running of the workshop and who took immediate care of all attendees' needs. The demonstration session was introduced following discussions about the future direction of NOSSDAV at ACM Multimedia 2007, and the feedback on including such a session that we received from participants was very positive as well. The program included also a keynote speach by Jrg Liebeherr of the University of Toronto, who gave a talk titled "Overlays can do more ... if not everything" and an invited talk by Paulo Mendes of INESC, Porto, on "Cooperative Networking as Boosting Tool for Internet Interactivity". The two talks presented quite different views of one of the currently hottest topics in our research, the potential of distributed systems with end-user involvement, and generated a lot of discussion.

The accepted peer-reviewed papers were arranged into six sessions, entitled "networking for virtual worlds", "networking and operating system support", "digital audio and video", "video streaming in wireless environments", "analyses and conclusions", and "streaming with P2P support".

In the "networking for virtual worlds" session, our colleagues demonstrated that the system support for virtual worlds in our field is quickly growing out of the shoes that SIGGRAPH is filling. The first paper demonstrated how autonomous systems can be used to improve the time required to find a reasonable set of nearby server that provide an entrance in a virtual world. In this case applied to game servers, the results are probably valid in other distributed systems as well. The other two papers of the session were concerned with 3D streaming, the first one proposing a new means for network-conscious refinement of high-quality 3D models, the other a peer selection strategy for distributed 3D environments that combines peer relations with areas-of-interest. The "network and operating system support" session started with a technique for maintaing QoS for a single VoIP connections whose bottleneck is the traffic generated by other applications in its own home network. This was followed by a deeper investigation of the group unicast concept, a technique that emulates multicast for an application while maintaining unicast semantics in the kernel underneath, and a study that proposes to use two-level schedulers to schedule media coding workloads on heterogeneous multicore processors. The "digital audio and video" session took a look at video services in Web 2.0, in particular examing the arrangement of video sources of the large video hosting Web 2.0 sites, proposed a technique for encoding video depending on the workload available for decoding it, and presented new ideas for tangible user interfaces that made use of object recognition to make interaction with digital content more intuitive. "Video streaming in wireless environments" was a session comprising a presentation of experimental results of a video quality investigation in wireless mesh networks based on IEEE 802.11e and an implementation of a wireless router that scales H.264 SVC video streams network-consciously by acting as a transparent RTSP/RTP proxy. The session on "analysis and conclusions" comprised three workload studies of existing commercial systems, one aimed at massive multiplayer games, one aimed at on-demand video streaming, and one for IP telephony. These are meant to provide other researchers with a means of modelling the needs and effects of today's major distributed applications. "Streaming with P2P support" examined overlay generation for a multi-source streaming system, compared the hit rates of P2P streaming networks under different replication strategies, and a DHT-based P2P system that scores peers by their capabilities, which allows the P2P system a considerable load balancing improvement.

The demonstration session was scheduled in the morning of May 30 and started with one-slide, two-minute introductions of each demo. This was a hard piece of work for the session chair, since researchers feel the need to explain their work in more detail to the plenum.

The social event on May 29 started out with a river tour on the river Oker through the city, which ended at the Oker Terrassen, where the conference dinner was held outdoors on the terraces. The dinner buffet was good and plentiful, and people did stay for long discussions. Most attendees' hotels were in easy reach from the restaurant, so the event faded out slowly.

The workshop was sponsored by ACM SIG Multimedia, with co-sponsorship by TU Braunschweig and Simula Research Laboratory AS (Simula). NEC and Ericsson were further sponsors of the workshop. TU Braunschweig and Simula took the financial risk of the workshop, which allowed a much more optimistic cost-estimate than would otherwise have been possible.

Reports from other ACM Events

Computer Vision meets Databases

Conference Chairs: Laurent Amsaleg (IRISA), Björn Þor Jónsson, (Reykjavik Univeristy), Vincent Oria (New Jersey Institute of Technology)

Event location: Beijing, China

Event date: June 10, 2007


Report by Björn Þor Jónsson

The goal of the International Workshop Series on Computer Vision meets Databases, or CVDB, is to foster inter-disciplinary work between the areas of computer vision and databases. The Third International Workshop on Computer Vision meets Databases, or CVDB 2007, was held in Beijing, China, on June 10, 2007. The workshop was co-located with the 2007 ACM SIGMOD/PODS conferences and was attended by twenty-five participants from all over the world. A report of the workshop was recently published in the ACM SIGMOD Record journal, and is available at

PhD thesis abstracts

Bo Gong

Photo Stream Segmentation Using Context Information

This dissertation defines the problem of photo stream segmentation and proposes solutions to address two technical challenges faced by photo stream segmentation. The first problem is hierarchical photo segmentation. Photo stream segmentation is to group photos into clusters, each of which corresponds to an event. Existing works were focused on flat partitions of photo streams for single user. However, hierarchical structure is preferred. One of the objectives of photo stream segmentation is to help understand and recognize the event that is captured by a photo stream. Thus it is ideal to segment the photo stream into a structure that matches the event's structure. Event is of hierarchical structure. It is therefore natural to segment photo streams in hierarchical structure. Another problem is photo stream segmentation for multi-user. When photos taken by different people are of the same event, this enables opportunities to another interesting research problem, i.e., how to segment photos from multi-user according to events. As photo on-line sharing becomes popular, it is very common to segment photo streams from multiple users. However, this problem has also been ignored.

This dissertation addresses the two problems aforementioned. There are two types of information that can be used for photo stream segmentation: content information and context information. This dissertation is focused on investigation of context information usage in photo stream segmentation. Based on context information, hierarchical mixture of Gaussian model is used as a unified framework for photo stream segmentation. It is assumed that all the photos from a photo stream are known i.i.d. samples from a mixture of underlying Gaussian distributions where each component represents an event. EM algorithm is used to learn the model parameters and the Bayesian Information Criterion is used to determine the number of component. We compare the performance when different context information is used which includes not only time, but also location and camera parameters.

Advisor(s): Ramesh Jain

SIG MM member(s): Ramesh Jain


Experiential Systems Laboratory, UC, Irvine

Liam M. Mayron

Image Retrieval Using Visual Attention

The retrieval of digital images is hindered by the semantic gap. The semantic gap is the disparity between a user's high-level interpretation of an image and the information that can be extracted from an image's physical properties. Content-based image retrieval systems are particularly vulnerable to the semantic gap due to their reliance on low-level visual features for describing image content. The semantic gap can be narrowed by including high-level, user-generated information. High-level descriptions of images are more capable of capturing the semantic meaning of image content, but it is not always practical to collect this information. Thus, both content-based and human-generated information is considered in this work.

A content-based method of retrieving images using a computational model of visual attention was proposed, implemented, and evaluated. This work is based on a study of contemporary research in the field of vision science, particularly computational models of bottom-up visual attention. The use of computational models of visual attention to detect salient by design regions of interest in images is investigated. The method is then refined to detect objects of interest in broad image databases that are not necessarily salient by design.

An interface for image retrieval, organization, and annotation that is compatible with the attention-based retrieval method has also been implemented. It incorporates the ability to simultaneously execute querying by image content, keyword, and collaborative filtering. The user is central to the design and evaluation of the system. A game was developed to evaluate the entire system, which includes the user, the user interface, and retrieval methods.

Advisor(s): Dr. Oge Marques -- Dissertation advisor

SIG MM member(s): Liam M. Mayron, Oge Marques



MLAB is Florida Atlantic University's Multimedia Laboratory. It is part of Florida Atlantic University's Department of Computer Science. Since its foundation in 1995, the laboratory's continuous mission has been to produce a valuable research in the fields of multimedia processing and multimedia communications. In particular, research has been focused in the areas of image and video analysis, video coding, and multimedia security.

Lin Ma

Techniques and Protocols for Distributed Media Streaming

Distributed media streaming employs multiple senders to cooperatively and simultaneously transmit a media stream to a receiver over the Internet, exploiting both sender diversity and path diversity to improve robustness in the system. This new model for media streaming has raised many challenging and interesting research issues. Previous research have studied many of the application-level issues, such as sender selection and packet scheduling. In this dissertation, we continue on the same line of research, but focuses on the fundamental issues of error control, congestion control, and transport protocol design.

We first study how streaming quality can be improved through distributed retransmission -- retransmission from alternate senders other than the origin of the lost packet. We explore the question of whether distributed retransmission recovers more packet losses than non-distributed retransmission by comparing two naive distributed retransmission schemes with the traditional non-distributed scheme. Through analysis, simulations, and PlanetLab experiments, we found that distributed retransmission leads to fewer lost packets and shorter loss burst length. To address the practical issue of who to retransmit from, we propose a distributed retransmission scheme that selects a sender with the lowest packet loss rate to retransmit from. Results show that our proposed scheme effectively recovers packet losses and improves playback quality.

Second, we investigate the issue of TCP friendliness in distributed media streaming. The traditional notion of TCP friendliness is not suitable for multi-flow applications, such as distributed media streaming, as it is unfair to other single-flow applications. We therefore introduce the notion of task-level TCP friendliness for distributed media streaming, where we require the total throughput for a set of flows belonging to the same task to be friendly to a TCP flow. To this end, we design a congestion control protocol to regulate the throughput of the flows in an aggregated manner. The regulation is done in two steps. First, we identify the bottlenecks and the subset of flows on the bottlenecks. Then, we adjust the congestion control parameter such that the total throughput of the subset is no more than that of a TCP flow on each bottleneck. Network simulation using multiple congestion scenarios shows the efficacy of our approach.

The combine requirements of flexible retransmission and TCP-friendliness above lead to the need for a unreliable, congestion controlled transport protocol. An option is to use DCCP. In this dissertation, however, we proposed an alterative to DCCP which is much simpler to implement. Our proposed protocol is based on TCP, and is called TCP Urel (for UnRELiable). TCP Urel sends fresh data during retransmissions, and therefore keeps the congestion control mechanism of TCP intact. TCP Urel is simple to implement. We realized TCP Urel based on the existing TCP stack in FreeBSD 5.4, with less than 750 lines of extra code. Our experiments over a LAN testbed show that TCP Urel is friendly to different TCP versions and introduces little CPU overhead. TCP Urel can easily evolve with TCP so that it remains friendly to future version of TCP.

Advisor(s): Wei Tsang Ooi (Thesis Supervisor)

SIG MM member(s): Wei Tsang Ooi


NEMESYS: Networked and Embedded Media Systems Research Group

The Networked and Embedded Media Systems (NEMESYS) research group at School of Computing, National University of Singapore, conducts theoretical and systems research with a special focus on multimedia applications. Our interest spans distributed systems, operating systems, embedded systems, and programming systems. In particular, we are studying how to provide systems support for multimedia data types (video, audio, graphics) in the context of applications such as video on demand, tele-conferencing, webcast productions, computer games, video surveillance, running on personal computers and mobile devices.

Stian Johansen

Rate-Distortion Optimization for Video Communication in Resource Constrained IP Networks

Resource limitations lead to a number of challenges in all communication systems. This is also the case for video communication over networks based on the Internet Protocol (IP), where limited rates are shared among competing, heterogeneous users. Packet losses, delays and connectivity losses will be experienced to a varying degree depending on network loads, physical properties and mobility-related issues. These factors, as well as source coding characteristics, in source the visual quality experienced by the users.

One of the main contributions of the work presented in this thesis is that performance gains can be attained when considering characteristics of source coding, networks and congestion control jointly. Throughout the presented work, optimization of visual quality is at the centre of attention. The thesis is divided into three main parts, with contributions as follows.

Part A deals with rate-distortion optimized packet loss protection when communicating video over multiple channels simultaneously. Source coder characteristics and characteristics of the (logically or physically) different channels are taken into account in order to yield an optimized packet loss protection of the video. This part presents different optimization algorithms, which are in turncompared in terms of both performance and complexity.

Part B uses the algorithms of part A in the context of congestion control. Specifically, the potential problem of misbehaving receivers is considered. In current systems, there exists an incentive for non-conformant congestion control by video receivers in that an improved video quality can be achieved through obtaining an unfairly high bandwidth share. Since this has unfortunate effects on the connection characteristics of competing users, it poses a potential problem for mass deployment of UDP (User Datagram Protocol) based video services. In this work, a joint source-channel coding based framework which removes the incentive for bandwidth `greediness' is introduced. Specifically, the framework attempts to reverse the situation and provide an incentive in terms of visual quality for adhering to congestion control guidelines for fair bandwidth sharing. The framework is developed for both unicast and multicast cases, and is presented along with optimization algorithms and simulation results.

Part C considers real-time video delivery in mobile ad-hoc networks. As this type of networks exhibit rather harsh characteristics in terms of throughput, packet losses and mobility-induced route losses, new solutions are required. The approach taken in this work is based on a distributed rate-distortion optimization framework, where multiple sources are used concurrently. The system uses scalable video coding and rateless channel codes in order to allow for uncoordinated sources and distributed optimization. The complete system is implemented in a network simulator, and is shown to exhibit considerable performance gains compared to previous proposed systems.

Advisor(s): Professor Andrew Perkis

SIG MM member(s): Andrew Perkis

ISBN number: 978-82-471-7300-8


Centre for Quantifiable Quality of Service in

The main focus of the research being conducted at the Centre is, as the name implies, quantifiable aspects of Quality of Service (QoS). Here the term QoS should be understood as having a perhaps unusually broad scope.

Indeed, much of the research conducted at the Centre is based on the insight that QoS must involve as diverse factors as user experience, perception, expectation, dependability and security - in addition to the more classical packet forwarding statistics.

In terms of expertise, the Centre is comprised of experts within the fields of information security, dependability, networking, performance, and audiovisual signal processing. While these topics traditionally are treated separately, research at the Centre is inherently cross-disciplinary. Certainly, when QoS is defined as above, the integration of these relatively diverse fields of competence is a necessity.

An important part in ongoing research within the assessment of QoS is case studies, measurements and tests. To this end, the Centre uses its own lab infrastructure as well as labs located at the Department of Electronics and Telecommunications, Department of Telematics (both at NTNU) and UNINETT.

Whereas the mentioned lab facilities are of prime importance in the assessment of QoS, basic research is being conducted along more theoretical strands as well. Within the development of QoS, mechanisms for dynamic networks, resource allocation, dependability and security are major research topics. Considering security, important subtasks include the modelling of human behavior, risk assessment and cryptographic services. Research into dependability primarily concentrates on the development of technologies that allow for service differentiation, as well as accompanying methods of tailoring the QoS parameters of each service to its requirements. Work within resource allocation considers technologies for multipath environments, including monitoring, estimation and path selection.

An important prerequisite for being able to quantify and provide QoS is found in the processing done at endpoints and in gateways. Within network media handling, the provisioning of functionality within media representation is of prime interest. Research is being conducted within media compression and representation, transcoding and transmoding, loss resilience and monitoring of perceptual quality. Interactivity is an important keyword here, both between user and system and between the different technical components in the media delivery chain.

Freely available source code, traces and test content

GpuCV: open-source GPU-accelerated image processing and Computer Vision library


GpuCV is an open-source GPU-accelerated image processing and Computer Vision library. It offers an Intel's OpenCV-like programming interface for easily porting existing OpenCV applications, while taking advantage of the high level of parallelism and computing power available from recent graphics processing units (GPUs). GpuCV supplies a set of GPU-accelerated OpenCV-like operators that are ready to be used in OpenCV applications as well as a framework for creating custom GPU-accelerated operators based on OpenGL+GLSL or NVIDIA CUDA APIs. It has been developed by Institut TELECOM since 2005 and it is distributed as free software under the CeCILL-B license.

FAU Salient Image Database


The FAU Salient image database consist of 1789 images organized into 14 categories. Each image is designed to have one or more salient objects in it. The photos were taken in a variety of locations (both indoors and outdoors) around the Boca Raton campus of Florida Atlantic University. Examples of salient object categories include stop signs, warning signs, tennis balls, basketballs.

Calls for Contributions

SIGMM Sponsored and Co-sponsored Events

19th International Workshop on Network and Operating System Support for Digitial Audio and Video

Full paper Deadline: February 9, 2009
Event location: Williamsburg, Virginia, USA
Event date: June 3-5, 2009
Sponsored by ACM SIG Multimedia

NOSSDAV 2009 will continue the workshop's long tradition of focusing on emerging topics, controvesial ideas, and future research directions in the area of multimedia systems research. The workshop will be held in a setting that stimulates lively discussions among the senior and junior participants. It is an established practice for NOSSDAV to encourage experimental research based on real systems and data sets. Further, public availability of source code and data sets is highly encouraged. For NOSSDAV 2009, we would like to especially highlight two new topics of interest: novel use of GPU for multimedia and multi-core processors support for multimedia.

Events held in cooperation with SIGMM

2nd International Conference on IMMERSIVE TELECOMMUNICATIONS

Full paper Deadline: November 1, 2008
Event location: University of California, Berkeley, USA
Event date: May 27-29, 2009
In cooperation with ACM SIG Multimedia

The aim of IMMERSCOM is to promote multi- and cross-disciplinary research on capturing, processing, analyzing, coding, communication and rendering of rich audio-visual content in order to enable remote immersive experiences of people, objects and environments. The body of technologies that enable such immersive experiences is collectively referred to as Immersive Telecommunications Technologies. Applications of these technologies are varied, and include tele-presence, industrial automation, health care, education, and entertainment. Many of these are beginning to be viewed as green technologies.

The Tenth Workshop on Mobile Computing, Systems, and Applications

Full paper Deadline: October 13, 2008
Event location: Santa Cruz, CA, USA
Event date: February 23-24, 2009
In cooperation with ACM SIG Multimedia

ACM HotMobile 2009, the Tenth Workshop on Mobile Computing Systems and Applications continues the series of highly selective, interactive workshops focused on mobile applications, systems, and environments, as well as their underlying state-of-the-art technologies. HotMobile's small workshop format makes it ideal for presenting and discussing new directions or controversial approaches.

The 15th International MultiMedia Modeling Conference (MMM2009) - DEMO AND INDUSTRY EXHIBITION

Full paper Deadline: October 10, 2008
Event location: EURECOM, Sophia Antipolis, France
Event date: 7-9 January 2009
In cooperation with ACM SIG Multimedia

The International MultiMedia Modeling (MMM) Conference is a leading international conference for researchers and industry practitioners to share their new ideas, original research results and practical development experiences from all MMM related areas. The MMM Conference is inviting enterprises from different sectors to present their latest products and demos in the areas of multimedia and signal processing. The industry exhibition will take place during the conference and companies interested in showcasing their products or/and demos will be granted some space in the conference demo primacies at no cost.

Other multimedia-related Events

EURASIP Journal on Image and Video Processing, Special Issue on Dependable Semantic Inference

Full paper Deadline: November 1, 2008

Guest Editors: Alan Hanjalic (Delft University of Technology), Tat-Seng Chua (National University of Singapore), Edward Chang (Google Inc./ UC Santa Barbara), Ramesh Jain (UC Irvine)

The goal of this Special Issue is to inspire and gather new, refreshing ideas in the field of content-based multimedia information indexing, search and retrieval that would make the solutions in the field mature enough to get adopted in practical systems and realistic application scenarios. In particular, the Special Issue focuses on high-quality and original contributions that reach beyond conventional ideas and approaches and make substantial steps towards dependable, practically deployable semantic inference theories and algorithms.

Publication date: May 1, 2009

International Journal of Advanced Media and Communications (IJAMC), Special Issue on High-Quality Multimedia Streaming in P2P Environments

Full paper Deadline: December 1, 2008

Guest Editor: Mohamed Hefeeda (Simon Fraser University)

Peer-to-peer (P2P) multimedia streaming systems have received significant attention from academia and industry in the past few years. Because of the limited capacity and unreliability of peers, mechanisms are needed to efficiently manage the resources contributed by peers and to adapt to the dynamic nature of the network. This special issue is dedicated to addressing all research challenges related to enabling the streaming of high-quality multimedia content in dynamic P2P systems. Authors are invited to submit papers that have significant research contributions to this special issue.

Publication date: 2nd or 3rd quarter of 2009

The 7th ACM/USENIX International Conference on Mobile Systems,Applications, and Services (MobiSys 2009)

Full paper Deadline: November 26, 2008
Event location: Krakow, Poland
Event date: June, 2009

MobiSys 2009 seeks to present innovative and significant research on the design, implementation, usage, and evaluation of mobile computing and wireless systems, applications, and services. This conference, jointly sponsored by ACM SIGMOBILE and USENIX, builds on the success of the previous six conferences.

The 8th ACM/IEEE International Conference on Information Processing in Sensor Networks

Full paper Deadline: October 15, 2008 (Abstract submission)
Event location: San Francisco, USA
Event date: April 13-16, 2009

The International Conference on Information Processing in Sensor Networks (IPSN) is a leading, single-track, annual forum that brings together researchers from academia, industry, and government to present and discuss recent advances in sensor network research and applications. The conference covers both theoretical and experimental research, as it pertains to sensor networks, in a broad range of disciplines including signal and image processing, information and coding theory, databases and information management, distributed algorithms, networks and protocols, wireless communications, machine learning, and embedded systems design.

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), Special Issue on Knowledge Media

Full paper Deadline: December 1, 2008

Guest Editors: Christoph Rensing (University of Darmstadt), Abdulmotaleb El Saddik (University of Ottawa), Edward A. Fox (Virginia Tech / USA), Chua Tat-Seng (National University of Singapore)

The present and next decade are identified by two trends - the ubiquitous presence of multimedia is in everyone's life, everyday, and is increasing the importance of knowledge in our work and social living. We are part of the Knowledge Society and we live in the Internet and Multimedia Age. Both trends will merge. Multimedia is currently facing many opportunities to record, store and deliver knowledge. Knowledge Management and Technology enhanced learning are facing the need to cope with the opportunities that multimedia present. Knowledge Media is becoming one of the most important application domains for Multimedia Technologies, besides end consumer usage. At this stage, educational scenarios push the development of multimedia applications and systems in many cases.

Publication date: November 2009

18th International World Wide Web Conference (WWW2009)

Full paper Deadline: November 3 , 2008
Event location: Madrid, Spain
Event date: April 20 - 24, 2009

Topics of discussion will include Browsers and User Interfaces, Data Mining, Industrial Practice and Experience, Internet Monetization, Mobility, Performance and Scalability, Rich Media, Search, Security and Privacy, Semantic / Data Web, Social Networks and Web 2.0, Web Engineering, XML and Web Data. The conference will also feature plenary speeches by renowned speakers, and tracks devoted to developers and to recent W3C activities that are of interest to the community.

Special Issue on MMOG Systems and Applications

Full paper Deadline: January 15, 2009

Guest Editors: Shervin Shirmohammadi (University of Ottawa), Marc Claypool (Worcester Polytechnic Institute)

This is a special issue on MMOG Systems and Applications, in Springer's Multimedia Tools and Applications journal. It will cover enabling technologies and systems for MMOGs, as well as applications of MMOGs outside of gaming. The focus will be mostly on the "massive" aspect of games.

Tentative publication date: late 2009

Award Opportunities

ERC Starting Grant

More info:

European Research Council Starting Independent Researcher Grants (ERC Starting Grants) aim to support up-and-coming research leaders who are about to establish or consolidate a proper research team and to start conducting independent research in Europe. The scheme targets promising researchers who have the proven potential of becoming independent research leaders. It will support the creation of excellent new research teams and will strengthen others that have been recently created.

GEM Fellowship

More info:

The National GEM Consortium's primary focus is to administer and award full fellowships with paid internships to highly qualified under-represented students who wish to pursue graduate studies in engineering or science. GEM's program activities, however, go beyond financial support by engendering student success in academic and professional environments. GEM has a solid success record in implementing effective programs to increase the recruitment, retention, and graduation of minority students.

NSF Graduate Fellowship Program

More info:

The National Science Foundation Graduate Research Fellowship Program recognizes and supports future leaders in the science, technology, engineering, and mathematics fields, and awards over 1,000 fellowships to outstanding students in these fields. Fellowships provide awardees with the freedom to pursue research projects of their own design.

Applicants must be US citizens, nationals or permanent resident aliens.

Job Opportunities

Assistant Professor

Employer: EURECOM, Sophia Antipolis (France)
Valid until: 31 December 2008
More info:

The new faculty will undertake research in close collaboration with the existing activities (Video analysis and information filtering, 3D Face Cloning, watermarking and biometrics, Speech and sound processing) and will participate in the teaching program for Master students. A strong commitment to excellence in research and teaching is essential.


Carsten Griwodz, Simula Research Laboratory
Stephan Kopf, University of Mannheim
Jun Wang, University College London
Yi Cui, Vanderbilt University