|ACM SIG MM eNewsletter||ACM SIG MM webpage|
Dear Member of the SIGMM Community,
We are proud to present the second issue of the SIG Multimedia newsletter. This time around you will find information about exciting new research being performed by young researchers of the multimedia community and a variety of information from members of the community, with pointers to further information regarding this.
Since the first issue of this newsletter we have received considerable amounts of feedback from members of the community. This is greatly appreciated, and also highly motivating, as it shows that there is an interest in the community for continuing this work, and that there is enough dedication to warrant time spent on giving feedback. We wish to encourage this activity, as the advice and suggestions we receive from you are invaluable and can only help improve the quality of forthcoming issues. Naturally, we cannot be any less dedicated then the community, and as you might have noticed we have already made some changes, which we hope you will consider improvements. Among the most notable changes is the inclusion of only brief abstracts in the newsletter itself, with pointers to further reading on the newsletter web page, and also an improved layout for the web page itself. We hope these changes are satisfactory, and of course, we are open for criticism.
One major news item in this issue is a report from the ACM Multimedia Conference 2007, by the conference chairs. It provides a detailed overview of the events that occurred during the conference, and as such, will provide those that did not have the opportunity to attend the SIG Multimedia's flagship conference with the information they might wish to obtain. It also provides a nice foundation for those who are curious about the conference and might want to get a better understanding of what to expect from future events. You can also read a summary of NOSSDAV 2007.
In addition to being strongly represented in this issue of the newsletter, the choice of this issue's featured paper has fallen to a paper presented at the ACM Multimedia Conference 2007. The paper is entitled "Rate Allocation for Multi-User Video Streaming over Heterogeneous Access Networks" and comes highly recommended to anyone that might have some interest in video streaming to heterogeneous access networks.
In related news, we are happy to refer you to a freely available on line collection of audiovisual material provided by the European Broadcasting Union (EBU). This material is primarily made available with the intent of using in studies that are concerned with automatic information extraction, but freely available to any kind of research. As such, there is a high probability that this collection of material might be of some use to you, either now or in the near future.
Finally, we have made available a number of summaries of the theses of recently graduated researchers. These theses represent the accumulated work of several years of research and might contain information relevant to you or your peers, and as such, might be worth taking a closer look at.
With kind regards,
Table of Contents
Rate Allocation for Multi-User Video Streaming over Heterogeneous Access Networks
The paper "Rate Allocation for Multi-User Video Streaming over Heterogeneous Access Networks" by Xiaoqing Zhu et al was elected as one of the best papers of ACM Multimedia '07 and presented in September, the same year, at the conference held at the University of Augsburg at Augsburg, Germany. The paper can currently be found at the ACM digital library.
This paper targets on the challenging problem of optimizing rate allocation for multiple competing video sessions over a multitude of access networks. Recent years have witnessed the expansion of Internet to a wide variety of access networks, such as cellular, WiFi, WiMax, etc. Yet, these networks are highly heterogeneous in terms of available bit rates (ABR), round trip times (RTT), and other vital network attributes. On the other hand, media streaming applications, depending on their purposes and the nature of the media content being transferred, often have different QoS requirements such as delay bound, distortion rate, etc. The authors formulate the rate allocation problem in a convex optimization framework, whose goal is to minimize the sum of expected distortions of all participating video streams. The rate distortion function is formulated based on a parametric model, which includes parameters specifying allocated rate, coding scheme, and the content of the video. A distributed approximation to the optimization is presented to enable autonomous rate allocation at each device in a media- and network-aware fashion. In addition, a scheme based on H-infinite optimal control is proposed to address the scenario where media-specific information is not available. Through NS-2 simulation, these two schemes are compared against two heuristic schemes employing TCP additive-increase-multiplicative-decrease (AIMD) principles, and are shown to reduce the average packet loss ratio from 27% to below 2%, while improving the average received video quality by 3.3 - 4.5 dB in PSNR.
This paper is a timely work. "Internet in a pocket" and "video everywhere" are among the most promising and talked-about technology advancements in the near future. This paper sets out to address the very key challenge to turn above vision into reality. As such, we highly recommend this paper to all our readers, though in particular to those with an interest in providing high-quality multimedia streaming in heterogeneous access networks. Therefore, we have selected this paper as the featured paper to appear in this newsletter.
ACM Multimedia 2007
ACM Multimedia 2007 was held at the campus facilities of the University of Augsburg in Augsburg, Germany, from September 24 to 29, 2007. It received a new record number of paper submissions, had a very high number of attendees and a very high quality and strong variety of program. The main conference was attended by 388 participants, the overall event (tutorial or main conference or workshop) by 429 participants. 210 participants attended workshops, 20 participants attended one half-day tutorials, while 61 participants attended two half-day tutorials.
The conference included 7 half-day tutorials for which we received an excellent feedback from the attendees:
The main conference had a rich 3 day program. All activities started at 8:30am and ended at 5pm or 5:30pm. Over the past 15 years, the concept of multimedia has evolved and the tutorials, main conference technical program, and workshops reflected this change. The conference covered a wide range of topics from foundation of multimedia, through multimedia systems, networks, and multimedia interactions to multimedia content and applications. In addition, the interactive art program was held at the Applied University of Augsburg with very exciting and outstanding exhibitions. The program included conference events such as four keynote addresses, long papers, short papers, brave new emerging topics, panels, doctoral symposium, technical demonstrations, open source competition, video demonstrations, research papers in interactive art and a very interesting multimedia art exhibition.
We had two research keynote speakers on Tuesday and Thursday morning as well as two application keynote speakers Wednesday morning. On Tuesday morning, Prof. Dr. Wahlster (DFKI) presented a very interesting mosaic of his research and views on "Smart Web - Multimodal Web Services on the Road". A day-long demo with a BMW motor-bike as well as a Mercedes-Benz R-Class vividly demonstrated the concepts of his keynote. The Thursday keynote speaker was Dr. Minoru Etoh from NTT DoCoMo who discussed some of the very interesting mobile multimedia challenges as well as shared his insights into future mobile multimedia applications. Dr. Fageth of CeWe Color, Europe's largest photo finisher, presented the latest production pipeline and ideas from consumers' digital files to tangible image products. The talk was full of interesting facts most of us are not aware of. Finally, Prof. Dr. Lutz Heuser from SAP shared his view on the workplace of the future.
There were 298 submissions of long papers (113 in Content, 90 in Applications, 64 in Systems, and 30 in Multimedia Interactions). After a rigorous review process 57 long papers were accepted: 19 in the Content track, 18 in the Applications track, 13 in the Systems track, and 7 in the newly introduced Multimedia Interactions track. This represents an acceptance rate of 19 percent. The papers were organized in 20 sessions, three papers per session.
Short Papers (Poster Session)
We have received an absolute record of 255 papers (a 43% increase over the previous year), out of which 70 were selected. The acceptance ratio was at a record-low of 27% bringing the short paper acceptance rate close to one of the long papers. The short papers were presented in three poster sessions.
Brave New Topics (BNTs)
Based on the criteria that topics selected as BNT must be new, brave and likely to be of high level of interests and impact the BNT committee selected only one topic from the submitted proposals:
The facilitators for the BNT topic were responsible for (a) inviting the potential authors as listed in the proposal to submit papers in appropriate subjects, (b) overseeing the review of papers, each with 2-3 reviewers, (c) selecting the papers in consultation with the BNT chairs, and (d) ensuring the authors submit the camera-ready papers on time. The BNT session was scheduled on Wednesday afternoon in parallel with the main paper sessions. As in previous years, the BNT session was well attended and received.
This year, the panel chairs proposed one panel session:
Panelists were: Alan Hanjalic (Delft University of Technology, The Netherlands), Alan Smeaton (Dublin University, Ireland), John R. Smith (IBM T.J. Watson Research Center, USA), Ramesh Jain (UC Irvine, USA), Svetha Venkatesh (Curtin University, Australia), Wei-Ying Ma (Microsoft Research Asia)
The panel session with its interaction with the audience was very successful and covered areas of concerns. The session was very informative and interactive. Many diverse opinions were expressed at the panel which made for interesting discussion among the panelists and between panelists and audience.
The doctoral symposium received 8 submissions, out of which 3 were selected for presentation at the conference. The review committee was Thomas Plagemann and Vera Goebel. The symposium was held on Thursday early afternoon (right after lunch) to ensure a strong attendance of the event.
We had a very strong program during the technical demonstrations and it was one of the highlights of the conference. The demonstration sessions were split into Demo 1 on Tuesday afternoon and Demo 2 sessions on Wednesday morning with an overall of 26 demonstrations. In addition to these demos, our keynote speaker Prof. Dr. Wahlster brought in a BMW motor-bike as well as a Mercedes-Benz R-Class to demonstrate the concepts of this keynote "Multimodal Web Services on the Road".
Open Source Competition
We followed the initiative started at ACM Multimedia 2004 and organized the Open Source Competition. We had 10 very high quality entries, out of which 4 were accepted for presentation. A committee selected one as the winner. The winner gave a feature presentation and demonstration of his software during the conference. The winner was the "Programming Web Multimedia Applications with Hop" system by Manual Serrano, which is a library for programming multimedia web applications. The audience very much enjoyed the features and the demo of this best Open Source Software as well as the presentation of the over accepted submissions:
The arts track was run as a mini-conference within the conference with its own array of long track papers, short track papers and exhibition. The reviews of the papers followed more or less the same guidelines of the main conference. At the end of the review process, we accepted 9 art short papers (out of 20) and 7 art long papers (out of 15). There were multiple excellent interactive art program exhibits shown at new science building of the applied university of Augsburg. The competition was high and only 16 % of the submissions made it into the exhibition, i.e., 10 out of a total of 64 submissions were accepted. The tram ran every 10 minutes between the main conference site and the arts exhibition.
The long papers were divided into 3 sessions under the themes: (a) Pieces of Art, (b) Art Pieces, and (c) Fluid Art. The short art papers were presented in one poster session together with other technical posters. Like in previous years, the interactive art program was an integral part of the overall main conference program and the integration of the arts program into the main conference program brought the two groups very much together - the multimedia arts people on one hand and the multimedia technology people on the other. The synergy has been very good.
The conference continued with a strong awards program for best full paper, best short paper, best art contribution, best demonstration, and open source winner. The following awards were presented during the conference banquet under various categories using self-generated conference funds.
We have included a presentation session at the conference where the four best paper nominees competed for the best full technical paper. This year we have not differentiated between student and non-student for the best paper. In the case that the overall best paper would come from a student, the rule was that the second best student paper would win the best student paper award. An awards committee made up of senior researchers in the field met afterwards to select the winner and announced him/her at the banquet.
The awards winner this year were:
We were also very fortunate this year to receive from IBM 10 student travel grants, covering the full advanced registration fee ($300) as well as a conference banquet ticket ($100).
Workshops have always been an important part of the conference. This year the following six workshops were organized as part of the conference:
The list included one standing workshop that has been running for many years (MIR), one repeated workshop (HCM) and four new workshops. MIR pulled in the largest crowd of 63, while HCM had 25, MV 46, TVS 33, MS 21 and EMME 22 participants. The workshops were well-attended with the smaller workshops being highly interactive.
We had an incredible support from the following sponsors: IBM, Yahoo! Research, Google, Microsoft Research (both locations Redmond and Beijing donated separately), Philips, SAP, FXPAL, CeWe, NEC, MAN, Ricoh, DoCoMo EuroLabs, the University of Applied Sciences in Augsburg and the University of Augsburg.
We are all looking forward to see you at ACM Multimedia 2008, which will be held in Vancouver, BC, Canada Oct 27 - Nov 1, 2008. The General Co-chairs are Prof. Abdulmotaleb EL Saddik and Prof. Son Vuong.
International Workshop on Network and Operating Systems Support for Digital Audio and Video
Over the years, NOSSDAV has become known as an excellent workshop due to the unique atmosphere it provides for discussion between students and senior researchers as well as the high caliber of its committee members.
NOSSDAV'07 was held at at the University of Illinois, Urbana-Champaign campus. Building on its reputation, NOSSDAV'07 attracted excellent papers on a range of topics related to multimedia systems and multimedia networking. NOSSDAV'07 received 52 submissions with authors from 19 countries. Each submission was reviewed by three program committee members and occasionally by an external reviewer. To maintain a workshop atmosphere and provide sufficient time for discussion, the program committee selected only 18 papers for publication. The selected papers cover a range of interesting topics such as streaming and display, gaming, coding, measurement, IPTV and mobility. Prof. Ralf Steinmetz delivered a keynote address on "QoS in Wireless Mesh Networks: A Challenging Endeavor" and industry meets academia in a panel discussion on "Large Scale Peer-to-Peer Streaming & IPTV Technologies".
PhD thesis abstracts
Adaptive media streaming over multipath networks
With the latest developments in video coding technology and fast deployment of end-user broadband internet connections, real-time media applications become increasingly interesting for both private users and businesses. However, the internet remains a best-effort service network unable to guarantee the stringent requirements of the media application, in terms of high, constant bandwidth, low packet loss rate and transmission delay. Therefore, efficient adaptation mechanisms must be derived in order to bridge the application requirements with the transport medium characteristics.
Lately, different network architectures, e.g., peer-to-peer networks, content distribution networks, parallel wireless services, emerge as potential solutions for reducing the cost of communication or infrastructure, and possibly improve the application performance. In this thesis, we start from the path diversity characteristic of these architectures, in order to build a new framework, specific for media streaming in multipath networks. Within this framework we address important issues related to an efficient streaming process, namely path selection and rate allocation, forward error correction and packet scheduling over multiple transmission paths.
First we consider a network graph between the streaming server and the client, offering multiple possible transmission paths to the media application. We are interested in finding the optimal subset of paths employed for data transmission, and the optimal rate allocation on these paths, in order to optimize a video distortion metric. Our in-depth analysis of the proposed scenario eventually leads to the derivation of three important theorems, which, in turn represent the basis for an optimal, linear time algorithm that finds the solution to our optimization problem. At the same time, we provide distributed protocols which compute the optimal solution in a distributed way, suitable for large scale network graphs, where a centralized solution is too expensive.
Next, we address the problem of forward error correction for scalable media streaming over multiple network paths. We propose various algorithms for error protection in a multipath scenario, and we assess the opportunity of in-network error correction. Our analysis stresses the advantage of being flexible in the scheduling and error correction process on multiple network paths, and emphasizes the limitations of possible real systems implementations, where application choices are limited. Finally, we observe the improvements brought by in-network processing of transmitted media flows, in the case of heterogeneous networks, when link parameters vary greatly.
Once the rate allocation and error correction issues are addressed, we discuss the packet scheduling problem over multiple transmission paths. We rely on a scalable bitstream packet model inspired from the media coding process, where media packets have different priorities and dependencies. Based on the concept of data pre-fetch, and on a strict time analysis of the transmission process, we propose fast algorithms for efficient packet scheduling over multiple paths. We ensure media graceful degradation at the client in adverse network conditions by careful load balancing among transmission paths, and by conservative scheduling which transparently absorb undetected network variations, or network estimation errors.
The final part of this thesis presents a possible system for media streaming where our proposed mechanisms and protocols can be straightforwardly implemented. We describe a wireless setup where clients can access various applications over possibly multiple wireless services. In this setup, we solve the rate allocation problem with the final goal of maximizing the overall system performance. To this end, we propose a unifying quality metric which maps the individual performance of each application (including streaming) to a common value, later used in the optimization process. We propose a fast algorithm for computing a close to optimal solution to this problem and we show that compared to other traditional methods, we achieve a more fair performance, better adaptable to changing network environments.
Balanced multiple description coding in image communications
Latest developments in video coding technology on one side, and a continuous growth in size and bandwidth in lossy networks on the other side, have undoubtedly created a whole new world of multimedia communications. However, nowadays networks, which are best-effort in nature, are unable to guarantee the stringent delay constraints and bandwidth requirements imposed by many of these applications. Therefore, the main challenge remains to find efficient coding techniques which do not require retransmission and which can ensure a good reconstruction quality if pieces of information are missing.
Multiple description coding (MDC) offers an elegant and competitive solution for data transmission over lossy packet-based networks, with a graceful degradation in quality as losses increase. In MDC, two or more representations of a source are generated in such a way that an acceptable quality is ensured even if only one description is received, while this quality further improves as more of them are combined. In this thesis, we address some important issues in MDC. One of them is how to generate an arbitrary number of descriptions, as it has been suggested by many researchers that having a scheme which adapts the number of descriptions to different lossy scenarios can be of great benefit. Another interesting problem is how to combine the principles of multiple description coding and increasingly popular redundant signal expansions, since they represent a natural candidate for MDC. Finally, our goal is to address the problem of designing a simple and efficient multiple description video coding scheme, which utilizes error resilience tools offered by the latest video coding standard, H.264/AVC.
We first address the generation of an arbitrary number of descriptions with the multiple description scalar quantization technique. Unlike the existing solutions whose complexity drastically increases when the number of descriptions augments, our solution remains very simple and easily extendable to any number of descriptions. We show how the tradeoff between distortions can be easily controlled with very few parameters in our scheme. Finally, given the probability of losing a description and the total bitrate, we find the optimal number of descriptions which minimizes the average distortion, taken as a sum of distortions weighted by the corresponding probabilities.
Next, we address the multiple description coding problem with redundant dictionaries of functions, called the atoms. Such dictionaries contain inherent redundancy, which can be efficiently exploited for MDC purpose. To do so, we cluster similar atoms together and represent each group by the molecules, taken as a weighted sum of the atoms in its clusters. Once a molecule is chosen as a good candidate in the signal representation, its children are distributed to different descriptions. To generate a description, we project a signal onto the sets of chosen atoms. This further gives us the sets of coefficients, which have to be quantized before transmission. To do so, we propose an adaptive quantization strategy which takes into account the importance of each atom, the properties of a dictionary and the expected loss probability. We apply these principles to an image communication scenario, where we use the modified version of matching pursuit algorithm to extract the most important information about an image on the level of molecules, and the less important candidates on the level of atoms. The redundancy in our scheme is controlled by the number of descriptions and the number of elements taken from the level of molecules.
Finally, we propose a standard compatible two-description video coding scheme which uses redundant pictures, an error resilience tool included in H.264/AVC, to improve the robustness to losses. In our implementation, redundant pictures are coarse versions of primary pictures and they are used to replace their possibly lost parts. If a primary picture is correctly received, its redundant version is normally discarded by the decoder. We propose a distortion model which, given the total bitrate and the network loss rate, tells us how to split the total rate between the ones for primary and redundant pictures, such that the average distortion at the receiver is minimized. We show that at low loss rates it does not make a lot of sense to waste bits on redundant pictures, since the probability they will be used as a replacement for primary pictures is low. On the other hand, as the loss rate increases, having a good quality of redundant pictures becomes more benecial. Finally, we show how the reconstructed quality can be further improved if we combine the reconstructions from both primary and redundant pictures.
Relevance Models for Collaborative Filtering
Collaborative filtering is the common technique of predicting the interests of a user by collecting preference information from many users. Although it is generally regarded as a key information retrieval technique, its relation to the existing information retrieval theory is unclear. This thesis shows how the development of collaborative filtering can gain many benefits from information retrieval theories and models. It brings the notion of relevance into collaborative filtering and develops several relevance models for collaborative filtering. Besides dealing with user profiles that are obtained by explicitly asking users to rate information items, the relevance models can also cope with the situations where user profiles are implicitly supplied by observing user interactions with a system. Experimental results complement the theoretical insights with improved recommendation accuracy for both item relevance ranking and user rating prediction. Furthermore, the approaches are more than just analogy: our derivations of the unified relevance model show that popular user-based and item-based approaches represent only a partial view of the problem, whereas a unified view that brings these partial views together gives better insights into their relative importance and how retrieval can benefit from their combination.
A Biometrics System based on Haptics for User Authentication in Virtual Environments
Haptics is the discipline that deals with the study of the complex sense of touch as an interface between human beings and machines. Consequently, haptic technology facilitates the exploration of and the interaction with the virtual world through touch. This can be done by utilizing special electro-mechanical interface devices equipped with specialized sensors. Haptic technology is being applied in many fields, such as scientific visualization, surgical procedures, education and arts.
This research investigates the issues related to the usage of haptics as a mechanism to extract behavioral features that characterize a biometric identifier system. In order to test this possibility, we designed a haptic-biometric system in which, position, velocity, force, angular orientation of the end-effector and torque data, among other attributes extracted from given haptic interfaces, are continuously measured and stored as users perform a specific task. We first proved the applicability of the haptic technology to analyze the characteristics of human hand movement patterns while users are interacting with a particular haptic device. Then, we analyzed the information content of the haptic data generated directly from the haptic interface in order to select those of the greatest user-classificatory worth. For example, the physical attributes, such as force, angular orientation of the end-effector as well as torque are presumed to provide valuable information content pertaining to a users unique behavior when he/she interacts with haptics in a Virtual Environment (VE). To analyze the information content, a new concept called entropic signature has been introduced, which characterizes both the way in which an individual is unique, and the magnitude of that uniqueness when using haptics in VE. Consequently, through a series of experimental works, which were based on the entropic signature approach, it was shown the suitability of haptics for authentication based on behavioral haptic biometrics.
Moreover, we provided evidence that our proposed haptic-based biometric system is best suited in biometrics verification mode rather than in biometric identification. In addition, we conducted further testing in order to discover individuals abilities and to evaluate their psychomotor patterns through selected tasks under different conditions such as stress, repeatability and concentration. Finally, an important and novel advantage of this research versus current authentication modes (biometrics and non- biometrics) is the fact that the authentication mode can be carried out at any given moment during the multimodal human-computer interaction process.
Automatic Classification of TV Genres (submitted)
During the last decades media became very deeply ingrained in our lives, due to the rapid evolution of the Information Technology (IT) industry. To efficiently access and retrieve desired information, tools for high-level multimedia documentation are indispensable. In this context, automatic genre classification provides a simple and effective solution to describe video contents in a structured and well understandable way. In this thesis, a method for classifying and characterising the genre of television programmes is presented.
All classification problems require a formal metadata representation. Nowadays, the Multimedia Content Description Interface (MPEG-7) represents the state of the art in multimedia content description. Its major drawback is that it does not include formal semantics. Consequently, we have developed an abstract taxonomy, which provides a technology-independent representation of multimedia content and allows to use knowledge-based techniques based on Semantic Web technologies and ontologies. The taxonomy for broadcasted TV programmes is modelled by a UML class diagram, which shows the properties of each information data channel and its interactions to each other.
In our experimental work four groups of features, which include both low-level perceptual descriptors and higher level audiovisual semantic information, are considered. The discriminatory power of these features for the classification of seven genres (i.e. News, Commercials, Football, Talk-Shows, Weather Forecasts, Cartoons and Music) is analysed using two different classification strategies.
In the first strategy a crisp classifier based on Artificial Neural Networks (ANNs)is considered. ANNs are successfully used in pattern recognition and machine learning due to their ability of producing fast and more accurate (w.r.t. classical statistical pattern recognition methods)evaluation of unknown data. In addition, ANNs do not require any a priori assumption on the statistical distribution of the recognised genres and are robust to noisy data.
In the second strategy the Fuzzy C-Means (FCM) clustering algorithm is employed to build a Fuzzy Embedding Classifier (FEC), which allows to model the intrinsic uncertainty underlying human judgements about genres (e.g. entertainment). The FEC system has been conceived not only to determine the membership degree of a TV programme to each of the considered genres but, and above all, to obtain a characterisation of multimedia contents, possibly also making explicit any possible ambiguity in the classification process (i.e. when a TV programme could be categorised as more than one genre, such as infotainment or docudrama) or, conversely, with the capability of rejecting attributions which are not sufficiently strong.
The experimental evaluations showed that both our approaches are very effective and achieve an outstanding performance mark when applied to a real-world scenario of more than 100 hours of complete TV programmes, achieving a classiffication accuracy of about 95% in the main classiffication task.
Automatic Summarization of Narrative Video
Due to advances in video compression, digital storage, and connectivity, the amount of video content available to every user is increasing rapidly. Automatically generated summaries can support users in browsing large video archives and in taking decisions more efficiently regarding choosing, purchasing, sharing, or deleting content. We present a model and a method for the automatic generation of video previews that is based on representing the problem as a constrained optimization problem. The method consists of two main parts. The first part comprises a set of multimedia content analysis algorithms used to analyze and characterize the content to be included in a video preview. The second part provides a solution to the optimization problem by means of local search. To validate our approach we have conducted a user evaluation study. The results show that previews generated by our method are not as good as manually made previews, but have higher quality than previews created without taking content properties into consideration.
The amount of digital video content available to users is rapidly increasing. Developments in computer, digital network, and storage technologies all contribute to broaden the offer of digital video. Only users attention and time remain scarce resources. Users face the problem of choosing the right content to watch among hundreds of potentially interesting offers.
Video and audio have a dynamic nature: they cannot be properly perceived without considering their temporal dimension. This property makes it difficult to get a good idea of what a video item is about without watching it. Video previews aim at solving this issue by providing compact representations of video items that can help users making choices in massive content collections. This thesis is concerned with solving the problem of automatic creation of video previews.
To allow fast and convenient content selection, a video preview should meet more than thirty requirements that we have collected by analyzing related literature on video summarization and film production. The list has been completed with additional requirements elicited by interviewing end-users, experts and practitioners in the field of video editing and multimedia. This list represents our collection of user needs with respect to video previews.
The requirements, presented from the point of view of the end-users, can be divided into seven categories: duration, continuity, priority, uniqueness, exclusion, structural, and temporal order. Duration requirements deal with the durations of the preview and its subparts. Continuity requirements request video previews to be as continuous as possible. Priority requirements indicate which content should be included in the preview to convey as much information as possible in the shortest time. Uniqueness requirements aim at maximizing the efficiency of the preview by minimizing redundancy. Exclusion requirements indicate which content should not be included in the preview. Structural requirements are concerned with the structural properties of video, while temporal order requirements set the order of the sequences included in the preview.
Based on these requirements, we have introduced a formal model of video summarization specialized for the generation of video previews. The basic idea is to translate the requirements into score functions. Each score function is defined to have a non-positive value if a requirement is not met, and to increase depending on the degree of fulfillment of the requirement. A global objective function is then defined that combines all the score functions and the problem of generating a preview is translated into the problem of finding the parts of the initial content that maximize the objective function.
Our solution approach is based on two main steps: preparation and selection. In the preparation step, the raw audiovisual data is analyzed and segmented into basic elements that are suitable for being included in a preview. The segmentation of the raw data is based on a shot cut detection algorithm. In the selection step various content analysis algorithms are used to perform scene segmentation, advertisements detection and to extract numerical descriptors of the content that, introduced in the objective function, allow to estimate the quality of a video preview. The core part of the selection step is the optimization step that consists in searching the set of segments that maximizes the objective function in the space of all possible previews. Instead of solving the optimization problem exactly, an approximate solution is found by means of a local search algorithm using simulated annealing.
We have performed a numerical evaluation of the quality of the solutions generated by our algorithm with respect to previews generated randomly or by selecting segments uniformly in time. The results on thirty content items have shown that the local search approach outperforms the other methods. However, based on this evaluation, we cannot conclude that the degree of fulfillment of the requirements achieved by our method satisfies the end-user needs completely. To validate our approach and assess end-user satisfaction, we conducted a user evaluation study in which we compared six aspects of previews generated using our algorithm to human-made previews and to previews generated by subsampling. The results have shown that previews generated using our optimization-based approach are not as good as manually made previews, but have higher quality than previews created using subsample. The differences between the previews are statistically significant.
Awards for SIGMM members
Alexandre Francois and Elaine Chew granted Fellowship at the Radcliffe Institute
Elaine Chew and Alexandre Francois (ACM members, University of Southern California) are 2007-2008 Fellows at the Radcliffe Institute for Advanced Study, where they form a research cluster on Analytical Listening through Interactive Visualization -- http://www-rcf.usc.edu/~mucoaco/Radcliffe . Selected from a pool of more than 775 applicants, the 52 fellows are a diverse group of distinguished and emerging scholars and artists.
Francois specializes in software architectures for interactive systems; he created the Software Architecture for Immersipresence -- open source middleware available from links at http://pollux.usc.edu/~afrancoi . Chew is an operations researcher and musician (pianist), who builds mathematical/computational models of music and its performance -- http://www-rcf.usc.edu/~echew .
As part of their cluster activities, the duo has organized:
A half-day symposium featuring state-of-the-art, award-winning systems for computer-assisted composition, human-machine improvisation, and automated accompaniment, juxtaposing four unique viewpoints on intelligent, interactive music systems:
A lecture-recital showcasing their MuSA.RT real-time music analysis and visualization system, and highlighting the mathematical techniques employed in contemporary compositions:
Freely available source code, traces and test content
EBU P/SCAIE Test Audiovisual Material
In the context of its technical activities, the EBU (European Broadcasting Union) makes available a data set of several digitised television programmes downoladable for free and usable for any kind of research purposes, coming from some of the European broadcasters' archives. Researchers interested in testing their technologies in a real life scenario, can ask the organisers of the AIEMPro08 workshop, an EBU sponsored event in conjuction with DEXA 2008, to have the coordinates for downolad. To do so, click on the "Material" link of the menu on the AIEMPro08 site. And always stay tuned for the availability of new programmes!
Calls for Contributions
ACM Multimedia (Video Program)
Full paper Deadline: June 6, 2008 (Video Demo)
ACM Multimedia 2008 invites your participation in the premier annual multimedia conference, covering all aspects of multimedia computing: from underlying technologies to applications, theory to practice, and servers to networks to devices.
Workshop on Network and Systems Support for Games (NetGames 2008)
Full paper Deadline: June 15, 2008
The NetGames workshop brings together researchers and developers from academia and industry to present new research in understanding current networked games and in enabling the next generation of networked games. Submissions are sought in any area related to networked games.
Multimedia Computing and Networking
Full paper Deadline: June 16, 2008
The multimedia computing and networking conference brings together researchers, practitioners and developers to contribute new ideas in all facets of multimedia systems, networking, applications, and other related areas of computing. Traditionally the conference features presentations of full and short papers, a keynote talk, and a panel of experts. Presenters are encouraged to make multimedia presentations and demonstrate their proposed solutions.
The 15th International MultiMedia Modeling Conference (MMM 2009)
Full paper Deadline: July 6, 2008
The International MultiMedia Modeling (MMM) Conference is a leading international conference for researchers and industry practitioners to share their new ideas, original research results and practical development experiences from all MMM related areas.
3rd Pacific-Rim Symposium on Image and Video Technology (PSIVT)
Full paper Deadline: August 11, 2008
This symposium will provide a forum for presenting and exploring the newest research and development in image and video technology. The symposium will bring together technical issues, theory and practice, artistic and consumer innovations, and invites researchers, artists, developers, educators, performers, and practitioners of image and video technology.
3rd International Conference on Semantics And digital Media Technology (SAMT 2008)
Full paper Deadline: June 13, 2008
The conference targets to attract authors of and audience for scientifically valuable research work tackling the semantic gap between the low-level signal data representation of multimedia material and the high-level meaning that providers, consumers and prosumers associate with multimedia content.
Workshop on Embedded Middleware for Smart Camera and Visual Sensor Networks (eMCAM)
Full paper Deadline: June 20, 2008
The aim of this workshop is to stimulate research in the area of middleware for smart camera and visual sensor networks. Papers addressing all theoretical and practical aspects such as design methods, modelling, verification, tools and services are solicited. Submissions describing prototypes, applications, case studies or deployments are particularly encouraged.