CS696: Computer Science Perspectives Fall 2008 Project Proposal Guidelines

Similar documents
K 1 2 K 1 2. Iron Mountain Public Schools Standards (modified METS) Checklist by Grade Level Page 1 of 11

Lecture 1: Machine Learning Basics

On the Combined Behavior of Autonomous Resource Management Agents

An Introduction to Simio for Beginners

CENTRAL MAINE COMMUNITY COLLEGE Introduction to Computer Applications BCA ; FALL 2011

Physics 270: Experimental Physics

North Carolina Information and Technology Essential Standards

Number of students enrolled in the program in Fall, 2011: 20. Faculty member completing template: Molly Dugan (Date: 1/26/2012)

EXPO MILANO CALL Best Sustainable Development Practices for Food Security

Please find below a summary of why we feel Blackboard remains the best long term solution for the Lowell campus:

Should a business have the right to ban teenagers?

Speech Recognition at ICSI: Broadcast News and beyond

Speak Up 2012 Grades 9 12

OFFICE SUPPORT SPECIALIST Technical Diploma

Science Fair Project Handbook

STA 225: Introductory Statistics (CT)

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Probability and Statistics Curriculum Pacing Guide

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

BPS Information and Digital Literacy Goals

My Identity, Your Identity: Historical Landmarks/Famous Places

Person Centered Positive Behavior Support Plan (PC PBS) Report Scoring Criteria & Checklist (Rev ) P. 1 of 8

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Mathematics Success Grade 7

Poster Presentation Best Practices. Kuba Glazek, Ph.D. Methodology Expert National Center for Academic and Dissertation Excellence Los Angeles

GACE Computer Science Assessment Test at a Glance

Classroom Assessment Techniques (CATs; Angelo & Cross, 1993)

Radius STEM Readiness TM

How the Guppy Got its Spots:

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

1 Use complex features of a word processing application to a given brief. 2 Create a complex document. 3 Collaborate on a complex document.

Grade 4. Common Core Adoption Process. (Unpacked Standards)

Education: Integrating Parallel and Distributed Computing in Computer Science Curricula

Outreach Connect User Manual

NCEO Technical Report 27

Degree Qualification Profiles Intellectual Skills

SURVIVING ON MARS WITH GEOGEBRA

On-Line Data Analytics

Protocol for using the Classroom Walkthrough Observation Instrument

Learning Microsoft Publisher , (Weixel et al)

MODULE 4 Data Collection and Hypothesis Development. Trainer Outline

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Learning Microsoft Office Excel

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Office Hours: Mon & Fri 10:00-12:00. Course Description

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

Welcome to ACT Brain Boot Camp

SAMPLE SYLLABUS. Master of Health Care Administration Academic Center 3rd Floor Des Moines, Iowa 50312

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Developing an Assessment Plan to Learn About Student Learning

TU-E2090 Research Assignment in Operations Management and Services

WORK OF LEADERS GROUP REPORT

Integration of ICT in Teaching and Learning

FIGURE IT OUT! MIDDLE SCHOOL TASKS. Texas Performance Standards Project

Demography and Population Geography with GISc GEH 320/GEP 620 (H81) / PHE 718 / EES80500 Syllabus

What Different Kinds of Stratification Can Reveal about the Generalizability of Data-Mined Skill Assessment Models

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

36TITE 140. Course Description:

Lesson M4. page 1 of 2

Clerical Skills Level I

Does the Difficulty of an Interruption Affect our Ability to Resume?

Course Content Concepts

Study Group Handbook

TeacherPlus Gradebook HTML5 Guide LEARN OUR SOFTWARE STEP BY STEP

The Strong Minimalist Thesis and Bounded Optimality

Education for an Information Age

Using SAM Central With iread

Unit 3 Ratios and Rates Math 6

Usability Design Strategies for Children: Developing Children Learning and Knowledge in Decreasing Children Dental Anxiety

Houghton Mifflin Online Assessment System Walkthrough Guide

Introduction to Causal Inference. Problem Set 1. Required Problems

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

International Journal of Innovative Research and Advanced Studies (IJIRAS) Volume 4 Issue 5, May 2017 ISSN:

Clerical Skills Level II

UCLA UCLA Electronic Theses and Dissertations

This Performance Standards include four major components. They are

Android App Development for Beginners

Truth Inference in Crowdsourcing: Is the Problem Solved?

THE DEPARTMENT OF DEFENSE HIGH LEVEL ARCHITECTURE. Richard M. Fujimoto

LEARNER VARIABILITY AND UNIVERSAL DESIGN FOR LEARNING

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

The following information has been adapted from A guide to using AntConc.

Distributed Weather Net: Wireless Sensor Network Supported Inquiry-Based Learning

Schoology Getting Started Guide for Teachers

Group A Lecture 1. Future suite of learning resources. How will these be created?

Improving Conceptual Understanding of Physics with Technology

University of Groningen. Systemen, planning, netwerken Bosman, Aart

RESPONSE TO LITERATURE

Mathematics (JUN14MS0401) General Certificate of Education Advanced Level Examination June Unit Statistics TOTAL.

Using GIFT to Support an Empirical Study on the Impact of the Self-Reference Effect on Learning

Introduction to Communication Essentials

Table of Contents. Introduction Choral Reading How to Use This Book...5. Cloze Activities Correlation to TESOL Standards...

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

DIDACTIC MODEL BRIDGING A CONCEPT WITH PHENOMENA

Curriculum Design Project with Virtual Manipulatives. Gwenanne Salkind. George Mason University EDCI 856. Dr. Patricia Moyer-Packenham

Notetaking Directions

WEBSITES TO ENHANCE LEARNING

Transcription:

CS696: Computer Science Perspectives Fall 2008 Project Proposal Guidelines The CS696 course on Computer Science Perspectives requires a project proposal from each student. This proposal should be written based on a prospective project with a member of the faculty. The proposal does not actually need to be implemented by the end of the semester; only the project proposal should be written. Indeed, most projects should not be implemented before it is clear how the project will complete. The proposal will need to be turned in three times at the beginning of October, November, and December, respectively, as specified by the instructor. A template for the proposals is below. This is not the only format for a scientific article, but it is a very common format and is encouraged for this class. If you think your project does not fit this template, you may use templates that you find in existing published work. Each draft should include: 1. A single paragraph for each of the bullet points listed below 2. Real citations of real work 3. Figures and images explaining the approach section. For example, provide an architectural diagram. Although you will not have implemented your system, write the proposal as if it were written. 4. Real graphs for your results. Although you will not have performed your experiment, draw graphs that show the data you would expect. Remember that each graph should have an x-axis, y-axis, and at least two curves (because you are comparing at least two systems) Examples paragraphs of the bulleted items for in the template are shown pages 3-5. This information was included to show the level of detail expected in the project proposals. The template example draws from the Geoblogging research project that we discussed in detail in class. Project proposals should include diagrams and graphs not shown in the template example, and should fill in details where the example uses ellipsis or et cetera.

Template for Course Project Proposals (...one of many possible templates) I Introduction Provide background and the context of the problem, e.g. current trends Clearly state the problem, including to whom and why it is important Identify concrete goals and quantitative metrics Described the obvious / baseline solution, and why it is not good enough State a hypothesis II Background and Related Work Define all terms and notation Identify all related problems, and their existing solutions Describe all previous solutions to this problem, and why each doesn t address your goals III Approach Provided summary of approach Described each technical subcomponent in detail IV Experimental Design Describe the hardware, testbed, or simulation to be used Identify all experimental parameters Describe how baseline or comparison system is implemented Identify independent variables and how they will be manipulated Identify dependent variables and how they will be measured Describe how ground truth will be measured V Experimental Results Summarize trends and results Explain each result individually, along with graphs and statistical analysis VI Conclusions Explain how results prove/disprove your hypothesis Identify the implications of this experiment for the world Common mistakes Did not identify all important metrics Experiment did not actually test all metrics Goals or metrics are not clearly stated until Experimental Design section Existing solutions are described, but not differentiated from proposed solution No baseline or inappropriate baseline is used for comparison Experimenter does not plan ahead how to collect ground truth Hypothesis does not quantitatively relate proposed solution to baseline in terms of the metrics

Examples of Template Bullet Items I II Introduction Blogging is the process of publishing short notes on the Internet. It has recently become extremely popular and some experts estimate that blogging constitutes XX% of new daily content on the Internet, and is currently the fastest way of disseminating new information [citation]. Current technological trends are making it possible for anybody to collect audio, video, text, images, etc. anywhere on the planet using a cell phone. This will enable what we call geoblogging: writing short notes about physical objects or locations. Geoblogging promises to be a vehicle for a new wealth of user-generated information about the physical world. However, it is extremely difficult to manage such huge quantities of geographicallytagged data. Bloggers cannot easily produce geoblogs and readers cannot easily find the content they desire. This is an important problem and will significantly hinder the popularity of geoblogging; studies have shown that blogging grew XX% after technological advances such as RSS feeds made it easier to find new content. The goal of this project is to design a cell phone-based geoblogging system with which users can easily record notes about a specific location in any media format, and the data will automatically be disseminated to interested parties. We will design the system to optimize for two goals: (i) minimize user effort required to create or read geoblogs and (ii) maximize the average desirability of the geoblogs that are disseminated to the user. The best way to share geoblogs with current technology is to post them on web servers and to find them through a search engine. The search terms could include key terms such as the object and/or street address, and better search terms may produce better results. However, this approach does not scale with the number of geoblogs; as more entries are added to the system, it becomes more difficult to find desirable entries. Our system will stream geoblogs to each user based on statistical models of the content, the listener, and the provider. The listener will have a single button with which to fast forward past unwanted geoblogs. We expect this single button to be enough to build our statistical models and to be minimimal effort by the user. Furthermore, because these models are improved as the system is used, we hypothesize that this approach will automatically scale to a large number of entries as long as the number of users grows with the number of producers, producing more desirable entries on average than a web search. Background and Related Work A blog is a list of short notes or references that is organized by the order in which the owner of the blog posted them. Typically, older entries are archived. Etc... Finding blogs is different than the process of finding geoblogs. Most blogs are found through the BlogFinder service or Bloogle. However, these services can only find blogs on particular topics; they do not allow the user to search for individual blog entries, as we would like to do with geoblogging. In that sense, finding geoblogs is also similar to finding entries in a chatroom or newsgroup, where users search for individual entries or threads. Unlike geoblogs, however, this problem can be solved through standard search engines such as Google because the entries are text based and are not index by geolocation. Several solutions have recently been proposed for searching multimedia objects, and for searching by geographic location. For example, here we describe technologies behind google local, image search, etc. in detail. However, most of these technologies present a set of results, and do not address the problem in geoblogging of finding the single best entry to provide a user based on their personal interests. This problem is addressed in

part by collaborative filtering techniques such as those used by amazon, netflix, etc. Our approach builds upong these previous techniques by combining collaborative filtering with technologies for geographic indexing and multimedia search. III Approach Our geoblogging system will allow people to easily record geoblogs through an iphonebased geoblogging interface. Users will then be able to query for geoblogs based on their location, their profile, a set of query parameters such as radius, and multimedia query objects such as text, images, or sounds. The system will return one geoblog at a time, which the user can view or skip. Based on the users view/skip behavior, the system will automatically infer parameters about the user, the geoblog, and the geoblogger that represent content, quality, style, genre, etc. As the system improves these parameters, it will use these parameters to improve future search results. The geoblogging interfaces: iphone libraries and implementation, user interface issues, etc. The geoblog viewer: iphone libraries and implementation, user interface issues, etc. The parameter inference system works as follows: bayesian inference, graphical models, multimedia search, geographic indexing, etc. IV Experimental Design To test our system, we build a geoblogging simulator so that the true parameters of each viewer, geoblog, and geoblogger can be directly observed. In our simulator, each geoblogger is assigned a set of parameters in the range [0,1] that indicate interest, style, quality, and frequency of geoblogging. Each geoblogger then moves through space using a random waypoint model and creates geoblogs with the specified frequency. The parameters of each geoblog produced are derived by adding Gaussian noise to the parameters of the geoblogger who produced it. Geoblog viewers are similarly given a set of parameters and move through space requesting geoblogs. They skip any geoblog given to them whose parameters have a euclidean distance greater than 0.5 from their own personal parameters and context. In our simulation, the world is 100km x 100km, and the users move at a rate of 3km/hr on average. The frequency of geoblog generation ranges from 1 geoblog per hour to 1 geoblog per day. The frequency of viewing is.. etc. We compare our system with standard techniques for geographically indexed multimedia search by running each simulation twice, where the second simulation does not infer the viewer, geoblog, or geoblogger parameters. Thus, the second simulation is a baseline implementation of a geoblogging system that uses simple multimedia search and geographic indexing. When this approach produces more than one geoblog in response to a query, it randomly chooses one to provide to the user. We ran each simulation multiple times, changing both the average frequency of geoblog generation and the average frequency of geoblog viewing. Thus, both of these variables were modified independently to vary the ratio of globlogs to viewing instances. In each simulation, we measured the average euclidean distance from the parameters of the viewer to the parameters of the geoblogs that the viewer was given by the system. The euclidean distance is calculated using the formula \sqrt{ \sum_{i=0}^n (v_i b_i)^2 }. We compared our results to the average euclidean distance produced by an oracle geoblog indexer, which always produced the optimal geoblog in response to any query. V Experimental Results The results of our simulation show that our system works better than the baseline system and approaches the optimal solution as the number of viewers increases. However, the

results approach that of the baseline system when the number of geoblogs far outnumbers the number of viewers. The first graph below shows the results when the number of geoblogs produced is high. In this graph, the number of viewers per geoblog is increased from 1 to 10, and the average euclidean distance produced by our system decreases from values similar to the baseline solution and approaches the values of the optimal solution. 18 16 average euclidean distance 14 12 10 8 6 4 2 baseline oursystem optimal Column C Column D 0 1 2 3 4 5 6 7 8 9 10 number of viewers per geoblog In the second graph below, the number of geoblogs is increases as the number of viewers is held constant. These results show that the results of our system approach the baseline solution. 18 16 average euclidean distance 14 12 10 8 6 4 2 baseline oursystem optimal Column C Column D 0 1 2 3 4 5 6 7 8 9 10 number of geoblogss per view er VI Conclusions Our results indicate that this approach automatically scales to a large number of entries as long as the number of users grows with the number of producers, producing more desirable entries on average than a web search. Etc... This system will make it possible for people to collect, index, and distribute geoblogs, providing a new mechanism for distribution of valuable information about the physical world. This will improve the quality of life for tourists and residents alike, will open new channels for advertisements and therefore improve market efficiency, and is expected to produce $7M in revenue for broadband telecommunications providers.