0:13
damn I am an image in August and I'm a
professor at Emory University
0:18
and without the about the this is the
first lecture 5
0:22
I and on our modeling search for
behavior
0:25
so Dover the course is gonna have
0:29
look something like this: today we're
gonna talk about
0:32
in general mothering behavior and I'll
tell you exactly what this means
0:36
them go the next lecture will be on
interpretation
0:39
or behavior I and mostly to try to
induce relevance
0:44
and lecture 3 a will be using behavioral
data for improving Web search ranking
0:50
and 40 looking into percent is Asian
using behavioral information
0:54
and finally I'll talk about the search
user interfaces because
0:59
frankly behavior is determined very
strongly by the interfaces for the user
1:04
so let's get into the details of today's
lecture
1:08
first I'll talk about the general for a
framework for understanding behavior at
1:12
three different levels
1:13
microM is on the back row and I'll talk
about the details of that
1:17
I'll introduce some a couple of common
1:20
or sort of classical theoretical models
of information-seeking
1:23
and away where should say this
1:27
that the handouts that you have do have
most of the information but
1:31
they're not completely in sync with the
site I'm going to be showing
1:35
this other side they have prison
presenting that have much more detail
1:39
right and eventually get to be posted or
taped or something right
1:42
so arm so in church you know you might
see
1:46
that there is some discrepancies CUSO
today okay so back to this
1:50
so a then I'll talk about web search
behavior
1:54
in detail namely the details
1:57
search intent what the user wants to do
variations in behavior and
2:01
and spend a sometime a click models so
2:05
ISO what I mean by there sir three
different levels of understanding
2:10
one a you could think of it as the
micro-level
2:13
which is what the user actually looks at
in very very fine detail
2:16
I if timing I'll same milliseconds or
microseconds
2:20
I'm so if you're as the heat map
basically the
2:23
position oh I created over multiple
users how they look at the Google search
2:28
Brent I this is the micro-level
2:31
we'll talk a little bit about that today
the message level is more like field
2:35
a researcher goes and sits next to the
searcher
2:38
and sort of trust observer ask what the
searchers doing
2:42
the try to understand again what what's
the cause of the behavior or the users
2:46
and a metre level which is most commonly
done
2:50
I'll look said you millions or billions
observations of
2:53
effectively searcher a search session
you know I think of this as a
2:57
click Log write a query log clicks I
3:00
and so a researcher can go through this
log and try to guess what the user is
3:05
trying to do right to try to mind as
data which is what we do
3:08
other try to find some patterns are
relevant for some other information
3:11
so a lot of the analysis is done on this
level simply because there's so much
3:15
collected at this level of sort of
queries and creeks okay
3:19
so those are the three different levels
and will try to discuss each one of
3:23
right I just just basically what what
what what know how you can study
3:27
so before we do so here is a classical
model of information-seeking
3:32
over user might have some task in mind
3:35
arm then they Easter formalize or sort
of thing common information need
3:39
then they tried to verbalize it to
generate the Corey
3:42
the query submit this search engine that
searches the corpus senator some results
3:47
the user looks at the results and
refines the Corey potentially right
3:50
so I stated in text right you can read
it as well as I
3:55
from information seeking is recognizing
you
3:58
establishing a plan to search for it so
think of sort of this is a plan
4:01
um you violating the results look at the
results
4:04
and them each rating for the process
right so the steps are query from
4:09
action right when you submit a query or
click on a result
4:12
a look at the results and then
potentially find curry
4:15
until your found with you to define
4:18
it so stay for a while so
4:23
all this underlies a date did that the
user has some notion of relevance right
4:29
they want to detect documents that so
they can somehow identified about him is
4:33
that a new data relevant to them
4:34
um and day it's actually really
difficult to define relevance believe it
4:39
or not because people sort of thing
cover relevance on intuitive level of 10
4:42
reichheld recognize I secrecy document I
can tell it's relevant press so you know
4:47
it seems relevant press but
4:49
sort of you know you can stated that the
relation between an object B&Q
4:52
rent soapy for example denounce accuse
the korean girl some property
4:56
our rights or property could be for
example topical relevance threat so
5:00
document could the relevant degree
because it's on the same topic
5:04
and that in fact is sort of one of the
most are common ways to define relevance
5:08
but that's not the only one right
5:12
so for example right other than topical
relevance people use
5:15
other are clues to detective document is
relevant or not
5:19
there are other sources are metrics like
Bhaskar go specific relevance are
5:23
not necessarily correlated with the pic
OT so the because it is sort of a given
5:28
if a document has nothing to do with the
topic of your curry then yes probably
5:32
but once the document is on the same
topic it may or may not be useful to you
5:38
so if I want to figure out how to
implement a yeah mother is some
5:41
I having a Wikipedia sort of summary
what EM algorithm is not very useful to
5:46
me even though it's on the same topic
5:47
great so that's the idea that they're so
does sort of this is the most common
5:50
with the defined relevance but it's not
the only one and perhaps our
5:53
arm you know may not be sufficient
5:57
so there is a whole field and
information science I
6:00
beat chicago is a coat clues researchers
some some other sort of name like that
6:04
this visit is try to identify attributes
are criteria
6:07
used by people to detect whether to make
decisions about relevance
6:11
right it's actually very you know
interesting and which topic but
6:15
I most traditionally our models in fact
he's just a very simply put more more
6:18
simple model relevance which is again
6:20
because it is or what you're probably so
in earlier lectures
6:24
um so there been a variety of
integrative models but they're much more
6:29
sort of now there is research in trying
to integrate some notion of task or use
6:34
into into I evaluation and
implementation but again it gets much
6:39
so far no this just you know began focus
though assume the relevant
6:43
minister-elect Oporto but it could get a
fixed rate
6:47
that's enough to make progress on sort
of better ranking
6:50
okay so now they talked about relevancy
and and just try to do
6:53
director I don't define the terms here
is sort of various you know
6:57
given the picture of searchers so before
here's a very simple model
7:01
are so the the classical one from world
long time ago if you get exact date
7:05
are so the information it is assumed to
be static that the user has
7:09
information you they're looking for
something they have a goal they execute
7:12
an action like this so before
7:14
against the world right it could be
searching you know search engine that
7:17
could be asking a friend
7:19
you get back some results you violate
and sorta based whether your goal is met
7:22
or not you for updated credit
7:24
go to the fusion about right so the
problem is of course that
7:28
your the studying that the information
you too static the thing is it's not
7:34
so I in fact it's it's common that
relevance criteria might actually change
7:39
so here's one example user progresses
for different stages of a task on
7:42
implementing algorithm
7:44
so first perhaps I might look for source
code or something or a description
7:48
then I might look for documentation so
so minute changes threat so even though
7:52
it's all about the a mother is some
7:54
my actual need them state changes
through that that's great and the best
7:58
so this kind of simple model doesn't
really capture the change in goals
8:05
so there is so because of this on the 89
there was a very influential model
8:11
what's called the berry picking model 0
which it's not have identified
8:15
information it can change during
interactions
8:18
so user starts with some need submits a
Cory gets back some documents
8:22
right based on the documents there need
may have changed
8:25
they submit another query and they kept
doing this or getting what they need
8:29
from each group of documents
8:31
until today so to finish the task right
so throughout this read the information
8:35
you know which behind its Chris
potentially changing because they're
8:39
hopefully some things are picking
something some the documents
8:42
right so that serve a different model
right from from the static need
8:45
execution go it's actually think the
changing its are wondering for the
8:49
picking important stuff so
8:53
this has been extended survey I mean
it's very nice but it doesn't tell you
8:56
how you decide what to do next
8:58
so so there was saying in a very
interesting or elegant model
9:02
called our information foraging theory
9:05
introduced by pepperoni and and
colleagues
9:09
so the idea is your goal is to maximize
return information game right
9:13
so I'll think of from a.m. anymore
foraging for food grade
9:18
so the idea is that there is sort of
clumps of food you know around say
9:22
it will you know there may be different
so far I am flocks of sheep or something
9:26
either if you're a bird rather different
9:28
you know bushes and there is a bit so
that is that you the basic problem is
9:32
that should I continue with the current
patch right
9:34
they chase the same you know group of oh
I love animals
9:38
or they go to another patch and eight is
that I'm trying to estimate expected
9:42
gain from continuing in the current
patch
9:44
and of course deciding how long I'm
gonna continue in dispatch compared to
9:49
right secures a person trying to decide
you know the way just search
9:52
welcome this part or the way switch you
note in other are to web sites and
9:56
you know keep doing this right so how do
you decide if I keep looking
9:59
or do a switch right so here to make it
more concretely
10:03
let's consider an example I want to
search for hotels
10:06
I wanna find a cheapest four star hotel
in Paris
10:10
right so I wanna stay only in four star
hotels and I wanna go to Paris for
10:14
I'm so the first step is I'm gonna pick
up with those search site there's a few
10:18
Expedia with those that come I U P
probably know more than you know more
10:23
so I pick a site I am sorry for the
quality this is sexually copy from a
10:28
so so from produce book so anyway so
here is a a
10:31
our list of hotels I'll you know they're
all very expensive Fred on the not so
10:35
on the success rate but someone or not
but it their all four star hotels
10:39
so what I'm gonna do this a scandalous
right
10:42
tried scandalous to down you know from
top to bottom
10:45
looking for the cheapest hotel right so
so far so good so good read
10:49
to be examined this website which is to
stick with those that come in I'm
10:53
scanning for the list of items
10:54
and trying to pick you know the cheapest
one that matches my
10:58
criteria just four stars right so
eventually I'm gonna run out to put
11:02
those on this list then there may
venture there may not be any cheaper
11:05
ones that have already found here a
bitch is a hundred and thirty dollars
11:08
so so I don't when I've for example
finished the list I got tired or decided
11:12
there's no more the final go to one
11:14
up economic with those sites Expedia or
Priceline
11:17
and I keep doing this okay so that's
that's over again the
11:20
in more complete sounds just information
foraging theory implementation of this
11:25
so what's happening then is the
following here is the hotel's
11:28
a in list or to rate the so I'm I'm
scanning for the first time is the
11:33
the next 20 then exported terrorist and
here is the price arm
11:37
are for each hotel and as you can see
that sort of thing I you already found a
11:41
minimal price so far
11:42
whatever disarm 7 this a seventy-five
dollars after looking at five results
11:47
and even though I keep looking for 20
more results I still haven't found any
11:51
anymore cheaper adults very so some
spending more and more effort rate going
11:56
through this list the results
11:57
but my server haven't found what I need
reading I'm serve my
12:01
expected dumb you know game there's no
game or
12:04
except until I get to like could your
list the rices over 26 right
12:09
so the game s do I try to continue is
coming through the list
12:12
or delay submit a query tonight but
they'll certainly hope that
12:16
it'll rank you know that but I want
higher Press
12:19
so here's what's happening in in terms
of expect to observe
12:22
savings and expected savings I'm so
again I'm going through the list order
12:27
so far after seeing five results i save
like 30 bucks for it so I started from a
12:31
hundred and I saved a
12:32
eighty right and then for a long time
there's no savings until there's not a
12:37
so it sexually what what does all this
is very nice what the point is that
12:41
actually this the savings or sort of
information gains
12:45
are followed is diminishing arm return
kar
12:49
right that's where it's coming from so
here is we're you know and that's where
12:53
is give a giver gets interesting
12:54
I am so we then peronist said well gonna
play
12:58
I H arnold's marginal value theorem from
biology
13:02
saying that I based on my shape of my
diminishing returns
13:06
I know how long they could scandalous
before or eight
13:09
I know what stopped them all sort of
PowerPoint to stop scanning the list
13:13
and go on to the next site right so what
I'm doing here is
13:17
arm you know the the formula or today
just finished third
13:21
or so far call a way that I was actually
for me know there are some early going
13:25
but it's actually not that's another
difficult so the point is I'm going to
13:29
compute the derivative of my arm
13:31
my are information gain curve and say
when the derivative is equal to the
13:36
from 02 that point that's where I should
stop
13:39
in from their own it's just that makes
us to just go to the next batch so what
13:43
are those variables threat so
13:44
this is the length of time it takes me
to go from the Dell site about it but
13:48
they'll say it so inter-korean go to the
site
13:50
and this is the ninth of time I spent
a.m. foraging writer
13:55
going through the current list thread
and are of course
13:58
is so the so going of I the
14:02
the information I've gotten so far I
over the
14:05
total time spent up going between sites
and spending
14:08
you know looking in the site itself
Press so that's that's just a direct
14:12
its unfortunately it's not the Quito
price one is very easy to formalize in
14:16
terms of information game because
they're discounting dollars
14:19
of course it's not so simple to say to
quantify how much information and
14:24
so that's why information foraging
theory is very nice to radical construct
14:27
it's it's a very elegant long but it's
very difficult to apply in practice
14:31
right but it's so we're thinking about
the same
14:33
you know the effort needed to go from
side to side versus the effort needed to
14:37
through the site itself right
14:41
okay so so that actually is a good place
to talk about browsing their search
14:45
because right with what's happening here
is that the user knows I wanna go to put
14:49
those that come site
14:50
and I'm going to search that so I it's
actually turns out that
14:54
I'll for was under that it's not
surprising that for people
14:57
it's much easier to recognize something
they work for
15:00
instead of trying to formulated to
stated in terms of portrait that the
15:06
so a so it's actually a or for example
browsing hierarchies is more effective
15:10
and we'll talk about Search Engine
interface is electrified
15:15
so this is the behavior it's very common
in web search
15:18
are the disco the interior the searcher
doesn't try to
15:22
to formulate the quarry to get you the
right result
15:25
brother the the searcher just States you
know knows approximately the right
15:29
information region right for example
15:31
if I want to download a and up update
from my Windows
15:35
I I don't need two assists type in
exactly the
15:38
upload the updated I need rather
15:41
going into know is I go to Microsoft
site and
15:44
download it from there it's on you to
know is just how to get to the Microsoft
15:48
website so that's that's what i do.
orienteering
15:51
and the and the interesting thing is
that once people find information this
15:55
they keep doing at the same way over and
over and over again right
15:59
so it's not like they say oh you know I
could have issued this great to get the
16:03
rather they remember a should agree
Microsoft and I just went through you
16:07
the downloads and eventually I found it
right so that's sort of what people
16:10
do and that's sort of one of the
strategies that actually helps people a
16:15
in a search is not perfect and now
that's a Google and other search engines
16:19
Mikes whatever getting better perhaps
people are sort of
16:22
starting to realize everything just
typing the query but of them
16:26
there's still this or eating behavior
that should be worth
16:29
um and against those people learn to
search better the start the show along
16:32
Aquarius and we'll talk about statistics
on Corey's
16:38
so I & Co people do this
16:41
and against people you know any
information science this is the only
16:44
game is about trying to understand what
people actually do want to do this are
16:48
arm what they call it was with this or
this notion information sent
16:52
they have some idea or look for clues
where they would find relevant useful
16:57
so here is a site of their of Labor
Statistics from mom
17:01
yes government red and this is an old
side that's what used to be
17:05
you know that's a ten years ago right
and people had horrible time finding
17:08
information on the site because
17:10
you know he had no idea what you would
get you know if you've clicked on the
17:13
data great you know if you're looking
for I know statistics
17:17
on unemployment the look do click here
on data that you click on the surveys
17:22
and programs or maybe
17:23
publication recently the regional
information you know not right so this
17:27
is some sort of the site is actually
Qur'an and negative example
17:30
are something that doesn't support well
the don't the mitigation disorienting
17:36
so based you know people so visually
even yes government realize that
17:40
love the site wasn't working and Eva
they've come up with a much better site
17:43
selection now see host servers to be the
one
17:46
from so the glans tables right so if you
morning on employment well you just look
17:49
and you actually see okay well
17:51
unemployment tables are likely to be
directly that's where it is exactly
17:54
so the point is to be providing much
more clues about what happens if you
17:59
right so supporting reissue
18:04
so and that's a running theme of
18:07
last part of this of this a course that
search results have to be informative
18:11
they have to tell the user
18:13
you know what what they will get once
they click on a result again that's what
18:16
lecture 5 is gonna be all about
18:19
so let me summarize again very shallow
sorta survey of both
18:23
of for search models there's been quite
a few comments a modest proposal I just
18:28
I because it sort of gets dry and little
boring because the author to look
18:33
um classifier systems mainly use and
notion of the quality but not perhaps
18:38
task oriented or other arm arm
18:42
notions relevance and of course there's
open questions like how people recognize
18:46
other kinds of relevance and they don't
really know yet even though there's all
18:50
I mean how to incorporate this kind of
relevance into I R
18:53
systems right how the the better ranking
I if you're trying to support us the
18:57
supposed to just a true relevant
documents on topic
19:00
so so the doctor so now let's talk about
understand model search behavior
19:05
specifically using so
19:06
website read: so the doc about
theoretical models now let's get into
19:10
I am sorry man remember again so there's
three different levels we can look at
19:15
web search there is Corey
19:16
intent in session characteristics how
researchers and direct the church be
19:20
and patterns trends and interactions and
interest on the session level
19:25
so that you may have seen aside like
this are very similar was before
19:29
on this is an overall architecture over
a web search engine I
19:32
I like it then later took it from the
tutorial um
19:36
so as you can imagine their scrolling
involved and
19:39
I the croat pages are indexed and then
the in the ranking is done over the
19:43
index pages and you've already learning
about drinking now
19:47
so we're going to focus more on this
part right so once the ranking output
19:52
what it blew it goes into some sort of
visual interface
19:55
user interacts with the results by using
Koreas are clicking our results are
20:00
right so we're going to focus upon this
part of the
20:03
websearch rights over gonna effectively
ignore all this other stuff but I just
20:06
wanted to see where our work that said
20:08
right so so here is a um
20:13
my my favorite so I don't want to miss
usual process from the point of view of
20:18
so the user select a source like a hotel
site then they
20:21
formulate the Corey I given know based
on their
20:25
information need the search engine
returns a ranked list the results
20:29
the user selects those results or some
results I potentially reduce the
20:34
re: their information you visit a site
despite or not and then the
20:38
formulate not agree to get hopefully
better results right
20:41
and then serve this process either its
arm but actually might want to soak a
20:46
buy a book or something so so point at
this process or either rates for this
20:49
steps right with the user in the middle
20:52
and the the key point of all this of
course is doing third user and don't you
20:56
know that the user want to do
20:58
but there are other challenges
psychology the better ranking how do you
21:00
evaluate how the present the results
21:02
and let's you know they'll do it someone
this later
21:06
so I so intense the and El Classico
21:09
you're here is the sort of the top level
of intensity stoked about user intent is
21:14
to those things especially understanding
with the core is about
21:18
so here is the top-level only right
there's many more levels in this
21:22
the kind of books on me it was so one
the one classic on proposed by broder in
21:27
based on a bunch of for Korea looks a
lot of mister
21:30
so I there's read kind of car is
information on the negation of
21:34
transactional according to broader
21:36
there's their split approximately
between 40 and 65 percent
21:40
our information out there as an example
history know your food
21:44
I in other arm arm a
21:47
kindest navigational rights I wanna go
to particular sites like Singapore
21:52
airlines rights and that those according
to broader between 45 and 50 percent the
21:56
system its absence change
21:58
a introduction hours if I wanna do
something we're
22:01
mediated like access a service or by
22:04
book or download music right so here's
some examples to cart the weather
22:09
or army satellite images so I wanna buy
nikon FinePix
22:13
right and there are some gray areas
where it's not exactly clear what I want
22:16
to do right so it could be both
22:18
from information 0 I want to learn about
current on column for
22:21
or it could be navigation I wanna go to
a car rentals company in there
22:25
or could be an introduction right I want
to just rent a car so it's not clear
22:29
but and there's also things like this
for the research when I want to just
22:32
learn what's what's available right I
don't actually do any of those
22:35
things I wanna see if there is some
information or something right so it's
22:38
not clear there there's some gray area
so it's not that sounds very
22:42
even on top of course there have been
sensor I think I have a reference in the
22:47
follow-up paper from arose arm at all
into the
22:51
2004 and there's been other system but
so the top level is actually
22:55
easy useful enough okay so now they have
22:59
from some intense know what does the
craze look like that reflect those
23:03
mode actually quite diverse especially
for engines like Google that come from
23:07
you know all kinds of cultures and
languages
23:09
arm it turns out the chorus tend to be
very short about 2.2
23:13
2.5 quarts per query users are impatient
23:17
they they only look at the first result
I mean for special go into details and
23:21
and deformations are quite common um and
so
23:25
just one point to keep in mind I'm and I
think this is very important for us
23:29
especially researchers doing work in
this area that this guy's a nut experts
23:33
they they like don't even know this box
to type but most other ratify when my
23:37
mom calls me to ask you know how do I
find a particular pharmacy
23:40
the first thing I have to say you know
you type in not on the address box it I
23:43
put me on this boxer
23:44
so the point is most orders are our
perception of what's users understand is
23:49
actually very skewed
23:50
right so you have to be aware that
people really don't know what they're
23:53
doing they'll just typing
23:54
thinks hoping for magic right so so
that's actually very important because
23:59
what should be understood or not that's
it's light gave results get
24:05
alright so I fear is again and I and i
serve flight from the tutorial I just
24:09
want to sort of show you that my son my
statistics
24:12
um again and they're I'll study they
found that
24:16
get a majority for is for informational
the rest or not
24:19
and there were some ambiguous queries
I'm curious
24:22
the same but distribution by topics I
think you have the sliding your hand
24:26
so that news tend to be not surprisingly
most informational
24:29
are but for us things like games or
24:32
others or arm you know things about the
24:36
re-creation you know could be even this
progress oh depends on topic you know
24:39
what kinds of goals the users have
24:42
right and like I said people are lazy
and they only look at very few results
24:47
right so they tend to um
24:50
the only look at the so a lot of them I
think record as a 25 percent or so
24:55
I'll look only so 16 look at only the
first few entries
24:58
I another 25 will get only the first
page
25:02
and another 27 right to persist months
look at the first two pages
25:06
so that covers about seventy five
percent or so the of
25:09
all the users who'd never look past the
second page of the results rights upon
25:13
Lisa you have to be right in the first
20 results for it
25:16
otherwise the most users will give up
okay it's a very few even go after
25:21
right are the first two pages okay so
that's very different from behavior of a
25:27
attract I R Type a judges who actually
go through
25:31
alders or make quite a few of the
results
25:35
so in fact the snippets viewing also so
not even
25:38
we're not talking about the contagious
anymore neither just looking at the
25:42
so here it seems that on on average rate
25:45
most people only you about from 03
25:49
picked up for the results nothin click
but just you look for the first results
25:53
and if they don't see irrelevant think
they'll just give up right
25:57
um and the median is actually I even
lower right it's about to end the reason
26:02
for that of course is that there's a few
users who quit deeply into the
26:04
list the results but the majority right
to me the median is actually higher
26:08
credits a or its only two right so most
people right only look at the first two
26:15
right stir crazy it's a physically not
only do you have to be right on the top
26:19
twenty you have to be right and it up to
26:20
for most sort of I'm people to be out
due to not give up
26:28
not surprisingly not only so this is a
diff from a different study this is from
26:32
your games at all from severe 2005 um
26:35
not surprisingly people who when you
don't look a when you don't look at an
26:39
apt sector a snippet
26:40
you never click on it either right so if
you look at the percent of
26:44
fixation says basically percent of time
that users can look at the results on
26:48
click straight that's very highly skewed
towards that
26:51
up right and you control kind of
interesting conclusions from it and we
26:56
home in the next actor how to interpret
this information
26:59
but for now it is just the just take a
you know getting tuition
27:02
that again most people spend most of
their time looking at the top few
27:07
and of course a.m. most of the clicks
also land in that up to you
27:12
I results so how do you know this well
27:15
you know this but doing eye-tracking
studies here's an example I tracker this
27:20
a color red made in Sweden section are
the best ones are so it's an integrated
27:24
I tracker here's the here the camera
curious to infrared
27:28
commuters that bounce I R infrared light
of of personal trainer
27:33
and the camera catches you know where
the
27:36
perfection looks like I'm thinkin can
project you know where the person looks
27:39
on the screen right secure is what it
looked like reading behavior looks like
27:43
I so the camera knows where the person
was looking at what time and for how
27:48
I so you can for example plotted on the
search result page
27:52
and this is where the user that reading
this page about research i movement
27:56
here they spend a lot of time reading
this part here is what it looks like
27:59
when the user is searching for the
results right there sort of looking
28:03
at the heading and then this skip rent
and the kinds of things we can track our
28:07
of course I position pupil diameter and
the difference between sort of fixations
28:11
read the circles and the jumps to see
cuts
28:17
so why do we care well there's actually
a lottery system no like for example you
28:20
know you wanna know which is also
reviewed or not
28:23
but I you can also look at kinda for
example remember I saw a show this to
28:27
you know doing behaviors are jumping up
and down and that's actually very
28:31
sorry about the cut off but but what
they see in the lower results right so
28:36
might actually influence what they think
about this result
28:39
for example I fear the query about their
children on unicycles
28:43
right and they sort of look at the thank
you look at a pic you
28:47
a look at this and then they see this
right from
28:51
that the resist the unicycles are not
suitable for children
28:55
right so the person was not always a
leader not suitable for children maybe I
28:59
should be examined this result and I'm
not going to click on this anymore
29:02
because I'm gonna buy it right
29:03
so so the point is spread so what they
see in the lower level results might
29:07
actually change how the judge
29:09
what are the top result is relevant or
not by so getting more interesting that
29:14
judge each result isolationist or scan
from top to bottom
29:18
then they potentially go back up you
know reexamine and then keep going right
29:22
so the game what can be learned from you
cannot quite a bit from this sort of
29:26
but I'll for one thing that we see over
and over again
29:29
is that the users really really trust
the ranking right so so they
29:33
always start reading from the top and
they trust the search engine to bring
29:37
the most relevant results to the top
29:39
and so their behavior again is impressed
by this by this implicit trust
29:43
right I think this is the last side on
the micro level
29:47
I'm so there is another interesting
thing here which is arm
29:51
that the users not only use their eyes
to examine the result some users use the
29:55
art to help focus their attention there
was a nice study by Rod & at all
29:59
up there now at Google I and what
they've shown was that the
30:04
if you look at the histogram of the
difference between
30:07
I position the mouse position they will
be the finish about
30:11
right so minus two hundred 200 about 400
by four hundred a window
30:15
rain I and sort of the I somewhat by the
mouse and I sort of somewhat
30:20
interesting correlation and interesting
ways i in particular they found the some
30:24
people want to document this complicated
or the task is difficult
30:27
who use the mouse to focus their
attention sort of if this is you think
30:31
about this the mouse
30:32
sort of position no just use the mouse
to read the result to sort of try to
30:35
find make sure they don't miss important
word
30:37
they can use that the mark a bookmark
promising results or they may not use it
30:42
so the behavior of course worries but
the point is there is another signal
30:46
there that's just a mouse movement and
you know I'm particularly in stone as
30:50
but the point is that there's you can
examine behavior on the micro level
30:54
right of micro second level
30:55
and get interesting things out
30:58
so case I was the micro no I talked
about the message read that's when the
31:03
micro is when you look at the results
and macro levels when you look at the
31:08
so why would you care well for one thing
you can actually look at
31:11
session level behavior and tried to ask
you know is the idea is to use are
31:15
trying to do orienteering
31:16
rate can you guess from the session
level data whether they tried to get to
31:20
and then they sort of scan and you know
get interesting stuff Rd foraging adding
31:25
right so there is a theoretical sort of
a models that try to say
31:29
that users do or do not multitask want
to search led to the so go off on a
31:33
of course they doing to see it from the
whole I'll show you
31:36
on and the search for the optimal
two-step process that rate so they can
31:41
sort of see all this in for the trio
data and try to mine it and see this
31:45
you know try to to refute or to support
the sort of hypothesis
31:49
so so let me let me arm up
31:53
go back and talk about sort of what what
can be learned from session
31:56
level data so for example you could see
when users are multitasking right
32:00
so here they're looking for a game
you're free or let right that's the
32:03
online game I and they look for various
to pay just some of them are made this
32:07
time and you know it
32:08
locked them out um but that then they
sort of a cable
32:12
for some reason one of those online
games for them to install shockley
32:15
right login or something right so then
the user has to go off on attention for
32:18
its a multi tasking start
32:20
downloading the shockwave no fighting
shockwave plugin arrest
32:23
and then after did they go back to
searching for relate right
32:27
so this is an example motor testing with
the user does want asked them to go off
32:30
on the side to do nothing and they go
bad press so this is very usefull of
32:34
course because now you can you can get
this information
32:36
almost for free right just by looking at
the at the session level the coral Oaks
32:43
and the India you know they look very
carefully those things um
32:47
all assessing people who do the stuff so
they're seeing you can visualize what
32:52
for the sessions look like so for
example here's a short navigational
32:55
session right if you just visualize
Korean what happened radically from one
32:59
result and that's that's that's right
33:01
here's an a with exploration looks like
person issue decree and the sort of
33:04
deeply go through the
33:06
list you know they might go to the first
page then the click on some link going
33:10
from the first page and then
33:11
so few keep exploring here's what the
topics which looks like red so for
33:15
example you could visualize you know
33:17
this shockwave for business you know
once you have a sort of their exploits
33:20
its not a good example I'm
33:22
think of you know you got a good for the
first searching but unicycles and now
33:26
you know I something caught my eye I am
gonna switch topic to something else
33:29
arm then there is a more methodical
result exploration bitch
33:34
you know sometimes you have to do right
use your query you look at the first
33:37
result you go back to the search result
33:39
you go back to the next result writing
you re did you go back to go back over
33:44
read about all the papers relevant to
say session about it straight this is
33:48
the behavior that I would do great I
will read all the papers in my say
33:51
top 10 list the result in Google Scholar
I'm curious when I
33:55
sort of cannot get any interesting
results right you can immediately see
33:59
what's happening as query formulation
there is also no good and the users of
34:02
peeps are formulating queries
34:04
race and and the stupid haven't found
anything useful to click on
34:07
right from here is an example:
multitasking when I described for rate
34:11
so that they wanted to
34:12
%uh play an online game like and the
game requires me to install
34:16
a shock wave radar flasher whatever I go
back to the nation on research
34:21
go for all those difficult is installing
my you know my plugin and then finally I
34:25
can do it again bc it from there you can
visualize that the district
34:28
and then there's things expecting to
hear my task is complex I might for
34:32
sort of go up explore something go back
Explorer go back after I can to actually
34:37
go back to the list the search result
red so again the point is that you can
34:41
analyze the session some very
interesting ways and visualize them I
34:45
so be it so so all this is very nice
accepted behavior is very different for
34:50
so some people are more expert searching
and another thread that's not the main
34:53
expertise you know could have a
34:54
expert doctor right who doesn't know how
to search
34:57
using Google they'll behave just as Navy
perhaps essay
35:01
as my mom right so so arm so some people
are just know how the search in some dot
35:05
so what does the search look like right
as people learn how to search better how
35:10
does the behavior change
35:11
well um so white and more is done an
interesting study in 2007
35:16
where they said okay well I'm gonna give
you can identify a arm
35:20
expert searches in a very rough waves
gonna say well anybody who knows how to
35:24
use search operators like plus
35:25
miners ok clothes and site cohen right
35:29
I everybody knows what site call on us
for it and Google everybody that almost
35:33
every way for the post it on
35:34
it's a I it's a big you can tell Google
to just only return results from that
35:39
from that domain right so so there's a
very useful tool
35:43
and so only about one percent of the
Coruscant anus operator
35:46
and about only nine percent on the
user's writer but would consider
35:50
experts searched according to the
definition right the justice operator
35:54
so the idea is that from they try to
identify
35:58
you know some number of advanced users
thirty 30 a thousand of those users and
36:02
they also found of course
36:03
quite a few not advanced users 250,000
in the Microsoft Search looks
36:07
and then I started seeing looking I is
to be here a different thread what's the
36:11
difference in behavior
36:13
so I specifically they looked at factors
like car
36:16
you know what are the chorus II decor is
different and also how users click on
36:20
the results based on
36:21
if they know what they're doing an
upgrade
36:25
so in specifically they looked at what
they call session trails
36:28
right so which is very similar to what I
just shown before in the previous
36:32
slide register just more detail secure
the for searches of a digital cameras
36:36
I this is a session trail for that
result again user clicks on the page for
36:40
search result I goes back %uh goes back
36:44
right maybe goes to another you know
link on that page right so this is
36:48
basically tracking the links
36:49
from this search result arm from from
this page right
36:53
and then they go back to the search
result need to click on another result
36:56
so this is again just another
presentation on the same thing right so
36:59
the condition other Korea they can
37:01
every rotor right so
37:04
again so what they wanna try to see is
what does the
37:07
behavior looks like for advanced users
versus right so this is
37:11
the not advanced users tend to have
37:14
you know about the same length of
session about 700 seconds on average
37:19
arm the trails however are much shorter
for the last users right
37:23
that bus users a terminate faster rate
they realize I'm not gonna find
37:27
something they remember foraging theory
right they consider
37:29
estimate yet the site is not useful to
me I'm going to leave and go back
37:34
Press are so they can better estimate
what it's for you know terminating the
37:39
um they chant of course read the results
faster
37:42
right on average do only look at the
results 32 seconds versus 37 seconds for
37:46
a not invest user switches reasonable
and they
37:50
the 10+2 for the rest looks pretty
similar and ask if the number is as part
37:53
of talk about that later
37:55
in more detail arm and other thing is
that this aura don't have as many trails
37:59
member so they actually know how to use
you better queries to get
38:03
not to have all troops um from
38:07
at sea would also like to say rate and a
Caesar get more advance their search
38:11
sessions get slower or faster and faster
38:13
rent so from you know in the top 25
percent and advanced users they can find
38:18
what they need and just 200 seconds
versus seven hundred
38:21
I am the trail seconds again get shorter
and shorter editor minutes faster
38:26
I mean everything so that the trend call
trade to sort of see this or change in
38:30
no are in tearing behavior you don't
know how to she Korean you just go to
38:34
recite them start exploring
38:35
to are more arm from effective for
38:39
longer Korea sort of better better
course
38:44
kept um so what does the day interesting
they find the
38:49
I I guess I'll you can read the
conclusion so on your own but I said
38:53
I've met mention already most
interesting stuff
38:56
so I think this is a good place to arm
38:59
to summarize what we did they talked
about micro and mezzo and macro-level
39:04
browsing the total information
39:06
and we've talked about levels of detail
search and dad and variations and web
39:10
I am the talked about the user behavior
39:13
to find information now let's talk about
one other interesting site point which
39:18
a how people actually keep a diff things
they find
39:22
so they can access them again so this
has been identified by Teva and
39:28
a back in 2004 so and the definitive
study came out it doesn't 7
39:32
on what's called refining behavior and
the study is
39:36
they've done a show that about forty
percent of the Corries
39:39
are actually re finding Korea's in a
sense that the users want to get to the
39:44
same peace they've seen before
39:46
right because it's an important
reference page or whatever right so for
39:50
I'll it does anybody actually in them
know the URL for this year's railroad
39:54
researcher I don't write and who cares
for it doesn't really matter because
39:58
it's much faster to type in a row CR
2009
40:01
into Google or young looks very favorite
search engine and just click on the
40:05
first or second I hope returned result
40:08
okay so and I'm sure a lot of you have
done this recently right
40:12
so this is what they mean by refining
behavior that so if there's
40:15
actually about forty percent of the
quasar just like that where the users
40:19
don't want to remember bookmark the URL
40:21
but they trust the search engine ranking
to be stable enough
40:24
obesity right they don't say that out
loud I that they know that if they
40:29
issue the same Korea they're going to
get roughly similar is also be able to
40:34
so what and because the display percent
of the craze that's obviously a very
40:37
important phenomenon
40:39
I am and you could think of it as a kind
of a navigational Korea but it's a very
40:44
so refining recent topics of interest is
in fact some sort of
40:48
was so what we know about this refined
it well one thing is of course people
40:52
give it a test to be temporary cluster
that people don't remember the korean
40:57
right but they sort of it's it's a rough
recent within you know I'm on
41:01
or within a week right um that people
know the past for refining
41:06
that they sort of um don't you search
engines just there for example a
41:12
and recently there have been as I
mentioned a but
41:15
you can look at the session level
information the the
41:18
a I guess what I called the
41:22
micro level information to analyze the
query sessions to identify the sort of
41:27
behavior right look at the temporal
aspect you know how temporally group
41:31
I would decrease sessions look like it
said Arthur
41:34
so so how this also where they get this
forty percent from
41:38
wall I dive this case is that the same
corey has been issued before by the same
41:44
and they clicked on exactly the same
result and the Clifton only one result
41:49
right so that covers about 24
twenty-four 24 percent
41:53
I then they um a clicked on what did
more than one click
41:57
on the same result with it's very
uncommon so let's becomes great much
42:01
but also they clicked on the same result
and perhaps on Morris alright
42:05
so in some sense this is also refining
behavior that you click the same result
42:08
may be upon another good one
42:10
are but they're looking for the same
sort of topic right on there are also
42:14
cases where the users click on exactly
the same result as before or after
42:18
a new korea right except you'll see that
is not meeting you Korea's justice or
42:22
often of the same Korea the mission
before so together you sort of you add
42:28
I you you know you get the no 5 percent
five percent five percent 25% you get
42:32
about the nine percent that's where the
play percent comes
42:34
right so it's basically she exactly the
same Corey clicking on the same result
42:39
are perhaps perhaps the same when one
more and fishing slight variant of the
42:43
quick thinking on the same result or
perhaps the same on more
42:49
so you can view this kind as navigation
alright you clicked only
42:53
on one result for this query this is
42:56
sort of like information alright and
it's really do love a difficult to
43:00
to judge but but the point is there also
a repeating the same information
43:04
so they the Stephen at all did the some
very
43:07
are arm to you so it's a work but
important one where they identified the
43:12
kinds of Korea changes that
43:14
that have change know from the Corey
from the old greater than you curry
43:17
you know things like capitalization
changes will believe this is the same
43:21
core as far as I'm concerned right
43:22
I imports of like Britney Spears and
Spears Britney
43:26
virtual Martin Walmart read this till
click on the same result
43:29
or remove all right or they may not
remember to you that they don't need to
43:33
of the worst Orange County venues maybe
just Orange County music venues
43:37
so this was the regional maybe this is a
new career something to read so they
43:40
look that sort of you can imagine all
the edit distance kind of us it just
43:44
and do with the different combinations
to try to figure out
43:47
you know if you could if you can
normalize I'm about to get to the point
43:51
where the old crew looks just like the
new one and a
43:53
had some some some results there so
43:57
so one other thing that identified as
that if the search engine got better
44:00
in between and change the ranking
substantially users have a hard time
44:05
are finding the same page for example
are
44:08
in the case where there was no ranked
change and the user issue the same Corey
44:13
the time before in the time after are
about
44:16
almost 90 percent of the time though
click on the same result
44:20
if the ranking has changed less than you
know only
44:23
I'll psyched about half of the time
though click on the same result
44:27
so you know you can start thinking you
know why is this happening
44:30
right a anybody has a hypothesis
44:34
i mean they're scuffle obvious reasons
why this may be the case why is it that
44:37
when the right thing changes you know
they don't click on the same result
44:42
okay one at that okay the
44:46
okay there could be no same result okay
any others
44:52
okay maybe they remember the position
okay and
44:56
okay good alright so I think they're
getting to it
44:59
great so so this is one I think somebody
said that the result is out there
45:03
oh yes you have the handouts aka ok okay
okay
45:06
the results are better at this is
nothing in the electorate sonya is
45:09
potentially better and and their support
they didn't see it because the only will
45:14
for the first couple results
45:15
so so here's an example this was the
free for example the old result
45:19
it has disappeared right it has
disappeared is no longer there
45:22
the query was the breast cancer
treatment right
45:26
on a treatment for breast cancer has
moved up right so they didn't expect it
45:31
highly ranked or for example this result
has changed maybe the summary has
45:36
changed or something like that right
45:37
so they didn't recognize the so there's
all those cases were search engine is
45:40
trying to do a better job ad ranking:
but it's also breaking the behavior and
45:44
it's actually breaking the habit
45:45
of the people trying to refine this
result right so they can't find it
45:49
got together just remember it is
approximately so change those refining
45:52
I'm so no to rank change makes a slower
et cetera et cetera
45:57
um so so the point is basically here
that changes in their fears instability
46:03
help so what do you do so
46:04
so this is a real tough challenge for
search engines they have to both try to
46:08
improve rankings and also try to
maintain some stability
46:11
in the search results for those that
rely on on ARM
46:14
arm on the results to to refine the
46:17
stuff the phone for I'm so does anybody
have an idea overs of
46:21
in approach that search engines might
use to support both
46:24
you know better ranking and also
refining at the same time
46:29
yes what about using history
46:37
okay so something right so so the idea
would be that if you can identify to
46:41
this is your buying behavior from
46:42
music history that you know that they've
issued this query before
46:45
so you could put that user you could try
to preserve the ranking of you but more
46:49
for a new user you could try to give the
best ranking possible right the one that
46:52
has an issue this Corey
46:53
so you could finger with a simple form
of personalization again we'll talk
46:56
about this position detail in lecture 4
46:59
but but the of course the challenges can
be automatically detected the core is
47:03
intended to refined or you just you know
what I find more information right
47:08
so so for exit so specifically when you
wanna do is
47:12
given the user history and simple
navigation navigational Corey
47:15
can you predict which euro the clicked I
and
47:18
for complex queries are there not the
obligation ok can be
47:22
a figure out if doing you site will be
clicked
47:26
or whether the old site will do will be
you know clicked again
47:32
so so first you know you know very
simple way of predicting educational
47:36
career as you've seen the greatest
finish it before it's a once or twice
47:40
I or you know twice is probably even
more accurate right
47:43
and by the Korea has been that has the
same result click before well
47:48
there that actually is very effective
right and very simple rule and gives you
47:51
about 96 percent accuracy rate
47:53
so if you wish you the query twice
before I clicked on 10 it
47:57
you know one result before you going to
click the same result the you know
48:00
the third time around right not
surprisingly right
48:03
so basically it 96-94 percent accuracy
for navigational Korea CA you can
48:07
you can't predict if the user's just
trying to refine the same result okay
48:10
this is automatically using
48:12
I forget exactly his quest for the use
but you can imagine sorta
48:16
a all of them working for you on this
48:19
sold for more complex queries things are
getting more more difficult
48:23
I'll and the the train SVM to try to
identify been years of the book licked
48:27
or if their orders also the click
48:29
and the features that are effective are
things like number previous searches for
48:34
all that surprising um whether any
others also pick more than one time
48:37
and the number of clicks that each time
the crew was issued great
48:42
so accuracy using served reasonable
features using reasonable class fire
48:46
I am I use about eighty percent I'm
48:50
so that's that's probably reasonable but
it's not that great of course is
48:54
it's not clear right so so so if you can
rely on it the introduction
48:57
but that's there's also got
49:01
so in summary they some the
49:04
in this you know in the paper they
didn't just look at the Logan is it
49:07
actually ask the user to try to sort of
get a clear picture
49:10
they found the course the defining
behavior is common
49:13
on the gay shoppers are particularly
common like Chris I R example I i
49:18
I that from they could ever I sort of
Duke
49:21
two guys negation also have more complex
than funny behavior and they've shown
49:25
that you know you could the result Frank
impacts the stability and that you can
49:30
or identify those are funny Christmas
eighty to ninety percent accuracy
49:33
so lecture plan rates again the talked
about their still on the web search
49:38
rather just trying to understand what
people are doing and now they're going
49:42
to get into click models right that's
the last topic of the day
49:45
and hope you're awake and sort of have
energy
49:49
so I am so click models are
49:53
really important because you want to
predict which result users who click on
49:56
and that's determined by many factors
49:58
like if the result is relevant to the
query whether the summary
50:02
looks visually appealing right with it
whether
50:05
of course to order the presentation
because we looked at how users look at
50:10
and there are all kinds of other issues
I context sister said if you remember
50:13
the result is bad from previous session
50:15
you know going to click on right so the
point is there's a lot of interesting
50:18
factors that can influence behavior
50:20
after clicking here so it's a
50:23
so there was an interesting study by
Croscill it all in with them 2008
50:27
that empirically compared for sort of
mean
50:30
models are ways of modeling click
behavior
50:33
so initially so this is the no bias a
50:37
up a sign that basically says the users
click on relevant results
50:40
no matter right where they appear read
the user's
50:43
carefully scanned for all the results
and once they find the relevant window
50:47
click on a dreaded the the position
50:49
doesn't matter I'd only thing that
matters is that its right
50:53
so that's the motto so so why is this
base i'm ok because
50:57
sorry I relevant sub the right the
document relevance is of course
51:01
should be important for clicking and the
it's a surgery may be the case for the
51:06
results down the list you know nine in
10 maybe it's the main explanation
51:09
however of course as we know
51:10
it's not the case that the users you
look the same amount 30 so I think the
51:14
same amount but his old position 1
51:16
versus is also lower down the left the
list
51:19
so so insane in a sense this is sort of
saying the following that the summary
51:23
attractiveness is the summary is
according to relevance as a result no
51:27
matter the position right
51:28
again this is a very simple model that
basically says if you just count clicks
51:32
on the page will know that result is
most random okay
51:35
so again there are so many many problems
that the simple model
51:39
reso that's a go up are to something
more reasonable
51:42
we're going to model the arm
51:45
user interactions into israeli there's
two kinds of clicks one that's based on
51:50
and read that so this is this one right
that the click
51:54
is depends on you know the user will
click or not based on documents relevant
51:58
are some factors at the click blindly by
trusting the search engine right this is
52:03
based on our results position right most
people click on results 1
52:06
then they then I'll half of the rest
click on a result to
52:10
in dinosaur keeps going down right so
the point is that you can sort of think
52:14
that there is a mixture model here that
there's a mixture of road that
52:17
the some probability blonde Rd plus
52:20
one minus on the Radio Times blind sort
of that position
52:24
trust you going to estimate the number
of clicks red
52:27
scissors model clear read that basically
the probability and the Clippers all
52:30
depends on relevance
52:31
and depend on the position right and
their editor fred there is no
52:39
we tend to be fun in it okay waited
because it's a mix from already
52:44
sort of it's a mix of me being lazy and
52:47
it's a mix-up relevant trade the so that
it's it's just
52:50
a a way of saying is the most simple on
that you sort of want to generate a
52:54
mixture began leasing is san andreas
52:57
me but a for the question because the
next model is in fact
53:01
the saying that it shouldn't be edited
right
53:05
so so this was previously or that it's a
model
53:08
so perhaps something else is going on
that the users to scan from top to
53:12
rain and and clicks arrive from the
relevance
53:16
and they examined the results and based
on the examination they gonna click on
53:21
so they look at the first result it
looks relative to relevant they're gonna
53:25
if it's not relevant they're going to go
on to the next result read that the
53:28
probability that i'm gonna get to the
next result
53:30
rate depends on how far down the list I
am and
53:34
the relevance of the results about me
53:41
host or anything I'm skipping ahead so
sensitively just baby I'm just saying
53:45
that they have something very simple
right
53:47
that the read the lower down the list is
a I i skipped all the result
53:50
and right so that that's the that's the
process people this result and it's
53:54
but arm and so the one thing they just
said that was incorrect red
53:58
that all the results above and I
relevant actually is part of the next
54:02
that it does not depend on now but what
you know whether the documents about
54:07
so here we get to the real so
interesting model we just got the
54:11
in here I'm going to the following for
each URL
54:14
right the probability if I see relevant
result the probability
54:19
Rd I'm gonna click on it rent and
54:23
if I and the probability 11 SRD
54:26
I'm going to go in exile gonna examine
the next result right
54:30
some going to basically keep going
through each use all the time right
54:32
that's consistent if the eye-tracking
studies on this idea of those coming
54:37
but it each document I'm going to make a
decision right to independent the season
54:41
but the something to it as a two-man of
some kind red
54:44
that the you know the once I'm in the
state I'm looking at results till
54:49
probability RDM gonna click on it red
and prevail eter
54:53
one minus are you I'm gonna sort of go
on and look at the next it's alright so
54:57
the probability that I'm going to end up
here
54:59
depends not only the or the clique that
i'm looking curate
55:02
depends not only on the relevance of the
purges oh but also on the relevance of
55:05
the first the result that I've decided
to skip
55:11
and more mature and more precisely red
if you can just formalize it as a
55:15
just the simple Armada like this right
that up for a pro the
55:19
probability that i'm gonna click on
result the I is arm
55:23
the relevance right of that current
document de
55:26
plus the probability that I got to that
point rest
55:29
basically that I've I've are one minus
thirty times more money so do
55:33
while it is rare that I have sort of
ended up skipping all the stuff result
55:36
so that's that's what the second term is
the product you know for all the top
55:40
up you know read about this one I mean
and remember that most of the time and I
55:45
guess I should have said it
55:46
I used only click on one result regimen
majority a member with this
55:50
I majority of not advanced users you
click on a result
55:54
and then keep exploring and its warnings
for spring maybe they'll come back right
55:58
but most of the time to sort of just
explore deep you know into the results
56:02
max Beijing to go in and just you never
see them again
56:05
right but you know there is some small
percentage to come back and get this
56:08
model doesn't capture
56:11
so here's an example again knowing for
those you know it it's like concrete
56:16
so here's a conductor remember that
there's also some smoothing you have to
56:19
do but without smoothing
56:20
and it's a %uh the 500 clicked on I am
56:24
arm a on a that's it
56:27
so 500 that the query is it okay for a
result 1 100 victims up to one hundred
56:33
so you can now start my you can start
sort of I
56:36
modeling this behavior is the fact that
they again skip result once skipped
56:40
and finally clicked on on secret
56:44
okay so it's actually very similar
formulation as before is just pick went
56:48
its frequency and it's not that have
basically two men but is the same thing
56:51
right because they're saying that all
users
56:53
the big assumption here that all users
tactics actually the same you know each
56:57
is a quota daughter user rent so it's
just by counting frequency of clicks
57:01
right you can evaluate whether this
model is accurate or not right
57:04
because read it on that again that's
something here that every user the same
57:08
so if you just see you know five hundred
times the core issue than a hundred
57:11
times technical result to
57:13
in a hundred times technical result
pretty the thing was the same thing is
57:16
the same is one user
57:17
has issued the query five hundred times
okay so it's the same is just a frequent
57:22
formulation of the same idea
57:26
so and then you know so which one is
closer to reality right
57:29
I whoa I sort of not surprisingly the
Cascade motto is and it goes straight to
57:34
the best possible as one been no
57:36
red where the clichés and the fact that
it's not as the role of course is that
57:40
I'll become predictable is because
people do come back and pick a result
57:43
and all kinds of other things that are
not
57:45
easily predictable but or there are some
not up Russian oh but
57:49
the point is that this kid model has the
lowest per cent cigarette to basically
57:53
that together this disorder difference
in distribution predicted by the cascade
57:58
is probably the is the smallest there's
Orem that from the old daughter models
58:02
like examination baseline and I didn't
know it was adjusted model because so
58:06
it decide really madala dollars just
logistic regression trying to guess
58:09
where the flexible and right
58:10
there's no model here so so the point is
that the cascade is actually very
58:14
it's pretty reasonable forearm ordering
for predicting the amount of clicks
58:19
a given that you know the relevance of
documents for and
58:22
but course the question is how do we
know the relevant documents well that's
58:25
going to be the next lecture gonna try
to figure out given click for
58:29
these documents are relevant right but
58:32
remember I said the following: that
clicks are action not determined just by
58:36
examination relevance they're determined
by
58:40
I have people looking at the results
snippets read for those who
58:43
don't know the terminology in English
right snippet is this part is the
58:47
sort of the the summary given by the on
but a search engine
58:51
of course this is the Euro this is the
title right so the users don't really
58:56
this see the snippet and based on this
typically corner Press
58:59
so the problem is that you know snippets
may not be perfect
59:04
so here's the kind of click for you
wanna show you what do you expect red
59:07
you expect that again most people click
on the first result
59:11
a so in this case with a 40 percent and
20 percent click on the next one
59:15
senator it's a red so fewer and fewer
people click so this is what you would
59:18
expect a normal case with the search
engine does a good job when the user
59:21
behaves as predicted
59:23
so what's going on when you see
something like this for Corey kids
59:28
you see a big sort of jump
59:31
Shearer and a result for right
59:35
and even in fact there's more clicks a
result for them results 1
59:39
3 right into so something is something
is going on
59:43
so the immediate of course our
hypothesis you would suggest is that
59:47
this is caused by relevance right that
document for is perhaps much more
59:51
relevant documents 1
59:54
but that's not necessarily the case in
fact I if you try to pick up or so the
59:58
third so she hears we're gonna
60:00
let me just say this again so so be
looking at someone identify analyze
60:04
those kinds of priests were the
60:06
click-through distribution is what they
call inverted right
60:09
so I was a one of the Cooper so this
paper so so
60:13
political an inverted is that again you
get more click through it lower level
60:17
results than higher levels of
60:20
so armed the hypothesis that relevance
60:23
up the second document is higher than
the relevant survey so is the one on
60:27
both be is the one below be gets more
clicks today
60:30
so in about 33 percent or about a third
of the time indeed relevant sis
60:34
is higher but in about sixty six percent
of the time
60:39
relevance is about the same or even
lower
60:43
rest a relevance is not the dominant
factor here
60:48
so something else is right so so
60:51
anything this looks very interesting
that if the relevance was
60:54
equal you would expect that people still
choose result eight right because of the
60:59
idea to examine it from top to bottom
right the and you're equally relevant
61:02
well they should have just click Play by
something weird is happening
61:07
everybody in the stands above the Pozo
okay
61:12
so so the try to understand what are the
features of the snippets
61:17
that can that caused this sort of
inversions read that that caused this
61:21
behavior changes well there's some %uh
this thing straight
61:24
if first if it is missing in the first
result caption a but as president
61:29
people probably will not be considered
on a result it has no summary or has no
61:34
are or if this new business to short
right for example if a
61:37
has a very short summary he has a long
summary and you can imagine a and I
61:42
the arm you have this in your notes
61:48
yes to the dress sought after it okay so
you can look at yourself but you can
61:52
sort of each item as they try to come up
with a reasonable features
61:55
to from to represent the summaries other
snippets
61:59
and the you know to to to to try to
figure out the beach with those features
62:03
are actually the most
62:04
you know responsible for arm for this
inversions
62:09
so just going through and doing
62:12
from you know for example computing
chi-square throughout our other ways of
62:15
measuring what's happening
62:17
for example a clip it is missing not
surprisingly I or short right
62:21
from the most of the time this does
cause in person
62:25
right so and this sort of Christ
Technica
62:28
right it's it's it's it's statistically
significant correlation between against
62:32
short snippets and missing snippets
causing adverse
62:35
okay not surprising um things like the
62:38
that the a from it said
62:41
but I'll image I can you have the stable
so let me just point out some
62:45
aren't like that the that appearance of
the term official
62:49
red is actually very good indicator read
that somebody
62:52
claims that this is the official
homepage of and Microsoft or
62:56
whatever rate then people believe it and
do actually click on that snippet
63:00
right with the about the sixty percent
of the time though he'll actually cause
63:04
them to prefer a lower level result
63:06
arm and again you can read through this
yourself I not surprisingly having
63:10
images issues for eight people affected
by a midget
63:16
one other interesting thing is we
started looking at what a
63:19
for example the image is readable right
also if the
63:22
summary is readable red so you there
there there
63:26
fairly simple ways of give detecting
buttered Texas readable or not
63:29
by looking at the see if they have the
text is very big words ever so slightly
63:34
um and so there's the flash in others
the reliability metrics
63:38
I'm in there actually correlate less
than the expected rate people don't seem
63:42
whether to skip it this for easy easy to
read
63:46
but what we did find is that there are
some important words in the snow but the
63:50
strongly correlate attract or or reject
clicks
63:54
right so terms like encyclopedia
Wikipedia
63:58
chance to you know it during in the
Summary
64:01
10 to repel people read actually have
very strong negative correlation
64:06
the click-through so if somebody sees a
result Wikipedia
64:09
I could be could be the result even
though they tend to come up very high up
64:12
frank because there had
64:13
good a tranquil era our people tend to
actually avoid those
64:17
so arm so why do you think that is
64:22
so let us focus on the first to rate a
64:25
okay so they are needed because
64:36
rights I think I am unsure you're saying
the same thing but I miss it at my words
64:40
I if it's an educational Corey from
Microsoft I don't need to read a
64:43
Microsoft Wikipedia is right
64:45
I just wanna go to the Microsoft site
right so
64:48
for example but because again the
computer has such a high Page Rank study
64:53
armed and you know the wheel beer and
rank high in no matter what right
64:58
say it's a min said Cyclopaedia official
like I mentioned as a
65:02
people trust the search engines to when
they say it's official page wall
65:05
okay so the click on it this is a really
interesting one
65:09
why does the word and right clicks
65:14
okay go on more information
65:25
yes that's correct anarchy do you read
the paper is it just that you guess
65:28
that's very good guess
65:29
very good guess in fact that's exactly
the case so a lot of search engines
65:33
have got this have realized this and
65:36
have generated summer is not
automatically but by using
65:39
paces people somebody p or I am
65:43
you know read the Open Directory a word
which has descriptions that very nice
65:46
human written summaries
65:48
of fun web sites right so based on that
65:51
I search engine can instead of
generating some reason mining and we'll
65:54
talk about some regeneration
65:56
but in short I actually pull out real
nice humans there is a description for
66:01
the site of course people prefer does
66:02
by their beloved right instead of sort
of you know
66:05
term that term that term that K so
tourism attractions not surprising
66:11
I interestingly there were three I
rebels people because thats
66:16
people learned that spammers 10 you know
like to include those
66:19
right thing and so we see this a really
interesting things that the medically a
66:23
not surprisingly so basically and then
MedlinePlus
66:27
and information whatever so if you if
you wanted to read attract the
66:32
think he was right there you will
generate a snippet that has like
66:35
official sexy information
66:37
about recycler right then everybody will
take on it
66:44
so on that note that miss some rest I
66:47
so they talked about the truck models
proposed to simulate search Oakley
66:52
I there is increasingly sophisticated
mathematically sophisticated
66:56
there is some the only service stopped
up until 2008 since then and this year
67:00
so even more arm sophisticated
mathematical models of clicks
67:04
hum that the issue the current click
miles assume the searchers rational
67:10
visions unfortunately they're not
they're not rush know or care for
67:14
the Skip results they don't read them
carefully the
67:17
they're very influenced by the quality
of summer is like the words and an aunt
67:20
or the words you know
67:22
whatever rate so so therefore it did so
user and I'm irrational writes
67:25
unfortunately it so difficult to
perfectly predict
67:28
there click behavior and then um
67:32
a so that the next lecture course this
was the whole set up a splat
67:36
is that now that mister understand a
little bit more about behavior and how
67:41
gonna now try to in the next lecture try
to extract
67:44
relevant information from this behavior
so they can for example program teeing
67:48
do all kinds of useful things with it so
okay so that's going to be the next
67:51
pictures how to extract relevant
information
67:54
so II guess we have I don't do we have
let me just some rest or
67:58
I guess you can read it yourself right
to talk about the radical model also
68:01
talked about different levels of
behavior and cookbook recalls
Комментариев нет:
Отправить комментарий