Research in Ranking
With the considerable advances in technology over the past few years,
information overload has become a serious problem in our lives. In a world that
continually asks for more space to keep more information, an efficient method
for recalling information is an everyday necessity for all computer users.
Activation Based Ranking fulfils that need. It is not simply another attempt
at a desktop search engine, but a flexible personal ranking technology that can
be used in a variety of information management tasks.
Hebbian develops search and retrieval system, which applies the
latest advances in the study of human-computer interaction and human memory
processes to various aspects of computer information management. This technology
centres on Activation Based Ranking, which ranks search results based on how
well the information is "remembered".
Human thought processes are extremely complex and are still far from being
understood. We do not claim that Activation ranking exactly matches the
functions of the human brain. However, by applying the same general principles,
which are hypothesized to work in human thought processes, the technology used
by Hebbian Recall parallels the usage of documents in a computer to human memory
processes like practice and forgetting. In fact, the user naturally ranks
his/her own documents simply through the everyday use of the computer.
About Activation Based Ranking
Traditional ranking algorithms compare the text in the query with the text in
the document, and then rank documents solely on similarity or dissimilarity
between the two. However it is a very tough task due to the ambiguity of natural
language texts. Another common way of ranking documents is arranging by
attributes (or metadata). A typical example is sorting files by name or date.
This method is efficient only when the volume of information is fairly small or
the user is well organized.
However, if you provide people with keywords and ask them to recall documents
based on those keywords, their results will not be based on the frequency of
those words occurring in a particular documents. Instead, they will rank
documents based on how useful a document was to them, how much effort they spent
on it, and when they last worked on it. All of these factors merge together to
form what human memory researchers refer to as activation.
Activation is well known in the field of neuropsychology. Activation of a
particular piece of information reflects the degree of the user's past
experience with this information in association with a current context. It
indicates how useful this information is at the current moment. In simple terms,
things that you recall from the "top of your head" in a particular situation are
top activated information items associated with the context of the situation.
When information becomes easier to access within the brain as a result of having
been used recently, it is more activated. Activation is tightly coupled with
remembering (probability of recall) and forgetting.
There are several components of activation:
- base level activation = depends on the strength of practicing the
information (how much we have used it) and the recency of the information (how
long since we have used it)
- partial matching activation = depends on how well the information we are
trying to recall matches the clues that we have
- context OR distributed activation = items which are activated because of the
current context or elements of the goal can "spread" their activation to related
items
For example, if someone were to imagine a tree, the tree that appears in
their mind will either be a tree that they have seen recently in their backyard,
or a tree that carries certain significance in their life. If you have just read
an article about Maple syrup, however, a maple tree may override other trees in
your memory because the information item for "maple" has already been partially
activated.
ACT-R and SOAR are just two of a number of well developed academic theories
that model activation as a part of human cognition and thought processes.
The spikes on this graph correspond to the moments when the user "practiced"
or "used" the particular item of information. The size of the spike is
proportional to the depth of processing experienced by the user, as well as the
effort they exerted. After each practice point, activation decays due to time or
other interference.
When the activation of an item drops below a certain point, the item is
considered "forgotten". This decay is very fast at first and then slows down
with time to a plateau (hence the label "negatively accelerating" curve). That
is why we "forget" a lot of information fast (steep part of the curve), but can
still recall the information if given the proper cues. The information is not
truly forgotten; rather it is simply lying dormant with a low level of
activation. Once retrieval or contextual cues are given, the activation of the
item increases, and the item can thus be recalled. It is important to note also
that practicing has a cumulative effect. Frequent practicing leads to a less
steep decay curve and higher residual activation.
Application of Activation Principles to Enterprise Search
Models for human memory have been based on empirical results, which have been
observed and documented over several decades and with thousands of human
participants.
One of the most well-known methods for testing this pattern of human memory
is the Word Recall Task. In this task, researchers present participants with
test sentences, followed by separate words from the sentence. For each word, the
participant is asked to recall the original sentence, and activation
measurements based on how difficult it was to remember the sentence, are taken.
The presentation of the sentences is the practice of information, recall based
on the words given is the partial matching component, and previously recalled
words and sentences form the activation context.
This same technique is used in Hebbian products to rank and search a user's
documents. A document is treated similarly to the sentence in the above
example, with words from a query used for the partial match and other activated
documents or specific task goals used as a context. In order to build a complete
activation model, depth and strength of processing have to be estimated. This
estimation is based on one of the following stages in human memorization:
Attention <- Encoding <- Storage <- Retrieval
The processes involved in encoding, storage and retrieval are functions
internal to the brain and are therefore difficult to quantify. However, user
attention to a particular information item can be measured and approximated.
Users can pay attention to a particular item of information by passively viewing
or hearing it, by actively interacting with the document (editing for example)
or by simply thinking about the information item without having it in from of
him or her. The last case is not measurable, and is not essential for our
approximation.
The ranking method used is universal and can be applied to any kind of
document (or more generally any information presentable to the user, such as a
user interface element, fragments of documents, individual words, MP3 songs or
movie characters). It does not depend on the document content, type of user
activity, or user working habits and does not require specific actions from the
user.
While our ranking is a simplified approximation of the processes that may
happen in a human brain, it is based on tried-and-tested principles of
practicing and forgetting. Given the wide range of information contained in a
document (from texts to landscape photos) and the fact that computers are not
telepathic/intelligent beings that share user's values - we don't see any better
way which is generic enough to model activation of the documents.
Activation calculations can be improved for specific type of documents,
specialized texts or tasks with classification or understanding of the document
content, context or user intentions, but none of these methods are universal or
accurate enough in current state of computer science.
Research in Attention Tracking
Human factors research traditionally deals with the issues of human-computer interactions. There are well established laws that tell how easy or difficult to do certain actions at the computer. These principals are successfully used in user interface design. In our research we reversed this: by observing user behavior we estimate how much efforts user had to spend to do certain actions. We also build number of probabilistic models that estimate amount of attention user paid to certain information items on the screen. These estimates incorporate artifacts from eye tracking studies and human attention research.
Applications of Activation Based Ranking Technology
Search and retrieval, as implemented in Hebbian Recall, is an obvious use of
activation data and technology. However, activation-based ranking is also
applicable to a wide range of applications, which we are currently developing.
These include:
User Context
Given a fixed number of a user's most activated
documents (the ranking determined by their individual activation), a context can
be created to accurately predict a user's probable focus of interest. Because
the activation of the documents is constantly changing, the context will change
as necessary, and will be completely maintenance-free. This context information
could be further used to personalize web searches, adjust user interfaces,
organize information, and for other applications.
Collaboration
Activation data and user context can be exchanged,
collected and mined to improve collaboration over intranets and the Internet.
Possible uses include:
- popularity feedback and ranking of online resources
- enterprise search, ranking of informational assets
- people and communities with similar interests
- capturing and sharing the best resources used by experts in the
enterprise
Storage Management
Backing up and moving data to maximize storage
capacity is a common practice for all in the information technology field.
Activation ranking provides a superior mechanism for storage management. It can
be used for both individual users and shared storages, and also provides an
effective algorithm for cache management.
Device Synchronization and Priority Information Transmission
Activation data will also allow much smarter data and file synchronization. For
example, a user can specify a specific query or a few key attributes as clues
for essential data to be synchronized to his or her PDA access. Using base
activation, partial matching, associative activation and total size of the PDA
memory, this technology can create a set of the most relevant documents that the
user can then review and transfer to the PDA.