Master Thesis

Simulated emotional AI with memories

This is my master thesis topic, with the full title "Extending the Artificial Psychosocial Framework with an emotional memory system simulating semantic knowledge, episodic memory and short-term memory ". The codebase is in GitLab. And the defense video is on Youtube.

AI is a critical part of games, the implementations are varied from each game to each game though. Most of the AIs are functional and hard-coded with emotions. One exception is The Sims, where sims are actually simulating their emotions according to game events. Artificial Psychosocial Framework (APF) proposed a framework for simulating AIs with emotions, personalities, relationships between NPCs and objects, NPCs and events, and NPCs and NPCs. Based on the simulation method and this APF, I propose this memory system to enable game AIs to remember past game events and states, generate semantic knowledge, retrieve memories with fallibility, and generate future prospects of current events based on memories, which feed back to APF for certain emotions generations.

This extended memory system, combined with APF, is really a starting point for NPCs to reflect and understand their experiences and the game world. They could reason with current scenarios, reevaluate memories to evolve feelings and characteristics, and eventually have a deeper connection with the player.

Memory System

The memory system has a self-update logic and is composed of two parts: short-term memory (STM) and long-term memory (LTM). The STM is a looped list with limited space, mimicking people's short-term memory. It could forget and mutate with time elapsing and enhance memories when similar memory traces happen repeatedly. The STM could consolidate some memory trace into the LTM based on custom criteria, like emotion saliency, and enhance memories for repetitions. The LTM contains two parts, semantic graph and episodic memory. Semantic graph is a directed graph, where the nodes are game states and the edges are game events. It could forget and mutate with time, and combine similar nodes or edges to generate semantic knowledge for cause and effect relationship between nodes and edges. Episodic memory stores the actual sequence of states and events in the past. It could forget and mutate with time, update semantic graph and be retrieved through reconstructing in LTM.

Some basic functions are also supported inside the memory system, like customizable function pointers for portability and query functions to search for any combinations of memory trace in both STM and LTM.

Vocabularies

There are two types of entities in the artifact, NPCs (player is a special kind of NPC) and objects (Non-NPCs). An NPC could have attitudes toward objects, have praise toward actions, and have social relations with NPCs or players. Objects and NPCs could have states, describing their positions, emotions, etc. For the memory system, memories and actions are the only two forms of information. This artifact currently is mostly string-based and could be extended into specific game information easily. Some of the definitions for these are derived from APF.

State: state currently only includes a string and an intensity, and is used for describing NPCs and objects. Like a cricket could be contained or moving. The combination of all states of all objects and NPCs is game states.
Object: every object has its own states, currently they are just multiple strings with intensities, but could be derived to include other game information like positions, factions, etc.
NPC: NPC is derived from Object, but its states include its mental states and current action as well. One NPC would have one memory system and also keep tracking its sensed game states, which could possibly bring in the Theory of Mind.
Action: except for five parameters: owner, actor, action, patient, and certainty required by APF, I brought in the concept of adjectives and tags.
1. Action adjectives: for string-based action definitions, the intent of actions couldn't be conveyed. Like for action Kill, Kill for money may be evil for most NPCs, but Kill for saving people could be acceptable for some. So to resolve this issue, an action could have multiple adjectives, each adjective has a delta valence value: for money would have a negative delta, and for saving people has a positive one, which makes Kill less and more acceptable correspondingly. Though in this artifact, no action adjectives are actually used yet. Also, some issues related to string-based definitions could be solved by adjectives. For example, "A send a gift to B" action would generate exactly same emotions for NPC A and B if their profiles are same; and adding adjectives of initiator and subject could solve this problem.
2. Action tags: tags are used to compare the similarity between two actions. For example, eat and drink should be similar to some extent, as NPCs could both drink energy drinks and eat food to recover or heal, and they would own the same tag Food. For now, Jaccard similarity is used for tags' comparison.
Memory: one memory includes the states of the memory's owning NPC, its sensed game states and its mental states. And the similarity comparison in the artifact derived from Jaccard similarity, taking consideration of states' intensity.
Memory trace: memory trace refers to the ordered list in form of memory, action, memory, action, ..., memory. This structure is commonly used for storing retrieval result from the memory system.

Event Flow

Fig. 1 Memory System Event Flow

The picture above shows the flow of a game event inside the memory system. When a new game event happens or should happen, the memory system would retrieve memory traces of similar game states and events based on current game states and this new event. The future prospects would be generated from these traces, which are dependant on AI's personality and current moods. The AI's actual emotion would be the combination of prospects and raw emotion generated from the APF. In the meantime of emotion generation, AI could select the best action from all possible actions in retrieved traces based on its unique settings. The actual new event with this selected action and game states with actual emotion would be remembered by the memory system. Lastly, this actual event would be fired by the NPC.

Fig. 2 Hierarchy of emotions in OCC model (Steunebrink et al.)

Emotions and Memories

Emotions and memories are actually affecting each other. Some emotions, like hope and fear, depend on the prospects of future, which generated by memories of similar pasts. Memories, on the other hand, would be heavily influenced by emotions, like extremely happy or sad memories would be remembered much longer than memories with neutral emotions. More theoretical discussions on emotions and memories could be found in Lim and Juzheng.

For the memory-dependant emotions, they are under CONSEQUENCE (OF EVENT) branch in the left graph. Also, this artifact only includes one NPC, so implementation for emotions related to other NPCs or players is neglected. Therefore, only positive, negative, pleased, displeased, hope, fear, joy, distress, satisfaction, fears-confirmed, relief, disappointment, gratification and remorse are the focus of the artifact. And whenever a new event happens for the NPC, a future prospect for game states and emotions would be generated in the memory system, and the actual emotions are the combination of prospect emotions and direct emotions appraised from APF.

For emotions affecting memories, emotions usually take effect in moods form. Moods are more general and long-lasting emotions than feelings, like the second layer in the left picture, pleased, displeased, approving, disapproving, liking and disliking. For example, mood-congruent effect and mood-dependant effect could make memories with similar to current moods more reliably recalled. In the artifact, high mood saliency helps memories in STM and LTM to last longer, and easier and more accurate to be retrieved. And memory traces with similar moods to the NPC's current moods are prioritized and valued more for prospect generation.

The memory system is based on APF library, which is responsible for game event appraisal and recording all attitudes, praise, social relations, mental states, objects, NPCs and events.

Query

The query mechanism for the memory system is based on Command pattern, constructing a query list and return all similar memory traces found in STM, semantic graph and episodic memory. A query list is a list of query nodes, the nodes are placed in chronological order of this memory trace, and there are two kinds of nodes: memory node and event node. Each node includes the function pointer to calculate similarity and the target to compare with.

A standard query list is M1 -> E1 -> M2 -> E2 -> ..., where M stands for memory node and E stands for event node. And when searching inside the memory system, the returned trace should satisfy the similarity requirements for each of the node, like for every Mi and Ei the similarity calculated from the function pointer toward the corresponding targets should meet requirement.

Small jumping is also supported in the artifact. In other words, query lists in form of M1->M2->... and E1 -> E2 -> ... are valid. M2 node would directly check for the next memory, and regard any connecting actions to be valid. E2 node would directly check for next actions, and regard any memory in between E1 and E2 is valid.

Long-term Memory

Long-term memory could be divided into implicit memory and explicit memory. Explicit memory (declarative memory) could be divided into episodic memory, semantic memory and autographical memory. Episodic memory is for specific events happened at specific time. Semantic memory refers to knowledge about factual information. For example, "when hungry, eating a sandwich makes the hunger disappear" is a semantic memory. But "on Feb. 1st I ate a sandwich from Shake Shack and was full afterwards" is an episodic memory.

For the memory system I propose, only semantic memory and episodic memory are implemented for LTM.

Semantic Graph

The semantic graph in LTM is a directed graph, where the nodes are game states (or memories) and the edges are game events (or actions), derived from Li et al. (2013). The picture on the right shows the structure. It could forget and mutate with time elapsing, which means the certainty of events, memories, and emotion saliency would decay or mutate as the NPC lives longer. It could also combine similar nodes or edges, to generate semantic knowledge for cause and effect relationship between nodes and edges, which works like deductive learning for humans. Currently in the artifact, searching based on BFS and QueryList is supported, with multiple versions of generating mutated MemoryTrace.

The nodes contain game states, states of objects and NPCs and players in the game, and the owner of this memory system's emotions. Each node's forget and mutate is based on its own existing age inside the memory system. The emotions are the latest 14 emotions the NPC has for this node's game states, and a rolling average is used for any emotion calculation in LTM. Nodes could be blank (shown as greyed M5), the NPC couldn't recall the exact content of blank nodes, but could refer from similar nodes in the graph and make up reasonable guessing. Nodes could also be saturated (shown as red M2), the NPC will never forget the content of the node since then, even the confidence of its content may drop to very low.

The edges contain probability and happening times, keeping track of how many times this specific game event happened between its cause and effect nodes in LTM. In the picture above, thicker edges indicate a higher probability.

Fig. 3 Long-term memory semantic graph structure

Episodic Memory

One episodic memory is actually a finite memory trace, where those specific events and states are stored in exact chronological order of happening. It could mutate and decay with time, even decay to totally faded and got deleted from the memory system. And it has two major features: reconstructing retrieval and updating the semantic graph.

For reconstructing retrieval, it would pick a similar route in the semantic graph, and replace nodes or edges satisfying replacing requirements. For example, if the similar route is M1 -A1-> M2 -A3-> M3 (as indicated in Fig. 3), and M1 node meets the replacing requirements, then the retrieved memory trace is M1' -A1-> M2 -A3-> M3. This replacement could be further expanded, so that a positive NPC would tend to remember the past episodes happy and bright, replacing those pessimistic game states.

For updating the semantic graph, it would pick a similar route in the semantic graph, and updating the game states or mental states according to game-specific needs, including re-appraisal and re-experiencing. This updating could be triggered automatically when this episodic memory is retrieved and activated, or according to time elapsing, or when the game needs it.

Fig. 4 Episodic Memory Structure

Short-term Memory

Short-term memory is meant for holding limited quantity of information for a short period of time. Therefore, STM in the memory system is a looped list with limited space, so that the latest memories and actions would overwrite the oldest ones automatically. STM is stored in exactly memory, action, memory, action... order, keeping the cause and effect relations for actions and memories.

STM could mutate and decay with time, and the memories and actions could decay to totally invalid.

And two critical parts for STM are consolidating memories from STM to LTM and enhancing memories and actions for repetitions.

Fig. 5 Short-term Memory Structure

Consolidation

The process of memory transfer from STM to LTM is called consolidation. One memory trace from STM could be consolidated automatically if it meets certain criteria like the emotions generated are saliant enough, in other words, extremely traumatic memory would be consolidated into LTM for sure. The consolidation could also be triggered manually according to game needs, like the artifact always consolidates STM after one session.

For the artifact, I also updated the praise and attitudes and delta valence of states inside APF library during this consolidation phase, so that NPC's attitudes for objects and praises for actions are actually dynamic in game. As a result, the attitudes toward a disgusting bug could become better when the general impression about the experiences dealing with this bug is happy and good, producing effects like brain-washing.

Repetition

A same sequence of memories and actions that happen repeatedly would enhance this memory trace's strength, and make it more likely to be consolidated into LTM. There are two commonly accepted repetition theories for human memories: multiple trace hypothesis and cumulative strength hypothesis. For this memory system, I use multiple trace hypothesis, where more repetition happens, more traces would be logged into the system and make this kind of trace easier to be recalled. And for NPCs, this repetition enhancing could be based on time elapsing or other game-specific requirements. Special behaviors like spaced repetition could be extended from current implementation easily.

For current repetition enhancing mechanism, the memory system would perform a query for both STM and LTM for the latest experiencing memory trace in STM, and if the returned result satisfies requirements, like when there are sufficient similar traces in the memory system, an enhancing process would be operated for this piece of the latest memory trace. For the artifact, the enhancing would polarize the mental states, increasing certainty for actions and increasing repetition times for this action, which would be added to the happening times for edges in the semantic graph if consolidated to LTM. So that this specific trace is more likely to be recalled when action possibility is the main focus.

Artifact

The artifact is a simulation for insect phobia cured by graduated exposure, inspired by Jones and Friman (1999). The player acts for the boy Bob, who has cricket phobia that severely interferes with his school performance, having this implanted fake belief that crickets would always hurt him however he interacts with them. The player needs to select exposure events of different fearful levels, together with resulting reinforcement events, and try to make Bob gradually get used to cricket as soon as possible.

The player could select five combinations of exposures and resulting reinforcements during one session, while monitoring Bob's emotions and relationships with objects and actions. Consolidation would happen only between sessions, making Bob re-appraise and adjust his attitudes of objects and states and praises for cricket-related memories and actions.

All the predefined data is saved in XML format and read into the artifact at startup. Some debug functions are also available: draw current long-term memory graph based on Graphviz. A demo video is shown below.

Fig. 6 Example part of LTM graph after fully combined

Exposure events

Exposure events are ones that require Bob to interact with crickets and get used to them. The events are listed from up to down in the increasing order of severity for Bob, see Fig. 7, so that holding cricket in both hands would make Bob really distressed, while hold a jar of crickets would make Bob less distressed.

Fig. 7 Exposure events

Reinforcement events

Reinforcement events happen after exposure events, and include three kinds of events: receiving actual rewards, receiving no reward and actually hurting by crickets. The events for actual rewards are listed from up to down in increasing order of liking for Bob, see Fig. 8, so that receiving a lego would make Bob much happier than receiving a candy.

When no reward is given to Bob, he would feel quite relieved and less fearful, because he had this stereotype that crickets would always hurt him, and it turned out the crickets didn't hurt him, proving that pure exposure could work for phobia cure sometimes. But when Bob is actually hurt by crickets after the exposure events, this stereotype would be strengthened, and he would hate crickets more.

Fig. 8 Reinforcement events

Case Study

There are many possible combinations of exposure events and reinforcement events, and many possible orders for arranging these combinations into sessions. Below listed three representative cases for the artifact. The exposure events selections are the same, but resulting reinforcement events are different for different treatments. The vertical dividing blue lines are indications for different periods of exposure events: from session 0-5, the exposure is "Hold a jar of crickets"; from session 6-10, the exposure is "Touch crickets with foot"; from session 11-15, the exposure is "Close eyes and stand in a room with crickets"; from session 16-19, the exposure is "Pick up cricket with a sheet of paper"; from session 20-22, the exposure is "Pick up cricket with gloved hand".

Fig. 9 Different Treatments with Different Outcomes

The best graduated exposure with reinforcement treatment should be gradually increasing the severity of exposure events while gradually decreasing the liking extent of reinforcement events, shown as the dark blue line in Fig. 9. Because after each session, Bob would consolidate his STM into LTM and update his attitudes and praises accordingly, keeping Bob in positive emotions for the whole session would increase his impression for crickets and these actions. And while the cricket-related impression gets improved, the reinforcement's impression would get worse, as an exchange for the feelings toward rewards and crickets. In other words, there exists this transferring effect between the liking of objects that are co-existing in a same period of time. Bob would be able to get rid of the fear for crickets and cricket-related events after 11 sessions. One problem is that Bob could become like the crickets after the game. Some special clamping or updating methods could resolve such issue, so that NPCs couldn't be turned totally against what they were at the start of game, keeping their characteristics for some degree.

It is also possible to have Bob get rid of cricket phobia by purely graduated exposure without reinforcement, shown as the light blue line in Fig. 10, nothing happens, no rewards or hurting. Bob is afraid of crickets mainly because of this unrealistic and imaginary fear that crickets would hurt him badly. And when Bob interacted with crickets safely without these prospect fearful imaginations coming true, he would feel quite relieved and much less fearful. This decrease in negative emotions and increase in positive emotions could make his attitudes toward cricket turn better than before, though this turning is much slower than combined with reinforcements. And there are also noticeable larger improvements in liking towards crickets at the start of each period of exposure events. Because Bob was expecting far worse things happening after exposures with his implanted false belief, "Crickets would always hurt him however he interacts with them", and it turns out that nothing happens, bringing a lot of relief inside him. And in the following sessions with the same exposures, he would already be expecting nothing to happen, thus generating less relief.

Actually hurt by cricket is also an option for reinforcement events, shown as the green line in Fig. 10. The general feeling for these experiences would be quite negative, because both the prospect generated from memories and direct feelings appraised from APF are negative. This actually hurt by cricket would resonate with Bob's initial impression and prospects, and make this memory trace more significant and easier to recall. After consolidation, the possibility for interaction with cricket leading to hurting Bob himself would increase. Therefore, Bob would fear crickets more, and it would be even harder to revert his fears for crickets in the future.

Fig. 10 With and Without Imagined Fear

This current pre-setting of Bob’s imagined fear is the critical factor to his recovery from the phobia of crickets, as he gradually disconfirms his imagined fear by finding out nothing would happen eventually. But actually, this pre-defined imagined fear, or false belief, didn’t exist until I found out the no reward cases wasn’t actually curing Bob.

There are times of going down for without imagined fear cases, and the general improvement of Bob’s phobia is quite small, compared to with imagined fear cases, not reaching half of the curing effects. And to realize this result, I have actually done some calculations tweaks, to maximize the effects of feelings like relief. Because for without imagined fear cases, Bob wouldn't have as much relief, as he wouldn't expect negative things to happen. This less positive feelings would make Bob alter his attitudes toward crickets much slower. And this also shows the difference of “merely not liking” and “not liking something because imagining bad things would happen”, Bob is implanted with fear of getting hurt in this light blue case.

Debugging

To better observe and test artifact features, two major debugging tools are used: text-based logging and graph-based drawing.

Text-based logging:
1. For in-game text logging, ImGui is used, printing the exact text information of AI's emotion, personality, actions taken, as well as drawing the line charts for emotions, praise and attitudes.
2. For logs recorded in game, I wrote a script for processing these logs into .csv files, for post analysis based on Excels.
Graph-based drawing: the LTM semantic graph is drawn into .png files based on Graphviz. One example is shown as Fig. 6.

Page updated

Google Sites

Report abuse