Configuring Policies

The rasa.core.policies.Policy class decides which action to take at every step in the conversation.

There are different policies to choose from, and you can include multiple policies in a single rasa.core.agent.Agent.


Per default a maximum of 10 next actions can be predicted by the agent after every user message. To update this value you can set the environment variable MAX_NUMBER_OF_PREDICTIONS to the desired number of maximum predictions.

Your project’s config.yml file takes a policies key which you can use to customize the policies your assistant uses. In the example below, the last two lines show how to use a custom policy class and pass arguments to it.

  - name: "KerasPolicy"
    - name: MaxHistoryTrackerFeaturizer
      max_history: 5
        - name: BinarySingleStateFeaturizer
  - name: "MemoizationPolicy"
    max_history: 5
  - name: "FallbackPolicy"
    nlu_threshold: 0.4
    core_threshold: 0.3
    fallback_action_name: "my_fallback_action"
  - name: "path.to.your.policy.class"
    arg1: "..."

Max History

One important hyperparameter for Rasa Core policies is the max_history. This controls how much dialogue history the model looks at to decide which action to take next.

You can set the max_history by passing it to your policy’s Featurizer in the policy configuration yaml file.


Only the MaxHistoryTrackerFeaturizer uses a max history, whereas the FullDialogueTrackerFeaturizer always looks at the full conversation history. See Featurization of Conversations for details.

As an example, let’s say you have an out_of_scope intent which describes off-topic user messages. If your bot sees this intent multiple times in a row, you might want to tell the user what you can help them with. So your story might look like this:

* out_of_scope
   - utter_default
* out_of_scope
   - utter_default
* out_of_scope
   - utter_help_message

For Rasa Core to learn this pattern, the max_history has to be at least 4.

If you increase your max_history, your model will become bigger and training will take longer. If you have some information that should affect the dialogue very far into the future, you should store it as a slot. Slot information is always available for every featurizer.

Data Augmentation

When you train a model, by default Rasa Core will create longer stories by randomly gluing together the ones in your stories files. This is because if you have stories like:

# thanks
* thankyou
   - utter_youarewelcome

# bye
* goodbye
   - utter_goodbye

You actually want to teach your policy to ignore the dialogue history when it isn’t relevant and just respond with the same action no matter what happened before.

You can alter this behavior with the --augmentation flag. Which allows you to set the augmentation_factor. The augmentation_factor determines how many augmented stories are subsampled during training. The augmented stories are subsampled before training since their number can quickly become very large, and we want to limit it. The number of sampled stories is augmentation_factor x10. By default augmentation is set to 20, resulting in a maximum of 200 augmented stories.

--augmentation 0 disables all augmentation behavior. The memoization based policies are not affected by augmentation (independent of the augmentation_factor) and will automatically ignore all augmented stories.

Action Selection

At every turn, each policy defined in your configuration will predict a next action with a certain confidence level. For more information about how each policy makes its decision, read into the policy’s description below. The bot’s next action is then decided by the policy that predicts with the highest confidence.

In the case that two policies predict with equal confidence (for example, the Memoization and Mapping Policies always predict with confidence of either 0 or 1), the priority of the policies is considered. Rasa policies have default priorities that are set to ensure the expected outcome in the case of a tie. They look like this, where higher numbers have higher priority:

5. FormPolicy
4. FallbackPolicy and TwoStageFallbackPolicy
3. MemoizationPolicy and AugmentedMemoizationPolicy
2. MappingPolicy
1. TEDPolicy, EmbeddingPolicy, KerasPolicy, and SklearnPolicy

This priority hierarchy ensures that, for example, if there is an intent with a mapped action, but the NLU confidence is not above the nlu_threshold, the bot will still fall back. In general, it is not recommended to have more than one policy per priority level, and some policies on the same priority level, such as the two fallback policies, strictly cannot be used in tandem.

If you create your own policy, use these priorities as a guide for figuring out the priority of your policy. If your policy is a machine learning policy, it should most likely have priority 1, the same as the Rasa machine learning policies.


All policy priorities are configurable via the priority: parameter in the configuration, but we do not recommend changing them outside of specific cases such as custom policies. Doing so can lead to unexpected and undesired bot behavior.

Keras Policy

The KerasPolicy uses a neural network implemented in Keras to select the next action. The default architecture is based on an LSTM, but you can override the KerasPolicy.model_architecture method to implement your own architecture.

def model_architecture(
    self, input_shape: Tuple[int, int], output_shape: Tuple[int, Optional[int]]
) -> tf.keras.models.Sequential:
    """Build a keras model and return a compiled model."""

    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import (

    # Build Model
    model = Sequential()

    # the shape of the y vector of the labels,
    # determines which output from rnn will be used
    # to calculate the loss
    if len(output_shape) == 1:
        # y is (num examples, num features) so
        # only the last output from the rnn is used to
        # calculate the loss
        model.add(Masking(mask_value=-1, input_shape=input_shape))
        model.add(LSTM(self.rnn_size, dropout=0.2))
        model.add(Dense(input_dim=self.rnn_size, units=output_shape[-1]))
    elif len(output_shape) == 2:
        # y is (num examples, max_dialogue_len, num features) so
        # all the outputs from the rnn are used to
        # calculate the loss, therefore a sequence is returned and
        # time distributed layer is used

        # the first value in input_shape is max dialogue_len,
        # it is set to None, to allow dynamic_rnn creation
        # during prediction
        model.add(Masking(mask_value=-1, input_shape=(None, input_shape[1])))
        model.add(LSTM(self.rnn_size, return_sequences=True, dropout=0.2))
        raise ValueError(
            "Cannot construct the model because"
            "length of output_shape = {} "
            "should be 1 or 2."


        loss="categorical_crossentropy", optimizer="rmsprop", metrics=["accuracy"]

    if common_utils.obtain_verbosity() > 0:

    return model

and the training is run here:

def train(
    training_trackers: List[DialogueStateTracker],
    domain: Domain,
    **kwargs: Any,
) -> None:


    training_data = self.featurize_for_training(training_trackers, domain, **kwargs)
    # noinspection PyPep8Naming
    shuffled_X, shuffled_y = training_data.shuffled_X_y()

    if self.model is None:
        self.model = self.model_architecture(
            shuffled_X.shape[1:], shuffled_y.shape[1:]

        f"Fitting model with {training_data.num_examples()} total samples and a "
        f"validation split of {self.validation_split}."

    # filter out kwargs that cannot be passed to fit
    self._train_params = self._get_valid_params(
        self.model.fit, **self._train_params

    self.current_epoch = self.epochs

    logger.debug("Done fitting Keras Policy model.")

You can implement the model of your choice by overriding these methods, or initialize KerasPolicy with pre-defined keras model.

In order to get reproducible training results for the same inputs you can set the random_seed attribute of the KerasPolicy to any integer.

Embedding Policy


EmbeddingPolicy was renamed to TEDPolicy. Please use TED Policy instead of EmbeddingPolicy in your policy configuration. The functionality of the policy stayed the same.

TED Policy

The Transformer Embedding Dialogue (TED) Policy is described in our paper.

This policy has a pre-defined architecture, which comprises the following steps:

  • concatenate user input (user intent and entities), previous system actions, slots and active forms for each time step into an input vector to pre-transformer embedding layer;

  • feed it to transformer;

  • apply a dense layer to the output of the transformer to get embeddings of a dialogue for each time step;

  • apply a dense layer to create embeddings for system actions for each time step;

  • calculate the similarity between the dialogue embedding and embedded system actions. This step is based on the StarSpace idea.

It is recommended to use state_featurizer=LabelTokenizerSingleStateFeaturizer(...) (see Featurization of Conversations for details).


Configuration parameters can be passed as parameters to the TEDPolicy within the configuration file. If you want to adapt your model, start by modifying the following parameters:

  • epochs: This parameter sets the number of times the algorithm will see the training data (default: 1). One epoch is equals to one forward pass and one backward pass of all the training examples. Sometimes the model needs more epochs to properly learn. Sometimes more epochs don’t influence the performance. The lower the number of epochs the faster the model is trained.

  • hidden_layers_sizes: This parameter allows you to define the number of feed forward layers and their output dimensions for dialogues and intents (default: dialogue: [], label: []). Every entry in the list corresponds to a feed forward layer. For example, if you set dialogue: [256, 128], we will add two feed forward layers in front of the transformer. The vectors of the input tokens (coming from the dialogue) will be passed on to those layers. The first layer will have an output dimension of 256 and the second layer will have an output dimension of 128. If an empty list is used (default behavior), no feed forward layer will be added. Make sure to use only positive integer values. Usually, numbers of power of two are used. Also, it is usual practice to have decreasing values in the list: next value is smaller or equal to the value before.

  • number_of_transformer_layers: This parameter sets the number of transformer layers to use (default: 1). The number of transformer layers corresponds to the transformer blocks to use for the model.

  • transformer_size: This parameter sets the number of units in the transformer (default: 128). The vectors coming out of the transformers will have the given transformer_size.

  • weight_sparsity: This parameter defines the fraction of kernel weights that are set to 0 for all feed forward layers in the model (default: 0.8). The value should be between 0 and 1. If you set weight_sparsity to 0, no kernel weights will be set to 0, the layer acts as a standard feed forward layer. You should not set weight_sparsity to 1 as this would result in all kernel weights being 0, i.e. the model is not able to learn.


Pass an appropriate number, for example 50, of epochs to the TEDPolicy, otherwise the policy will be trained only for 1 epoch.


Default max_history for this policy is None which means it’ll use the FullDialogueTrackerFeaturizer. We recommend to set max_history to some finite value in order to use MaxHistoryTrackerFeaturizer for faster training. See Featurization of Conversations for details. We recommend to increase batch_size for MaxHistoryTrackerFeaturizer (e.g. "batch_size": [32, 64])

The above configuration parameters are the ones you should configure to fit your model to your data. However, additional parameters exist that can be adapted.

| Parameter                       | Default Value    | Description                                                  |
| hidden_layers_sizes             | dialogue: []     | Hidden layer sizes for layers before the embedding layers    |
|                                 | label: []        | for dialogue and labels. The number of hidden layers is      |
|                                 |                  | equal to the length of the corresponding.                    |
| transformer_size                | 128              | Number of units in transformer.                              |
| number_of_transformer_layers    | 1                | Number of transformer layers.                                |
| number_of_attention_heads       | 4                | Number of attention heads in transformer.                    |
| use_key_relative_attention      | False            | If 'True' use key relative embeddings in attention.          |
| use_value_relative_attention    | False            | If 'True' use value relative embeddings in attention.        |
| max_relative_position           | None             | Maximum position for relative embeddings.                    |
| batch_size                      | [8, 32]          | Initial and final value for batch sizes.                     |
|                                 |                  | Batch size will be linearly increased for each epoch.        |
| batch_strategy                  | "balanced"       | Strategy used when creating batches.                         |
|                                 |                  | Can be either 'sequence' or 'balanced'.                      |
| epochs                          | 1                | Number of epochs to train.                                   |
| random_seed                     | None             | Set random seed to any 'int' to get reproducible results.    |
| embedding_dimension             | 20               | Dimension size of embedding vectors.                         |
| number_of_negative_examples     | 20               | The number of incorrect labels. The algorithm will minimize  |
|                                 |                  | their similarity to the user input during training.          |
| similarity_type                 | "auto"           | Type of similarity measure to use, either 'auto' or 'cosine' |
|                                 |                  | or 'inner'.                                                  |
| loss_type                       | "softmax"        | The type of the loss function, either 'softmax' or 'margin'. |
| ranking_length                  | 10               | Number of top actions to normalize scores for loss type      |
|                                 |                  | 'softmax'. Set to 0 to turn off normalization.               |
| maximum_positive_similarity     | 0.8              | Indicates how similar the algorithm should try to make       |
|                                 |                  | embedding vectors for correct labels.                        |
|                                 |                  | Should be 0.0 < ... < 1.0 for 'cosine' similarity type.      |
| maximum_negative_similarity     | -0.2             | Maximum negative similarity for incorrect labels.            |
|                                 |                  | Should be -1.0 < ... < 1.0 for 'cosine' similarity type.     |
| use_maximum_negative_similarity | True             | If 'True' the algorithm only minimizes maximum similarity    |
|                                 |                  | over incorrect intent labels, used only if 'loss_type' is    |
|                                 |                  | set to 'margin'.                                             |
| scale_loss                      | True             | Scale loss inverse proportionally to confidence of correct   |
|                                 |                  | prediction.                                                  |
| regularization_constant         | 0.001            | The scale of regularization.                                 |
| negative_margin_scale           | 0.8              | The scale of how important it is to minimize the maximum     |
|                                 |                  | similarity between embeddings of different labels.           |
| drop_rate_dialogue              | 0.1              | Dropout rate for embedding layers of dialogue features.      |
|                                 |                  | Value should be between 0 and 1.                             |
|                                 |                  | The higher the value the higher the regularization effect.   |
| drop_rate_label                 | 0.0              | Dropout rate for embedding layers of label features.         |
|                                 |                  | Value should be between 0 and 1.                             |
|                                 |                  | The higher the value the higher the regularization effect.   |
| drop_rate_attention             | 0.0              | Dropout rate for attention. Value should be between 0 and 1. |
|                                 |                  | The higher the value the higher the regularization effect.   |
| weight_sparsity                 | 0.8              | Sparsity of the weights in dense layers.                     |
|                                 |                  | Value should be between 0 and 1.                             |
| evaluate_every_number_of_epochs | 20               | How often to calculate validation accuracy.                  |
|                                 |                  | Set to '-1' to evaluate just once at the end of training.    |
| evaluate_on_number_of_examples  | 0                | How many examples to use for hold out validation set.        |
|                                 |                  | Large values may hurt performance, e.g. model accuracy.      |
| tensorboard_log_directory       | None             | If you want to use tensorboard to visualize training         |
|                                 |                  | metrics, set this option to a valid output directory. You    |
|                                 |                  | can view the training metrics after training in tensorboard  |
|                                 |                  | via 'tensorboard --logdir <path-to-given-directory>'.        |
| tensorboard_log_level           | "epoch"          | Define when training metrics for tensorboard should be       |
|                                 |                  | logged. Either after every epoch ('epoch') or for every      |
|                                 |                  | training step ('minibatch').                                 |


If evaluate_on_number_of_examples is non zero, random examples will be picked by stratified split and used as hold out validation set, so they will be excluded from training data. We suggest to set it to zero if data set contains a lot of unique examples of dialogue turns.


For cosine similarity maximum_positive_similarity and maximum_negative_similarity should be between -1 and 1.


There is an option to use linearly increasing batch size. The idea comes from https://arxiv.org/abs/1711.00489. In order to do it pass a list to batch_size, e.g. "batch_size": [8, 32] (default behavior). If constant batch_size is required, pass an int, e.g. "batch_size": 8.


The parameter maximum_negative_similarity is set to a negative value to mimic the original starspace algorithm in the case maximum_negative_similarity = maximum_positive_similarity and use_maximum_negative_similarity = False. See starspace paper for details.

Mapping Policy

The MappingPolicy can be used to directly map intents to actions. The mappings are assigned by giving an intent the property triggers, e.g.:

 - ask_is_bot:
     triggers: action_is_bot

An intent can only be mapped to at most one action. The bot will run the mapped action once it receives a message of the triggering intent. Afterwards, it will listen for the next message. With the next user message, normal prediction will resume.

If you do not want your intent-action mapping to affect the dialogue history, the mapped action must return a UserUtteranceReverted() event. This will delete the user’s latest message, along with any events that happened after it, from the dialogue history. This means you should not include the intent-action interaction in your stories.

For example, if a user asks “Are you a bot?” off-topic in the middle of the flow, you probably want to answer without that interaction affecting the next action prediction. A triggered custom action can do anything, but here’s a simple example that dispatches a bot utterance and then reverts the interaction:

class ActionIsBot(Action):
"""Revertible mapped action for utter_is_bot"""

def name(self):
    return "action_is_bot"

def run(self, dispatcher, tracker, domain):
    return [UserUtteranceReverted()]


If you use the MappingPolicy to predict bot utterance actions directly (e.g. triggers: utter_{}), these interactions must go in your stories, as in this case there is no UserUtteranceReverted() and the intent and the mapped response action will appear in the dialogue history.


The MappingPolicy is also responsible for executing the default actions action_back and action_restart in response to /back and /restart. If it is not included in your policy example these intents will not work.

Memoization Policy

The MemoizationPolicy just memorizes the conversations in your training data. It predicts the next action with confidence 1.0 if this exact conversation exists in the training data, otherwise it predicts None with confidence 0.0.

Augmented Memoization Policy

The AugmentedMemoizationPolicy remembers examples from training stories for up to max_history turns, just like the MemoizationPolicy. Additionally, it has a forgetting mechanism that will forget a certain amount of steps in the conversation history and try to find a match in your stories with the reduced history. It predicts the next action with confidence 1.0 if a match is found, otherwise it predicts None with confidence 0.0.


If you have dialogues where some slots that are set during prediction time might not be set in training stories (e.g. in training stories starting with a reminder not all previous slots are set), make sure to add the relevant stories without slots to your training data as well.

Fallback Policy

The FallbackPolicy invokes a fallback action if at least one of the following occurs:

  1. The intent recognition has a confidence below nlu_threshold.

  2. The highest ranked intent differs in confidence with the second highest ranked intent by less than ambiguity_threshold.

  3. None of the dialogue policies predict an action with confidence higher than core_threshold.


The thresholds and fallback action can be adjusted in the policy configuration file as parameters of the FallbackPolicy:

  - name: "FallbackPolicy"
    nlu_threshold: 0.3
    ambiguity_threshold: 0.1
    core_threshold: 0.3
    fallback_action_name: 'action_default_fallback'


Min confidence needed to accept an NLU prediction


Min amount by which the confidence of the top intent must exceed that of the second highest ranked intent.


Min confidence needed to accept an action prediction from Rasa Core


Name of the fallback action to be called if the confidence of intent or action is below the respective threshold

You can also configure the FallbackPolicy in your python code:

from rasa.core.policies.fallback import FallbackPolicy
from rasa.core.policies.keras_policy import KerasPolicy
from rasa.core.agent import Agent

fallback = FallbackPolicy(fallback_action_name="action_default_fallback",

agent = Agent("domain.yml", policies=[KerasPolicy(), fallback])


You can include either the FallbackPolicy or the TwoStageFallbackPolicy in your configuration, but not both.

Two-Stage Fallback Policy

The TwoStageFallbackPolicy handles low NLU confidence in multiple stages by trying to disambiguate the user input.

  • If an NLU prediction has a low confidence score or is not significantly higher than the second highest ranked prediction, the user is asked to affirm the classification of the intent.

    • If they affirm, the story continues as if the intent was classified with high confidence from the beginning.

    • If they deny, the user is asked to rephrase their message.

  • Rephrasing

    • If the classification of the rephrased intent was confident, the story continues as if the user had this intent from the beginning.

    • If the rephrased intent was not classified with high confidence, the user is asked to affirm the classified intent.

  • Second affirmation

    • If the user affirms the intent, the story continues as if the user had this intent from the beginning.

    • If the user denies, the original intent is classified as the specified deny_suggestion_intent_name, and an ultimate fallback action is triggered (e.g. a handoff to a human).


To use the TwoStageFallbackPolicy, include the following in your policy configuration.

  - name: TwoStageFallbackPolicy
    nlu_threshold: 0.3
    ambiguity_threshold: 0.1
    core_threshold: 0.3
    fallback_core_action_name: "action_default_fallback"
    fallback_nlu_action_name: "action_default_fallback"
    deny_suggestion_intent_name: "out_of_scope"


Min confidence needed to accept an NLU prediction


Min amount by which the confidence of the top intent must exceed that of the second highest ranked intent.


Min confidence needed to accept an action prediction from Rasa Core


Name of the fallback action to be called if the confidence of Rasa Core action prediction is below the core_threshold. This action is to propose the recognized intents


Name of the fallback action to be called if the confidence of Rasa NLU intent classification is below the nlu_threshold. This action is called when the user denies the second time


The name of the intent which is used to detect that the user denies the suggested intents


You can include either the FallbackPolicy or the TwoStageFallbackPolicy in your configuration, but not both.

Form Policy

The FormPolicy is an extension of the MemoizationPolicy which handles the filling of forms. Once a FormAction is called, the FormPolicy will continually predict the FormAction until all required slots in the form are filled. For more information, see Forms.