Overview

🔥News: A new implementation of this package with better documentations can be found in ULTRA.

This is an implementation of the inverse propensity weighting algorithm (IPW_rank) and the Dual Learning Algorithm (DLA) for unbiased learning to rank <1>. Please cite the following paper if you plan to use it for your project：

Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, W. Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proceedings of SIGIR '18

The dual learning algorithm is an online learning framework that directly learns unbiased ranking models from click data. Here we implement both the ranking model and the propensity estimator with multi-layer feed-forward neural network. Please refer to the paper for more details.

Requirements:

1. To run DLA in ./DLA/ and the python scripts in ./scripts/, python 2.7+ and Tensorflow v1.4+ are needed.

Data Preparation for the Initial Ranker

For simplicity, here we show the instruction of data preparation for SVMrank (the intial ranker) on Yahoo letor (the simulation experiment in the paper) and attached the corresponding scripts in /scripts/. You can extend the scripts or write your own code to prepare the data for other letor datasets and learning algorithms.

1. Download Yahoo Letor dataset 1 from (http://webscope.sandbox.yahoo.co).

2. Decompressed the files and put the data into a single directory. The directory should be like the follows:
    <letor_data_path>: # the directory of letor data
        /set1.train.txt # the data used for training the initial ranker
        /set1.valid.txt # the data used for validation
        /set1.test.txt # the data used for testing

3. Randomly sample 1% of data (lines) from set1.train.txt and replace it. We only use 10% of the training data to train the initial ranker.

4. Train SVMrank with the data and output the model. For detailed training instructions, please refer to https://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html.

5. Run the SVMrank model on the train/valid/test data and output the corresponding scores. Then you should have a directory with the output scores like:
    <inital_rank_path>: # the directory for SVMrank outputs
        /train.predict # the SVMrank output for documents in the training data
        /valid.predict # the SVMrank output for documents in the validation data
        /test.predict # the SVMrank output for documents in the test data

6. Generate the rank lists in the initial retrieval process using the SVMrank outputs and prepare the data for the models:
    python ./scripts/Prepare_yahoo_letor_data_set1.py <letor_data_path> <inital_rank_path> <input_data_path> <rank_cut>
        <letor_data_path>: the directory of letor data. 
        <inital_rank_path>: the directory for SVMrank outputs. 
        <input_data_path>: the directory for the inputs of DLA and IPW_rank. 
        <rank_cut>: the number of top documents we keep for each query. It is 10 for the paper.

After the data preparation, we will have the following files in <input_data_path>:
    <input_data_path>/settings.json:
        The settings we used to prepare the data

    <input_data_path>/train/:
        1. <input_data_path>/train/train.feature:
            The feature data.

            <doc_id> <feature_id>:<feature_val> <feature_id>:<feature_val> ... <feature_id>:<feature_val>

                <doc_id> = the identifier of the document. For example, "test_2_5" means the 5th document for the query with identifier "2" in the original test set of the Yahoo letor data.
                <feature_id> = an integer identifier for each feature from 0 to 699
                <feature_val> = the real feature value

            Each line represents a different document. 

        2. <input_data_path>/train/train.init_list:
            The initial rank lists for each query:

            <query_id> <feature_line_number_for_the_1st_doc> <feature_line_number_for_the_2nd_doc> ...  <feature_line_number_for_the_Nth_doc>

                <query_id> = the integer identifier for each query.
                <feature_line_number_for_the_Nth_doc> = the line number (start from 0) of the feature file (train.feature) in which the features of the Nth document for this query is stored.

            Each line represents a rank list generated by the SVMrank for the query. Documents are represented with their feature line number in the feature file, and are sorted by the decending order based on the ranking scores produced by SVMrank.

        3. <input_data_path>/train/train.gold_list:
            The golden rank lists for each query:

            <query_id> <doc_idx_in_initial_list> <doc_idx_in_initial_list> ...  <doc_idx_in_initial_list>

                <query_id> = the integer identifier for each query.
                <doc_idx_in_initial_list> = the index (start from 0) of the document in the initial rank list (stored in train.init_list) for the query. For example, <doc_idx_in_initial_list> = 1 means the 2rd document in the initial list of the query 

            Each line represents a golden rank list generated by reranking the initial rank list according document annotations for the query. Documents are represented with their index in the initial list of the corresponding query in train.init_list, and are sorted by the decending order based on human relevance annotations.

        4. <input_data_path>/train/train.weights:
            The annotated relevance value for documents in the initial list of each query.

            <query_id> <relevance_value_for_the_1st_doc> <relevance_value_for_the_2nd_doc> ...  <relevance_value_for_the_Nth_doc>

                <query_id> = the integer identifier for each query.
                <relevance_value__for_the_Nth_doc> = the human annotated relevance value of the Nth documents in the initial list of the corresponding query. For 5-level relevance judgments, it should be one of the value from {0,1,2,3,4}.

        5. <input_data_path>/train/train.initial_scores:
            The ranking scores produced by SVMrank for documents in the initial list of each query.

            <query_id> <ranking_scores_for_the_1st_doc> <ranking_scores_for_the_2nd_doc> ...  <ranking_scores_for_the_Nth_doc>

                <query_id> = the integer identifier for each query.
                <ranking_scores_for_the_Nth_doc> = the ranking scores produced by SVMrank for the Nth documents in the initial list of the corresponding query.

        6. <input_data_path>/train/train.qrels:
            The relevance judgement file used for evaluation.

            <query_id> 0 <doc_id> <relevance_value>

                <query_id> = the integer identifier for each query.
                <doc_id> = the identifier of the document. For example, "test_2_5" means the 5th document for the query with identifier "2" in the original test set of the Yahoo letor data.
                <relevance_value> = the human annotated relevance value for the corresponding query-document pair. For 5-level relevance judgments, it should be one of the value from {0,1,2,3,4}.

        7. <input_data_path>/train/train.trec.gold_list:
            The golden rank lists in TREC format.

            <query_id> Q0 <doc_id> <rank> <relevance_value> Gold

                <query_id> = the integer identifier for each query.
                <doc_id> = the identifier of the document. For example, "test_2_5" means the 5th document for the query with identifier "2" in the original test set of the Yahoo letor data.
                <rank> = the rank (start from 1) of the document in the ranked list of the query.
                <relevance_value> = the human annotated relevance value for the corresponding query-document pair. For 5-level relevance judgments, it should be one of the value from {0,1,2,3,4}.

        8. <input_data_path>/train/train.trec.init_list:
            The initial rank lists in TREC format.

            <query_id> Q0 <doc_id> <rank> <ranking_scores> RankSVM

                <query_id> = the integer identifier for each query.
                <doc_id> = the identifier of the document. For example, "test_2_5" means the 5th document for the query with identifier "2" in the original test set of the Yahoo letor data.
                <rank> = the rank (start from 1) of the document in the ranked list of the query.
                <ranking_scores> = the ranking scores produced by SVMrank for the corresponding query-document pair. 

        * Please notice that the query sequence in train.init_list, train.gold_list, train.weights and train.initial_scores must be the same.

    <input_data_path>/valid/:
        Similar to <input_data_path>/train/ except that this directory is built for the validation data.

    <input_data_path>/test/:
        Similar to <input_data_path>/train/ except that this directory is built for the test data.

Data Preparation for DLA/IPW_rank

1. Create a click model for the generation of simulated clicks:
    python ./Unbiased_LTR/click_model.py <click_model_name> <neg_click_prob> <pos_click_prob> <rel_grad_num> <eta>
        <click_model_name> = the name of the click model, it could be "position_biased_model" or "user_browsing_model".
        <neg_click_prob> = the click probability (from 0 to 1) of an irrelevant result after a user has examined it.
        <pos_click_prob> = the click probability (from 0 to 1) of an relevant result after a user has examined it. It must be greater or equal to <neg_click_prob>.
        <rel_grad_num> = the highest relevance level (an interger greater than 0). For example, the highest relevance level for a 5-level relevance judgment is 4. 
        <eta> = A hyper-parameter that controls the severity of presentation biases. Please check Section 5.1 in the paper for more information.

    After running the program, a new json file for the created click model will be stored in the current directory. An example created click model json file is shown in ./Example_files/ClickModel/pbm_0.1_1.0_4_1.0.json (<click_model_name>="position_biased_model", <neg_click_prob>=0.1, <pos_click_prob>=1.0, <rel_grad_num>=4, <eta>=1.0). 

2. For IPW_rank, estimate the position propensity based on simluated clicks and create a propensity estimator:
    python ./Unbiased_LTR/propensity_estimator.py <click_model_json_file> <input_data_path> <estimator_json_file>
        <click_model_json_file> = The path for the json file of the click model (the output of previous step).
        <input_data_path> = The path of the processed model input data.
        <estimator_json_file> = The file path of the program outputs. It is a json file that stores the propensity parameters estimated in the randomization experiment.

    After running the program, a new json file for the randomized propensity estimator will be stored in <estimator_json_file>. An example propensity estimator json file is shown in ./Example_files/PropensityEstimator/randomized_pbm_0.1_1.0_4_1.0.json

Training/Testing DLA

1. python ./Unbiased_LTR/DLA/main.py --<parameter_name> <parameter_value> --<parameter_name> <parameter_value> … 

    1. batch_size: Batch size used in training. Default 256
    2. train_list_cutoff: The number of documents to consider in each list during training. Default 10.
    3. max_train_iteration: Limit on the iterations of training (0: no limit).
    4. steps_per_checkpoint: How many training steps to do per checkpoint. Default 200.
    5. use_non_clicked_data: Set to True for estimating propensity weights for non-click data. Default false.
    6. decode: Set to “False" for training on the training data and “True" for testing on test data. Default “False".
    7. decode_train: Set to "True" for testing on the training data. Default "False".
    8. data_dir: The data directory, which should be the <input_data_path>.
    9. click_model_json: The josn file for the click model used to generate clicks (e.g. ./Example_files/ClickModel/pbm_0.1_1.0_4_1.0.json)
    10. train_dir: Model directory & output directory.
    11. test_dir: The directory for output test results.
    12. hparams: Hyper-parameters for models (a string written in the Tensorflow required format), which include:

        1. learning_rate:  The learning rate in training. Default 0.5.
        2. hidden_layer_sizes: The number of neurons in each layer of a feed-forward neural network. 
        3. max_gradient_norm: Clip gradients to this norm. Default 5.0
        4. loss_func: The loss function for DLA. It could be 
            "click_weighted_softmax_cross_entropy": The IPW based softmax loss function.
            "click_weighted_log_loss": The IPW based sigmoid loss function.
            "softmax_loss": The softmax loss without inverse propensity weighting.
        5. logits_to_prob: The function used to convert logits to probability distributions. It could be
            "softmax": The softmax function.
            "sigmoid": The sigmoid function.
        6. l2_loss: The lambda for L2 regularization. Default 0.0
        7. ranker_learning_rate: The learning rate for ranker (-1 means same with learning_rate). Default -1.
        8. ranker_loss_weigh: Set the weight of unbiased ranking loss. Default 1.0.
        9. grad_strategy: Select gradient strategy. It could be:
            "ada": Adagrad.
            "sgd": Stochastic gradient descent.
        10. relevance_category_num: Select the number of relevance category. Default 5.
        11. use_previous_rel_prob: Set to True for using ranking features in denoise model. Default false.
        12. use_previous_clicks: Set to True for using ranking features in denoise model. Default false.
        13. split_gradients_for_denoise: Set to True for splitting gradient computation in denoise model. Default true.

2. Evaluation

    1. After training with "--decode False”, generate test rank lists with "--decode True”.
    2. TREC format rank lists for test data will be stored in <train_dir> with name “test.ranklist”
    3. Evaluate test rank lists with ground truth <input_data_path>/test/test.qrels using trec_eval or galago eval tool.

Training/Testing IPW_rank

1. python ./Unbiased_LTR/IPW_LTR/main.py --<parameter_name> <parameter_value> --<parameter_name> <parameter_value> … 

    1. batch_size: Batch size used in training. Default 256
    2. train_list_cutoff: The number of documents to consider in each list during training. Default 10.
    3. max_train_iteration: Limit on the iterations of training (0: no limit).
    4. steps_per_checkpoint: How many training steps to do per checkpoint. Default 200.
    5. use_non_clicked_data: Set to True for estimating propensity weights for non-click data. Default false.
    6. decode: Set to “False" for training on the training data and “True" for testing on test data. Default “False".
    7. decode_train: Set to "True" for testing on the training data. Default "False".
    8. data_dir: The data directory, which should be the <input_data_path>.
    9. click_model_json: The josn file for the click model used to generate clicks (e.g. ./Example_files/ClickModel/pbm_0.1_1.0_4_1.0.json). 
    10. train_dir: Model directory & output directory.
    11. test_dir: The directory for output test results.
    12. estimator_json: The josn file for the propensity estimator used to train unbiased models (e.g. ./Example_files/PropensityEstimator/randomized_pbm_0.1_1.0_4_1.0.json). 
    13. hparams: Hyper-parameters for models (a string written in the Tensorflow required format), which include:

        1. learning_rate:  The learning rate in training. Default 0.5.
        2. hidden_layer_sizes: The number of neurons in each layer of a feed-forward neural network. 
        3. max_gradient_norm: Clip gradients to this norm. Default 5.0
        4. loss_func: The loss function for DLA. It could be 
            "click_weighted_softmax_cross_entropy": The IPW based softmax loss function.
            "click_weighted_log_loss": The IPW based sigmoid loss function.
            "softmax_loss": The softmax loss without inverse propensity weighting.
        5. l2_loss: The lambda for L2 regularization. Default 0.0

2. Evaluation

    1. After training with "--decode False”, generate test rank lists with "--decode True”.
    2. TREC format rank lists for test data will be stored in <train_dir> with name “test.ranklist”
    3. Evaluate test rank lists with ground truth <input_data_path>/test/test.qrels using trec_eval or galago eval tool.

Example Parameter DLA/IPW_rank Settings

learning_rate --> 0.05
steps_per_checkpoint --> 500
max_train_iteration --> 10000
loss_func --> 'click_weighted_softmax_cross_entropy'

Reference:

<1> Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, W. Bruce Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In Proceedings of SIGIR '18