Solve the constrained assignement problem.

Parameters:

Parameters:
The constrained assignment problem is a very classical problem that occurs very often in direct marketing. It’s also very common in retail. The typical usage is the following:
The classical methodology used to optimally assign coupons to your customers is:
This procedure assigns a different set of products (or “coupons”) to each customer. Potentially, there are as many different folders as there are customers (To print these folders you need a “digital printer” because this is the only technology that allows you to print a different folder for each customer).
There exist two types of assignment problems:
Mono-product assignment problems: You need to select ONE product per customer, taking into account the constraints.
These type of problems can easily be solved with any number of highlighted products and any number of customers: For example:
For each different customer, propose 1 product amongst 9 different products.
The Mono-product solver uses the concept of "meta-customer". A "meta-customer" is a mini-segment of customers computed using Stardust. Typically, we’ll have 300 to 1500 differents segments. This type of assignment problem is easily solved using any LP solver (ETL has a very high-quality multi-threaded LP solver).
Multi-product assignment problems: You need to select n>1 products per customer, taking into account the constraints (n is the number of coupons inside a folder).
These type of problems are extremely difficult to solve because they belong to the class of problem known as "NP-hard" problems. For example:
For each different customer, propose 3 products amongst 9 different products.
For each different customer, propose 8 products amongst 450 different products.
Because these problems are “NP-hard”, it’s not possible to solve them exactly. The
AssignmentSolver Action uses advanced meta-heuristics to find a “good-enough” solution in a reasonable amount of time.
About “NP-Hard” problems
Typically, to solve a “normal” numerical problem (i.e. not a NP-hard problem), a good idea is to “split” the problem into simpler problems, then solve each simpler problem separately and thereafter combine all these intermediate results into the “final” solution.This type of approach is not possible with “NP-Hard” problems: They are not decomposable. The only way to exactly solve “NP-Hard” problems is to enumerate all possible solutions and to give, as the “final” solution, the best admissible solution found during the enumeration.
Let’s now evaluate how much time is required to enumerate all possible
solutions to the multi-product assignment problem. Let’s assume that, inside our
assignment problem, there are c customers and p products. One solution of the
assignment problem is a matrix X. Each element X ij of the solution-matrix X is,
either:
“X ij =1”: the customer i received a coupon about product j.
“X ij =0”: the customer i did not receive a coupon about product j.
We need to enumerate all the different solution-matrices X.
There are 2 c x p different solution-matrices X.
For a normal-size problem, we have: c=1 million customers, p=100 products,
the number of different solutions: 2 100 million = a very very large number.
Even on the best possible hardware, it will take several times the age of the universe to enumerate all the solutions.
Since it’s not possible to solve exactly the multi-product assignement problem (because it’s not possible to enumerate all different solutions), we’ll only try to find a “good enough” solution: We’ll enumerate only a (very) limited number of good “candidate solutions”. These “candidate solutions” are generated using a meta-heuristic named “tabu-search”. The solution that is returned by the AssignmentSolver Action is the best admissible solution found during this “limited” enumeration.
“Admissible solution” means a solution that respects the constraints (i.e. The constraints are: We have some limitations on the quantity of each coupon/product).
“Best solution” means a solution that has the highest ROI/the highest ER.
The quality of the returned solution mainly depends on the running time: If you run the “tabu-search” procedure for a very-long time, you’ll be able to test a large number of different “candidate solutions” and you’ll be more likely to find a very good solution (To summarize: the longer you compute, the better the “returned” solution).
The different parameters of the AssignmentSolver Action control are:
The solution returned by the AssignmentSolver Action is a table that contains the primary keys of each customers and the solution-matrix X. Each element X ij of the solution-matrix X is, either:
“X ij =1”: the customer i received a coupon about product j.
“X ij =0”: the customer i did not receive a coupon about product j.
More precisely, inside the solution table, we have:
In mathematical terms, the best solution is the solution that cumulates the greatest purchase probabilities for all the assigned products to all the customers: It’s:
[TODO!!!]
With i=1,…,c : all the customers (there are c different customers in total)
j=1,…,p : all the “highlighted products of the week” (there are p different highlighted productsin total).
PurchaseProbability : this is typically computed using 100 predictive models (one model for each different highlighted product).
Furthermore, this solution must respect these 2 constraints:
Let’s now assume that you get a small margin each time a customer uses a coupon. The “expected profit” when a customer i purchases a product j is thus:
The table returned by the AssignmentSolver action is the solution X ij to the following optimization problem:
To use the AssignmentSolver action, you need to provide these 2 input tables:
Here is an example:
The AssignmentSolver action uses a “tabu-search” procedure to generate many good “candidates solutions”. At the end, we’ll return the best, admissible candidate solution (i.e.: The one with the highest ER=“Expected Return” that satisfies all the contraints). The “tabu-search” procedure
is an iterative procedure: it starts from a known candidate-solution and “changes” it slightly to create a new candidate solution. For example, this small “change” can be: Swapping 2 products between 2 different customers. The “tabu-search” procedure always tries to make a change that improves the “quality of the candidate solution”. To avoid entering inside the “infeasible space” (i.e. to avoid entering the space composed of solutions that do not satify the constraints), the “quality of the candidate solution” is defined as:
[TODO]
If the parameter is very small, the “tabu-search” procedure will allow “entering” the infeasible space.
If the parameter is very large, the “tabu-search” procedure will be “pushed” towards the feasible space.
If there are many violated contraints inside the current solution, the assignement solver automatically increases the parameter to “go back” towards the feasible space. Allowing the assignement solver to enter slightly inside the infeasible space is an important feature to avoid getting stuck “inside a local optimum” along the way toward the “best” solution.
It can happen that the next best candidate solution is a solution that we already tested previously. This is annoying because it can cause “infinite loops” inside the assignement solver (e.g. the solver always “circle around” testing always the same set of candidate solutions). To avoid such phenomenon, the solver keeps a list of “tabu” solutions (thus the name: “tabu-search”). A candidate solution is marked
as “tabu” when it has been already “explored” at a previous iteration of the solver. The solver is then forbidden to “move to” a “tabu” solution. The “tabu-list” of forbidden solutions prevents the solver to “circle around”. The tabu-list is also a very effective mechanism to “get out” of local optimum (to be able to find the “global” optimum).
The size of the “tabu-list” is limited using a FIFO methodology (FIFO=first in, first out). A large tabu-list allows you to be less sensitive to local optimums (and this is good) but it also increases significantly the running time.
The quality of the solution depends on the quality of the candidates generated by the tabu-search algorithm. To generate these candidates, at each iteration of the “tabu-search” procedure, we apply a small “change” to the current solution. These "incremental changes” must have several properties:
Unfortunatly the 2 properties hereabove are antagonist objectives. More precisely: you can have:
Here are the differents steps that are executed inside the multiproduct assignement solver to compute the (near) optimal assignement:
The “tabu-search” algorithms used during the steps 2&3 are carefully tuned C++ implementation. The extremely high speed of the C++ solver included inside ETL allows us to explore a larger number of different candidate solutions than any other implementation, leading to a better solution (more ROI) at the end.
The parameters that control the solver are:
As usual with all meta-heuristics, the more time we explore the search space, the better the quality of the final, returned solution. Thus, it’s important to avoid losing time exploring un-interesting regions of the search space.
Parameter P1: “Update rho each X iterations. X=?”.
During step 3.a), the parameter is dynamically adjusted every X iterations.
Parameter P2: “nIterDiversivication”
During step 3.b), we scramble the best know solution to obtain a new starting point for further optimization. The parameter P2 controls the amount of “scrambling” that is applied.
Parameter P3: “stoppingCriteriaNIterWithoutImprovements”
During step 3.a), we’ll run a slow “tabu-search” procedure that stops when the ExpectedReturn (ER) of the solution does not improve for P3 iterations.
Parameter P4: “highLevelIteration”
The parameter P4 is the number of iterations inside the “high-level” loop (that is composed of steps 3.a) and 3.b)). The objective of this “high-level” loop is to “test” different “starting points”.
If the parameter P4 is zero, then only the “fast solver” is used.
Proceed with caution when increasing P4: This increases a lot the running time of the
multiproduct assignement solver.
Parameter P5: “tabu-list size”
A large tabu-list allows you to be less sensitive to local optimums (and this is good) but it also increases significantly the running time.
Parameter P6: “rho start percentile”
The parameter P6 controls the initial value of the parameter inside the (slow) “tabu-search” procedure that runs in step 3.a).
Parameter P7: “Seed for random number generator”
The “tabu-search” procedure (in steps 2.a) and 3.a)) has a strong random component.
The “diversification” procedure (in step 3.b)) has an even greater random component.
The parameter P7 defines the “seed” used inside the pseudo-random number generator that is used inside the multiproduct assignement solver. For more information about pseudo- random number generator, see:
http://en.wikipedia.org/wiki/Pseudorandom_number_generator
Parameter P8: “Stop inner loop when a feasible solution is found and after N
iterations.N=?”
Ideally, the slow “tabu-search” procedure of step 3.a) uses the parameter P3 as the only stopping criteria. However, it can happen that the slow “tabu-search” procedure manages to continuously improve (i.e. at nearly each iteration) the ER of the “best found solution so far”. These improvements are totally negligible but they still prevent the stopping criteria based on parameter P3 to work properly. The parameter P8 is useful when such rare circumstance occurs. The parameter P8 places a hard upper bound on the maximum number of iterations performed inside the slow “tabu-search” procedure. The effects of a very small (or very large) value for the parameter P8 are the exact same effects as for the parameter P3.
The multiproduct assignement solver has also 2 buttons that are only active when the solver is running:
Large Assignment Problems
To solve large assignment problems, set to zero the parameter P4 “highLevelIteration”):
This will only run the “fast solver” (and completely skip the “In-depth solver”). The “fast solver” has also a very small RAM memory consumption (especially compared to the RAM memory consumption of the “In-depth solver”). This allows you to solve really large assignment problems with a computer that has a very limited RAM memory.
