Heuristics’ Accuracy and Adjustments - Domain-independent Planning

5. Experiments

In this chapter, we present our main experimental results. We overview the data we’ve collected, compare performance of the learned heuristic with a baseline and analyze the results. We also present various statistics on results of the training, as well as performance of heuristic-adjustments.

We performed various statistical analyses and ML experiments using data generated from state-spaces of planning problems. Lets first take a look at the data we’ve collected.

5.1 Data

Our main goal is to perform the heuristic learning as described in chapter 3 so the data were collected to serve this purpose. We used the technique described in section 3.1.1 to obtain a set of states {s^P_jⁱ}. We then used our ad-hoc solvers to calculate estimates of h^∗(s^P_jⁱ) for these states.

We perform the experiments on two domains: Zenotravel and Blocks. See attachments A.1.1 and A.1.2 for details about the domains. We used all 20 problems available for Zenotravel and the first 27 problems fromBlocks.

Figure 5.1: Count of samples from individual problems in blocks domain.

The experiments were designed as follows: For every state in our data set, we compute value of the two heuristics. We already have the h^∗ values so this gives us a set of tuples{⟨s_i, h_GC(s_i), h_{F F}(s_i), h^∗(s_i)⟩}_i.

All the data are included in the supplementary materials.

5.2.1 Heuristics’ accuracy

Figure 5.7 shows correlation between heuristic estimates and real goal-distances for both heuristic and both domain.

The blue area represents the data as a scatter plot. In each subgraphs, the X-axis is goal distance, Y-axis is the heuristic estimate and the red line represents a perfect match. The closer to the red line the more informed the heuristic is.

Data points above the red line indicate overestimation. As expected,hF F is much more informed but not admissible.

Figures 5.8 and 5.9 provide more detailed view on h_{F F}. For every x ∈ N, we find all states si such that hF F(si) = x and calculate minimum, average and maximum of their true goal-distances. Formally, let T_x = {s_i | h_{F F}(s_i) = x}, min^x = min{h^∗(s_i) | s_i ∈ T_x}, max^x = max{h^∗(s_i) | s_i ∈ T_x} and avg^x = avg{h^∗(si)| si ∈ Tx}. For each x on the X-axis, the vertical line represents the interval [min^x, max^x] and the blue dot showsavg^x. Statistics include all collected data fromBlocks (figure 5.8) and all collected data from Zenotravel (figure 5.9).

5.2.2 Performance of adjusted heuristics

Based on the data, we’ve constructed the three adjusted heuristicsh^min,h^avg and h^{shif t}as defined in section 2.3 and tested them on a set of available problems. For every combination of h ∈ {hGC, hF F} and domain ∈ {blocks,zenotravel} we run an A^∗ search with heuristics h, h^min, h^avg and h^{shif t} on all problems in domain.

Figure 5.2: Distribution ofh^∗ by problem in blocks domain. Red columns show average of h^∗, blue ones show maximum. Minimum is always 0.

Figure 5.3: Count of samples from individual problems in zenotravel domain.

Time was capped at 30 minutes per problem and there was a memory limit of 5 million nodes per problem (sum of sizes of the open list and the closed list).

When solving a problem P, we use only data from other problems - all from the domain exceptP - to construct the adjusted heuristics. This should resemble our use case scenario where we are interested in transferring the knowledge across problems.

Figure 5.10 shows results forblocks and figure 5.11 forzenotravel. We measure TotalNodes- sum of number of expanded nodes and the number of nodes present in the open list at the end of the search, SolutionCost which in this case is the number of actions in the plan, number of problems solved and search time. Blank spaces in SolutionCost mean that the problem was not solved within the time or memory limit. We’ve only included problems that were solved by at least one of the heuristics.

GoalCount is an admissible but very week heuristic hence the adjusted heuristics outperform it in all criteria except plan length on all problems inblocks. The heuristic is admissible, so themin adjustment and theshift adjustment will yield the same heuristic, i.e. ∀s :h^min_GC(s) =h^{shif t}_GC (s).

Figure 5.4: Distribution of h^∗ by problem in zenotravel domain. Red columns show average ofh^∗, blue ones show maximum. Minimum is always 0.

Figure 5.5: Count of samples by their goal-distances. X-axis: goal distance, Y-axis: relative number of samples in blocks domain. (All problems combined.)

In blocks, the h_GC heuristic systematically and unnecessarily underestimates most states. The heuristic counts the number of blocks that are misplaced. Mov- ing a block, however, requires two actions: lift and put-down hence

∀s :h_GC(s) = k⇒h^∗(s)≥2k−1

We subtract 1 because one of the k blocks could already bylifted in the state.

When using the estimate 2k−1 instead ofk, the heuristic is much more informed and still admissible. This is exactly whath^min_GC and h^{shif t}_GC are doing.

Inzenotravel domain, theh_GC heuristic counts the number of passengers that are not yet at their destinations. For every k there exists a state s such that h_GC(s) =k =h^∗(s). The state looks as follows: there arek passengers travelling to the same destination, they are boarded in the plane that is already located at their destination. All other passengers have already disembarked at their respective destinations. In this state, performing the disembark action k-times will lead to goal.

Due to this, the heuristic estimate is actually tight and can’t be improved without loosing admissibility. Hence in this caseh^min_GC =h^{shif t}_GC =h_GC.

h^avg on the other hand is more informed that the others, solves more problems

Figure 5.6: Count of samples by their goal-distances. X-axis: goal distance, Y- axis: relative number of samples inzenotravel domain. (All problems combined.) and requires less time and memory. It is not admissible hence it doesn’t guarantee optimality of solutions.

The FF heuristic is much more sophisticated and it is therefore unlikely that these simple adjustments will significantly improve it. h^{Shif t}_{F F} seems to be working exactly likeh_{F F} which suggests that there are no obvious errors inh_{F F} that could be repaired by shifting, or at least they don’t occur on either of the two domains.

None of the Avg and M in adjustments outperform h_{F F} in all criteria. h^{M in}_{F F} is admissible unlike h_{F F} hence can improve the solution quality and provides optimality guarantee. h^Avg_{F F} can sometimes be faster, especially in the blocks domain at the cost of increasing plan length.

5.2.3 Summary

The min and shift adjustments are able to automatically identify and repair the underestimation that occurs with h_GC onblocks. h^avg_GC outperformsh_GC in search time, memory consumption as well as number of problems solved at the cost of loosing optimality.

Improvements overh_{F F} are negligible but we should point out that developing h_{F F} took several years and many man-days of work of top-tier researchers while the adjusted heuristic can be constructed automatically within minutes, assuming that we already have the data.

The proposed modification are applicable to any existing heuristic and can be used to adjust the informedness vs. admissibility tradeoff of the heuristic.

No documento Domain-independent Planning (páginas 64-68)