• Nenhum resultado encontrado

Heuristics’ Accuracy and Adjustments

No documento Domain-independent Planning (páginas 64-68)

5. Experiments

In this chapter, we present our main experimental results. We overview the data we’ve collected, compare performance of the learned heuristic with a baseline and analyze the results. We also present various statistics on results of the training, as well as performance of heuristic-adjustments.

We performed various statistical analyses and ML experiments using data generated from state-spaces of planning problems. Lets first take a look at the data we’ve collected.

5.1 Data

Our main goal is to perform the heuristic learning as described in chapter 3 so the data were collected to serve this purpose. We used the technique described in section 3.1.1 to obtain a set of states {sPji}. We then used our ad-hoc solvers to calculate estimates of h(sPji) for these states.

We perform the experiments on two domains: Zenotravel and Blocks. See attachments A.1.1 and A.1.2 for details about the domains. We used all 20 problems available for Zenotravel and the first 27 problems fromBlocks.

Figure 5.1: Count of samples from individual problems in blocks domain.

The experiments were designed as follows: For every state in our data set, we compute value of the two heuristics. We already have the h values so this gives us a set of tuples{⟨si, hGC(si), hF F(si), h(si)⟩}i.

All the data are included in the supplementary materials.

5.2.1 Heuristics’ accuracy

Figure 5.7 shows correlation between heuristic estimates and real goal-distances for both heuristic and both domain.

The blue area represents the data as a scatter plot. In each subgraphs, the X-axis is goal distance, Y-axis is the heuristic estimate and the red line represents a perfect match. The closer to the red line the more informed the heuristic is.

Data points above the red line indicate overestimation. As expected,hF F is much more informed but not admissible.

Figures 5.8 and 5.9 provide more detailed view on hF F. For every x ∈ N, we find all states si such that hF F(si) = x and calculate minimum, average and maximum of their true goal-distances. Formally, let Tx = {si | hF F(si) = x}, minx = min{h(si) | siTx}, maxx = max{h(si) | siTx} and avgx = avg{h(si)| siTx}. For each x on the X-axis, the vertical line represents the interval [minx, maxx] and the blue dot showsavgx. Statistics include all collected data fromBlocks (figure 5.8) and all collected data from Zenotravel (figure 5.9).

5.2.2 Performance of adjusted heuristics

Based on the data, we’ve constructed the three adjusted heuristicshmin,havg and hshif tas defined in section 2.3 and tested them on a set of available problems. For every combination of h ∈ {hGC, hF F} and domain ∈ {blocks,zenotravel} we run an A search with heuristics h, hmin, havg and hshif t on all problems in domain.

Figure 5.2: Distribution ofh by problem in blocks domain. Red columns show average of h, blue ones show maximum. Minimum is always 0.

Figure 5.3: Count of samples from individual problems in zenotravel domain.

Time was capped at 30 minutes per problem and there was a memory limit of 5 million nodes per problem (sum of sizes of the open list and the closed list).

When solving a problem P, we use only data from other problems - all from the domain exceptP - to construct the adjusted heuristics. This should resemble our use case scenario where we are interested in transferring the knowledge across problems.

Figure 5.10 shows results forblocks and figure 5.11 forzenotravel. We measure TotalNodes- sum of number of expanded nodes and the number of nodes present in the open list at the end of the search, SolutionCost which in this case is the number of actions in the plan, number of problems solved and search time. Blank spaces in SolutionCost mean that the problem was not solved within the time or memory limit. We’ve only included problems that were solved by at least one of the heuristics.

GoalCount is an admissible but very week heuristic hence the adjusted heuris- tics outperform it in all criteria except plan length on all problems inblocks. The heuristic is admissible, so themin adjustment and theshift adjustment will yield the same heuristic, i.e. ∀s :hminGC(s) =hshif tGC (s).

Figure 5.4: Distribution of h by problem in zenotravel domain. Red columns show average ofh, blue ones show maximum. Minimum is always 0.

Figure 5.5: Count of samples by their goal-distances. X-axis: goal distance, Y-axis: relative number of samples in blocks domain. (All problems combined.)

In blocks, the hGC heuristic systematically and unnecessarily underestimates most states. The heuristic counts the number of blocks that are misplaced. Mov- ing a block, however, requires two actions: lift and put-down hence

s :hGC(s) = kh(s)≥2k−1

We subtract 1 because one of the k blocks could already bylifted in the state.

When using the estimate 2k−1 instead ofk, the heuristic is much more informed and still admissible. This is exactly whathminGC and hshif tGC are doing.

Inzenotravel domain, thehGC heuristic counts the number of passengers that are not yet at their destinations. For every k there exists a state s such that hGC(s) =k =h(s). The state looks as follows: there arek passengers travelling to the same destination, they are boarded in the plane that is already located at their destination. All other passengers have already disembarked at their respective destinations. In this state, performing the disembark action k-times will lead to goal.

Due to this, the heuristic estimate is actually tight and can’t be improved without loosing admissibility. Hence in this casehminGC =hshif tGC =hGC.

havg on the other hand is more informed that the others, solves more problems

Figure 5.6: Count of samples by their goal-distances. X-axis: goal distance, Y- axis: relative number of samples inzenotravel domain. (All problems combined.) and requires less time and memory. It is not admissible hence it doesn’t guarantee optimality of solutions.

The FF heuristic is much more sophisticated and it is therefore unlikely that these simple adjustments will significantly improve it. hShif tF F seems to be working exactly likehF F which suggests that there are no obvious errors inhF F that could be repaired by shifting, or at least they don’t occur on either of the two domains.

None of the Avg and M in adjustments outperform hF F in all criteria. hM inF F is admissible unlike hF F hence can improve the solution quality and provides opti- mality guarantee. hAvgF F can sometimes be faster, especially in the blocks domain at the cost of increasing plan length.

5.2.3 Summary

The min and shift adjustments are able to automatically identify and repair the underestimation that occurs with hGC onblocks. havgGC outperformshGC in search time, memory consumption as well as number of problems solved at the cost of loosing optimality.

Improvements overhF F are negligible but we should point out that developing hF F took several years and many man-days of work of top-tier researchers while the adjusted heuristic can be constructed automatically within minutes, assuming that we already have the data.

The proposed modification are applicable to any existing heuristic and can be used to adjust the informedness vs. admissibility tradeoff of the heuristic.

No documento Domain-independent Planning (páginas 64-68)