Proceedings of CogSci89Structural Evaluation of Analogies: What Counts?Kenneth D. ForbusDedre GentnerQualitative Reasoning Group Psychology DepartmentBeckman Institute, University of IllinoisAbstract:Judgments of similarity and soundness are important aspects of human analogicalprocessing. This paper explores how these judgments can be modeled usingSME,a simulation ofGentner's structure-mapping theory. We focus on structural evaluation, explicating severalprinciples which psychologically plausible algorithms should follow. We introduce theSpecificityConjecture,which claims that naturalistic representations include a preponderance of appearanceand low-order information. We demonstrate via computational experiments that this conjectureaffects how structural evaluation should be performed, including the choice of normalizationtechnique and how the systematicity preference is implemented.1IntroductionJudging soundness and structural similarity are important aspects of human analogicalprocessing. While other criteria (such as factual correctness and relevance to current goals) arealso important, they cannot replace structural evaluation. For example, neither factualcorrectness or relevance are enough when an analogy is used to make an argument; the claimedconsequences must legitimately follow from the analogy or the argument will be rejected. Theimportance of structural evaluation is even clearer when one considers the use of analogy todiscover new ideas: the learner must have some means of judging the comparison withoutknowing in advance if its implications are correct or relevant.We have suggested that human structural evalution of analogies depends largely on the degreeto which the analogs share systematic relational structure (i.e., share systems of relations governedby common higher-order relations) [8]. There is psychological evidence supporting this position asa descriptive account [10].SME[5,6],our simulation of Gentner's structure-mapping theory [7,8],includes a structural evaluator which appears to match psychological data on analogical soundnessjudgments reasonably well [17]. In this paper we use a combination of theoretical argument andsensitivity analyses to probe more deeply into the issues surrounding structural evaluation.Section 2 begins with a brief overview ofSMEand outlines some constraints on psychologicallyplausible algorithms for structural evaluation. Section 3 summarizes psychological resultsconcerning analogical soundness, and shows how our prior simulation experiment provides aframework for sensitivity analyses. Section 4 proposes that the representations used in AI andcognitive simulation tend to be unrealistically sparse (theSpecificity Conjecture).The next twosections demonstrate how this conjecture constrains structural evaluation algorithms. Two designdimensions are considered. Section5compares alternate normalization strategies (i.e., howevidence is combined). Section6compares our original cascade-like technique for implementingthe systematicity preference(trickle-down)with another technique,order-scoring.We concludethat trickle-down with result normalization provides the best fit to human data. We close byconsidering the broader implications of the Specificity Conjecture.2 The Structure-Mapping Enginewas designed to provide anaccountablesimulation of Gentner's structure-mapping theory. Byaccountable, we mean that processing choices not explicitly constrained by the theory must beeasily changable, so that dependence on alternate choices can be explored. To achieveaccountability,SME'sinput includes two sets of rules which construct and evaluate local matches.By varying these rulesSMEcan be programmed to emulate all the comparisons ofstructure-mapping, as well other matchers consistent with its assumptions [4]. Here we use thisSMEForbus&Gentnerprogrammability to perform sensitivity analyses to rule out certain processing choices as beingunable to account for human data.Givenbaseandtargetdescriptions to match, SME produces a set ofGmaps,representing thepossible interpretations of the comparison. Each Gmap includes a set of correspondences betweenthe items (objects and propositions) in the base and target, the set ofcandidate inferencessanctioned by the match (i.e., knowledge about the base conjectured to hold in the target byvirtue of the correspondences), andastructural evaluation score(SES) indicating the \"quality\" ofthe match.SME begins by computing localmatch hypotheses (MH's)involving pairs of items from baseand target. The construction rules guide this processl. At this stage the match is incoherent, inthat the set of match hypotheses collectively can contain many-to-one mappings. Localconstraints, such as one-to-one mappings and structural consistency (see [6] for details) areenforced next. These constraints rule out match hypotheses which cannot be part of any legalinterpretation, and note which pairs of match hypotheses cannot consistently be part of the sameinterpretation. Gmaps are built by finding the maximal structurally consistent collections of localmatches, and using the computed overlap to determine what non-overlapping aspects of the basecan be postulated to hold in the target (i.e., the candidate inferences). The structural evalutationscore is computed last. First, the evaluation rules are run to provide a score for each matchhypothesis. The SES of each Gmap is computed by adding the scores of its match hypotheses.SME provides a process model for structure-mapping. The goal is to achieve sophisticatedresults using computationally simple techniques. We believe that combining local matchhypotheses into coherent global interpretations is a psychologically plausible aspect of SME [9].However, not every aspect of SME is equally plausible psychologically. For example, we do notbelieve people necessarily compute all interpretations, although for experimental purposes wegenerally have SME compute the complete set of Gmaps to gain more insight into the match. Asecond limitation is that SME models only the structural component of match quality. Contextualand pragmatic factors can also play a role in match evalution. However, understanding thosefactors involves simulating larger pieces of the overall processing system, with a subsequentincrease in the number of free parameters. By understanding structural evaluation in isolation wehope to tightly constraint that aspect of the system.Structure-mapping postulates thatsystematicity ispreferred in structural evaluations [8]; i.e.,a Gmap involving a larger connected system of relations, particularly higher-order relations2,should have a higher SES than one involving a smaller, or disconnected, system of relations. Thesystematicity constraint is stated at the information processing level (as defined by Marr [14]);additional principles are needed to provide constraint at the algorithm and implementation levels.This paper focuses on the algorithm level, importing only the most general constraints from theprospect of highly parallel, neural-like implementations. Our currentimplementation isserial, butthat is an accident of technology — the SME algorithm lends itself naturally to a variety of parallelimplementations [6]3.The score associated with a match hypothesis indicates how strongly the correspondencebetween the base and target items it connects is preferred on structural grounds. We restrictevaluation rules to use only local, structural properties in assigning scores. For example,MH'sreceive some initial score based on the kinds of items matched (relation, function, or attribute).Under structure-mapping only propositions involving identical relations or attributes match4, sothe same initial score is used for all relations and attributes (i.e., matches involving relations such'Which pairs of items are hypothesized to match and the structural constraints defining consistent global inter-pretations are fixed by structure-mapping theory.2Structure-mapping defines theorderof an item in a representation as follows: Objects and constants are order O.The order of a predicate is one plus the maximum of the order of its arguments. ThusGREATER-THAN(x,y)isfirst-order if x and y are objects, andCAUSE [GREATER-THAN(x, y) , BREAK (x) ] issecond-order. Examples of higher-orderrelations includeCAUSEandIMPLIES.3We view Holyoak and Thagard'sACME [11]as evidence thatSMEcould be implemented in at least a localistconnectionist framework, since there is substantial overlap in the information processing and algorithm levels betweenSMEandACME.4We assume a decompositional semantics, so that synonyms are translated into some common form (c.f. [1,3]).This allows similarity to be reduced to partial identity. The alternative course of allowing similar predicates to matchrequires one to define similarity by invoking it.Forbus&GentnerasCAUSEare given the same score as matches involving relations such asIMPLIESorLEFT-OF).This parameter is calledSame-Predicate.The parameterSame-Functionis used for identicalfunctions5. This part of the structural evaluation can proceed in parallel with match hypothesisconstruction.At first glance systematicity might appear to be an inherently global concept, requiringdifficult computations to enforce. We implemented it locally viatrickle-down,a cascade-likemodel [12]. In trickle-down, a match hypothesisMHadds its score, scaled by the parameterTrickle-Down,to the match hypotheses linking the arguments of the items matched byMH.Thusin a deep system of relations the scores will cascade down, providing high scores for the objectcorrespondences supporting the system (and thus for the system as a whole). This computation,too, can proceed in parallel, taking0(log(N))for a match hypothesis set of sizeN [6].The structural evaluation system thus has three parameters:Same-Predicate, Same-Function,andTrickle-Down6.Once the propagation of local scores for all match hypotheses is complete, theSESof an interpretation (Gmap) is computed by summing of the scores of its constituent matchhypotheses. In the original version ofSME,scores were represented and combined using theDempster-Shafer formalism [16,2]. We do not normalize Gmap scores, since doing so would onlyintroduce further parameters without theoretical motivations. We also wish to avoid arbitraryassumptions concerning the scaling of human soundness judgments. Consequently, ourconclusions will be based completely on ordinal comparisons between scores, never on the actualmagnitude of scores themselves.3 Modeling Soundness JudgmentsTo perform a sensitivity analysis one must have a standard for comparison. We use the cognitivesimulation experiment described in [17], which showed thatSMEcould replicate aspects of humansoundness judgments demonstrated empirically [10,15]. In the psychological studies, subjects firstread a large set of stories. In a subsequent session, they were shown similar stories and tried toretrieve corresponding original stories (an access measure). Afterwards, subjects were asked tojudge the inferential soundness of pairs of stories. What was varied was the kind of similaritybetween pairs of stories; some cases shared only relational structure (i.e., were analogous), someonly shared object similarities (i.e., appearance matches), and some shared both (i.e., wereliterally similar). Subjects rated literal similarity and analogy pairs as signficantly more soundthan appearance matches.In the original simulation study, five triads of stories were encoded, each consisting of the basestory (Base), an analogous story with different surface structure but similar relational structure(AN), and a story with surface similarities but different relational structure (MA). We askedwhetherSME'sstructural evaluation system could model these judgments. That is, if we interprettheSESas an indication of the soundness rating a subject would give, then to match the humandata the score computed bySMEfor the Base/AN match should be higher than the score for theBase/MA match. As predicted,SES(Base/AN) > SES(Base/MA).This experiment provides a useful framework for carrying out sensitivity analyses. Suppose wehave N triads of stories. For any particular collection of parameters – numerical, symbolic, oralgorithmic – we can define the fit with human performance to be the number of triads for whichSES(Base/AN) > SES(Base/MA).By analyzing how the fit varies we can determine howsensitive the results are to each choice. Figure 1 depicts how this design can be viewed as anexperimental apparatus. In the analyses which follow, three triads of stories were used each time.For each manipulation, this ESENSE apparatus was run over a sample of the numerical parameterspace to estimate what fraction of the space provides results which fit the psychological data. The5Functions are treated differently from predicates, since derived matches between non-identical functions areallowed if the structure above them matches (e.g., Temperature to Pressure).6The original structural evaluation rules [6] had eight parameters; the reduction to three required some theoreticalanalysis and about two Symbolics-days of numerical sensitivity analyses. Although our initial use of eight parametersmay seem large, it should be noted that simulations often have far more: ACME, for instance, relies on a numerical\"similarity score\" being available for each pair of predicates, hence the number of parameters is at least as large asthe square of the number of predicates in the underlying representation language. Ascertaining the dependence ofACME's performance on its parameters via sensitivity analysis would appear to be a rather formidable task.Forbus&GentnerFigure 1: ESENSE Experimental setup for sensitivity analysesThe previous simulation experiment can be viewed as an apparatus which, for any particular combination ofparameters, representations, and algorithm provides an estimate of fit to psychological data (here, an integerranging between 0 and 3). Running this apparatus over alternate choices provides insight about how eachaspect of the system accounts for the fit.Estimate of fitw/psychological dataSMESMESMESMESME>HISMEBASE.•BASE.•BASE.AN• MA• AN• MA• AN• MAStory triad #1Story triad #2Story triad #3desirable outcome is that some portion, but not all, provides such a fit to the human data. If thewhole space fits, then the parameters are irrelevant. If none of the space fits, then clearly thatcombination cannot account for human soundness judgments.4 The Specificity ConjectureRepresentational choices are often the most difficult issue in cognitive simulation. Rarely does atheory completely constrain the representational format, and while many choices are logicallyequivalent, even small changes can yield very different performance for a particular algorithm.Often there is no agreement (and sometimes intense disagreement) on what representations arereasonable. The typical solution is to test programs on a variety of examples to ensure generality.We believe that content variations alone are not always enough. Varying more globalrepresentational assumptions, such as the amount of perceptual information, can also be crucial.The representations used in cognitive simulation tend to have much in common with thoseused in AT. They focus on the important aspects of what is to be represented, leaving out\"irrelevant\" information. Consequently they tend to be rather sparse. Such representations arefine if the only purpose is to compute a particular kind of answer (such as how to fix a brokencar), and surely some human representations are like that. But is it reasonable to assume thatmost are? We suspect not. A person solving a problem or reading a story builds an internalrepresentation from a variety of sources. This can include rich visual and auditory informationabout appearances (possibly including mental imagery) from which the relevant factors must beextracted. In fact, the more realistic the problem-solving scenario, the more irrelevantinformation there tends to be. While an expert may have an intricate theory of the situation, it isfar from clear that the theory, as a percentage of the total number of propositions in therepresentation, dominates. And a novice faced with the same domain may have no applicableabstract knowledge, and thus can only encode observable properties.Let us usetop-heavyto refer to descriptions where most of the information is abstract, withvery little information about appearances or basic object properties, andbottom-heavyfordescriptions in which appearance information dominates (there may be just as much relationalForbus&Gentnerstructure as top-heavy descriptions, as long as there is even more appearance information). Basedon the observation that we can see far more than we can explain, we make theSpecificityConjecture:bottom-heavy descriptions are very common in human memory, perhapsoutnumbering top-heavy descriptions. If this conjecture is correct, it is important to testsimulations on bottom-heavy descriptions as well as the top-heavy descriptions which have beenthe favorite of experimenters.How does the Specificity Conjecture affect structural evaluation of analogy? Structurally,top-heavy descriptions have a preponderance of higher-order relations, while bottom-heavydescriptions have many more attributes and first-order relations (e.g.,LEFT-OF, BELOW).Considerthe relative number of match hypotheses in the Base/AN and Base/MA comparisons describedabove. All else being equal, given a top-heavy representation the Base/AN comparison will havemore match hypotheses than the Base/MA comparison, since there is more higher-order structurethan appearance information. Conversely, in a bottom-heavy representation the Base/MAcomparison will have more match hypotheses than the Base/AN comparison, since there is moreappearance information to match than higher-order structure. Thus in top-heavy representationsSES(Base/AN) > SES(Base/MA)will tend to be true even with uniformMHscores, assumingthat the higher-order structures do in fact match. But in bottom-heavy representations thetendency is towardsSES(Base/AN) < SES(Base/MA),due to the predominance of appearanceinformation. In this case trickle-down plays a crucial role, to prevent the inferentially importantcomparison from being \"swamped\" by the surface comparison. People apparently have the abilityto find structural commonalities even when they have bottom-heavy representations7. By lookingfor swamping over a space of numerical parameters and representation choices (i.e., by varyingthe amount of appearance information), we have a more subtle probe for exploring structuralevaluation.5 Analyzing normalization strategiesAny physically realizable computing scheme must include elements of finite dynamic range, andhence there will always be some normalization scheme which ensures that scores are within thatrange. The ability of trickle-down to prevent swamping depends in part on the normalizationstrategy used in computing scores. We can divide such strategies into two broad classes:resultnormalization,andcontribution normalization.Connectionist models tend to use resultnormalization; a unit's inputs are multiplied by a set of coefficients, added, and then scaled bysome non-linear function [13]. Formalisms for probabilistic reasoning tend to use contributionnormalization;MYCIN'scertainty factors, for instance, scale every contribution to belief in aproposition by the percentage of uncertainty remaining for that belief. Which kind of strategy,when plugged into the ESENSE apparatus, provides a better fit to the data?To answer this question we set up the following experiment. First, we modified the encodedstories of the original simulation experiment to produce three sets of stories: one consisting oftop-heavy descriptions, one consisting of bottom-heavy descriptions (i.e., twice as many matchhypotheses for the Base/MA comparison as for the Base/AN comparison) and one \"neutral\" set,where the number of match hypotheses for the Base/MA and Base/AN comparisons were exactlyequal. Then, we implemented a representative algorithm for each type of normalization. For theresult normalization case we used the following rule:AddMax: Wz+1=Min(1.O,Wz + Cz)whereWz,Wz+1are theMH'sscore before and after the contribution, Cz is the amountcontributed, and Wo = 0.0. For the contribution normalization case we used the Dempster/Shafercode from the originalSMEstructural evaluator. We then ran theESENSEapparatus over every setof stories using each normalization strategy, varying the numerical parameters over a broad range,to see how these choices interacted to affect the fit with human performance.7For example, in the experiment described above people gave higher structural evaluations to analogies thanto appearance matches. Yet we can infer that they must have stored the stories with a great deal of low-orderinformation, because their memory access was better for appearance matches than for analogical matches.Forbus&GentnerTable 1: Summary of fit as a function of representation and normalizationThis table shows, for each combination of representation type and normalization algorithm, how much of the sampledparameter space can completely account for the data. That is, a value of X% indicates that given any parametersetting in that fraction of the space, SME's performance will exactly match the original human data. Dempster/Shafercannot account for the data unless top-heavy representations are assumed.Top-heavy Neutral Bottom-heavyOne complication in setting up the experiment is that these strategies differ in the ranges ofparameters they allow. In Dempster/Shafer all parameters must be between zero and one. InAddMaxallowingSame-PredicateorSame-Functionto be one or greater is equivalent to justcounting match hypotheses, so we restrict these to be less than one.Trickle-Down,on the otherhand, can be greater than one, since the other parameters could be substantially less than one.(Even if the product is larger than one it makes no difference forAddMax,although it would violatethe fundamental assumptions of Dempster/Shafer.) ForAddMaxwe varied the three numericalparameters over the following ranges:Same-PredicateandSame-Functionover (0.0,10-4, 10-3,0.01, 0.1, 0.3, 0.9)andTrickle-Downover (0.0,0.5, 1.0, 2.0, 4.0, 8.0, 16.0).For Dempster/Shafer wevaried all three parameters(Same-Predicate,Same-Function,andTrickle-Down)over the same setof values: (0.0,10-4, 10-3, 0.01, 0.1, 0.3, 0.9).The number of samples for each algorithm is thus 73,or 343 points. To compute whether or not a point fits requires running each structural evaluatorsix times for each story set (i.e., to do the Base/AN and Base/MA comparison for each of threestory triads in a set). Thus with three story sets 6, 174 structural evaulations were required.Table 1 summarizes the results by showing what percentage of the sampled parameter spaceyields a perfect fit of the data, as measured by the ESENSE apparatus. Dempster/Shafer clearlyallows swamping as the number of attribute matches is increased. Thus it cannot explain thedata, unless attention is restricted to top-heavy representations.AddMax,by contrast, can betuned to fit the data for each type of representation. Can a single setting of parameters suffice?That is, are there subsets of the sample space in whichAddMaxfits the data for all three types, orare the subsets which fit the data for each type of representation disjoint? Yes, there is a singlesubset which fits the data. The boundary of this region appears complicated, and the coarsenessof our sampling precludes a detailed description of it. However, it is reasonably large, indicatingthat the algorithm is not overly sensitive to particular choices of parameters. For example, withinthe rangesSame-PredicateE [10-3, 0.01], Same-Function E [10-4,0.01], andTrickle-Down E[4, 16],every point fits perfectly. The regions which are clearly outside are interesting:Trickle-Downvalues of 1.0 or less, values ofSame-Predicateof 0.3 or more, and values ofSame-Function0.9 orhigher. Intuitively, what seems to be happening is this: UnlessTrickle-Downis sufficiently high,not enough score cascades down to overcome the swamping effect of the large number of attributematches. For the same reason, the \"baseline activation\" for eachMHmust be kept small;otherwise the cascade effect will be blocked by normalization.This experiment suggests an interesting possibility. Since any physical computation schemeincorporates elements of limited dynamic range, for any set of parameters there will be somemaximum depth beyond which additional systematicity cannot be distinguished, since theprocessing elements will have reached their maximum scores. This limit may be so high as to beirrelevant for human representations, or may show up as an \"order cutoff\" in failing to distinguishone comparison as more sound than another if both are extremely intricate.6 Trickle-Down versus Order-ScoringAn interesting alternative to trickle-down for implementing systematicity isorder-scoring.Consider a large relational structure which is shared by both base and target in someinterpretation of an analogy. This structure will have a number of match hypotheses involvingForbus&GentnerTable 2: Results of Order Scoring on the story setsThis describes the percentage of the sampled points which perfectly fit the data for the three story sets describedabove. We repeat the trickle-down AddMax data for easy comparison. Order-scoring fails to account for human data,assuming the Specificity Conjecture holds.Top-heavy Neutral Bottom-heavyrelational items of high order. To satisfy structural consistency, the arguments of each such itemmust themselves have correspondences in the interpretation. Hence its mere presence in theinterpretation indicates the existence of matches \"all the way down\" to object matches. Orderscoring simply scales the score given to each match hypothesis by the order of the items involved,instead of passing scores downward as in trickle-down.On computational grounds, we find order-scoring less preferable to trickle-down. First, tosatisfy our constraints order must itself be computed locally. This is not difficult, if one allowsinformation to propagate \"upwards\" from match hypotheses between entities (which have orderzero) to match hypotheses which include them as arguments, and so on. However, this explicitcomputation of order seems inelegant, since, to paraphrase [12], it requires\"more complexcurrency\" than simply propagating local scores. A second difference is that order-scoring directlysignals the existence of higher-order relations, and only indirectly signals the connectivity of asystem of relational matches. Trickle-down, on the other hand, directly signals connectivity,leaving order implicit. Intuitively, connectivity seems a better structural reflection of coherenceand inferential power than simply the existence of higher-order relations. Thus trickle-down hasgreater theoretical appeal as a way of deriving a structural evaluation.But intuitions can be misleading. To see whether order-scoring could account for the data, weimplemented a set of evaluation rules using this strategy. The contribution of order was definedby the function 03 :03(MH) =Min(1.0,Cx[1+ Order(MH)xOrder-Bias])where C is eitherSame-PredicateorSame-Functionas appropriate. We sampled thisparameter space in the same way as in the earlier analysis: i.e.,Same-PredicateandSame-Functionranged over (0.0,10-4,10-3,0.01, 0.1, 0.3, 0.9)andOrder-Biasover (0.0,0.5, 1.0, 2.0,4.0, 8.0, 16.0).Table 2 summarizes the results. Clearly, 03 is not a viable candidate forimplementing the systematicity preference, since it is swamped on bottom-heavy representations.We suspect that this result will hold for all order-scoring algorithms. Even whenOrder-Bias ishigh, local normalization prevents the score of any particularMHbecoming too high.Trickle-down avoids this limitation by co-opting all the lower-order structure matches under thehigh-order match, thus providing better resistance to swamping.7 DiscussionPrevious work demonstrated that structural critera are important in judging the relativesoundness of analogical comparisons. This paper explores the relationship between the data andsimulation in detail, making explicit the principles which constrain the space of algorithms weallow, and using the previous experiments to provide a framework for sensitivity analyses (theESENSE apparatus) that help provide a deeper account of structural evaluation. In particular, weintroduced the Specificity Conjecture, which suggests that in mental representations appearanceand other low-order information is likely to dominate. If true, our experiments indicate that (a)normalization of scores for local matches should occur by a result normalization strategy ratherthan by contribution normalization and (b) trickle-down provides a better implementation of thesystematicity preference than order-scoring.Forbus&GentnerWe believe the Specificity Conjecture has important general ramifications for cognitivesimulation. The aesthetic for good representations in AI is driven by the desire to solve particularkinds of problems. Since AI workers tend to do more explicit formal representations than workersin other areas of Cognitive Science, their aesthetic tends also to be inherited by other areas, evenwhen it may not be appropriate. There is very little direct evidence about the format andstatistical properties of mental representations (i.e., when they tend to be top-heavy versusbottom-heavy). Still, the fact that humans have powerful perceptual systems which deliver a richassortment of information regardless of whether they know much else about what they are seeingargues for the importance of testing simulations with bottom-heavy representations. Showingthat a simulation works in different content areas is now common. This is clearly important, butwe now believe that it is not enough. One must explore how well a simulation performs with arange of representations that captures plausible intuitions about what the range of mentalrepresentations might be like.8 AcknowledgementsBrian Falkenhainer provided valuable advice and algorithms for carrying out parts of thesensitivity analyses. This paper benefited from discussions with John Collins, Brian Falkenhainer,Rob Goldstone, Art Markman, Doug Medin, Janice Skorstad, and Ed Smith. This research wassupported by the Office of Naval Research, Contract No. N00014-85-K-0559, an NSF PresidentialYoung Investigator award, and an equipment grant from IBM.References[1]Burstein, M. H. (1983). A model of learning by analogical reasoning and debugging. InProceedings of theNational Conference on Artificial Intelligence,Washington, D. C.[2]Falkenhainer, B., Towards a general-purpose belief maintenance system, in: J.F. Lemmer (Ed.),Uncertainty inArtificial Intelligence, Volume II,1987. Also Technical Report, UIUCDCS-R-87-1717, Department ofComputer Science, University of Illinois, 1987.[3]third stage in the analogy process: Verification-Based Analogical Learning, Technical ReportUIUCDCS-R-86-1302, Department of Computer Science, University of Illinois, October, 1986. A summaryappears inProceedings of the Tenth International Joint Conference on Artificial Intelligence,Milan, Italy,August, 1987.Falkenhainer, B., The SME user's manual, Technical Report UIUCDCS-R-88-1421, Department of ComputerScience, University of Illinois, 1988.Falkenhainer, B., K.D. Forbus, D. Gentner, The Structure-Mapping Engine,Proceedings of the Fifth NationalConference on Artificial Intelligence,August, 1986.Falkenhainer, B., Forbus, K., Gentner, D. The Structure-Mapping Engine: Algorithm and examplesArtificialIntelligence,to appear, 1989.Gentner, D., The structure of analogical models in science, BBN Tech. Report No. 4451, Cambridge, MA.,Bolt Beranek and Newman Inc., 1980.Gentner, D., Structure-mapping: A theoretical framework for analogy,Cognitive Science 7(2),1983.Gentner, D., Mechanisms of analogical learning. To appear in S. Vosniadou and A. Ortony, (Eds.),Similarityand analogical reasoning.Presented in June, 1986.Gentner, D., & R. Landers, Analogical reminding: A good match is hard to find. InProceedings of theInternational Conference on Systems, Man and Cybernetics.Tucson, Arizona, 1985.Holyoak, K. & Thagard, P. Analogical mapping by constraint satisfaction, to appear inCognitive Science.McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letterperception: Part 1. An account of basic findings.Psychological Review,88(5),375-407.Rumelhart, D. and McClelland, J.Parallel Distributed Processing, Volumes 1 & 2,The MIT Press, 1986.Marr, D.Vision, W.H. Freeman and Company, San Francisco, 1982.Rattermann, M.J., and Gentner, D. Analogy and Similarity: Determinants of accessibility and inferentialsoundness,Proceedings of the Cognitive Science Society,July, 1987.Shafer,G., A mathematical theory of evidence,Princeton University Press, Princeton, New Jersey, 1976.Skorstad, J., Falkenhainer, B., Gentner, D., Analogical Processing: A simulation and empirical corroboration,in:Proceedings of the Sixth National Conference on Artificial Intelligence,Seattle, WA, August, 1987.