Vol.4 No.2 2011

Research paper : ARGUS: Adaptive Recognition for General Use System (N. Otsu et al.)−85−Synthesiology - English edition Vol.4 No.2 (2011) have supplemented the explanation as much as possible within the given space.3 Requirements for vision systems Question (Motoyuki Akamatsu)As basic conditions required in a vision system, you listed “R1: shift-invariance, R2: frame-additivity, R3: adaptive trainability,” but the grounds for citing these was not clearly written. Could youplease write the scenario in which these theoretical developments were selected? Also for geometrical invariance, I believe that other choices could also be considered, such as invariance with size, invariance with inclination, and relative position invariance between features. Moreover, for invariant feature extraction, it is mentioned that the functional is investigated which gives feature values invariant under geometrical transformations. Is it correct to understand that since this targets a vision system, geometrical invariance is an essentially important property? Finally, about frame-additivity, I do not think additivity will be satisfied in the case where there is overlapping, so is this a choice made mainly from processing time?Answer (Nobuyuki Otsu)The shift-invariance refers to invariance under a parallel shift. This does not mean that “the distance between camera and physical object hardly changes,” but rather that due to changes in the camera direction, the physical object undergoes a geometrical transformation which is a parallel shift within the screen frame, and its position changes, and that features which are invariant to such kinds of basic translation are essential in recognition. Of course, as you pointed out, other size (scale) transformations and rotations can be considered as invariant transformations, but what I am saying here is that the parallel shift (or position) invariance is the most fundamental. To avoid any misunderstanding, I have made a slight revision. The invariant feature extraction theory that seeks features invariant under geometrical transformations (functionals) is not something which is restricted to vision but also includes audio signals, and is a theory which we can generally consider as universal.Frame-additivity, as you mentioned, does not strictly hold true for cases of overlapping, but I will risk asserting that it is important to leave the requirements as they are even in those cases. This, as you have pointed out, has implications from a processing time viewpoint, but the feature representation is a convenient one (linear) in terms of recognition (especially enumeration), and also the required condition to make the subsequent processing simple. I have supplemented the explanation.4 The meaning of adaptive learning Question (Kanji Ueda)There could be several rules in using the word “adaptive learning,” but could you clarify its meaning in the context of this paper?Answer (Nobuyuki Otsu)To start, the prerequisite information in pattern recognition is not perfect. Based only on a finite number of examples given as learning samples, recognition is conducted on unknown test samples (an infinite number if possible). As you pointed out, there is certainly some ambiguity in the terminology “adaptive learning.” First, even if the pattern recognition is limited to the recognition object, there is adaption according to variations in the pattern. This is related to feature extraction and the learning process. Also, the adaptive learning in this paper is used in a meta-sense in that it is adaptive learning to a given recognition task. In the case of model-based learning, the model needs to be replaced when the task changes, whereas this method adapts to the task, with no model required at all and the components as they Discussions with Reviewers1 Expansion of the theory and application to the industrial worldQuestion (Motoyuki Akamatsu, Human Technology Research Institute, AIST)I understand from the fact that ARGUS was a robust method backed by theory, that the technique could be widely applied. As a Synthesiology paper about research based on such a theory, could you perhaps write about essential points or difficulties regarding the research of this theoretical basis? Also, after trying a variety of applications, could you possibly record if they generally went according to theory, or if not, whether you experienced difficulties? If it is the former case, it would greatly help the readers if you could explain why the theory went well.Answer (Nobuyuki Otsu)I have responded to the extent possible.2 Selection of elemental technologiesQuestion (Kanji Ueda, AIST)This paper, being Type 2 Basic Research for a theoretically based technique which applies to real problems, is of a type that had not yet appeared in Synthesiology. I would like to ask about the selection of elemental technologies for this theoretically based constitutive research. How did you choose components to achieve a practical target? Please explain whether they are just components derived by deduction from existing states, or if there are hypothetical components.Question (Motoyuki Akamatsu)In subchapter 4.1, you discuss how you developed an adaptive general-use image recognition system as a system satisfying the required conditions from R1 to R3, and how you adopted HLAC and CHLAC as a technique to extract feature values satisfying shift-invariance. In the process of their adoption, I believe there were other techniques considered as candidates. Could you write the rationale for how you came to the conclusion that, compared to those other techniques, HLAC is superior? Also, what is written here as a reason why HLAC was adopted, is the point that patterns are localized and their localized relative relationship is essential. I am afraid readers not specialized in this field may not immediately understand the relationship between focusing solely on local features and shift-invariance. It would be helfpful if you could add a little postscript .Answer (Nobuyuki Otsu)In a recognition system, the feature extraction from the object pattern is an important component in determining the performance. In contrast to choosing a variety in an ad hoc way (hypothetically so to speak or by trial and error), as has been the case up to now, higher-order local auto-correlation and multivariate data analysis were adopted as concrete components, which from a theoretical basis gives a two-stage framework comprising geometrical invariant feature extraction and statistical discriminant feature extraction, and which satisfies the basic required three conditions for achieving practical implementation objectives. In that sense, one can think of them as components derived in a deductive manner from theory, and also as hypothetical yet essential components that satisfy both theory and requirements. There actually are not many alternatives for features that simultaneously satisfy the basic required conditions (especially R1 and R2), and yet are generic features not based on any model. As you pointed out, simply examining local features does not imply shift-invariance. Rather, because “relative” relationships are extracted as autocorrelation, this implies shift-invariance. I


page 14