In behavioral investigations examining mechanisms and functions of inter- and intra specific communications, whether one can manipulate stimulus properties is a critical factor. If we can substitute a real animal with an artificial model, that should greatly advance the research. Here we tested whether male zebra finches (Taeniopygia guttata castanotis) and male Bengalese finches (Lonchura striata var. domestica) emit natural behavior of directed singing to video images of conspecific females. When a conventional cathode ray tube (CRT) monitor was used, birds showed few signs of behavioral responses. However, when a thin film transistor (TFT) liquid crystal display was used, several behavioral responses, mostly sexual displays, to the images were observed. The amount of directed singing emitted towards the TFT projected images were comparable to that emitted to the live female birds in both species of birds. The reason why TFT monitor is much more powerful than CRT monitor in eliciting natural behavior from these birds may lie in the fact that TFT monitor is flickerless while CRT monitor might produce some flickers to the eye of birds that has high critical flicker frequency. TFT monitors should be better substitute of real objects than CRT monitors in behavioral investigations. This technique, combined with modern image processing techniques, should be useful for neuroethological studies of bird behavior.
Song behavior of estrildid finches such as zebra finches and Bengalese finches has been subject to various ethological and neuroethological investigations (Immelman, 1969; Brenowitz, et al., 1997). Zebra finches and Bengalese finches both have two types of songs that are used in different behavioral contexts (Sossinka and Böhner, 1980). Directed songs (DS) are emitted in a courtship context (Morris, 1954) while undirected songs (US) are sung by himself (Immelman, 1969). Undirected songs are therefore usually recorded in less noisy conditions but directed songs could be contaminated by several noise sources such as female vocalizations and wing flapping and easily interrupted or terminated by female's behavior. However, in order to examine behavioral and physiological correlates of two types of the songs (i.e., Doupe et al., 1998), we need to obtain noise-free, uninterrupted directed songs. Furthermore, live birds would inevitably interact with subject birds and the nature of interactions changes trial by trial. We thus wanted to develop a technique with which we can reliably induce finches to sing directed songs without showing them live females.
The use of video images instead of live animals may be one of the promising procedure to solve this problem. Video images projected onto a cathode ray tube (CRT) display were successfully used in several operant discrimination tasks with birds. For example, Bengalese finches can be trained to discriminate between two conspecific images (Watanabe et al., 1993). Hens can be trained to discriminate between video images of brown hen versus no hen (Patterson-Kane, et al., 1997).
However, operant discrimination does not guarantee that animals are in fact recognizing video images as real objects. One can infer that animals are recognizing video images as real objects only when operant discrimination transfers to real objects. Patterson-Kane et al. (1997) examined object-video transfer in domestic hens and concluded that video images are not equivalent to the real stimulus but only part of them such as some colors may be equivalent.
In conducting such experiment with birds, two limitations in conventional CRT displays are apparent. First, since birds might possess higher flicker fusion frequencies (Pigeons, 65 Hz, Hendricks, 1966; Pigeons 60 ∼ 145 Hz, Powell, 1967; Pigeons, above 100 Hz, Emmerton, 1983; Chicks, above 60 Hz, Nuboer et al., 1992) than humans (about 50 Hz; Ohkawa and Ishida, 1987), the use of conventional CRT monitor might produce flicker to bird's eye. Second, since bird color vision is at least tetrachromatic or even pentachromatic (Emmerton, 1983), equipment made to satisfy trichromatic vision do not satisfactory reproduce tetrachromatic visions. Especially, ultraviolet (UV) ranges are missing. In zebra finches, UV vision is indispensable in assessing mate quality (Bennett et al., 1996). To overcome at least the first problem, we used, in addition to conventional CRT display, video images projected onto a thin film transistor (TFT) liquid crystal display. TFT display actively keeps the color information of each pixel until the next scanning information arrives. Thus, TFT display is flickerless.
In this study we tested whether male zebra finches and male Bengalese finches emit directed songs towards video images of conspecific females. We used both a TFT display and a CRT display, as well as actual female birds. We found that images projected onto the TFT display could satisfactory induce natural behavior including directed songs from male birds.
MATERIALS AND METHODS
The birds used in this experiment were obtained from local pet suppliers and maintained in the aviary at Chiba University under controlled room temperature (around 25°C) and humidity (60%) with room lights turned on during 7 a.m.–9 p.m. Each zebra finch was kept in an individual cage, while Bengalese finches were kept in communal cages with mixed or separated sex members. Prior to experiment, birds were assessed for their basal reactivity to female conspecifics. The male bird was brought into a small cage and placed in front of a cage that contained a female conspecific bird. When the bird did not initiate singing female-directed song within 1 min., that bird was not used for this experiment. We screened eight adult male zebra finches and seven adult male Bengalese finches. Out of these birds, two zebra finches and one Bengalese finch were omitted from the experiment by our criterion mentioned above. As a result, the total of six birds from both species were selected. All six zebra finches were wild type, of gray morphs.
Each of three adult wild type female zebra finches and three adult female Bengalese finches were videotaped in the individual plastic cage with a perch (Matsuyama Plastics, Model No. 10, dimensions 15.0 × 22.0 × 30.5 cm) for ∼20 min. The three female Bengalese finches were of different plumage patterns since this strain of birds have natural variations in plumage patterns. The three zebra finches were all wild type, of gray morphs. A charge-coupled device (CCD) camera with 250,920 (510 × 492) pixels (VDC mail service, CV-95) and an electoret microphone (SONY ECM-MS957) were connected to an 8 mm video recorder (SONY CVD-500) with 240 horizontal line resolutions to record the stimulus. The CCD camera had 350 horizontal and 300 vertical line resolutions. Video recordings were viewed and for each bird, 1 min. sequence was selected. We selected the part of video recordings within which the female stayed on the perch longer time because that was the position most visible when recorded. The same plastic cage was used to show a live female as a stimulus. Time schedules for stimulus presentations were controlled by an IBM-compatible computer through a VISCA (Video system control architecture) interface.
Physical characteristics of auditory stimuli produced by each of the monitor were measured by placing a sound level meter (TGK Model SL-1250) at the position where usually occupied by the subject bird. Background noise level and the peak sound pressure level produced by a female zebra finch and a female Bengalese finch were measured for each of the monitor (Table 1). In general, sound pressure level was comparable for the both monitors but was slightly larger in the TFT monitor when distance calls were broadcast. These levels were within the range of natural distance calls.
Physical measurements of the stimuli
For visual stimuli measurement, chromaticities and luminance of the background and the white abdomen of a female zebra finch and a female Bengalese finch used in the experiment were measured by a tri-filter colorimeter equipped with a close up filter (Minolta CS-100) placed at 100 cm from the stimulus. This device gives direct readings of luminance and chromaticity [Commission Internationale de l'Eclairage (CIE) 1931] (Wyszeki and Stiles, 1982). Measurement was taken 5 times for each, and the average was reported in Table 1. The chromaticities and luminance levels of the CRT and TFT video images were comparable. Data were also taken from the actual live bird. The video images were generally brighter and more bluish than the actual bird and the background. In all conditions, the color of the abdomen of birds is within the range that is perceived as “white” to human eye (Mollon and Sharp, 1983) and the luminance levels of all conditions were similar to human eye.
Testing was conducted in a sound proof chamber (internal dimensions, 151 × 117 × 187 cm; Music Cabin, Co.). A small desk (90 × 60 × 73 cm) was placed inside the chamber. Experimental setups were placed on the desk. Either a 10-inch CRT display television (SONY, KV-10DS1; 720 pixel × 512 horizontal lines at 60 Hz interlace scanning) or an 8.4-inch TFT display television (Sharp, LC-84TV1; 640 × 480 pixels), or a cage containing live female bird was placed next to the cage that contained the subject bird (Fig. 1). Although the pixel resolution of the both monitors were high, the CCD camera used to take video images limited the actual resolution of the system (350 × 330). These conditions (presentation methods) were denoted as CRT, TFT, and LIVE respectively. Although the CRT monitor and the TFT monitor had different diagonal sizes, effective diagonal sizes of the two were almost the same (about 8 inches).When placed next to the subject's cage, images on these displays were about life size. The microphone and the CCD camera were placed behind the subject's cage and outputs from these were fed into an 8 mm video recorder (SONY, EV-PR2) and a CRT display for real-time observation and video recording. These video recordings were used for off-line examination of the experiment.
Experiment took place once every seven days per subject. Three sessions (CRT, TFT, and LIVE) were run for each bird. The order of testing for TFT and CRT was randomized for each bird. Testing with live birds was conducted as the final session. Prior to the session, the bird was taken from a communal aviary and isolated inside a small sound proof box (Music Cabin, Co., internal dimensions, 40 × 58 × 38 cm) for 20 min. The session was initiated by a habituation period (200–400 sec) during which the subject was placed into the experimental chamber without stimulus. Following that, one of the three stimuli was randomly selected and presented for 1 min. After that, an inter-trial interval (ITI) period of 200–500 sec then followed. Songs emitted during ITI were counted and treated as the “NONE” condition. During one session, 3 stimuli and 2 ITIs were presented. All sessions were run when the birds were most active, between 10 a.m. and 2 p.m. During the session, the subject bird was closely observed and behavior and reactions to the stimuli such as directed singing, undirected singing, and tail quivering were monitored.
The entire experimental sessions were video taped. The tape was viewed after the experiment and the durations of directed and undirected songs were separately counted for each stimulus and for each ITI. Duration of singing was added up for the three presentations, summing up to 3 min. Since duration of the ITI varied, duration of singing during ITI (the NONE condition) was standardized to 3min.
Except for the NONE condition for Bengalese finches, few undirected songs were emitted during the entire experiment. Therefore, statistical analyses were not conducted for undirected songs.
For each species, the effect of the presentation methods was examined by Friedman test (Sokal and Rohlf, 1995). In this test, NONE condition was excluded because no directed songs were emitted during NONE periods for both species of birds. When the effect turned out to be significant, Wicoxon matched pairs tests were run to find which condition differed from which (Sokal and Rohlf, 1995). With three presentation conditions, three comparisons were available. These comparisons were treated as a family of comparisons.
For each of the TFT and LIVE conditions, stimulus specificity of directed songs elicitability was examined by Friedman test. When stimulus effect turned out to be significant, Wilcoxon matched pairs tests were run to find which stimulus differed from which. With three stimuli, three comparisons were available. These comparisons were treated as a family of comparisons.
For Friedman tests, alpha smaller than 0.05 was required for significance. For Wilcoxon tests, for the family-wise comparisons, alpha smaller than 0.10 was required. Therefore, with the most conservative Bonferroni method (Sokal and Rohlf, 1995), alpha smaller than 0.033 was required for each of the comparison-wise tests.
For directed songs, Friedman test detected a significant effect for the presentation methods (χ2 = 9.333, df = 2, p = 0.009). Wilcoxon tests showed that LIVE (p = 0.028) and TFT (p = 0.028) both elicited more directed songs than CRT. LIVE and TFT did not differ (Fig. 2A). Thus, video images projected onto the TFT monitor was as effective as the real female bird. On the other hand, CRT was same as not showing anything at all (NONE). Video images projected onto the CRT was not recognized as a conspecific bird by male zebra finches.
For zebra finches, hardly no undirected song was emitted during the experiment (average 1.0 sec/180 sec) (Fig. 2A) and further analyses were limited to the directed songs.
Friedman test detected significant effect of stimuli for LIVE condition (χ2 = 7.000, df = 2, p = 0.030) but not for TFT condition (Fig. 2B). For LIVE condition, ZF2 elicited significantly higher level of singing than ZF1 (p = 0.028). But other comparisons were not significant. Thus for LIVE conditions, the order of attractiveness was ZF2 ≧ ZF3 ≧ ZF1 and ZF2 > ZF1. For TFT condition, the order was ZF1 ≧ ZF3 ≧ ZF2.
Within a total of 180 sec during which stimulus was presented, our subject zebra finches, on average, sung about 70 sec to the real females and about 40 sec to the TFT images. On the other hand, only 7 sec of directed songs was sung during the 180 sec presentation of stimuli on the CRT display, a significant reduction of singing compared to LIVE and TFT conditions. Since both CRT and TFT conditions used the identical video clips with the identical sounds played together, the difference appeared in the two conditions should be attributed to the differences in stimulus projection methods.
In our CRT result, only instances in which directed songs occurred were when the stimulus was of a particular female that emitted distance calls. On hearing the distance calls, 4 out of 6 subjects immediately answered back with distance calls, and began to sing. The other two stimuli did not emit any distance calls and these stimuli did not elicit any directed songs from any of the subjects.
When directed songs were sung, the posture of the subjects towards TFT projected images was very similar to that taken when singing to the live female birds. However, when real birds were used as stimuli, in two occasions, one of the stimulus females emit copulation solicitation display to the subject males. These reaction from female's side might have enhanced the durations of directed singing. In one occasion, one of the male subjects emit tail quivering display to the stimulus female after singing directed song. These interactions were not observed when videos were used.
For directed songs, Friedman test detected a significant effect for the presentation methods (χ2 = 12.000, df = 2, p = 0.003). Wilcoxon tests showed that LIVE (p = 0.028) condition induced more directed songs than any other conditions (Fig. 3A). TFT induced less directed songs than LIVE, but TFT condition induced more directed songs than CRT (p = 0.028) condition. Thus, video images projected onto the TFT monitor was quite effective but it was not as effective as the real female birds. On the other hand, CRT was same as not showing anything at all (NONE). Apparently, video images projected onto the CRT was not recognized as a conspecific bird by male Bengalese finches. For undirected songs, the NONE condition elicited more undirected songs (average 34.6 sec /180 sec) than any other conditions (CRT = 0.7, LIVE = TFT = 0.0) (Fig. 3A).
As in the zebra finches, when directed songs were sung towards TFT projected images, the posture of the subjects was very similar to that taken when singing to the live female birds.
In the LIVE condition, stimulus BF3 and BF1 elicited about the same amount of directed songs, and BF2, the smallest. Thus, the order of attractiveness as measured by the total duration of directed songs sang was BF3 ≧ BF1 ≧ BF2 (Fig. 3B). Also for TFT condition, Friedman test detected significant stimulus effect (χ2 = 6.583, df = 2, p = 0.037) and Wilcoxon test showed BF3 > BF2 (p = 0.028) but other comparisons were not significant (Fig. 3B). Thus, the order of attractiveness goes as BF3 ≧ BF1 ≧ BF2. For Bengalese finches, the order of preference was the same when TFT was used and when live females were presented (LIVE condition).
Several studies tried to establish object-video equivalence using natural behavior of animals. If animals behave towards the video images as they do towards the real objects, we can substitute real objects with video images. But demonstrating this has not been easy. For example, although pigeons elicit agnostic responses toward unfamiliar conspecifics, life size moving video images of pigeons did not elicit any social responses from subject pigeons (Ryan and Lea, 1994). On the other hand, Shimizu (1998) and Frost et al. (1998) reported that they were successful in inducing courtship behavior from male pigeons by CRT projected video images of female conspecifics. These results suggest that although CRT projected video images may or may not exactly look like conspecifics, at least some releasing stimuli for sexual display should be perceivable by pigeons in such images.
Adret (1997) is another of the few reports in which video images successfully induced natural behavior from birds. Unfortunately, the birds used in this study were reared in social isolation and sung more directed songs toward the video image of a male zebra finch than that of a female zebra finch. In our casual observations and in other published works (Zann, 1996) adult male zebra finches do not emit directed song toward adult male conspecifics. This discrepancy between natural observations and the experimental results cautions whether the video images were treated as conspecific birds in this experiment.
Studies on chickens seem to be successful, but further cautions are required. Male domestic fowl produce appropriate alarm calls to videos of ground and aerial predator (Evans et al., 1993). Alarm calling elicited in response to real aerial predator silhouette is facilitated by the video images of hens, but such a response is also shown to be facilitated by the presence of quail and by the sound of a hen (Evans and Marler, 1991). Alarm calling might be facilitated by simple cues but whether or not video images are looked as real objects by birds is not clear with these results.
In this study we showed that it is possible to use video images instead of live female birds to induce directed songs from zebra finches and Bengalese finches when the TFT monitor was used. We showed that the TFT monitor and CRT monitor may be quite different to the eye of the birds. Furthermore, in the experiment using Bengalese finches, birds were clearly responding differently to the different video stimuli, suggesting that birds were capable of discriminating among different individuals projected onto the TFT monitor.
Our results suggested that the TFT images and the real females were comparably attractive in inducing directed songs. On the other hand, CRT display elicited few directed songs. In our CRT result, only instances in which directed songs occurred were when the stimulus was of a particular female that emitted distance calls. The other two stimuli did not emit any distance calls and these stimuli did not elicit any directed songs from any of the subjects. Distance calls could be eliciting stimuli for singing. Considering these points, some of the songs elicited during our experiment might have been elicited in response to distance calls, rather than to the CRT images. When directed songs were sung, the posture of the subjects towards TFT projected images was very similar to that taken when singing to the live female birds, suggesting that TFT images were perceived as being similar to the conspecific birds.
Our results are different from that of Adret (1997). In the study using a CRT display, Adret reported that his subjects (male zebra finches) most often sang to the video image of a gray male and to a gray female zebra finches, and to less extent, to a white female zebra finch and a to lovebird. In our experiment, male zebra finches sang at a quite high rate to the video images of female zebra finches when they were projected onto the TFT display, but not when they were projected onto the CRT display (Fig. 3A). Also in our casual observations we never observed a male zebra finch singing toward an adult male conspecific.
Since Adret used male zebra finches that were reared in individually isolated sound proof chamber between day 35 and day 120, direct comparison with our data requires caution. Furthermore, 4 of his 9 subjects were exposed only to the video image of singing male zebra finches during day 35 and 65. The rest of birds were individually isolated in a Skinner box placed in a sound proof chamber in which they were trained to peck for the reward of conspecific songs (Adret, 1993) during that period. Visual cues for species recognition must be learned in the zebra finch (ten Cate, 1984). Zebra finches first reared by Bengalese finch parents could later be easily imprinted upon zebra finches, but not vise versa (Immelman, 1972). Thus, imprinting with irrelevant stimuli results in extended sensitive period (Bateson, 1979). Since these subjects were deprived of normal visual experience, the birds could have been still in the sensitive period for visual learning of conspecific images. In that case, some visual cues, even if they did not look as conspecific, might trigger directed singing. In fact, in Adret's study, some zebra finches were emitting directed song to such stimuli as the head or body of a lovebird. Lovebirds have reddish face and green-gray plumage, somewhat similar to the coloration of zebra finches. This raises a possibility that finches were responding to particular aspects of video images such as a combination of colors.
When directed songs were sung towards TFT images, the posture of the subjects was very similar to that taken when singing to live female birds. However, in our result the response toward the TFT display was significantly fewer than that toward live birds. One of the reasons for that could be the lack of interactions. We observed in the video recordings when the live female was more attending to the subject, the subject tended to emit more songs. Another reason may be that in LIVE conditions, sometimes the female and the subject male began calling each other just before the experimental session started when we were carrying the subject to the experimental chamber. This could prepare the subject to have a higher motivational state for directed song.
Our results also suggest that Bengalese finches may be discriminating among the three individual females since the order of the preference of each stimulus was the same in TFT and in LIVE conditions. This inference is possible only with an assay involving natural responses as in our study.
Watanabe et al. (1993) asked whether Bengalese finches could discriminate between video images of conspecific birds projected onto CRT display. Since their procedure was appetitive operant conditioning, whether or not the subjects were regarding the stimuli as conspecifics was not asked. In their experiment, only 3 out of 8 birds were successfully trained after 30 sessions. Unfortunately, they did not report biological reactions of the subjects to the video images.
Why TFT monitor works?
The image-production methods are radically different between TFT displays and CRT displays. In CRT displays, a point of electron beam is projected onto the luminescent screen. When driven by a NTSC (National Television System Committee) signal that scans at 60 Hz interlace (renewing alternate horizontal lines every 1/60 sec), it takes 1/30 sec to complete one frame of picture. While scanning, luminance of each pixel decays before next electric beam hits at the same pixel. In TFT displays, on the other hand, the color information projected onto one pixel is kept by an active circuit until the next scanning information arrives. Thus, TFT displays are flickerless (Ohshima, 1998).
When temporal resolution of the receiver is too high, the CRT display should look like a quickly moving dot with trailing line. Since our visual system has a flicker fusion frequency of about 50 Hz (Ohkawa and Ishida, 1987), CRT images refreshed at 30–60 Hz look like a continuous image. Unfortunately, no data is available for flicker fusion frequency of finches, but we at least know that pigeons and chicks have higher flicker-fusion thresholds (60 ∼ 145 Hz; see introduction for detail) than humans. With the eye of finches, only a part of the entire screen might be visible in the CRT display. On the other hand, the images projected onto the TFT display should be complete at any one instance even with the eye of finches. Fig. 4 clearly demonstrates this point. These are the pictures of the CRT (upper) and the TFT (lower) displays both taken at a shutter speed of 1/125 sec under the identical conditions and within the identical roll of film. Only about a half of the entire screen is visible on the CRT display but on the TFT display the image is complete.
As for the spatial resolution, the two monitors were almost comparable. The TFT monitor we used had 640 x 480 pixels and the CRT monitor we used had 512 horizontal lines with each line consisted of 720 pixels. Although the pixel resolution of the both monitors were high, the CCD camera (350 × 330) and the 8 mm video recorder (240 horizontal lines) used to take video images limited the effective resolution. Thus, practical resolution was the same for the CRT and the TFT monitors.
Even with TFT display though, some visual properties of live birds may be missing. One of them is apparently ultraviolet rays. Most birds do have photo receptors that are sensitive to ultraviolet rays (Varela et al., 1993). Ultraviolet visions are important when assessing male quality in zebra finches (Bennett et al., 1996), though we do not yet know the degree by which female qualities are assessed by males by ultraviolet vision.
Nevertheless, TFT displays should be quite useful in behavioral investigations. For example, with this system we should be able to investigate the relationship between visual and auditory stimuli in evoking directed songs in finches. Since CRT displays are not free from flicker that might be resolvable to the bird eye, flickerless TFT displays probably look much more natural to organisms with higher motion sensitivity.
In this study we were able to induce species-specific behavior from the male finches towards the images of conspecific females projected onto the TFT display. Our data also gave a hint, at least for Bengalese finches, that the male birds might be discriminating among individuals projected onto the TFT display, and furthermore, recognizing each female and their images to be the identical (or at least similar) objects. With this technology, we are now able to ask unique questions to animals. One application of the technique would be to examine the differences in neural responses of a song control nucleus when singing directed songs and undirected songs without being interfered by female's behavior (Doupe et al., 1998). Another application would be to examine the physiological mechanisms by which visual and auditory information are integrated to induce directed songs. Combined with the advance of image processing technology, the use of flickerless devices such as TFT displays will enhance the design of behavioral and neuroethological experiments.
We thank Drs. Eiji Kimura and Yosie Kiritane of Chiba University for technical comments concerning luminance and color measurements. Dr. Eiji Kimura kindly rented us the colorimeter. We also thank Dr. Hidetoshi Ikeno of Himeji Institute of Technology and Dr. Tetsumi Moriyama of Tokiwa University for suggesting literature. We are grateful to Dr. Koichi Mori of Research Institute of National Rehabilitation Center for the Disabled and Dr. Shigeru Watanabe of Keio University for critical comments on a earlier version of this paper. M.I. was supported by a Research Fellowships of the Japan Society for the Promotion of Science for Young Scientists (No. DC6152). This research is supported by “Intelligence and Synthesis” research program sponsored by PRESTO, Japan Science and Technology Corporation. Financial support was also provided from Grant-in-Aid for Scientific Research on Priority Areas, “Evolution and Development of the Mind” sponsored by the Ministry of Education, Science, Sports and Culture.