output pulse obtained for a 100 Hz repetition rate and a high Q period of 5 μs. A peak power of approximately 8.8 W was achieved in a pulse of 200 ns duration. This pulse duration is in good agreement with the calculated value. The fibre laser operated in Q-switched mode for a day or more continuously with negligible long-term variation of the pulse shape.

We also investigated the effect of replacing the modulator in the cavity by an ordinary optical chopper wheel. Remarkably, we found that, in spite of the slow switch-on time of the chopper, which was 10 μs or a few hundred cavity round-trip times, peak powers approaching half those given by the acousto-optic modulator, whose switch-on time was a few nanoseconds, could be achieved. Fig. 3 shows a typical pulse of 4 W peak power and 450 ns duration obtained in this way. Multiple subsidiary pulses followed the initial pulse and could not readily be eliminated, but the experimental simplicity of this technique, which has no associated problems of alignment and cavity loss, makes it attractive.

Conclusion: We have demonstrated that Q-switching allows an Nd³⁺-doped monomode fibre laser to produce high peak powers. If the dye laser pump source was replaced by a semiconductor diode laser as in Reference 1, then a compact high-power source could be envisaged. The same Q-switching techniques described here should also prove suitable for fibre lasers based on other dopant ions. In particular, the ability to use a simple mechanical chopper, with the advantage of low cost and ease of use, looks encouraging.

Acknowledgments: This work has been supported by a grant from the UK SERC and also in part under a JOERS program. The authors are grateful to S. Poole and other members of the Fibre Optics Group in the Department of Electronics & Information Engineering, Southampton University, for kindly providing the doped fibre. One of us (I. P. Alcock) acknowledges the support of an SERC studentship.

I. P. ALCOCK
A. C. TROPFER
A. I. FERGUSON
D. C. HANNA
Department of Physics
University of Southampton
Southampton SO9 5NH, United Kingdom

12th November 1985

Fig. 2 Profile of 1.08 μm pulse obtained in Q-switched operation

Fig. 3 Profile of 1.08 μm pulse obtained by Q-switching with optical chopper

References
2 Mears, R. J., Reekie, L., Poole, S. B., Payne, D. N.: ‘Neodymium-doped silica single-mode fibre lasers’, ibid., 1985, 21, pp. 738–740

IMPROVED SYSTOLIC ARRAY FOR LINEAR DISCRIMINANT FUNCTION CLASSIFIER

Indexing term: Signal processing

A word-level systolic array with 100% efficiency is described for the linear discriminant function classifier. When compared with two previous word-level linear classifier arrays, it not only saves about C inner product step cells, where C is the number of weighted vectors used, but also simplifies the chip's I/O design.

Introduction: The linear discriminant function classifier is widely used in statistical pattern recognition, such as the Euclidean minimum distance classification¹ and voiced/ unvoiced classification² etc. Let X = [x₁ x₂ ... xₙ] be a feature vector and Wᵢ = [wᵢ₁ wᵢ₂ ... wᵢₙ] be the ith weighted vector, where i = 1, 2, ..., C. Then the ith class linear discriminant function is defined as

\[
g(X) = X'W_i^T = \sum_{j=1}^{n} x_j w_{ij} \tag{1}
\]

where

\[
X' = [x_1 x_2 ... x_n] \quad \text{and} \quad W_i = [w_{i1} w_{i2} ... w_{in}] \tag{2}
\]

Following the method used by Urquhart,³ operations of the linear discriminant function classifier can be partitioned into: (i) computing g(X) for i = 1, 2, ..., C, and (ii) finding the class label j, for which g(X) is maximum.

To perform the above operations fast, several linear classifier arrays whose computations are based on eqn. 2 have been described in References 3 and 4. Those with contraflow possess only 50% efficiency and those with static weighted vectors possess 100% efficiency. However, updating the weighted vectors is much easier to achieve in the former than in the latter. Besides, using eqn. 2, which is an (n + 1)-dimensional inner product computation, to compute the linear discriminant function is not efficient enough in terms of both hardware and computation time, owing to the fact that the last multiplication operation in eqn. 2 is unnecessary. In this letter an improved word-level systolic array with both contraflow and 100% efficiency is proposed for the linear discriminant function classifier to remove the redundancy and the tradeoff. Moreover, for ease of both constructing a large array from smaller unit chips and interfacing with memory, a byte-serial grouped I/O scheme is described for the chip.

Systolic system: In the proposed array shown in Fig. 1, which is in fact a module for constructing larger arrays, each linear discriminant function is computed from eqn. 3 instead of eqn. 2, i.e. each gᵢ(X) is obtained by performing one n-dimensional inner product initialised by some proper value. The leftmost column of C delay elements is used to properly provide the class labels for each current discriminant function. When compared with the two word-level arrays in Reference 3, the proposed one requires an additional
2C delay elements, but saves one column of C inner product step cells, as shown by the broken lines in Fig. 1. It is true that the silicon area increased for the former is much less than that decreased for the latter.

In Fig. 1, the augmented feature vector \((\{x_1, x_2, \ldots, x_n, y \}^T)\) streams and the modified augmented weighted vector \((\{w_1^*, w_2^*, \ldots, w_n^* \})\) streams move through the array in opposite directions. The former is delayed for C cycle periods (shown by asterisks) to let the 1st feature vector meet the 1st weighted vector at the 1st row. The latter, in which each component occupies two cycle periods, should recirculate continuously in order to process streams of feature vectors. With this arrangement of data flow, it is easy to check that the sequences of weighted vectors in computation with the 1st and the 2nd feature vectors are all 1, 2, ..., C, with the 3rd and the 4th ones all 2, 3, ..., C, 1, and so on. Since all operations are pipelined, the classification results will come out at a rate of one per cycle after some initial delay. Apparently, the proposed array module is 100% efficient and its latency is \(O(\text{C} + n)\) instead of \(O(\text{C} + n + 1)\) latency, owing to the fact that computations are based on eqn. 3 rather than eqn. 2. It also simplifies the chip's I/O design, owing to the ease of updating weighted vectors and the symmetry of data flow.

The proposed byte-serial grouped I/O scheme is much easier to implement than the byte-serial grouped scheme described by Chern and Murata,\(^5\) owing to the use of byte-serial/word-parallel and word-parallel/byte-serial shift registers rather than demultiplexers, multiplexers and latches to perform the data collection and the data distribution operations. With this I/O scheme, both chip interconnection (for constructing larger arrays) and memory interfacing problems can be solved easily.

**References**


**CHIN-LIANG WANG**
**CHE-HO WEI**
**SIN-HORNG CHEN*\(^*\)**

* Institute of Electronics
* Institute of Communication Engineering
National Chiao Tung University
Hsin-Chu, Taiwan, Republic of China

Conclusions: A word-level linear classifier array with both contraflow and 100% efficiency has been described. When compared with the word-level ones in Reference 3, this array saves about C inner product step cells and has \(O(C + n)\) instead of \(O(C + n + 1)\) latency, owing to the fact that computations are based on eqn. 3 rather than eqn. 2. It also simplifies the chip's I/O design, owing to the ease of updating weighted vectors and the symmetry of data flow.

The proposed byte-serial grouped I/O scheme is much easier to implement than the byte-serial grouped scheme described by Chern and Murata,\(^5\) owing to the use of byte-serial/word-parallel and word-parallel/byte-serial shift registers rather than demultiplexers, multiplexers and latches to perform the data collection and the data distribution operations. With this I/O scheme, both chip interconnection (for constructing larger arrays) and memory interfacing problems can be solved easily.

**Fig. 3 Complete linear classifier system**