標題: 蟲洞路徑網格型網路中一些集體通訊之高效率演算法
Efficient Algorithms for Some Collective Communications on Wormhole-Routed Mesh-Like Networks
作者: 侯 猷 □
Yomin Hou
徐力行
王建民
Lih-Hsing Hsu
Chien-Min Wang
資訊科學與工程研究所
關鍵字: 廣播;深度無互競;超立方體網路;k元n維立方體網路;線性常數通訊;網格網路;環形網路;蟲洞路徑;broadcast;depth contention-free;hypercube network;k-ary n-cube network;linear-constant communication;mesh network;torus network;wormhole routing
公開日期: 2000
摘要:   在本論文中,我們提出了幾個高效率的演算法,用來在全埠蟲洞路徑網路中進行通訊。我們所考慮的第一個問題,是對蟲洞路徑超立方體網路中,執行線性互補通訊時所遭受的路徑互競予以最小化。根據我們的研究,傳統的選徑演算法,用於執行線性互補通訊時所遭受的路徑互競情形相當嚴重。為了解決此一間題,我們考慮另一種方法,在程式編譯時進行處理器映對。也就是說,在程式編譯時,對處理器編號予以重新排列,使得所要執行的線性互補通訊可以有效率地在立方體網路完成。我們可以證明,對任一個線性互補通訊,必存在一個重排映對,可以使新的線性互補通訊有最少的路徑互競。在一個n維的超立方體網路中,尋找這樣的重排映對所需的時間是O(n^3)。對於一組線性互補通訊,我們也依據動態程式的觀念提出了一個演算法,可以找到一個最好的重排映對。 接著,我們將超立方體網路中執行線性互補通訊的討論,擴增到在k元n維立方體網路中執行線性常數通訊。同時證明對任一組m個線性常數通訊,m<=k-1,必存在一個線性映對,可以使新的線性常數通訊有最少的路徑互競。由模擬實驗的結果可以看到在處理器映對的方法對系統效率的重大改善。 在本論文中,另一個我們所考慮的問題,是在蟲洞路徑網路中進行廣播。網路硬體假設僅提供一對一的通訊,其選徑演算法是維度順序選徑。 經由利用全埠的特性,以及蟲洞路徑架構中距離的低影響性,我們提出的廣播演算法不僅效率高,而且可以做到深度無互競。當在一個(2^d)x(2^d) 的環形網路中廣播時,只需d步即可。除此之外,我們還利用每一點在每一步適當地重疊訊息的方法,來降低整個廣播所需的軟體時間。當在一個(2^d)x(2^d)x(z) 的環形網路中廣播時,則只需(d+k)步即可,其中k是一個自然數,且 T(d, k-1)< z<= T(d, k), T(d,i)= summation((3^i)x(2^(i-j))x(C(d-1+i-j,d-1))), 0<= j<= i。 我們也針對在任意大小的全埠蟲洞路徑二維網格及環形網路,提出新的深度無互競廣播演算法。新方法可以減少廣播所需的步數。其想法是重複地將一個網路切分成數個較小的網路,在切分的過程中,資料亦同時傳到每一個較小的網路中,直到每一個點都收到為止。 對論文中的每一個演算法,其效果均經過仔細的研究,並與相關研究比較分析。其結果可以清楚地顯示出新方法的優點。
In this thesis, we proposed several efficient algorithms for communicating on all-port wormhole-routed interconnection networks. The first problem considered is to minimize channel contention of linear-complement communication on wormhole-routed hypercubes. Our research reveals that, for traditional routing algorithms, the degree of channel contention of a linear-complement communication can be quite large. To solve this problem, we propose an alternative approach, which applies processor mapping at compile time. In this compiler approach, processors are logically reordered according to the given communication(s) so that the new communication(s) can be efficiently realized on the hypercube network. It is proved that, for any linear-complement communication, there exists a reordering mapping such that the new communication has minimum channel contention. An O(n^3) algorithm is proposed to find such a mapping for an n-dimensional hypercube. An algorithm based on dynamic programming is also proposed to find an optimal reordering mapping for a set of linear-complement communications. Then, the discussion of linear-complement communication on hypercube is extended to linear-constant communications on k-ary n-cube network. It is proved that, for any set of m linear-constant communications, m<=k-1, there exists a processor mapping such that the new communications have minimum channel contention. Simulation results clearly show significant performance improvement provided by the processor mapping approach. Another problem considered in this thesis is the broadcasting on all-port wormhole-routed interconnection networks. The underlying networks are assumed to support only the dimension-ordered unicast routing. By taking the advantage of the all-port model and the distance insensitivity of the wormhole routing, the proposed broadcast algorithms are not only efficient but also depth contention-free. For broad-casting on a (2^d)x(2^d) torus, only d steps are required. Furthermore, the software latency is also minimized by appropriately overlaying the messages sent out by the same node in each step. For broadcasting on a (2^d)x(2^d)x(z) torus, d>=1, only (d+k) steps are required, where k is an integer such that T(d, k-1)< z<= T(d, k), T(d,i)= summation((3^i)x(2^(i-j))x(C(d-1+i-j,d-1))), 0<= j<= i. We also propose new depth contention-free algorithms for broadcasting on all-port wormhole-routed two-dimensional meshes and tori with arbitrary size, which can reduces the number of message-passing steps. The concept of recursive-decomposition is adopted in those algorithms such that an interconnection network can be decomposed into smaller blocks recursively. Following the decomposition, the broadcast message can be forwarded to every node. For all the algorithms proposed in this thesis, the performance are studied carefully and compared with related works. The result clearly shows the advantage of the proposed algorithms.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT890394007
http://hdl.handle.net/11536/66906
Appears in Collections:Thesis