2ちゃんねる ■掲示板に戻る■ 全部 1- 最新50    

■ このスレッドは過去ログ倉庫に格納されています

菊池洋一裁判官の殺人強奪幇助の誤判が認められる

1 :オーバーテクナナシー:2018/07/18(水) 13:29:15.97 ID:mYJ3GfQJ.net
提訴された菊池洋一裁判官(広島高裁長官に異動)が誤判によって被害者から金銭を奪ったことを佐藤隆幸裁判官が認めました。

【被害者の事件】
東京高裁 平成28年(ネ)第5619号
菊池洋一裁判長・佐久間政和裁判官・工藤正裁判官
被害者は、殺されそうになって金銭を奪われたシステムの製作会社にシステムの仕様被害を訴え損害金と慰謝料の賠償請求をしていた。虚偽証言した加害者は提訴後に逃走。

【菊池洋一被告の犯罪】
東京地裁 平成29年(ワ)第13960号
佐藤隆幸裁判官
裁判官の菊池洋一被告は、被害者が殺されそうになって金銭を奪われたシステムを訴えた裁判で加害者から金銭を奪わせる不正裁判をした。
そもそもシステムが解らない判事が判断する事が不正であり難易度が高いほど犠牲が増える事は更に許されないはずである。
1 システム以外に理由がないのにシステム以外で判断する不正裁判をした
2 システム数値の計算を間違えた
3 システム数値の間違いで被害者から金銭を奪った
4 システム完成の未判断で被害者から金銭を奪った
5 被害者から技術請求以外の着服被害金も奪った
6 被害者にシステムの専門分野の立証負担をかけた
7 被害者から100万円以上の訴訟費用も奪った

不正裁判の犠牲者の方々のために公益を図る目的で裁判公開の原則に基づいて、裁判官への制裁方法を検討するために裁判官の犯罪の情報をご提供致します。

致死量強奪金数値のシステム証拠CDをお配りしております。
【お問い合わせ】
kikuchiyouichi@outlook.jp
ご協力いただきました方々に深く感謝申し上げます。

菊池洋一被告の技術反論不能の答弁書
https://i.imgur.com/y6o6qse.jpg
https://i.imgur.com/ohS2e0i.jpg

2 :オーバーテクナナシー:2018/07/19(木) 17:44:36.19 ID:vDT4LCT2.net
[被害者による裁判官の不正対策ロジック]

技術数値を計算・完了の問題にして提訴した
       ↓
裁判官が計算・完了の判断を間違えた
       ↓
裁判官を技術誤判で提訴した
      ↓
裁判官が技術反論できなかった

3 :オーバーテクナナシー:2018/07/22(日) 15:59:25.91 ID:y6xT6nYG.net
裁判官がまともに計算しないで
何人を殺させるつもりだよ?
何億を盗ませるつもりだよ?

4 :オーバーテクナナシー:2018/07/23(月) 19:00:52.21 ID:xxAFCXxG.net
致死量と強奪金の算出は
科学立証なんだから
証人尋問なんて
通用しねえだろ!

5 :オーバーテクナナシー:2018/07/26(木) 18:36:45.56 ID:dK+1g+CZ.net
致死量や強奪金の
技術数値判断できねえのに
判決すんじゃねえよ!

6 :オーバーテクナナシー:2018/07/28(土) 13:46:59.80 ID:PPGM1j95.net
致死量や強奪金の数値の計算
も出来てないくせに
馬鹿みたいで大迷惑な
技術回避の証人尋問なんか
に従いやがって!

7 :オーバーテクナナシー:2018/07/29(日) 13:04:45.37 ID:DHMpjaZY.net
致死量や強奪金だけの裁判なのに
何で致死量や強奪金の
技術数値判断だけが
ねえんだよ!

8 :オーバーテクナナシー:2018/07/30(月) 19:19:19.70 ID:GzPOBKXM.net
物は虚偽を減らすけど
人は虚偽を増やすだろ!

9 :オーバーテクナナシー:2018/07/31(火) 18:51:46.38 ID:6fpmEdJe.net
解釈の判断じゃなくて
解答で判断しないと
殺人し放題じゃねえかよ!

10 :オーバーテクナナシー:2018/08/02(木) 11:59:48.81 ID:3Soa9AQ1.net
間違って人を殺してしまいますから
一般人から裁判官への批判は強烈ですね(T0T)

ついに事実の正解率は
一般人 > 裁判官
になってしまいました(T-T)

5ちゃんだけでなくアイドルも不正裁判官を批判(☆o☆)
嵐の松本潤が不正裁判官を戒めてくれるドラマ
99.9 -刑事専門弁護士-
https://ja.m.wikipedia.org/wiki/99.9_-%E5%88%91%E4%BA%8B%E5%B0%82%E9%96%80%E5%BC%81%E8%AD%B7%E5%A3%AB-

11 :オーバーテクナナシー:2018/08/03(金) 16:27:40.49 ID:l3giHV8E.net
解釈の判断じゃなくて
解答で判断しないと
強奪し放題じゃねえかよ!

12 :オーバーテクナナシー:2018/08/06(月) 18:27:37.97 ID:PtVDo0yN.net
致死量計算のための制度だろ!
強奪金計算のための制度だろ!

医師や建築士のほか,不動産鑑定士,税理士,シ
ステム開発,金融取引,交通工学,破壊力学など
多様な分野の専門委員がおり,様々な専門訴訟に
対応できる態勢が整っています。専門委員が関与
して,解決に至った事件も数多く見られます
www.courts.go.jp/vcms_lf/20901004.pdf

13 :オーバーテクナナシー:2018/08/07(火) 18:45:11.58 ID:qH2cjsvg.net
技術だけの裁判なのに
技術だけから逃げやがって!

14 :オーバーテクナナシー:2018/08/08(水) 16:12:48.36 ID:HeL1QvSw.net
間違って人を殺してしまいますから、裁判官は一般人から強烈に批判されてますね(T0T)

ついに事実の正解率は
一般人 > 裁判官
になってしまいました(T-T)

5ちゃんだけでなくアイドルも不正裁判官を批判(☆o☆)
嵐の松本潤が不正裁判官を戒めてくれるドラマ
99.9 -刑事専門弁護士-
https://ja.m.wikipedia.org/wiki/99.9_-%E5%88%91%E4%BA%8B%E5%B0%82%E9%9

15 :オーバーテクナナシー:2018/08/09(木) 19:02:42.52 ID:Y141TPoj.net
現在に至るまで技術反論者は一人もいません。

16 :YAMAGUTIseisei:2019/11/10(日) 16:29:49.88 ID:2xdpBNeP2
Google 翻訳 http://webcache.googleusercontent.com/search?q=cache:cFXKfQwoUVMJ:www.iccs-meeting.org/archive/iccs2018/papers/108620619.pdf


  A Parallel Quicksort Algorithm on Manycore Processors in Sunway TaihuLight


Siyuan Ren, Shizhen Xu, and Guangwen Yang Tsinghua

University, China Abstract.


In this paper we present a highly efficient parallel quicksort algorithm on SW26010, a heterogeneous manycore processor that makes Sunway TaihuLight the Top-One supercomputer in the world.
Motivated by the software-cache and on-chip communication design of SW26010, we propose a two-phase quicksort algorithm, with the first counting elements and the second moving elements.
To make the best of such many-core architecture, we design a decentralized workflow, further optimize the memory access and balance the workload.
Experiments show that our algorithm scales efficiently to 64 cores of SW26010, achieving more than 32X speedup for int32 elements on all kinds of data distributions.

The result outperforms the strong scaling one of Intel TBB (Threading Building Blocks) version of quicksort on x86-64 architecture.

17 :YAMAGUTIseisei:2019/11/10(日) 16:30:58.02 ID:2xdpBNeP2
1
Introduction

This paper presents our design of parallel quicksort algorithm on SW26010, the heterogeneous manycore processor making the Sunway TaihuLight supercomputer currently Top-One in the world [4].
SW26010 features a cache-less design with two methods of memory access: DMA (transfer between scratchpad memory (SPM) and main memory) and Gload (transfer between register and main memory).
The aggressive design of SW26010 results in an impressive performance of 3.06 TFlops, while also complicating programming design and performance optimizations.

Sorting has always been a extensively studied topic [6].
On heterogeneous architectures, prior works focus on GPGPUs.
For instance, Satish et al.[9] compared several sorting algorithms on NVIDIA GPUs, including radix sort, normal quicksort, sample sort, bitonic sort and merge sort.
GPU-quicksort [2] and its improvement CUDA-quicksort [8] used a double pass algorithm for parallel partition to minimize the need for communication.
Leischner et al.[7] ported samplesort (a version of parallel quicksort) to GPUs, claiming significant speed improvement over GPU quicksort.

Prior works give us insights on parallel sorting algorithm, but cannot directly satisfy our need for two reasons.
First, the Gload overhead is extremely high so that all the accessed memory have to be prefetched to SPM via DMA.
At the same time, the capacity of SPM is highly limited (64KiB).
Second, SW26010 provides a customized on-chip communication mechanism, which opens new opportunities for optimization.

18 :YAMAGUTIseisei:2019/11/10(日) 16:31:57.99 ID:2xdpBNeP2
ICCS Camera Ready Version 2018 To cite this paper please use the final published version: DOI: 10.1007/978-3-319-93713-7_61 Page 2 Based on these observations, we design and implement a new quicksort algorithm for SW26010.
It alternates between parallel partitioning phase and parallel sorting phase.
During first phase, the cores participate in a double-pass algorithm for parallel partitioning, where in the first pass cores count elements, and in the second cores move elements.
During the second phase, the cores sort its assigned pieces in parallel.

To make the best of SW26010, we dispense with a central manager common in parallel algorithms.
Instead we duplicate the metadata on SPM of all worker cores and employ a decentralized design.
The tiny size of the SPM warrants special measures to maximize its utilization.
Furthermore, we take advantage of the architecture by replacing memory access of value counts with register communication, and improving load balance with a simple counting scheme.

Experiments show that our algorithm performs best with int32 values, achieving more than 32 speedup (50% parallel efficiency) for sufficient array sizes and all kinds of data distributions.
For double values, the lowest speedup is 20 (31% efficiency).
We also compare against Intel TBB’s parallel quicksort on x86-64 machines, and find that our algorithm on Sunway scales far better.

19 :YAMAGUTIseisei:2019/11/10(日) 16:32:27.11 ID:2xdpBNeP2
2
Architecture of SW26010

SW26010 [4] is composed of four core-groups (CGs).
Each CG has one management processing element (MPE) (also referred as manager core), 64 computing processing elements (CPEs) (also referred as worker cores).
The MPE is a complete 64-bit RISC core, which can run in both user and kernel modes.
The CPE is also a tailored 64-bit RISC core, but it can only run in user mode.
The CPE cluster is organized as an 8x8 mesh on-chip network.
CPEs in one row and one column can directly communicate via register, at most 128 bit at a time.
In addition, each CPE has a user-controlled scratch pad memory (SPM), of which the size is 64KiB.

SW26010 processors provide two methods of memory access.
The first is DMA, which transfers data between main memory and SPM.
The second is Gload, which transfers data between main memory and register, akin to normal load/store instructions.
The Gload overhead is extremely high, so it should be avoided as much as possible.

Virtual memory on one CG is usually only mapped to its own physical memory.
In other words, four CGs can be regarded as four independent processors when we design algorithms.
This work focuses on one core group, but we will also briefly discuss how to extend to more core groups.

20 :YAMAGUTIseisei:2019/11/10(日) 16:33:49.33 ID:2xdpBNeP2
3 Algorithm

As in the original quicksort, the basic idea is to recursively partition the sequence into subsequences separated by a pivot value.
Values smaller than the pivot shall be moved to the left, larger to the right.
Our algorithm is divided into two phases to reduce overhead.
The first phase is parallel partitioning with a two ICCS Camera Ready Version 2018 To cite this paper please use the final published version: DOI: 10.1007/978-3-319-93713-7_61 Page 3 pass algorithm.
When the pieces are too many or small enough, we enter the second phase, when each core independently sorts its pieces.
Both phases are carried out by repeated partitioning with slightly different algorithms.

3.1 Parallel Partitioning
Parallel partitioning is the core of our algorithm.
We employ a two pass algorithm similar to [2,1,10].
in order to avoid concurrent writes.
In the first pass, each core counts the total number of elements strictly smaller than and strictly larger than the pivot in its assigned subsequence.
It does so by loading consecutively the values from main memory into its SPM and accumulating the count.
The cores then communicate with one another about their counts, with which they can calculate the position by cumulative sum where they should write to in the next pass.

In the second pass, each core does their own partitioning again, this time directly transferring the partitioned result into their own position in the result array.
This step can be done in parallel since all of the reads and writes are disjoint.
After all cores commit their result, the result array is left with a middle gap to be filled by the pivot values.
The cores then fill the gap in parallel with DMA writes.

21 :YAMAGUTIseisei:2019/11/10(日) 16:34:46.74 ID:2xdpBNeP2
The synchronization needed by the two pass algorithm is hence limited to only these places: a barrier at the end of counting pass, the communication of a small number of integers, and the barrier after the filling with pivots.

3.2 Communication of Value Counts
Because the value counts needed for calculation of target location are small in number, exchanging them through main memory among worker cores, either via DMA or Gload, would result in a great overhead.
We instead decide to let the worker cores exchange the counts via register communication, with which the worker cores can transfer values at most 128bit at a time.
The smaller and larger counts are both 32-bit, so they can be concatenated into one 64-bit value and communicated in one go.

Each worker core needs only two combined values: one is the cumulative sum of counts for cores ordered before it, another is the total sum of all counts.
The information flow is arranged in a zigzag fashion to deal with the restriction that cores can only communicate with one another in the same row or column.

22 :YAMAGUTIseisei:2019/11/10(日) 16:35:47.26 ID:2xdpBNeP2
3.3 Load Balancing
Since Sunway has 64 cores, load imbalance is a serious problem in phase II.
If not all the cores finish their sorting at the same time, those that finish early will have to sit idle, wasting cycles.
To reduce the imbalance, we employ a simple dynamic scheme based on an atomic counter.

To elaborate, we dedicate a small fraction of each SPM to hold the metadata of array segments that all of them are going to sort independently in parallel.

ICCS Camera Ready Version 2018 To cite this paper please use the final published version: DOI: 10.1007/978-3-319-93713-7_61 Page 4 When the storage of metadata is full, each core will enter phase II and choose one segment to sort.
When any core finishes, it will atomically increment an counter in the main memory to get the index of next segment, until the counter exceeds the storage capacity, and the algorithm either returns to phase I or finishes.

3.4 Memory Optimization
As SPM is very small (64KiB), any memory overhead will reduce the number of elements it can buffer at a time, thereby increasing the rounds of DMAs.

Memory optimization is therefore critical to the overall performance.
We employ the following tricks to further reduce memory overhead of control structures.

For one, we use an explicit stack, and during recursion of partitioning at all levels, we descend into the smaller subarray first.
This bounds the memory usage of the call stack to O(log2 N), however the pivot is chosen [5].

For another, we compress the representation of subarrays by converting 64-bit pointers to 32-bit offsets, and by reusing the sign bit to denote the base of the offset (either the original or the auxiliary array).
The compression can reduce the number of bytes needed for each subarray representation from 16 bytes to 8 bytes, a 50% save.

23 :YAMAGUTIseisei:2019/11/10(日) 16:41:04.14 ID:2xdpBNeP2
3.5 Multiple Core Groups
To apply our algorithm to multiple core groups, we may combine the single core group algorithm with a variety of conventional parallel sorting algorithms, such as samplesort.
Samplesort on n processors is composed of three steps [3]: partition the array with n - 1 splitters into n disjoint buckets, then distribute them onto n processors so that i-th processors have the i-th bucket, and finally sort them in parallel.
To adapt our algorithm to multiple core groups,
we simply regard each core group as a single processor in the sense of samplesort, and do the first step with our parallel partitioning algorithm (Sect.3.1) with a slight modification (maintain n counts and do multi-way partitioning).

4 Experiments

To evaluate the performance of our algorithm, we test it on arrays of different sizes, different distributions, and different element types.
We also test the multiple CG version against single CG version.
To evaluate how our algorithm scales, we experiment with different number of worker cores active.
Since there is no previous work on Sunway or similar machines to benchmark against, we instead compare our results with Intel TBB on x86-64 machines.

Sorting speed is affected by data distributions, especially for quicksort since its partitioning may be imbalanced.
We test our algorithm on five different distributions of data.
See Fig.1 for the visualizations of the types of distributions.

For x86-64 we test on an AWS dedicated instance with 72 CPUs (Intel Xeon Platinum 8124M, the latest generation of server CPUs in 2017).
The Intel TBB library is versioned 2018U1.

-
To adapt our algorithm to multiple core groups,we simply regard each core group as a single processor in the sense of samplesort, and do the first step with our parallel partitioning algorithm (Sect.3.1) with a slight modification.

24 :YAMAGUTIseisei:2019/11/10(日) 16:43:14.15 ID:2xdpBNeP2
Page 5


uniform
shuffle
increment
decrement
staggered


Fig.1:
Visualizations of data distributions.
The horizontal axis represents the index of element in the array, and the vertical axis the value.


Both the library and our test source are compiled with -O3 -march=native so that compiler optimizations are fully on.

4.1 Results on Sunway TaihuLight
We compare the running time of our algorithm on Sunway TaihuLight against single threaded sorting on the MPE with std::sort.
The STL sort, as implemented on libstdc++, is a variant of quicksort called introsort.

Fig.2 shows the runtime results for sorting 32-bit integers.
From the graph we can see that the distribution matters only a little.
Fig.3 shows sorting different types of elements with the size fixed.
The reason for the reduced efficiency with 64-bit types (int64 and double) is evident: the number of elements buffered in SPM each time is halved, and more round trips between main memory and SPM are needed.
The reason for reduced efficiency of float32 values is unknown.
Fig.4 shows the timings and speedups of multiple CG algorithm (adapted samplesort).

25 :YAMAGUTIseisei:2019/11/10(日) 16:45:34.14 ID:2xdpBNeP2
0 2E+08 4E+08 6E+08 8E+08
Size
0 50 100 150 200 250 300 350
Time/s

decrement
increment
shuffle
staggered
uniform
(a) STL

0 2E+08 4E+08 6E+08 8E+08
Size
0 2 4 6 8 10
Time/s
:
(b) Ours

0 2E+08 4E+08 6E+08 8E+08
Size
0 10 20 30
Speedup
:
(c) Speedup

Fig.2:
Results for int32 values


>>24
Results on Sunway TaihuLight

26 :YAMAGUTIseisei:2019/11/10(日) 16:48:20.69 ID:2xdpBNeP2
4.2
Comparison against Intel TBB on x86-64
We compare our implementation against Intel TBB on Intel CPU.
TBB is a C++ template library of generic parallel algorithms, developed by Intel, and most optimized for their own processors.


Page 6


uniform
staggered
shuffle
increment
decrement
Data distribution
0 50 100 150 200 250
Time/s

int32
int64
float
double
(a) STL

:
0 2 4 6 8 10
Time/s
int32 int64 float double
(b) Ours

27 :YAMAGUTIseisei:2019/11/10(日) 16:49:01.73 ID:2xdpBNeP2
:
0 10 20 30 40
Speedup
:
(c) Speedup

Fig.3:
Results for different element types


2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Number of Core Groups
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
Time/s
(a) Timings

128 256 384 512 640 768 896 1024
Total Number of Worker Cores
0 50 100 150 200 250 300 350
Speedup
(b) Speedups

128 256 384 512 640 768 896 1024
Total Number of Worker Cores
0.0 0.1 0.2 0.3 0.4
Parallel Efficiency
(c) Parallel Efficiency

Fig.4:
Results for different number of core groups

28 :YAMAGUTIseisei:2019/11/10(日) 16:52:56.69 ID:2xdpBNeP2
For a fairer comparison, we choose a machine with one of the most powerful Intel processors available to date.

The result is illustrated at Fig.5.
We can see that an individual x86-64 core is about six times as fast as one SW26010 worker core, but our algorithm scales much better with the number of cores.
The performance of TBB’s algorithm saturates after about 20 cores are in use, whereas our algorithm could probably scale further from 64 cores, judging from the graph.
Even though the comparison isn’t direct since the architecture is different, it is evident that our algorithm on top of Sunway TaihuLight is much more efficient than traditional parallel sorting algorithms implemented on more common architectures.

5 Conclusion

In this paper, we present a customized parallel quicksort on SW26010 with significant speedup relatively to single core performance.
It is composed of two-pass parallel partitioning algorithm with the first counting elements and the second moving elements.
This design is able to leverage the on-chip communication mechanism to reduce synchronization overhead, and fast on-chip SPM to minimize the data movement overhead.
Further, we design a cooperative scheduling scheme, and optimize memory usage as well as load balancing.

Experiments show that for int32 values, our algorithm achieves a speedup of more than 32 on 64 CPEs and a strong-scaling efficiency 50% for all distributions.

29 :YAMAGUTIseisei:2019/11/10(日) 16:54:21.75 ID:2xdpBNeP2
For a fairer comparison, we choose a machine with one of the most powerful Intel processors available to date.

The result is illustrated at Fig.5.
We can see that an individual x86-64 core is about six times as fast as one SW26010 worker core, but our algorithm scales much better with the number of cores.
The performance of TBB’s algorithm saturates after about 20 cores are in use, whereas our algorithm could probably scale further from 64 cores, judging from the graph.
Even though the comparison isn’t direct since the architecture is different, it is evident that our algorithm on top of Sunway TaihuLight is much more efficient than traditional parallel sorting algorithms implemented on more common architectures.

5 Conclusion

In this paper, we present a customized parallel quicksort on SW26010 with significant speedup relatively to single core performance.
It is composed of two-pass parallel partitioning algorithm with the first counting elements and the second moving elements.
This design is able to leverage the on-chip communication mechanism to reduce synchronization overhead, and fast on-chip SPM to minimize the data movement overhead.
Further, we design a cooperative scheduling scheme, and optimize memory usage as well as load balancing.

Experiments show that for int32 values, our algorithm achieves a speedup of more than 32 on 64 CPEs and a strong-scaling efficiency 50% for all distributions.

30 :YAMAGUTIseisei:2019/11/10(日) 16:55:22.30 ID:2xdpBNeP2
Page 7


0 10 20 30 40 50 60
Number of Cores
0 20 40 60 80 100 120
Time/s
Sunway
x86-64
(a) Sorting time

0 10 20 30 40 50 60
Number of Cores
0 5 10 15 20 25 30 35 40
Speedup
Sunway
x86-64
(b) Speedup

0 10 20 30 40 50 60
Number of Cores
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Parallel Efficiency
Sunway x86-64
(c) Parallel Efficiency

Fig.5:
Results for different cores on SW26010 (our algorithm) vs on x86-64 (TBB)


Compared with Intel TBB’s implementation of parallel quicksort on x86-64 architecture, our design scales well even when using all of 64 CPEs while TBB’s implementation hardly benefit from more than 20 cores.

31 :YAMAGUTIseisei:2019/11/10(日) 16:56:39.13 ID:2xdpBNeP2
References

1.
Blelloch, G.E.:
Prefix sums and their applications.
Tech. rep., Synthesis of Parallel
Algorithms (1990), https://www.cs.cmu.edu/~guyb/papers/Ble93.pdf
2.
Cederman, D., Tsigas, P.:
GPU-Quicksort: A practical quicksort algorithm for graphics processors.
Journal of Experimental Algorithmics 14, 4 (2009)
3.
Frazer, W.D., McKellar, A.C.:
Samplesort: A sampling approach to minimal storage tree sorting.
J. ACM 17(3), 496–507 (Jul 1970)
4.
Fu, H., Liao, J., Yang, J., Wang, L., Song, Z., Huang, X., Yang, C., Xue, W.,Liu, F., Qiao, F., Zhao, W., Yin, X., Hou, C., Zhang, C., Ge, W., Zhang, J.,Wang, Y., Zhou, C., Yang, G.:
The Sunway TaihuLight supercomputer: system and applications.
Science China Information Sciences 59(7), 072001 (Jun 2016)
5.
Hoare, C.A.R.:
Quicksort.
The Computer Journal 5(1), 10–16 (1962)
6.
Knuth, D.E.:
The Art of Computer Programming, Volume 3: (2nd Ed.) Sorting and Searching.
Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA (1998)

32 :YAMAGUTIseisei:2019/11/10(日) 16:57:30.90 ID:2xdpBNeP2
7.
Leischner, N., Osipov, V., Sanders, P.:
GPU sample sort. In: 2010 IEEE International Symposium on Parallel Distributed Processing.
pp. 1–10 (April 2010)
8.
Manca, E., Manconi, A., Orro, A., Armano, G., Milanesi, L.:
CUDA-quicksort: an improved GPU-based implementation of quicksort.
Concurrency and Computation: Practice and Experience 28(1), 21–43 (2016)
9.
Satish, N., Harris, M., Garland, M.:
Designing efficient sorting algorithms for many-core GPUs.
In: 2009 IEEE International Symposium on Parallel Distributed Processing. pp. 1–10 (May 2009)
10.
Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.:
Scan primitives for GPU computing.
In: Proceedings of the 22Nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware. pp. 97–106.
GH ’07, Eurographics Association, Aire-la-Ville, Switzerland, Switzerland (2007)


ICCS Camera Ready Version 2018
To cite this paper please use the final published version:
DOI: 10.1007/978-3-319-93713-7_61

33 :YAMAGUTIseisei:2019/11/16(土) 22:12:37.02 ID:qMz0CsxXJ
>>16
Siyuan Ren, Shizhen Xu, and Guangwen Yang
Tsinghua University, China
Abstract.

>>17
For instance, Satish et al[9] compared several sorting algorithms on NVIDIA GPUs, including radix sort, normal quicksort, sample sort, bitonic sort and merge sort.

Leischner et al[7] ported samplesort (a version of parallel quicksort) to GPUs, claiming significant speed improvement over GPU quicksort.

>>18
Based on these observations, we design and implement a new quicksort algorithm for SW26010.

>>20
The first phase is parallel partitioning with a two pass algorithm.

>>22
When the storage of metadata is full, each core will enter phase II and choose one segment to sort.

34 :YAMAGUTIseisei:2020/04/12(日) 15:36:43.01
| 40 YAMAGUTIseisei 200411 2215 ecqWmY3o? \| 26 YAMAGUTIseisei 200411 2201 ecqWmY3o?
||>861 地震雷火事名無 200409 0829 FIkPK0gU
||>●●政権が「経済重視」なら、即刻条件なしの現金給付をすべき 理由 | ハーバー・ビジネス・オンライン
||>http://hbol.jp/216364# \>
||>新型コロナ 2週間で失業者数がリーマンショッ 上回ったスペイン | ハーバー・ビジネス・
||>http://hbol.jp/216402# \>
||>感染拡大 、強大な ●●政権がこれほど ●能である理由 | ハーバー・
||>http://hbol.jp/216383
|| :
||
||>221 ー 200410 1916 KdYOxDNW
||>OECD、 経済 打撃は「何年も続く」 新型ウイルス流行
||>http://bbc.com/japanese/52000856## ?SThisFB&fbclid=IwAR1CQQY2LSaD6-6tI3kWTaVF-rfUb4ko064vbCf8_i2DG8RfLTUyiHNPwO8#
||| \>「V字回復 できない \>、どんなに順調にいってもU字、 \>つまり 長い低迷 。
||>いま正しい決断をすれば、『L字』の事態は避けられるはずだ」
||
||>768 ー 200403 1214 o/1xBAUK
||| >766
||>ttp://blog.livedoor.jp/its●ku/archives/56601961.html## http://rio2016.2ch.net/test/read.cgi/future/1583830700/768
||| 7:風吹けば名無 200402 0822 sNarN44Kp
||| 10万→5万→お肉券→お魚券→マスク2枚←今ココ
||| 12:風吹名無 200402 0823 irtePzlUa
||| コロナ 各国の政府の対応
||>韓国 現金86000円
||>アメリカ  現金130000円
||>香港  現金140000円
||>イタリア  現金300000円以上
||>イギリス  休業補償(8割
||>フランス  休業補償(全額
||>スペイン  休業補償(全額
||>日本  布
|| :

35 :YAMAGUTIseisei:2020/04/12(日) 15:37:16.92
| 41 YAMAGUTIseisei 200411 2216 ecqWmY3o? \| 27 名前:YAMAGUTIseisei Email:sagezon.jp/dp/B085VNQX8Q/okyuryo-22 投稿日:2020/04/11(土) 22:02:29.64 ID:ecqWmY3o?
|||●●政権が「経済重視」なら、即刻条件なしの現金給付をすべき 理由 | ハーバー・ビジネス・オンライン
||>http://hbol.jp/216364/3##
||
|| ▼主な国の新型コロナ経済対策( 年4月3
|| ・韓国
|| 生活支援 8.5万円を給付 月収712万ウォン(63.2万円)以下の1400万世帯が対象。1人世帯は40万ウォン(3.5万円)、4人以上世帯に100万ウォン(8.5万円)給付
|| ・アメリカ
|| 生活支援 13万円 年収7.5万ドル(818万円)以下の大人1人に最大1200ドル(13万円)、子供1人につき500ドル(5.4万円
|| ・香港
|| 生活支援 14万円 18歳以上の永住権を持つ住民全員に1万香港ドル(14万円
|| ・シンガポール
|| 生活支援 6.8万円 21歳以上の国民に、所得に応じて最大900シンガポールドル(6.8万円)、子供1人につき300シンガポールドル(2.3万円
|| ・イタリア
|| 休業補償 7万円 自営業者、観光関連の季節労働者、観劇関連の労働者、農業従事者などに対し、600ユーロ(7万円)の給付金を最長で3か月
|| ・イギリス
|| 休業補償 所得の80% 休業を余儀なくされる個人事業主380万人を対象に、月額2500ポンド(33.4万円)を上限にして所得の8割
|| ・カナダ
|| 休業補償 15万円 仕事や収入を失った人すべてに 、月額2000カナダドル(15.2万円)を最大4か月
|| ・ドイツ
|| 休業補償 最大105万円 個人事業主約300万人および、個人のアーティストを対象 、最大9000ユーロ(105万円
|| ・日本
|| 生活支援? 30万円? 所得が大 減少し 生活に支障 世帯 限定 30万円、個人事業主に対する数兆円規模の助成 検討中
||
|| ※各国の経済対策は多岐 の中から 個人 現金給付策等 ピックアップ (SPA!調
|| 【井上智洋氏】駒澤大学経済学部准教 専 マクロ経 、貨幣経済理論。 『ヘリコプターマネー』(日本経済新聞出版社)など多数
|| 【小野盛司氏】「日本経済復活の会」会長。理学博士。著書に『「資本主義社会」から「解放主義社会」へ』(創英社/三省堂書店)など
|| <取材・文/福田晃広・野中ツトム(清談社) 写真/時事通信社 PIXTA> ※週刊SPA!4月7 号より

総レス数 35
31 KB
掲示板に戻る 全部 前100 次100 最新50
read.cgi ver.24052200