<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-4825121741853908400</id><updated>2011-11-28T08:44:04.986+09:00</updated><title type='text'>MUDA development blog</title><subtitle type='html'>This is a blog for MUDA development.
MUDA is a (short) vector language for CPUs.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://mudadev.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4825121741853908400/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://mudadev.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>syoyo</name><uri>http://www.blogger.com/profile/15167076070732369617</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>5</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-4825121741853908400.post-278597389230257908</id><published>2008-04-10T23:42:00.010+09:00</published><updated>2008-04-29T22:47:17.520+09:00</updated><title type='text'>Initial tryout on double x 4 in MUDA.</title><content type='html'>To prepare 256-bit SIMD(double x 4) for Intel's AVX(Intel's future CPU functionality), I'm trying to implemt double x 4 feature in MUDA.&lt;br /&gt;&lt;br /&gt;New type &lt;span style="FONT-WEIGHT: bold"&gt;dvec &lt;/span&gt;is introduced for MUDA, which represents double x 4.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;// input.mu&lt;br /&gt;dvec bora_func(dvec a)&lt;br /&gt;{&lt;br /&gt;  return a * a * a;&lt;br /&gt;}&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;First, I've wrote SSE backend which translates dvec-typed expression with almost same manner as done in vec(float x 4) type.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;$ mudah input.mu &gt; bora.c&lt;br /&gt;dvec bora (const double * a)&lt;br /&gt;{&lt;br /&gt;  const __muda_m256 t_dvec2 = (*((__muda_m256 *)(a))) ;&lt;br /&gt;  const __muda_m256 t_dvec1 = (*((__muda_m256 *)(a))) ;&lt;br /&gt;  const __muda_m256 t_dvec3  =  _muda_mul_4d( t_dvec2 ,  t_dvec1  ) ;&lt;br /&gt;  const __muda_m256 t_dvec4 = (*((__muda_m256 *)(a))) ;&lt;br /&gt;  const __muda_m256 t_dvec5  =  _muda_mul_4d( t_dvec3 ,  t_dvec4  ) ;&lt;br /&gt;  return t_dvec5 ;&lt;br /&gt;}&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Here &lt;span style="FONT-WEIGHT: bold"&gt;__muda_m256&lt;/span&gt; and &lt;span style="FONT-WEIGHT: bold"&gt;_muda_mul_4d&lt;/span&gt; is a simple wrapper C function which emulates 256-bit SIMD in current 128-bit SIMD machine, as defined in following.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;typedef union {&lt;br /&gt;  struct { __m128d v[2]; };&lt;br /&gt;  double f[4];&lt;br /&gt;} __muda_m256 __attribute__((aligned(16)));&lt;br /&gt;&lt;br /&gt;static inline __muda_m256 _muda_mul_4d(__muda_m256 a, __muda_m256 b)&lt;br /&gt;{&lt;br /&gt;  __muda_m256 ret;&lt;br /&gt;&lt;br /&gt;  ret.v[0] = _mm_mul_pd(a.v[0], b.v[0]);&lt;br /&gt;  ret.v[1] = _mm_mul_pd(a.v[1], b.v[1]);&lt;br /&gt;&lt;br /&gt;  return ret;&lt;br /&gt;}&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;But gcc compiler translates this code into following unoptimized assembly.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;$ gcc -msse2 -O3 -c bora.c&lt;br /&gt;$ otool -v -t bora.o&lt;br /&gt;_bora:&lt;br /&gt;00000000        pushl   %ebp&lt;br /&gt;00000001        movl    %esp,%ebp&lt;br /&gt;00000003        pushl   %edi&lt;br /&gt;00000004        pushl   %esi&lt;br /&gt;00000005        subl    $0x00000150,%esp&lt;br /&gt;0000000b        movl    0x0c(%ebp),%eax&lt;br /&gt;0000000e        movl    (%eax),%edx&lt;br /&gt;00000010        movl    %edx,0xfffffed4(%ebp)&lt;br /&gt;...&lt;br /&gt;00000111        movl    0xfffffec4(%ebp),%eax&lt;br /&gt;00000117        mulpd   0xffffff38(%ebp),%xmm0&lt;br /&gt;0000011f        movapd  %xmm0,0xffffff18(%ebp)&lt;br /&gt;00000127        movapd  0xffffff08(%ebp),%xmm0&lt;br /&gt;...&lt;br /&gt;(total 170 instructions)&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Doh! lots of &lt;span style="FONT-WEIGHT: bold"&gt;mov&lt;/span&gt;*!&lt;br /&gt;&lt;br /&gt;Even though when using latest llvm-gcc(llvm-gcc4.2-2.2-x86-darwin8), still some redundant mov instructions remains in the output.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;_bora:&lt;br /&gt;00000000        pushl   %ebp&lt;br /&gt;00000001        movl    %esp,%ebp&lt;br /&gt;00000003        subl    $0x000000e8,%esp&lt;br /&gt;00000009        movl    0x0c(%ebp),%eax&lt;br /&gt;0000000c        movapd  0x10(%eax),%xmm0&lt;br /&gt;00000011        movapd  (%eax),%xmm1&lt;br /&gt;00000015        movapd  %xmm0,0xffffff68(%ebp)&lt;br /&gt;0000001d        movapd  %xmm1,0xffffff58(%ebp)&lt;br /&gt;00000025        movapd  %xmm0,0xffffff48(%ebp)&lt;br /&gt;0000002d        movapd  %xmm1,0xffffff38(%ebp)&lt;br /&gt;00000035        movapd  0xffffff58(%ebp),%xmm2&lt;br /&gt;0000003d        mulpd   0xffffff38(%ebp),%xmm2&lt;br /&gt;00000045        movapd  %xmm2,0xffffff78(%ebp)&lt;br /&gt;0000004d        movapd  0xffffff68(%ebp),%xmm2&lt;br /&gt;00000055        mulpd   0xffffff48(%ebp),%xmm2&lt;br /&gt;0000005d        movapd  %xmm2,0x88(%ebp)&lt;br /&gt;00000062        movapd  0xffffff78(%ebp),%xmm3&lt;br /&gt;0000006a        movapd  %xmm2,0xc8(%ebp)&lt;br /&gt;0000006f        movapd  %xmm3,0xb8(%ebp)&lt;br /&gt;00000074        movapd  %xmm0,0xa8(%ebp)&lt;br /&gt;00000079        movapd  %xmm1,0x98(%ebp)&lt;br /&gt;0000007e        movapd  0xb8(%ebp),%xmm0&lt;br /&gt;00000083        mulpd   0x98(%ebp),%xmm0&lt;br /&gt;00000088        movapd  %xmm0,0xd8(%ebp)&lt;br /&gt;0000008d        movapd  0xc8(%ebp),%xmm0&lt;br /&gt;00000092        mulpd   0xa8(%ebp),%xmm0&lt;br /&gt;00000097        movapd  %xmm0,0xe8(%ebp)&lt;br /&gt;0000009c        movapd  0xd8(%ebp),%xmm1&lt;br /&gt;000000a1        movapd  %xmm0,0xffffff28(%ebp)&lt;br /&gt;000000a9        movapd  %xmm1,0xffffff18(%ebp)&lt;br /&gt;000000b1        movl    0x08(%ebp),%eax&lt;br /&gt;000000b4        movapd  0xffffff28(%ebp),%xmm0&lt;br /&gt;000000bc        movapd  0xffffff18(%ebp),%xmm1&lt;br /&gt;000000c4        movapd  %xmm1,(%eax)&lt;br /&gt;000000c8        movapd  %xmm0,0x10(%eax)&lt;br /&gt;000000cd        addl    $0x000000e8,%esp&lt;br /&gt;000000d3        popl    %ebp&lt;br /&gt;000000d4        ret     $0x0004&lt;br /&gt;(38 instructions)&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I also got almost same result from Intel's icc compiler.&lt;br /&gt;It seems that for C compiler this code is difficult to optimize.&lt;br /&gt;I think I have to translate MUDA code into C code much more in flat manner without using any macros or inlined wrapper function.&lt;br /&gt;(directly emit 2 _mm_mul_pd() for dvec-typed mulitiply).&lt;br /&gt;&lt;br /&gt;&lt;span style="FONT-WEIGHT: bold"&gt;How about LLVM IR?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Then, I also added initial double x 4 support for&lt;double&gt; LLVM backend of MUDA.&lt;br /&gt;&lt;br /&gt;LLVM IR version.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;$ mudah --llvm  input.mu&lt;br /&gt;define &lt;4xdouble&gt; @bora (&lt;4xdouble&gt; %a)&lt;br /&gt;{&lt;br /&gt;  %a.addr = alloca &lt;4xdouble&gt; ;&lt;br /&gt;  store &lt;4xdouble&gt; %a, &lt;4xdouble&gt;* %a.addr ;&lt;br /&gt;  %t_dvec2 = load &lt;4xdouble&gt;* %a.addr ;&lt;br /&gt;&lt;br /&gt;  %t_dvec1 = load &lt;4xdouble&gt;* %a.addr ;&lt;br /&gt;&lt;br /&gt;  %t_dvec3  =  mul &lt;4xdouble&gt; %t_dvec2 ,  %t_dvec1   ;&lt;br /&gt;  %t_dvec4 = load &lt;4xdouble&gt;* %a.addr ;&lt;br /&gt;&lt;br /&gt;  %t_dvec5  =  mul &lt;4xdouble&gt; %t_dvec3 ,  %t_dvec4   ;&lt;br /&gt;  ret &lt;4xdouble&gt; %t_dvec5 ;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;$ llvm-as bora.ll -f&lt;br /&gt;$ llc bora.bc -f&lt;br /&gt;$ cat bora.s&lt;br /&gt;&lt;br /&gt;_bora:&lt;br /&gt;Leh_func_begin3:&lt;br /&gt;Llabel3:&lt;br /&gt;subl    $44, %esp&lt;br /&gt;movapd    %xmm0, (%esp)&lt;br /&gt;movapd    %xmm1, 16(%esp)&lt;br /&gt;movaps    %xmm1, %xmm2&lt;br /&gt;mulpd    %xmm2, %xmm2&lt;br /&gt;mulpd    %xmm2, %xmm1&lt;br /&gt;movaps    %xmm0, %xmm2&lt;br /&gt;mulpd    %xmm2, %xmm2&lt;br /&gt;mulpd    %xmm2, %xmm0&lt;br /&gt;addl    $44, %esp&lt;br /&gt;ret&lt;br /&gt;(11 instructions)&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The output assembly is almost optimized!&lt;br /&gt;LLVM infrastructure do good job when we use vector expression!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4825121741853908400-278597389230257908?l=mudadev.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mudadev.blogspot.com/feeds/278597389230257908/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4825121741853908400&amp;postID=278597389230257908' title='23 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4825121741853908400/posts/default/278597389230257908'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4825121741853908400/posts/default/278597389230257908'/><link rel='alternate' type='text/html' href='http://mudadev.blogspot.com/2008/04/initial-tryout-on-double-x-4-in-muda.html' title='Initial tryout on double x 4 in MUDA.'/><author><name>syoyo</name><uri>http://www.blogger.com/profile/15167076070732369617</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>23</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4825121741853908400.post-7878299254317424074</id><published>2008-03-04T00:20:00.004+09:00</published><updated>2008-03-04T00:40:07.184+09:00</updated><title type='text'>Unoptimized "select" instruction handling in LLVM x86 backend</title><content type='html'>I wrote a portable LLVM IR code which realizes vector max() function(this should be provided as a intrinsic function for MUDA) as below.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;;; max.ll&lt;br /&gt;define &lt;4xfloat&gt; @muda_maxf4(&lt;4xfloat&gt; %a, &lt;4xfloat&gt; %b)&lt;br /&gt;{&lt;br /&gt;&lt;br /&gt;;; extract&lt;br /&gt;%a0 = extractelement &lt;4xfloat&gt; %a, i32 0&lt;br /&gt;%a1 = extractelement &lt;4xfloat&gt; %a, i32 1&lt;br /&gt;%a2 = extractelement &lt;4xfloat&gt; %a, i32 2&lt;br /&gt;%a3 = extractelement &lt;4xfloat&gt; %a, i32 3&lt;br /&gt;&lt;br /&gt;%b0 = extractelement &lt;4xfloat&gt; %b, i32 0&lt;br /&gt;%b1 = extractelement &lt;4xfloat&gt; %b, i32 1&lt;br /&gt;%b2 = extractelement &lt;4xfloat&gt; %b, i32 2&lt;br /&gt;%b3 = extractelement &lt;4xfloat&gt; %b, i32 3&lt;br /&gt;&lt;br /&gt;;; c[N] = a[N] &gt; b[N]&lt;br /&gt;%c0 = fcmp ogt float %a0, %b0&lt;br /&gt;%c1 = fcmp ogt float %a1, %b1&lt;br /&gt;%c2 = fcmp ogt float %a2, %b2&lt;br /&gt;%c3 = fcmp ogt float %a3, %b3&lt;br /&gt;&lt;br /&gt;;; if %c[N] == 1 then %a[N] else %b[N]&lt;br /&gt;&lt;br /&gt;%r0 = select i1 %c0, float %a0, float %b0&lt;br /&gt;%r1 = select i1 %c1, float %a1, float %b1&lt;br /&gt;%r2 = select i1 %c2, float %a2, float %b2&lt;br /&gt;%r3 = select i1 %c3, float %a3, float %b3&lt;br /&gt;&lt;br /&gt;;; pack&lt;br /&gt;&lt;br /&gt;%tmp0 = insertelement &lt;4xfloat&gt; undef, float %r0, i32 0&lt;br /&gt;%tmp1 = insertelement &lt;4xfloat&gt; %tmp0, float %r1, i32 1&lt;br /&gt;%tmp2 = insertelement &lt;4xfloat&gt; %tmp1, float %r2, i32 2&lt;br /&gt;%r    = insertelement &lt;4xfloat&gt; %tmp2, float %r3, i32 3&lt;br /&gt;&lt;br /&gt;ret &lt;4xfloat&gt; %r&lt;br /&gt;}&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;Since LLVM IR doesn't accept vector type for &lt;span style="font-weight: bold;"&gt;compare&lt;/span&gt; and &lt;span style="font-weight: bold;"&gt;select&lt;/span&gt; instruction,&lt;br /&gt;so  I do vector -&gt; scalar conversion at first, then do compare/select, finally take scalar result back to vector.&lt;br /&gt;&lt;br /&gt;But, for this LLVM IR input code, the LLVM x86/SSE backend emits following unoptimized assembly.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;$ llvm-as max.ll; opt -std-compile-opts max.bc -f | llc -march=x86 -mcpu=penryn -f&lt;br /&gt;&lt;br /&gt;.text&lt;br /&gt;.align 16&lt;br /&gt;.globl muda_maxf4&lt;br /&gt;.type muda_maxf4,@function&lt;br /&gt;muda_maxf4:&lt;br /&gt;extractps $3, %xmm1, %xmm2&lt;br /&gt;extractps $3, %xmm0, %xmm3&lt;br /&gt;ucomiss %xmm2, %xmm3&lt;br /&gt;ja .LBB1_2 #&lt;br /&gt;.LBB1_1: #&lt;br /&gt;movaps %xmm2, %xmm3&lt;br /&gt;.LBB1_2: #&lt;br /&gt;extractps $1, %xmm1, %xmm2&lt;br /&gt;extractps $1, %xmm0, %xmm4&lt;br /&gt;ucomiss %xmm2, %xmm4&lt;br /&gt;ja .LBB1_4 #&lt;br /&gt;.LBB1_3: #&lt;br /&gt;movaps %xmm2, %xmm4&lt;br /&gt;.LBB1_4: #&lt;br /&gt;movss %xmm4, %xmm2&lt;br /&gt;unpcklps %xmm3, %xmm2&lt;br /&gt;extractps $2, %xmm1, %xmm3&lt;br /&gt;extractps $2, %xmm0, %xmm4&lt;br /&gt;ucomiss %xmm3, %xmm4&lt;br /&gt;ja .LBB1_6 #&lt;br /&gt;.LBB1_5: #&lt;br /&gt;movaps %xmm3, %xmm4&lt;br /&gt;.LBB1_6: #&lt;br /&gt;ucomiss %xmm1, %xmm0&lt;br /&gt;ja .LBB1_8 #&lt;br /&gt;.LBB1_7: #&lt;br /&gt;movaps %xmm1, %xmm0&lt;br /&gt;.LBB1_8: #&lt;br /&gt;unpcklps %xmm4, %xmm0&lt;br /&gt;unpcklps %xmm2, %xmm0&lt;br /&gt;ret&lt;br /&gt;.size muda_maxf4, .-muda_maxf4&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;^^), it consists of lots of jumps, while I expected to get one of following&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;cmpps + andps/andnps/orps&lt;/li&gt;&lt;li&gt;cmpps + blendps(in SSE4)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;maxps&lt;/li&gt;&lt;/ul&gt;Anyway, it's not so strange LLVM's x86/SSE background emits above unoptimized code.&lt;br /&gt;&lt;br /&gt;According to the document( $(llvm)/lib/Target/X86/README-SSE.txt ),&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;select&lt;/span&gt; instruction(conditional move) is currently mapped to branch,&lt;br /&gt;not and*/andn*/or* triple or blend* x86 instruction.&lt;br /&gt;&lt;br /&gt;This is the reason LLVM x86/SSE backend emits unexpected and unoptimized assembly.&lt;br /&gt;&lt;br /&gt;If I had a enough time, I'd like to write a patch for LLVM's x86/SSE backend to emit and*/andn*/or* trible instead branching.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4825121741853908400-7878299254317424074?l=mudadev.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mudadev.blogspot.com/feeds/7878299254317424074/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4825121741853908400&amp;postID=7878299254317424074' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4825121741853908400/posts/default/7878299254317424074'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4825121741853908400/posts/default/7878299254317424074'/><link rel='alternate' type='text/html' href='http://mudadev.blogspot.com/2008/03/unoptimized-select-instruction-handling.html' title='Unoptimized &quot;select&quot; instruction handling in LLVM x86 backend'/><author><name>syoyo</name><uri>http://www.blogger.com/profile/15167076070732369617</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4825121741853908400.post-1262691856426783219</id><published>2008-02-28T00:46:00.000+09:00</published><updated>2008-02-28T00:46:37.834+09:00</updated><title type='text'>Logical op in LLVM</title><content type='html'>In MUDA, taking logical op for floating point type are possible.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;vec and_func(vec a, vec b)&lt;br /&gt;{&lt;br /&gt;    return a &amp;amp; b;&lt;br /&gt;}&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;This syntax computes, for example, as follows.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;// a = (0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000) = (1.0f, 1.0f, 1.0f, 1.0f)&lt;br /&gt;// b = (0xffffffff, 0x00000000, 0x00000000, 0xf0f00000)&lt;br /&gt;// a &amp;amp; b = (0x3f800000, 0x00000000, 0x00000000, 0x30800000)&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;but In LLVM IR, logical instruction does not accept floating point type for its operand, thus if we want to take a logical op on floating point values,&lt;br /&gt;first we have to change the type of variable from float to integer, without changing its content.&lt;br /&gt;&lt;br /&gt;For this purpose, LLVM IR provides &lt;span class="Apple-style-span" style="font-weight: bold;"&gt;bitcast&lt;/span&gt; op.&lt;br /&gt;&lt;br /&gt;MUDA's LLVM IR backend emits following code againt above input MUDA code.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;define &lt;4xfloat&gt; @and_func (&lt;4xfloat&gt; %a, &lt;4xfloat&gt; %b)&lt;br /&gt;{&lt;br /&gt;  %a.addr = alloca &lt;4xfloat&gt; ;&lt;br /&gt;  store &lt;4xfloat&gt; %a, &lt;4xfloat&gt;* %a.addr ;&lt;br /&gt;  %b.addr = alloca &lt;4xfloat&gt; ;&lt;br /&gt;  store &lt;4xfloat&gt; %b, &lt;4xfloat&gt;* %b.addr ;&lt;br /&gt;  %t_vec2 = load &lt;4xfloat&gt;* %a.addr ;&lt;br /&gt;&lt;br /&gt;  %t_vec3 = load &lt;4xfloat&gt;* %b.addr ;&lt;br /&gt;&lt;br /&gt;  %t_ivec4 = bitcast &lt;4xfloat&gt; %t_vec2 to &lt;4xi32&gt; ;&lt;br /&gt;  %t_ivec5 = bitcast &lt;4xfloat&gt; %t_vec3 to &lt;4xi32&gt; ;&lt;br /&gt;  %t_ivec6 = and &lt;4xi32&gt; %t_ivec4 ,  %t_ivec5 ;&lt;br /&gt;  %t_vec1 = bitcast &lt;4xi32&gt; %t_ivec6 to &lt;4xfloat&gt; ;&lt;br /&gt;  ret &lt;4xfloat&gt; %t_vec1 ;&lt;br /&gt;}&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;Generated LLVM IR code is somewhat redundant, but LLVM optimizer and x86 backend emits  exactly what I expected.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;$ llvm-as tmp.ll -f; opt -std-compile-opts tmp.bc -f | llc -march=x86&lt;br /&gt;    .text&lt;br /&gt;    .align 4,0x90&lt;br /&gt;    .globl _and_func&lt;br /&gt;_and_func:&lt;br /&gt;    andps %xmm1, %xmm0&lt;br /&gt;    ret&lt;br /&gt;&lt;br /&gt;    .subsections_via_symbols&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;LLVM IR is mapped to just &lt;span style="font-weight: bold;"&gt;one&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;andps&lt;/span&gt; instruction. LLVM is so nice!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4825121741853908400-1262691856426783219?l=mudadev.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mudadev.blogspot.com/feeds/1262691856426783219/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4825121741853908400&amp;postID=1262691856426783219' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4825121741853908400/posts/default/1262691856426783219'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4825121741853908400/posts/default/1262691856426783219'/><link rel='alternate' type='text/html' href='http://mudadev.blogspot.com/2008/02/logical-op-in-llvm.html' title='Logical op in LLVM'/><author><name>syoyo</name><uri>http://www.blogger.com/profile/15167076070732369617</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4825121741853908400.post-7679988051987177227</id><published>2008-02-08T02:23:00.000+09:00</published><updated>2008-02-11T23:41:17.989+09:00</updated><title type='text'>Work in progress | MUDA -&gt; LLVM backend</title><content type='html'>&lt;span class="sans"&gt;I've started to implement LLVM IR backend for MUDA.&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;a href="http://lucille.svn.sourceforge.net/viewvc/lucille/angelina/haskellmuda/CodeGenLLVM.hs?view=markup"&gt;http://lucille.svn.sourceforge.net/viewvc/lucille/angelina/haskellmuda/CodeGenLLVM.hs?view=markup&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;MUDA's LLVM IR backend is still work in progress.&lt;br /&gt;First, I've won success on simple case.&lt;br /&gt;&lt;br /&gt;Here's MUDA input code,&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;// input.mu&lt;br /&gt;vec&lt;br /&gt;func( vec a, vec b ) {&lt;br /&gt;&lt;br /&gt; return a + b;&lt;br /&gt;&lt;br /&gt;}&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;MUDA's current LLVM backend emits,&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;$ mudah --llvm input.mu &gt; tmp.ll&lt;br /&gt;$ cat tmp.ll&lt;br /&gt;&lt;br /&gt;;;&lt;br /&gt;;; The following code was generated by MUDA compiler&lt;br /&gt;;;&lt;br /&gt;target datalayout = "i32:128:128-f32:128:128"&lt;br /&gt;&lt;br /&gt;define &lt;4xfloat&gt; @func (&lt;4xfloat&gt; %a, &lt;4xfloat&gt; %b)&lt;br /&gt;{&lt;br /&gt;  %a.addr = alloca &lt;4xfloat&gt; ;&lt;br /&gt;  store &lt;4xfloat&gt; %a, &lt;4xfloat&gt;* %a.addr ;&lt;br /&gt;  %b.addr = alloca &lt;4xfloat&gt; ;&lt;br /&gt;  store &lt;4xfloat&gt; %b, &lt;4xfloat&gt;* %b.addr ;&lt;br /&gt;  %t_vec1 = load &lt;4xfloat&gt;* %a.addr ;&lt;br /&gt;&lt;br /&gt;  %t_vec2 = load &lt;4xfloat&gt;* %b.addr ;&lt;br /&gt;&lt;br /&gt;  %t_vec3  =  add &lt;4xfloat&gt; %t_vec1 ,  %t_vec2   ;&lt;br /&gt;  ret &lt;4xfloat&gt; %t_vec3 ;&lt;br /&gt;}&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The LLVM code generated in MUDA -&gt; LLVM IR backend is straightforward and somewhat redundant.&lt;br /&gt;&lt;br /&gt;Try to get native code with optimization by LLVM midend and backend,&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;$ llvm-as tmp.ll -f&lt;br /&gt;$ opt -std-compile-opts -f tmp.bc -o tmp.opt.bc&lt;br /&gt;$ llc tmp.opt.bc -f&lt;br /&gt;$ cat tmp.opt.s&lt;br /&gt;&lt;br /&gt; .text&lt;br /&gt; .align 4,0x90&lt;br /&gt; .globl _func&lt;br /&gt;_func:&lt;br /&gt; addps %xmm1, %xmm0&lt;br /&gt; ret&lt;br /&gt;&lt;br /&gt; .subsections_via_symbols&lt;br /&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;This is exactly what I want to get in x86 assembler(Just one addps instruction),&lt;br /&gt;LLVM(and it's x86 backend) rocks!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4825121741853908400-7679988051987177227?l=mudadev.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mudadev.blogspot.com/feeds/7679988051987177227/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4825121741853908400&amp;postID=7679988051987177227' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4825121741853908400/posts/default/7679988051987177227'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4825121741853908400/posts/default/7679988051987177227'/><link rel='alternate' type='text/html' href='http://mudadev.blogspot.com/2008/02/work-in-progress-muda-llvm-backend.html' title='Work in progress | MUDA -&gt; LLVM backend'/><author><name>syoyo</name><uri>http://www.blogger.com/profile/15167076070732369617</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4825121741853908400.post-2891448259440319408</id><published>2008-02-03T14:04:00.000+09:00</published><updated>2008-02-04T00:26:19.862+09:00</updated><title type='text'>MUDA blog lanuched.</title><content type='html'>I've decided to launch MUDA blog site separately.&lt;br /&gt;&lt;br /&gt;And I've finished almost basic  implementation of MUDA language.&lt;br /&gt;If you want to play with MUDA, go&lt;br /&gt;&lt;a href="http://lucille.sourceforge.net/muda/"&gt;&lt;br /&gt;http://lucille.sourceforge.net/muda/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;And check out the current svn tree.&lt;br /&gt;&lt;br /&gt;Example &amp;amp; Documentation will be updated soon.&lt;br /&gt;&lt;br /&gt;Here is TODOs&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;LLVM IR backend(near furure)&lt;/li&gt;&lt;li&gt;math library for MUDA(near future)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Automatic optimzation(middle future)&lt;/li&gt;&lt;li&gt;Formal verification of computation code by gappa(middle future)&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4825121741853908400-2891448259440319408?l=mudadev.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mudadev.blogspot.com/feeds/2891448259440319408/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4825121741853908400&amp;postID=2891448259440319408' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4825121741853908400/posts/default/2891448259440319408'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4825121741853908400/posts/default/2891448259440319408'/><link rel='alternate' type='text/html' href='http://mudadev.blogspot.com/2008/02/muda-blog-lanuched.html' title='MUDA blog lanuched.'/><author><name>syoyo</name><uri>http://www.blogger.com/profile/15167076070732369617</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry></feed>
