summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* Reduce x86 register context boilerplate.Pavel Labath2017-12-185-82/+71
| | | | | | | | | | | | | | | | | | | | | | Summary: The x86 FPR struct was defined as a struct containing a union between two members: XSAVE and FXSAVE. This patch makes FPR a union directly to remove one layer of indirection when trying to access the members. The initial layout of these two structs is identical, which is recognised by the fact that XSAVE has FXSAVE as its first member, so we also considered removing one more layer and leave FPR identical to XSAVE struct, but stopped short of doing that, as the FPR may be used to store different layouts in the future (e.g., ones generated by the FSAVE instruction). Reviewers: clayborg, krytarowski Subscribers: emaste, lldb-commits Differential Revision: https://reviews.llvm.org/D41245 llvm-svn: 320966
* AArch64: work around how Cyclone handles "movi.2d vD, #0".Tim Northover2017-12-188-23/+69
| | | | | | | | | | | For Cylone, the instruction "movi.2d vD, #0" is executed incorrectly in some rare circumstances. Work around the issue conservatively by avoiding the instruction entirely. This patch changes CodeGen so that problematic instructions are never generated, and the AsmParser so that an equivalent instruction is used (with a warning). llvm-svn: 320965
* [TargetLibraryInfo] Discard library functions with incorrectly sized integersIgor Laevsky2017-12-182-3/+19
| | | | | | Differential Revision: https://reviews.llvm.org/D41184 llvm-svn: 320964
* [ARM] Adjust test checksSam Parker2017-12-181-2/+2
| | | | | | Correct the CHECK-LABELS of a couple of dag combine tests. llvm-svn: 320963
* [DAGCombine] Move AND nodes to multiple load leavesSam Parker2017-12-182-378/+472
| | | | | | | | | | | | | Search from AND nodes to find whether they can be propagated back to loads, so that the AND and load can be combined into a narrow load. We search through OR, XOR and other AND nodes and all bar one of the leaves are required to be loads or constants. The exception node then needs to be masked off meaning that the 'and' isn't removed, but the loads(s) are narrowed still. Differential Revision: https://reviews.llvm.org/D41177 llvm-svn: 320962
* NPL: Clean up handling of inferior exitPavel Labath2017-12-183-47/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: lldb-server was sending the "exit" packet (W??) twice. This happened because it was handling both the pre-exit (PTRACE_EVENT_EXIT) and post-exit (WIFEXITED) as exit events. We had some code which was trying to detect when we've already sent the exit packet, but this stopped working quite a while ago. This never really caused any problems in practice because the client automatically closes the connection after receiving the first packet, so the only effect of this was some warning messages about extra packets from the lldb-server test suite, which were ignored because they didn't fail the test. The new test suite will be stricter about this, so I fix this issue ignoring the first event. I think this is the correct behavior, as the inferior is not really dead at that point, so it's premature to send the exit packet. There isn't an actual test yet which would verify the exit behavior, but in my next patch I will add a test which will also test this functionality. Reviewers: eugene Subscribers: lldb-commits Differential Revision: https://reviews.llvm.org/D41069 llvm-svn: 320961
* [NFC][CodeGen][ExpandMemCmp] Fix documentation.Clement Courbet2017-12-181-3/+2
| | | | llvm-svn: 320960
* [X86] Use mattr instead of mcpu in some of the cost model tests.Craig Topper2017-12-184-39/+33
| | | | | | | | Based on the names of the check lines, features seems more appropriate that cpu. Spotted while prototyping my patch to make 512-bit vectors illegal on SKX sometimes. llvm-svn: 320959
* [SROA] Disable non-whole-alloca splits by defaultHiroshi Inoue2017-12-184-60/+23
| | | | | | | This patch introduce a switch to control splitting of non-whole-alloca slices with default off. The switch will be default on again after fixing an issue reported in PR35657. llvm-svn: 320958
* [X86] Fix mistake that I made when splitting up the setOperationAction calls ↵Craig Topper2017-12-181-2/+2
| | | | | | | | recently. The block I moved things that need BWI and 512-bit or VLX is incorrectly qualified with just hasBWI || hasVLX. Here I've qualified it with hasBWI && (hasAVX512 || hasVLX) where the hasAVX512 will be replaced with allowing 512-bit vectors in an upcoming patch. llvm-svn: 320957
* [CGP] Fix the handling select inst in complex addressing modeSerguei Katkov2017-12-182-2/+23
| | | | | | | | | | | | | | | When we put the value in select placeholder we must pass the value through simplification tracker due to the value might be already simplified and erased. This is a fix for PR35658. Reviewers: john.brawn, uabelho Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41251 llvm-svn: 320956
* [x86] add tests for finite libcall lowering (PR35672); NFCSanjay Patel2017-12-181-0/+52
| | | | llvm-svn: 320955
* Refactor overridden methods iteration to avoid double lookups.Benjamin Kramer2017-12-1714-88/+52
| | | | | | Convert most uses to range-for loops. No functionality change intended. llvm-svn: 320954
* Re-commit "Properly handle multi-element and dynamically sized allocas in ↵Bjorn Steinbrink2017-12-173-5/+24
| | | | | | | | | getPointerDereferenceableBytes()"" llvm-clang-x86_64-expensive-checks-win is still broken, so the failure seems unrelated. llvm-svn: 320953
* [testsuite] Un-XFAIL the global variables tests.Davide Italiano2017-12-171-6/+0
| | | | | | | | <rdar://problem/28725399> Differential Revision: https://reviews.llvm.org/D41312 llvm-svn: 320952
* [X86] Add test cases that show cases where buildvector of extract and ↵Craig Topper2017-12-171-0/+676
| | | | | | | | inserts should be turned into fmsubadd. This is a follow up to the fmaddsub support added in r320950. Hopefully in the future we can fix lowering to handle this fmsubadd too. llvm-svn: 320951
* [X86] Make the code that creates fmaddsub from build_vector of extracts and ↵Craig Topper2017-12-172-6/+284
| | | | | | | | | | | | | | | | | | | inserts functional and add tests. Summary: We had no tests for this and we couldn't do the optimization because of a bad use count check. We need to know how many non-undef pieces of the build vector were filled in and ensure our use count is equal to that. But on the shuffle combine version we need the use count to be 2. The missing coverage was noticed during the review of D40335. Reviewers: RKSimon, zvi, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41133 llvm-svn: 320950
* [X86] Regenerate truncated rotation tests + add missing 32-bit checksSimon Pilgrim2017-12-171-8/+41
| | | | llvm-svn: 320949
* [WebAssembly] Move code for copying of data segment relocation. NFC.Sam Clegg2017-12-172-8/+15
| | | | | | | | | This is a preparetory change for function gc which also requires relocations to be copied in ranges like this. Differential Revision: https://reviews.llvm.org/D41313 llvm-svn: 320948
* use uint32_tSam Clegg2017-12-171-2/+2
| | | | llvm-svn: 320947
* [WebAssembly] Export some more info on wasm funtionsSam Clegg2017-12-173-6/+9
| | | | | | | | | | | | | Summary: These fields are useful for lld's gc-sections support Also remove an unused field. Subscribers: jfb, dschuff, jgravelle-google, aheejin, sunfish Differential Revision: https://reviews.llvm.org/D41320 llvm-svn: 320946
* Revert "Properly handle multi-element and dynamically sized allocas in ↵Bjorn Steinbrink2017-12-173-24/+5
| | | | | | | | | | getPointerDereferenceableBytes()" This reverts commit 217067d5179882de9deb60d2e866befea4c126e7. Fails on llvm-clang-x86_64-expensive-checks-win llvm-svn: 320945
* Revert "Treat sret arguments as being dereferenceable in ↵Bjorn Steinbrink2017-12-171-3/+2
| | | | | | | | | | getPointerDereferenceableBytes()" This reverts commit 8b7a7660a3904b2088bc594311bcea2c651def08. I didn't mean to commit this. llvm-svn: 320944
* Treat sret arguments as being dereferenceable in ↵Bjorn Steinbrink2017-12-171-2/+3
| | | | | | getPointerDereferenceableBytes() llvm-svn: 320943
* [ASTImporter] Support importing FunctionTemplateDecl and ↵Aleksei Sidorin2017-12-172-0/+155
| | | | | | | | | | | | CXXDependentScopeMemberExpr * Also introduces ImportTemplateArgumentListInfo facility (A. Sidorin) Patch by Peter Szecsi! Differential Revision: https://reviews.llvm.org/D38692 llvm-svn: 320942
* Remove superfluous break after a return. NFCI.Simon Pilgrim2017-12-171-1/+0
| | | | llvm-svn: 320941
* [X86DomainReassignment] Store legal domains in a std::bitset instead of ↵Craig Topper2017-12-171-16/+18
| | | | | | using a SmallVector that really only ever has one element as a set. llvm-svn: 320940
* Properly handle byval arguments in getPointerDereferenceableBytes()Bjorn Steinbrink2017-12-172-4/+18
| | | | | | | | | | | | | | Summary: For byval arguments, the number of dereferenceable bytes is equal to the size of the pointee, not the pointer. Reviewers: hfinkel, rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41305 llvm-svn: 320939
* Properly handle multi-element and dynamically sized allocas in ↵Bjorn Steinbrink2017-12-173-5/+24
| | | | | | | | | | | | getPointerDereferenceableBytes() Reviewers: hfinkel, rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41288 llvm-svn: 320938
* [X86] Use extract_vector_elt instead of X86ISD::VEXTRACT for isel of vXi1 ↵Craig Topper2017-12-175-18/+19
| | | | | | extractions. llvm-svn: 320937
* [X86] Canonicalize extract_vector_elt from vXi1 to always return MVT::i32.Craig Topper2017-12-175-3692/+3693
| | | | | | This allows us to remove some isel patterns that allowed MVT::i8 result type. llvm-svn: 320936
* [X86] Don't create X86ISD::VEXTRACT nodes directly. Use EXTRACT_VECTOR_ELT ↵Craig Topper2017-12-171-5/+6
| | | | | | | | and allow that to be legaized to VEXTRACT. I think we can remove the VEXTRACT node completely and use a canonicalized EXTRACT_VECTOR_ELT instead. This is a first step. llvm-svn: 320935
* Fix unused variable warning.Simon Pilgrim2017-12-161-3/+2
| | | | llvm-svn: 320934
* [X86][AVX] lowerVectorShuffleAsBroadcast - aggressively peek through BITCASTsSimon Pilgrim2017-12-163-115/+88
| | | | | | | | Assuming we can safely adjust the broadcast index for the new type to keep it suitably aligned, then peek through BITCASTs when looking for the broadcast source. Fixes PR32007 llvm-svn: 320933
* [X86][AVX] Use extract128BitVector helper. NFCI.Simon Pilgrim2017-12-161-4/+1
| | | | llvm-svn: 320932
* [sanitizer] Define __sanitizer_clockid_t on FreeBSDKostya Kortchinsky2017-12-161-1/+1
| | | | | | | | | | | | | | | | | Summary: https://reviews.llvm.org/D41121 broke the FreeBSD build due to that type not being defined on FreeBSD. As far as I can tell, it is an int, but I do not have a way to test the change. Reviewers: alekseyshl, kparzysz Reviewed By: kparzysz Subscribers: kparzysz, emaste, kubamracek, krytarowski, #sanitizers, llvm-commits Differential Revision: https://reviews.llvm.org/D41325 llvm-svn: 320931
* [X86][AVX] Fix failed broadcast foldSimon Pilgrim2017-12-162-19/+11
| | | | | | Strip excess BITCASTs from EXTRACT_SUBVECTOR input llvm-svn: 320930
* [Memcpy Loop Lowering] Only calculate residual size/bytes copied when needed.Sean Fertile2017-12-162-14/+14
| | | | | | | | If the loop operand type is int8 then there will be no residual loop for the unknown size expansion. Dont create the residual-size and bytes-copied values when they are not needed. llvm-svn: 320929
* [X86] Don't pass a zero input to the passthru operand of ↵Craig Topper2017-12-161-9/+6
| | | | | | | | getVectorMaskingNode/getScalarMaskingNode when its going to emit an ISD::OR/ISD::AND. NFCI In those cases, the pass thru operand of the methods isn't used. The calls to the scalar version were passing a MVT::i1 zero, which is an illegal type at the stage this code runs. llvm-svn: 320928
* [X86] Have getVectorMaskingNode return an ISD::AND for X86ISD::VPSHUFBITQMB ↵Craig Topper2017-12-161-0/+1
| | | | | | instead of creating a select with one input being 0. llvm-svn: 320927
* [X86] When using vpopcntdq for ctpop of v8i16 vectors, only promote to v8i32.Craig Topper2017-12-163-26/+28
| | | | | | Previously we promoted to v8i64, but we don't need to go all the way to 512-bits. If we have VLX we can use the 256-bit instruction. And even if we don't have VLX we can widen v8i32 to v16i32 and drop the upper half. llvm-svn: 320926
* [libcxx] Add WebAssembly supportSam Clegg2017-12-162-1/+3
| | | | | | | | | | | | It turns out that this is the only change required in libcxx for it to compile with the new `wasm32-unknown-unknown-wasm` target recently added to Clang. Patch by Nicholas Wilson! Differential Revision: https://reviews.llvm.org/D41073 llvm-svn: 320925
* [X86] Combine some more scheduler model entries using regular expressions.Craig Topper2017-12-164-200/+100
| | | | | | We had a lot of separate 32 and 64 instructions that had the same scheduling data. This merges them into the same regular expression. This is pretty consistent with a lot of other instructions. llvm-svn: 320924
* [X86] Use instrs instead of instregex for gather/scatter instructions in the ↵Craig Topper2017-12-164-130/+116
| | | | | | | | scheduler models. Combine into single InstrRW entries. The reduces the number of scheduler groups in subtarget info. llvm-svn: 320923
* [InstCombine] Regenerate FMUL/FMA combine tests with update_test_checks.pySimon Pilgrim2017-12-162-100/+186
| | | | llvm-svn: 320922
* [InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+selSanjay Patel2017-12-162-12/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to do this for 2 reasons: 1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766. 2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern. More detail about what happens in the backend: 1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs into the shift variant. That is the opposite of this IR canonicalization. 2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization. 3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2 into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node when that's legal/custom. 4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a variety of ways. a. For #2, the vector path is missing the case for setlt with a '1' constant. b. For #3, we are missing a match for commuted versions of the shift variants. 5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the shift sequence when not. 6. In the following examples with this patch applied, we may get conditional moves rather than the shift produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate. define i32 @abs_shifty(i32 %x) { %signbit = ashr i32 %x, 31 %add = add i32 %signbit, %x %abs = xor i32 %signbit, %add ret i32 %abs } define i32 @abs_cmpsubsel(i32 %x) { %cmp = icmp slt i32 %x, zeroinitializer %sub = sub i32 zeroinitializer, %x %abs = select i1 %cmp, i32 %sub, i32 %x ret i32 %abs } define <4 x i32> @abs_shifty_vec(<4 x i32> %x) { %signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> %add = add <4 x i32> %signbit, %x %abs = xor <4 x i32> %signbit, %add ret <4 x i32> %abs } define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) { %cmp = icmp slt <4 x i32> %x, zeroinitializer %sub = sub <4 x i32> zeroinitializer, %x %abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x ret <4 x i32> %abs } > $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx > abs_shifty: > movl %edi, %eax > negl %eax > cmovll %edi, %eax > retq > > abs_cmpsubsel: > movl %edi, %eax > negl %eax > cmovll %edi, %eax > retq > > abs_shifty_vec: > vpabsd %xmm0, %xmm0 > retq > > abs_cmpsubsel_vec: > vpabsd %xmm0, %xmm0 > retq > > $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64 > abs_shifty: > cmp w0, #0 // =0 > cneg w0, w0, mi > ret > > abs_cmpsubsel: > cmp w0, #0 // =0 > cneg w0, w0, mi > ret > > abs_shifty_vec: > abs v0.4s, v0.4s > ret > > abs_cmpsubsel_vec: > abs v0.4s, v0.4s > ret > > $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le > abs_shifty: > srawi 4, 3, 31 > add 3, 3, 4 > xor 3, 3, 4 > blr > > abs_cmpsubsel: > srawi 4, 3, 31 > add 3, 3, 4 > xor 3, 3, 4 > blr > > abs_shifty_vec: > vspltisw 3, -16 > vspltisw 4, 15 > vsubuwm 3, 4, 3 > vsraw 3, 2, 3 > vadduwm 2, 2, 3 > xxlxor 34, 34, 35 > blr > > abs_cmpsubsel_vec: > vspltisw 3, -16 > vspltisw 4, 15 > vsubuwm 3, 4, 3 > vsraw 3, 2, 3 > vadduwm 2, 2, 3 > xxlxor 34, 34, 35 > blr > Differential Revision: https://reviews.llvm.org/D40984 llvm-svn: 320921
* [Driver, CodeGen] pass through and apply -fassociative-mathSanjay Patel2017-12-167-13/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are 2 parts to getting the -fassociative-math command-line flag translated to LLVM FMF: 1. In the driver/frontend, we accept the flag and its 'no' inverse and deal with the interactions with other flags like -ffast-math -fno-signed-zeros -fno-trapping-math. This was mostly already done - we just need to translate the flag as a codegen option. The test file is complicated because there are many potential combinations of flags here. Note that we are matching gcc's behavior that requires 'nsz' and no-trapping-math. 2. In codegen, we map the codegen option to FMF in the IR builder. This is simple code and corresponding test. For the motivating example from PR27372: float foo(float a, float x) { return ((a + x) - x); } $ ./clang -O2 27372.c -S -o - -ffast-math -fno-associative-math -emit-llvm | egrep 'fadd|fsub' %add = fadd nnan ninf nsz arcp contract float %0, %1 %sub = fsub nnan ninf nsz arcp contract float %add, %2 So 'reassoc' is off as expected (and so is the new 'afn' but that's a different patch). This case now works as expected end-to-end although the underlying logic is still wrong: $ ./clang -O2 27372.c -S -o - -ffast-math -fno-associative-math | grep xmm addss %xmm1, %xmm0 subss %xmm1, %xmm0 We're not done because the case where 'reassoc' is set is ignored by optimizer passes. Example: $ ./clang -O2 27372.c -S -o - -fassociative-math -fno-signed-zeros -fno-trapping-math -emit-llvm | grep fadd %add = fadd reassoc float %0, %1 $ ./clang -O2 27372.c -S -o - -fassociative-math -fno-signed-zeros -fno-trapping-math | grep xmm addss %xmm1, %xmm0 subss %xmm1, %xmm0 Differential Revision: https://reviews.llvm.org/D39812 llvm-svn: 320920
* [X86] Implement kand/kandn/kor/kxor/kxnor/knot intrinsics using native IR.Craig Topper2017-12-162-16/+74
| | | | llvm-svn: 320919
* [X86] Remove GCCBuiltin from kand/kandn/kor/kxor/kxnor/knot intrinsics so ↵Craig Topper2017-12-161-6/+6
| | | | | | clang can implement with native IR. llvm-svn: 320918
* [X86] Remove unneeded code for handling the old kunpck intrinsics.Craig Topper2017-12-162-13/+1
| | | | llvm-svn: 320917
OpenPOWER on IntegriCloud