summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [CodeGen] Fix assert in SelectionDAG::computeKnownBitsScott Linder2018-08-131-0/+30
| | | | | | | | | | Fix SelectionDAG::computeKnownBits asserting when handling EXTRACT_SUBVECTOR when zero extending the demanded elements mask if it is already as long as the source vector. Differential Revision: https://reviews.llvm.org/D49574 llvm-svn: 339600
* Revert "[Sparc] Add support for the cycle counter available in GR740"Daniel Cederman2018-08-131-11/+0
| | | | | | | It breaks when using EXPENSIVE_CHECKS with the error message "Bad machine code: Using an undefined physical register". llvm-svn: 339570
* [X86] Add tests showing missing div/rem 0, X -> 0 combinesSimon Pilgrim2018-08-134-0/+270
| | | | llvm-svn: 339562
* [CGP] Fix GEP issue with out of range APInt constant values not fitting in ↵Simon Pilgrim2018-08-131-0/+11
| | | | | | | | int64_t Test case reduced from https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=7173 llvm-svn: 339556
* [Sparc] Add support for the cycle counter available in GR740Daniel Cederman2018-08-131-0/+11
| | | | | | | | | | | | | | | | | | Summary: The GR740 provides an up cycle counter in the registers ASR22 and ASR23. As these registers can not be read together atomically we only use the value of ASR23 for llvm.readcyclecounter(). The ASR23 register holds the 32 LSBs of the up-counter. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48638 llvm-svn: 339551
* [ARM] Added FP16 VREV Vector Instrinsic CodeGen supportLuke Geeson2018-08-131-13/+11
| | | | llvm-svn: 339546
* [SelectionDAG] In PromoteFloatOp_BITCAST, insert a bitcast after the ↵Craig Topper2018-08-131-12/+47
| | | | | | | | fp_to_fp16 in case the result type isn't a scalar integer. This is another variation of PR38533. In this case, the result type of the bitcast is legal and 16-bits wide, but not a scalar integer. So we need to emit the convert to i16 and then bitcast it to the true result type. This new bitcast will be further type legalized if necessary. llvm-svn: 339536
* [SelectionDAG] In PromoteIntRes_BITCAST, when the input is TypePromoteFloat, ↵Craig Topper2018-08-131-0/+19
| | | | | | | | | | make sure the output type is scalar. For vectors, use a store and load of temporary. Previously if the result type was a vector, we emitted a FP_TO_FP16 with a vector result type which isn't valid. This is basically the opposite case of the root cause of PR38533. llvm-svn: 339535
* Restore correct x86_64 EH encodings in kernel code modelLei Liu2018-08-131-0/+79
| | | | | | | | | | | | | Fixes PR37524. The exception handling encodings for x86_64 in kernel code model has been changed with r309884. Restore it to correct ones. These encodings include PersonalityEncoding, LSDAEncoding and TTypeEncoding. Differential Revision: https://reviews.llvm.org/D50490 llvm-svn: 339534
* [SelectionDAG] In PromoteFloatRes_BITCAST, insert a bitcast before the ↵Craig Topper2018-08-131-0/+11
| | | | | | | | | | fp16_to_fp in case the input type isn't an i16. The bitcast can be further legalized as needed. Fixes PR38533. llvm-svn: 339533
* AMDGPU: Cleanup min/max legacy testsMatt Arsenault2018-08-124-124/+664
| | | | | | | Also add some more tests in preparation for a future patch. llvm-svn: 339526
* DAG: Check no-signed-zeros instead of unsafe-fp-mathMatt Arsenault2018-08-124-10/+4
| | | | | | | Addresses fixme, although this should still be checking individual operand flags. llvm-svn: 339525
* AMDGPU: Check NSZ MI flag when folding omodMatt Arsenault2018-08-121-0/+71
| | | | | | | | I'm not sure the exact nsz flag combination that is OK. I think as long as it's on either, this is OK. For now just check it on the omod multiply. llvm-svn: 339513
* AMDGPU: Use splat vectors for undefs when folding canonicalizeMatt Arsenault2018-08-122-8/+91
| | | | | | | | | | | If one of the elements is undef, use the canonicalized constant from the other element instead of 0. Splat vectors are more useful for other optimizations, such as matching vector clamps. This was breaking on clamps of half3 from the undef 4th component. llvm-svn: 339512
* AMDGPU: Fix packing undef parts of build_vectorMatt Arsenault2018-08-123-4/+386
| | | | llvm-svn: 339511
* [X86] Change the MOV32ri64 pseudo instruction to def a GR64 directly instead ↵Craig Topper2018-08-119-34/+31
| | | | | | | | | | of wrapping it in a SUBREG_TO_REG. Now we switch to the subregister in expandPostRAPseudos where we already switched the opcode. This simplifies a few isel patterns that used the pseudo directly. And magically seems to have improved our ability to CSE it in the undef-label.ll test. llvm-svn: 339496
* AMDGPU/GlobalISel: Define instruction mapping for G_INSERTTom Stellard2018-08-111-0/+83
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49625 llvm-svn: 339491
* [WebAssembly] Added default stack-only instruction mode for MC.Wouter van Oortmerssen2018-08-1081-107/+303
| | | | | | | | | | | | | | | | | | | | | | | Summary: Moved Explicit Locals pass to last. Made that pass obligatory. Made it convert from register to stack based instructions, and removed the registers. Fixes to related code that was expecting register based instructions. Added the correct testing flag to all tests, depending on what the format they were expecting so far. Translated one test to stack format as example: reg-stackify-stack.ll tested: llvm-lit -v `find test -name WebAssembly` unittests/MC/* Reviewers: dschuff, sunfish Subscribers: jfb, llvm-commits, aheejin, eraman, jgravelle-google, sbc100 Differential Revision: https://reviews.llvm.org/D50568 llvm-svn: 339474
* [ARM] Adjust AND immediates to make them cheaper to select.Eli Friedman2018-08-107-28/+87
| | | | | | | | | | | | | | | | | | | | | | | LLVM normally prefers to minimize the number of bits set in an AND immediate, but that doesn't always match the available ARM instructions. In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer a two-instruction sequence movs+ands or movs+bics. Some potential improvements outlined in ARMTargetLowering::targetShrinkDemandedConstant, but seems to work pretty well already. The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX instruction due to a larger-than-expected mask. (It's orthogonal, in some sense, but as far as I can tell it's either impossible or nearly impossible to reproduce the bug without this change.) According to my testing, this seems to consistently improve codesize by a small amount by forming bic more often for ISD::AND with an immediate. Differential Revision: https://reviews.llvm.org/D50030 llvm-svn: 339472
* AMDGPU: More canonicalized operationsMatt Arsenault2018-08-101-0/+22
| | | | llvm-svn: 339464
* AMDGPU: Combine and of seto/setuo and fp_classMatt Arsenault2018-08-102-6/+81
| | | | | | Clear the nan (or non-nan) test bits from the mask. llvm-svn: 339462
* AMDGPU: Match isfinite pattern to class instructionsMatt Arsenault2018-08-101-4/+22
| | | | llvm-svn: 339460
* [ARM] Disallow zexts in ARMCodeGenPrepareSam Parker2018-08-104-281/+343
| | | | | | | | | | | | | | | | | | | Enabling ARMCodeGenPrepare by default caused a whole load of failures. This is due to zexts and truncs not being handled properly. ZExts are messy so it's just easier to disable for now and truncs are allowed only as 'sinks'. I still need to figure out why allowing them as 'sources' causes so many failures. The other main changes are that we are explicit in the types that we converting to, it's now always 'TypeSize'. Type support is also now performed while checking for valid opcodes as it unnecessarily complicated having the checks are different stages. I've moved the tests around too, so we have the zext and truncs in their own file as well as the overflowing opcode tests. Differential Revision: https://reviews.llvm.org/D50518 llvm-svn: 339432
* Rename the cfguard module flag to cfguardtableHans Wennborg2018-08-101-1/+1
| | | | | | | | | | The previous name sounds like it inserts cfguard implementation, but it really just emits the table of address-taken functions. Change the name to better reflect that. Clang will be updated in the next commit. llvm-svn: 339419
* [WebAssembly] Gate i64x2 and f64x2 on -wasm-enable-unimplementedHeejin Ahn2018-08-091-2/+11
| | | | | | | | | | | | | | | | | | Summary: i64x2 and f64x2 operations are not implemented in V8, so we normally do not want to emit them. However, they are in the SIMD spec proposal, so we still want to be able to test them in the toolchain. This patch adds a flag to enable their emission. Reviewers: aheejin, dschuff Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D50423 Patch by Thomas Lively (tlively) llvm-svn: 339407
* [X86] Qualify one of the heuristics in combineMul to only apply to positive ↵Craig Topper2018-08-091-2/+2
| | | | | | | | multiply amounts. This seems to slightly help the performance of one of our internal benchmarks. We probably need better heuristics here. llvm-svn: 339406
* [Hexagon] Map ISD::TRAP to J2_trap0(#0)Krzysztof Parzyszek2018-08-091-1/+1
| | | | llvm-svn: 339365
* [SelectionDAG] try harder to convert funnel shift to rotateSanjay Patel2018-08-092-7/+3
| | | | | | | | | | | | Similar to rL337966 - if the DAGCombiner's rotate matching was working as expected, I don't think we'd see any test diffs here. AArch only goes right, and PPC only goes left. x86 has both, so no diffs there. Differential Revision: https://reviews.llvm.org/D50091 llvm-svn: 339359
* extend folding fsub/fadd to fneg for FMFMichael Berg2018-08-091-72/+24
| | | | | | | | | | | | | | Summary: This change provides a common optimization path for both Unsafe and FMF driven optimization for this fsub fold adding reassociation, as it the flag that most closely represents the translation Reviewers: spatel, wristow, arsenm Reviewed By: spatel Subscribers: wdng Differential Revision: https://reviews.llvm.org/D50195 llvm-svn: 339357
* [ARM] Replace processor check with featureEvandro Menezes2018-08-091-10/+13
| | | | | | | Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a processor check. Otherwise, NFC. llvm-svn: 339354
* [ARM] FP16: codegen support for VTRNSjoerd Meijer2018-08-091-19/+23
| | | | | | Differential Revision: https://reviews.llvm.org/D50454 llvm-svn: 339340
* [X86][SSE] Remove PMULDQ/PMULUDQ by zeroSimon Pilgrim2018-08-093-69/+42
| | | | | | | | Exposed by D50328 Differential Revision: https://reviews.llvm.org/D50328 llvm-svn: 339337
* [X86][SSE] Combine (some) target shuffles with multiple usesSimon Pilgrim2018-08-0927-1067/+948
| | | | | | | | | | | | | | As discussed on D41794, we have many cases where we fail to combine shuffles as the input operands have other uses. This patch permits these shuffles to be combined as long as they don't introduce additional variable shuffle masks, which should reduce instruction dependencies and allow the total number of shuffles to still drop without increasing the constant pool. However, this may mean that some memory folds may no longer occur, and on pre-AVX require the occasional extra register move. This also exposes some poor PMULDQ/PMULUDQ codegen which was doing unnecessary upper/lower calculations which will in fact fold to zero/undef - the fix will be added in a followup commit. Differential Revision: https://reviews.llvm.org/D50328 llvm-svn: 339335
* [NVPTX] Select atomic loads and storesJonas Hahnfeld2018-08-091-0/+88
| | | | | | | | | | | | | | | | | | | According to PTX ISA .volatile has the same memory synchronization semantics as .relaxed.sys, so it can be used to implement monotonic atomic loads and stores. This is important for OpenMP's atomic construct where - 'read's and 'write's are lowered to atomic loads and stores, and - an update of float or double types are lowered into a cmpxchg loop. (Note that PTX could do better because it has atom.add.f{32,64} but LLVM's atomicrmw instruction only allows integer types.) Higher levels of atomicity (like acquire and release) need additional synchronization properties which were added with PTX ISA 6.0 / sm_70. So using these instructions still results in an error. Differential Revision: https://reviews.llvm.org/D50391 llvm-svn: 339316
* [x86] add test for commuted variant for fsub fold; NFCSanjay Patel2018-08-081-2/+21
| | | | llvm-svn: 339300
* [DAGCombiner] loosen constraints for fsub+fadd foldSanjay Patel2018-08-081-24/+44
| | | | | | | | | isNegatibleForFree() should not matter here (as the test diffs show) because it's always a win to replace an fsub+fadd with fneg. The problem in D50195 persists because either (1) we are doing these folds in the wrong order or (2) we're missing another fold for fadd. llvm-svn: 339299
* [ADT] Normalize empty triple componentsPetr Hosek2018-08-081-8/+8
| | | | | | | | | | | | | | | | | LLVM triple normalization is handling "unknown" and empty components differently; for example given "x86_64-unknown-linux-gnu" and "x86_64-linux-gnu" which should be equivalent, triple normalization returns "x86_64-unknown-linux-gnu" and "x86_64--linux-gnu". autoconf's config.sub returns "x86_64-unknown-linux-gnu" for both "x86_64-linux-gnu" and "x86_64-unknown-linux-gnu". This changes the triple normalization to behave the same way, replacing empty triple components with "unknown". This addresses PR37129. Differential Revision: https://reviews.llvm.org/D50219 llvm-svn: 339294
* [x86] add tests for fsub+fadd with FMF; NFCSanjay Patel2018-08-081-3/+69
| | | | | | These are related to the block of code under review in D50195. llvm-svn: 339293
* [DWARF] Unclamp line table version on Darwin for v5 and later.Jonas Devlieghere2018-08-082-6/+6
| | | | | | | | | On Darwin we pin the DWARF line tables to version 2. Stop doing so for DWARF v5 and later. Differential revision: https://reviews.llvm.org/D49381 llvm-svn: 339288
* [ARM] Avoid spilling lr with Thumb1 tail calls.Eli Friedman2018-08-081-30/+137
| | | | | | | | | | | | | | | Normally, if any registers are spilled, we prefer to spill lr on Thumb1 so we can fold the "bx lr" into the "pop". However, if there are tail calls involved, restoring lr is expensive, so skip the optimization in that case. The spill of r7 in the new test also isn't necessary, but that's mostly orthogonal to this patch. (It's the same code in ARMFrameLowering, but it's not related to tail calls.) Differential Revision: https://reviews.llvm.org/D49459 llvm-svn: 339283
* revert tests of '[CodeGen] emit inline asm clobber list warnings for reserved'Ties Stuij2018-08-083-43/+0
| | | | llvm-svn: 339276
* [Hexagon] Diagnose misaligned absolute loads and storesKrzysztof Parzyszek2018-08-082-0/+76
| | | | | | Differential Revision: https://reviews.llvm.org/D50405 llvm-svn: 339272
* AMDGPU: Error more gracefully on libcallsMatt Arsenault2018-08-081-0/+7
| | | | | | | I think this is the only situation where the callsite will have a null instruction. llvm-svn: 339271
* AMDGPU: Fix shifts for i128Matt Arsenault2018-08-081-0/+1047
| | | | llvm-svn: 339270
* [PowerPC] Improve codegen for vector loads using scalar_to_vectorZaara Syeda2018-08-0811-219/+1444
| | | | | | | | | | | | | | | | This patch aims to improve the codegen for vector loads involving the scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X) to utilize: LXSD and LXSDX for i64 and f64 LXSIWAX for i32 (sign extension to i64) LXSIWZX for i32 and f64 Committing on behalf of Amy Kwan. Differential Revision: https://reviews.llvm.org/D48950 llvm-svn: 339260
* [CodeGen] emit inline asm clobber list warnings for reservedTies Stuij2018-08-083-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: ``` extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } ``` Compiled for thumb, this gives: ``` $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @APP mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... ``` r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: ``` repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ ``` Reviewers: eli.friedman, olista01, javed.absar, efriedma Reviewed By: efriedma Subscribers: efriedma, eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49727 llvm-svn: 339257
* [TargetLowering] BuildUDIV - Add support for divide by one (PR38477)Simon Pilgrim2018-08-081-85/+19
| | | | | | | | Provide a pass-through of the numerator for divide by one cases - this is the same approach we take in DAGCombiner::visitSDIVLike. I investigated whether we could achieve this by magic MULHU/SRL values but nothing appeared to work as we don't have a way for MULHU(x,c) -> x llvm-svn: 339254
* [ARM][NFC] Replaced tab-characters in test file vtrn.llSjoerd Meijer2018-08-081-100/+100
| | | | llvm-svn: 339251
* [X86][SSE] PR38477 test is more cleanly tested with udiv instead of uremSimon Pilgrim2018-08-081-110/+78
| | | | | | Making the test use urem relies on it calling udiv-like combines, but the real issue is with the udiv so we're better off using that directly. llvm-svn: 339247
* [ARM] FP16: codegen support for VEXTSjoerd Meijer2018-08-081-12/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D50427 llvm-svn: 339241
OpenPOWER on IntegriCloud