summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* Regenerate test results for and-su.ll . NFCAmaury Sechet2018-01-271-7/+38
| | | | llvm-svn: 323588
* [X86][SSE] Add broadcast from v2i32 memory tests (PR34394)Simon Pilgrim2018-01-271-0/+98
| | | | llvm-svn: 323587
* [TargetLowering] Teach TargetLowering::SimplifySetCC to simplify setcc of ↵Craig Topper2018-01-273-82/+74
| | | | | | | | vXi1 vectors into logic ops. This transform was already being done for setcc of scalar i1. This extends it to vectors. llvm-svn: 323585
* [SelectionDAG] Make DAGTypeLegalizer::PromoteSetCCOperands handle ↵Craig Topper2018-01-271-18/+4
| | | | | | | | SETEQ/SETNE correctly for vector types. The code was using getValueSizeInBits and combining with the result of a call to DAG.ComputeNumSignBits. But for vector types getValueSizeInBits returns the width of the full vector while ComputeNumSignBits is going to give a number no larger than the width of a single element. So we should be using getScalarValueSizeInBits to get the element width. llvm-svn: 323583
* [GlobalISel][Legalizer] Convert the FP constants to the right APFloat type ↵Amara Emerson2018-01-271-1/+1
| | | | | | | | | | | for G_FCONSTANT. We weren't converting the immediate ConstantFP during legalization, which caused the wrong bit patterns to be emitted for half type FP constants. Fixes PR36106. llvm-svn: 323582
* [X86] Use vpternlog to implement vector not under AVX512.Craig Topper2018-01-266-246/+480
| | | | | | | | Previously we had to materialize all 1s in a register using vpternlog or pcmpeq and then xor with that. By using vpternlog directly we can do it in one operation. This is implemented using isel patterns, but we should maybe consider creating a generalized vpternlog combiner. llvm-svn: 323572
* [Hexagon] Generate constant splats instead of loads from constant poolKrzysztof Parzyszek2018-01-261-0/+12
| | | | llvm-svn: 323568
* [Hexagon] Make sure that offset on globals matches alignment requirementsKrzysztof Parzyszek2018-01-261-0/+32
| | | | | | | | | | | | | A correctly aligned address may happen to be separated into a variable part and a constant part, where the constant part does not match the alignment needed in a load/store that uses this address. Such a constant cannot be used as an immediate offset in an indexed instruction. When lowering a global address, make sure that if there is an offset folded into the global, the offset is valid for all uses in load/store instructions. llvm-svn: 323562
* [Hexagon] Replace multiple vector extracts with store-load combinationsKrzysztof Parzyszek2018-01-261-0/+26
| | | | llvm-svn: 323561
* [LivePhysRegs] Preserve pristine regs in blocks with no successors.Eli Friedman2018-01-261-0/+47
| | | | | | | | | | | | | | | One common source of blocks with no successors is calls to noreturn functions; we want to preserve pristine registers in case they throw an exception. The whole pristine register thing is messy (we should really prefer to explicitly model registers), but this fills a hole in the model for now. Fixes https://bugs.llvm.org/show_bug.cgi?id=36073. Differential Revision: https://reviews.llvm.org/D42509 llvm-svn: 323559
* [X86] Allow any_extend to be combined with setcc on VLX targets.Craig Topper2018-01-261-22/+10
| | | | | | For VLX target getSetccResultType returns vXi1 which prevents the target independent DAG combine from doing this tranform itself. llvm-svn: 323555
* [X86][AVX512] Add combining support for X86ISD::VTRUNCSSimon Pilgrim2018-01-261-33/+11
| | | | | | | | Similar to the existing support for X86ISD::VTRUNCUS. Differential Revision: https://reviews.llvm.org/D42544 llvm-svn: 323553
* [Hexagon] Fix an incorrect assertion in HexagonConstExtendersKrzysztof Parzyszek2018-01-261-0/+54
| | | | llvm-svn: 323548
* [X86][AVX] LowerBUILD_VECTORAsVariablePermute - add support for VPERMILPV to ↵Simon Pilgrim2018-01-261-26/+2
| | | | | | | | v4i32/v4f32 Extension to D42431, adding support for v4i32/v4f32 as well as v2i64/v2f64 now that D42308 has landed llvm-svn: 323542
* [X86][SSE] Don't colaesce v4i32 extractsSimon Pilgrim2018-01-266-331/+253
| | | | | | | | | | We currently coalesce v4i32 extracts from all 4 elements to 2 v2i64 extracts + shifts/sign-extends. This seems to have been added back in the days when we tended to spill vectors and reload scalars, or ended up with repeated shuffles moving everything down to 0'th index. I don't think either of these are likely these days as we have better EXTRACT_VECTOR_ELT and VECTOR_SHUFFLE handling, and the existing code tends to make it very difficult for various vector and load combines. Differential Revision: https://reviews.llvm.org/D42308 llvm-svn: 323541
* [DAG] Teach findBaseOffset to interpret indexes of indexed memory operationsNirav Dave2018-01-262-4/+2
| | | | | | Indexed outputs are addition / subtractions and can be interpreted as such. llvm-svn: 323539
* [MIPS] Don't crash on unsized extern types with -mgpoptAlexander Richardson2018-01-261-0/+22
| | | | | | | | | | | | | | Summary: This fixes an assertion when building the FreeBSD MIPS64 kernel. Reviewers: atanasyan, sdardis, emaste Reviewed By: sdardis Subscribers: krytarowski, llvm-commits Differential Revision: https://reviews.llvm.org/D42571 llvm-svn: 323536
* [DAGCombine] reduceBuildVecToShuffle - ensure EXTRACT_VECTOR_ELT index is in ↵Simon Pilgrim2018-01-261-0/+15
| | | | | | | | range From OSS Fuzz Test Case #5688 llvm-svn: 323535
* [X86][SSE] Add tests for vector truncation with PACKUS style signed saturationSimon Pilgrim2018-01-261-0/+3315
| | | | | | PACKUS - truncates signed value, saturating to [0,unsigned_max_trunc] llvm-svn: 323531
* [MIR] Add support for addrspace in MIRFrancis Visoiu Mistrih2018-01-265-9/+40
| | | | | | | | | | Add support for printing / parsing the addrspace of a MachineMemOperand. Fixes PR35970. Differential Revision: https://reviews.llvm.org/D42502 llvm-svn: 323521
* [AMDGPU] fix LDS f32 intrinsicsDaniil Fukalov2018-01-261-18/+18
| | | | | | | | | | | | - using qualified pointer addrspace in intrinsics class to avoid .f32 mangling - changed too common atomic mangling to ds - added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic Reviewed by: b-sumner Differential Revision: https://reviews.llvm.org/D42383 llvm-svn: 323516
* [ARM] Accept a subset of Thumb GPR register class when emitting an SP-relativeMomchil Velikov2018-01-261-0/+25
| | | | | | | | | | | | | load instruction The function `Thumb1InstrInfo::loadRegFromStackSlot` accepts only the `tGPR` register class. The function serves to emit a `tLDRspi` instruction and certainly any subset of the `tGPR` register class is a valid destination of the load. Differential revision: https://reviews.llvm.org/D42535 llvm-svn: 323514
* [X86FixupBWInsts] Prefer positive checks in the test. NFCAndrei Elovikov2018-01-261-1/+1
| | | | | | | | | | | | Reviewers: andrew.w.kaylor, craig.topper, MatzeB Reviewed By: andrew.w.kaylor Subscribers: aivchenk, llvm-commits Differential Revision: https://reviews.llvm.org/D42531 llvm-svn: 323513
* [ARM] Armv8.2-A FP16 code generation (part 1/3)Sjoerd Meijer2018-01-262-1/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the groundwork for Armv8.2-A FP16 code generation . Clang passes and returns _Float16 values as floats, together with the required bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318. We will implement half-precision argument passing/returning lowering in the ARM backend soon, but for now this means that this: _Float16 sub(_Float16 a, _Float16 b) { return a + b; } gets lowered to this: define float @sub(float %a.coerce, float %b.coerce) { entry: %0 = bitcast float %a.coerce to i32 %tmp.0.extract.trunc = trunc i32 %0 to i16 %1 = bitcast i16 %tmp.0.extract.trunc to half <SNIP> %add = fadd half %1, %3 <SNIP> } When FullFP16 is *not* supported, we don't make f16 a legal type, and we get legalization for "free", i.e. nothing changes and everything works as before. And also f16 argument passing/returning is handled. When FullFP16 is supported, we do make f16 a legal type, and have 2 places that we need to patch up: f16 argument passing and returning, which involves minor tweaks to avoid unnecessary code generation for some bitcasts. As a "demonstrator" that this works for the different FP16, FullFP16, softfp modes, etc., I've added match rules to the VSUB instruction description showing that we can codegen this instruction from IR, but more importantly, also to some conversion instructions. These conversions were causing issue before in the FP16 and FullFP16 cases. I've also added match rules to the VLDRH and VSTRH desriptions, so that we can actually compile the entire half-precision sub code example above. This showed that these loads and stores had the wrong addressing mode specified: AddrMode5 instead of AddrMode5FP16, which turned out not be implemented at all, so that has also been added. This is the minimal patch that shows all the different moving parts. In patch 2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the remaining Armv8.2-A FP16 instruction descriptions. Thanks to Sam Parker and Oliver Stannard for their help and reviews! Differential Revision: https://reviews.llvm.org/D38315 llvm-svn: 323512
* [NFC] fix trivial typos in comments and documentsHiroshi Inoue2018-01-261-1/+1
| | | | | | "in in" -> "in", "on on" -> "on" etc. llvm-svn: 323508
* [CGP] Re-enable Select in complex addressing mode.Serguei Katkov2018-01-261-1/+1
| | | | | | Switch Select handling on after fixing two bugs: rL323192 and rL323497. llvm-svn: 323498
* [X86] Fix killed flag handling in X86FixupLea passSerguei Katkov2018-01-262-2/+2
| | | | | | | | | | | | | | | | When pass creates a MOV instruction for lea (%base,%index,1), %dst => mov %base,%dst; add %index,%dst modification it should clean the killed flag for base if base is equal to index. Otherwise verifier complains about usage of killed register in add instruction. Reviewers: lsaba, zvi, zansari, aaboud Reviewed By: lsaba Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42522 llvm-svn: 323497
* [CodeGen] Ignore private symbols in llvm.used for COFFShoaib Meenai2018-01-261-1/+4
| | | | | | | Similar to the existing handling for internal symbols, private symbols are also not visible to the linker and should be ignored. llvm-svn: 323483
* [Hexagon] SETEQ and SETNE are valid integer condition codesKrzysztof Parzyszek2018-01-251-0/+21
| | | | llvm-svn: 323452
* [X86][SSE] Add tests for vector truncation with signed saturationSimon Pilgrim2018-01-251-0/+3520
| | | | | | AVX512 isn't using X86ISD::VTRUNCS and SSE/AVX isn't using PACKSS/PACKUS llvm-svn: 323428
* [X86][SSE] Add tests for vector truncation with unsigned saturationSimon Pilgrim2018-01-251-0/+2352
| | | | | | AVX512 tends to do a good job, but there are some missed opportunities with SSE/AVX llvm-svn: 323422
* X86 Tests: Add AVX+XOP config to SDIV combine testsZvi Rackover2018-01-251-0/+622
| | | | | | | As pointed out in D42479, XOP also needs to be covered as it supports vector shifts with variable shift amount. llvm-svn: 323418
* [X86] Expand IMUL/MUL instregexs in Intel scheduler models. Add load latency ↵Craig Topper2018-01-251-29/+29
| | | | | | | | | | | | to some of them in SkylakeClient model. The regular expressions and the imul names caused some instructions to be matched by multiple regexs creating unpredictable results. This changes them all to use explicit instrs instead. While doing this I also found that some instructions in Skylake were missing load latency so I fixed that too. llvm-svn: 323406
* [GlobalISel] Add a requires: asserts to a test.Amara Emerson2018-01-241-0/+1
| | | | llvm-svn: 323384
* [AArch64][GlobalISel] Fall back during AArch64 isel if we have a volatile load.Amara Emerson2018-01-241-0/+14
| | | | | | | | | | | | | | | The tablegen imported patterns for sext(load(a)) don't check for single uses of the load or delete the original after matching. As a result two loads are left in the generated code. This particular issue will be fixed by adding support for a G_SEXTLOAD opcode in future. There are however other potential issues around this that wouldn't be fixed by a G_SEXTLOAD, so until we have a proper solution we don't try to handle volatile loads at all in the AArch64 selector. Fixes/works around PR36018. llvm-svn: 323371
* [GlobalISel] Don't fall back to FastISel.Amara Emerson2018-01-241-0/+10
| | | | | | | Apparently checking the pass structure isn't enough to ensure that we don't fall back to FastISel, as it's set up as part of the SelectionDAGISel. llvm-svn: 323369
* [X86][SSE] Aggressively use PMADDWD for v4i32 multiplies with 17 or more ↵Simon Pilgrim2018-01-243-170/+374
| | | | | | | | | | | | leading zeros As discussed in D41484, PMADDWD for 'zero extended' vXi32 is nearly always a better option than PMULLD: On SNB it will result in code that isn't any faster, but not any slower so we may as well keep it. On KNL it only has half the throughput, so I've disabled it on there - ideally there'd be a better way than this. Differential Revision: https://reviews.llvm.org/D42258 llvm-svn: 323367
* [X86][SSE] Add slow-pmulld attribute (silvermont-style) testSimon Pilgrim2018-01-241-247/+505
| | | | | | Requested by @zvi on D42258 llvm-svn: 323364
* Revert r321751, "StructurizeCFG: Fix broken backedge detection"Nicolai Haehnle2018-01-242-88/+42
| | | | | | | | | | | It causes regressions in various OpenGL test suites. Keep the test cases introduced by r321751 as XFAIL, and add a test case for the regression. Change-Id: I90b4cc354f68cebe5fcef1f2422dc8fe1c6d3514 Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36015 llvm-svn: 323355
* [ARM] Expand long shifts for Thumb1 to __aeabi_ callsWeiming Zhao2018-01-241-0/+15
| | | | | | | | | | | | | | Summary: For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1. Reviewers: samparker Reviewed By: samparker Subscribers: samparker, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42401 llvm-svn: 323354
* [X86] Fix some inconsistencies in the itineraries and Sched for ↵Craig Topper2018-01-242-2/+2
| | | | | | | | (V)PEXTRW/(V)PINSRW The weirdest being that PEXTRWrr was tagged as a memory operation. llvm-svn: 323353
* [X86] Adjust names of PINSRW/PEXTRW intructions between MMX/SSE/AVX/AVX512 ↵Craig Topper2018-01-241-3/+3
| | | | | | for consistency and to maybe enable more regular expression compaction in the scheduler models. NFCI llvm-svn: 323352
* [Hexagon] Run late copy propagation and dead code elimination passesKrzysztof Parzyszek2018-01-2410-37/+43
| | | | llvm-svn: 323346
* X86 Tests: Add more sdiv combine cases. NFCZvi Rackover2018-01-241-9/+3160
| | | | | | Add cases with vector non-splat pow2 contant divider. llvm-svn: 323329
* [AArch64] Avoid unnecessary vector byte-swapping in big-endianPablo Barrio2018-01-241-55/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Loads/stores of some NEON vector types are promoted to other vector types with different lane sizes but same vector size. This is not a problem in little-endian but, when in big-endian, it requires additional byte reversals required to preserve the lane ordering while keeping the right endianness of the data inside each lane. For example: %1 = load <4 x half>, <4 x half>* %p results in the following assembly: ld1 { v0.2s }, [x1] rev32 v0.4h, v0.4h This patch changes the promotion of these loads/stores so that the actual vector load/store (LD1/ST1) takes care of the endianness correctly and there is no need for further byte reversals. The previous code now results in the following assembly: ld1 { v0.4h }, [x1] Reviewers: olista01, SjoerdMeijer, efriedma Reviewed By: efriedma Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D42235 llvm-svn: 323325
* [DAGCombiner] Bail out if vector size is not a multipleSven van Haastregt2018-01-241-0/+12
| | | | | | | | | | | | For the included test case, the DAG transformation concat_vectors(scalar, undef) -> scalar_to_vector(sclr) would attempt to create a v2i32 vector for a v9i8 concat_vector. Bail out to avoid creating a bitcast with mismatching sizes later on. Differential Revision: https://reviews.llvm.org/D42379 llvm-svn: 323312
* [ARM] Call __chkstk for dynamic stack allocation in all windows environmentsMartin Storsjo2018-01-242-4/+3
| | | | | | | | | | | | | | | This matches what MSVC does for alloca() function calls on ARM. Even if MSVC doesn't support VLAs at the language level, it does support the alloca function. On the clang level, both the _alloca() (when emulating MSVC, which is what the alloca() function expands to) and __builtin_alloca() builtin functions, and VLAs, map to the same LLVM IR "alloca" function - so within LLVM they're not distinguishable from each other. Differential Revision: https://reviews.llvm.org/D42292 llvm-svn: 323308
* [GlobalMerge] Don't merge dllexport globalsMartin Storsjo2018-01-242-0/+30
| | | | | | | | | Merging such globals loses the dllexport attribute. Add a test to check that normal globals still are merged. Differential Revision: https://reviews.llvm.org/D42127 llvm-svn: 323307
* [NFC] fix trivial typos in commentsHiroshi Inoue2018-01-244-5/+5
| | | | | | "the the" -> "the" llvm-svn: 323302
* Don't assume a null GV is local for ELF and MachO.Rafael Espindola2018-01-246-20/+20
| | | | | | | | | | | This is already a simplification, and should help with avoiding a plt reference when calling an intrinsic with -fno-plt. With this change we return false for null GVs, so the caller only needs to check the new metadata to decide if it should use foo@plt or *foo@got. llvm-svn: 323297
OpenPOWER on IntegriCloud