summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Recommiting unsigned saturation with a bugfix.Elena Demikhovsky2017-01-191-0/+100
| | | | | | | A test case that crached is added to avx512-trunc.ll. (PR31589) llvm-svn: 292479
* Re-commit: [globalisel] Tablegen-erate current Register Bank InformationDaniel Sanders2017-01-199-191/+88
| | | | | | | | | | | | | | | | | | | | | Summary: Adds a RegisterBank tablegen class that can be used to declare the register banks and an associated tablegen pass to generate the necessary code. Changes since first commit attempt: * Added missing guards * Added more missing guards * Found and fixed a use-after-free bug involving Twine locals Reviewers: t.p.northover, ab, rovka, qcolombet Reviewed By: qcolombet Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka Differential Revision: https://reviews.llvm.org/D27338 llvm-svn: 292478
* GlobalISel: Implement widening for shiftsJustin Bogner2017-01-191-5/+9
| | | | llvm-svn: 292476
* [AVX-512] Support ADD/SUB/MUL of mask vectorsCraig Topper2017-01-191-18/+19
| | | | | | | | | | | | | | | | | Summary: Currently we expand and scalarize these operations, but I think we should be able to implement ADD/SUB with KXOR and MUL with KAND. We already do this for scalar i1 operations so I just extended it to vectors of i1. Reviewers: zvi, delena Reviewed By: delena Subscribers: guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D28888 llvm-svn: 292474
* AMDGPU: Disable some fneg combines unless nszMatt Arsenault2017-01-192-0/+16
| | | | | | | | | | | | For -(x + y) -> (-x) + (-y), if x == -y, this would change the result from -0.0 to 0.0. Since the fma/fmad combine is an extension of this problem it also applies there. fmul should be fine, and I don't think any of the unary operators or conversions should be a problem either. llvm-svn: 292473
* AMDGPU: Remove modifiers from v_div_scale_*Matt Arsenault2017-01-192-9/+7
| | | | | | | | They seem to produce nonsense results when used. This should be applied to the release branch. llvm-svn: 292472
* [X86] Merge LowerADD and LowerSUB into a single LowerADD_SUB since they are ↵Craig Topper2017-01-191-13/+3
| | | | | | identical. llvm-svn: 292469
* [AVX-512] Use VSHUF instructions instead of two inserts as fallback for ↵Craig Topper2017-01-191-78/+33
| | | | | | subvector broadcasts that can't fold the load. llvm-svn: 292466
* [PM] Add LoopVectorize to the default module pipelineMichael Kuperstein2017-01-191-4/+0
| | | | | | | | LV no longer "requires" LCSSA and LoopSimplify, and instead forms them internally as required. So, there's nothing preventing it from being enabled. llvm-svn: 292464
* LowerTypeTests: Implement exporting of type identifiers.Peter Collingbourne2017-01-191-31/+103
| | | | | | | | | | | | Type identifiers are exported by: - Adding coarse-grained information about how to test the type identifier to the summary. - Creating symbols in the object file (aliases and absolute symbols) containing fine-grained information about the type identifier. Differential Revision: https://reviews.llvm.org/D28424 llvm-svn: 292462
* GlobalISel: Implement narrowing for G_LOADJustin Bogner2017-01-191-0/+26
| | | | llvm-svn: 292461
* GlobalISel: Fix text wrapping in a comment. NFCJustin Bogner2017-01-191-2/+1
| | | | llvm-svn: 292460
* Add -debug-info-for-profiling to emit more debug info for sample pgo profile ↵Dehao Chen2017-01-194-8/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | collection Summary: SamplePGO binaries built with -gmlt to collect profile. The current -gmlt debug info is limited, and we need some additional info: * start line of all subprograms * linkage name of all subprograms * standalone subprograms (functions that has neither inlined nor been inlined) This patch adds these information to the -gmlt binary. The impact on speccpu2006 binary size (size increase comparing with -g0 binary, also includes data for -g binary, which does not change with this patch): -gmlt(orig) -gmlt(patched) -g 433.milc 4.68% 5.40% 19.73% 444.namd 8.45% 8.93% 45.99% 447.dealII 97.43% 115.21% 374.89% 450.soplex 27.75% 31.88% 126.04% 453.povray 21.81% 26.16% 92.03% 470.lbm 0.60% 0.67% 1.96% 482.sphinx3 5.77% 6.47% 26.17% 400.perlbench 17.81% 19.43% 73.08% 401.bzip2 3.73% 3.92% 12.18% 403.gcc 31.75% 34.48% 122.75% 429.mcf 0.78% 0.88% 3.89% 445.gobmk 6.08% 7.92% 42.27% 456.hmmer 10.36% 11.25% 35.23% 458.sjeng 5.08% 5.42% 14.36% 462.libquantum 1.71% 1.96% 6.36% 464.h264ref 15.61% 16.56% 43.92% 471.omnetpp 11.93% 15.84% 60.09% 473.astar 3.11% 3.69% 14.18% 483.xalancbmk 56.29% 81.63% 353.22% geomean 15.60% 18.30% 57.81% Debug info size change for -gmlt binary with this patch: 433.milc 13.46% 444.namd 5.35% 447.dealII 18.21% 450.soplex 14.68% 453.povray 19.65% 470.lbm 6.03% 482.sphinx3 11.21% 400.perlbench 8.91% 401.bzip2 4.41% 403.gcc 8.56% 429.mcf 8.24% 445.gobmk 29.47% 456.hmmer 8.19% 458.sjeng 6.05% 462.libquantum 11.23% 464.h264ref 5.93% 471.omnetpp 31.89% 473.astar 16.20% 483.xalancbmk 44.62% geomean 16.83% Reviewers: davidxl, echristo, dblaikie Reviewed By: echristo, dblaikie Subscribers: aprantl, probinson, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D25434 llvm-svn: 292457
* [LV] Run loop-simplify and LCSSA explicitly instead of "requiring" themMichael Kuperstein2017-01-191-5/+13
| | | | | | | | | | | | This changes the vectorizer to explicitly use the loopsimplify and lcssa utils, instead of "requiring" the transformations as if they were analyses. This is not NFC, since it changes the LCSSA behavior - we no longer run LCSSA for all loops, but rather only for the loops we expect to modify. Differential Revision: https://reviews.llvm.org/D28868 llvm-svn: 292456
* LiveIntervalAnalysis: Cleanup; NFCMatthias Braun2017-01-191-81/+64
| | | | | | | | - Fix doxygen comments: Do not repeat name, remove duplicated doxygen comment (on declaration + implementation), etc. - Use more range based for llvm-svn: 292455
* [NVPTX] Fix lowering of fp16 ISD::FNEG.Artem Belevich2017-01-191-0/+2
| | | | | | | | | There's no neg.f16 instruction, so negation has to be done via subtraction from zero. Differential Revision: https://reviews.llvm.org/D28876 llvm-svn: 292452
* [SCEV] Make getUDivExactExpr handle non-nuw multiplies correctly.Eli Friedman2017-01-181-16/+21
| | | | | | | | | | | | | | | | | To avoid regressions, make ScalarEvolution::createSCEV a bit more clever. Also get rid of some useless code in ScalarEvolution::howFarToZero which was hiding this bug. No new testcase because it's impossible to actually expose this bug: we don't have any in-tree users of getUDivExactExpr besides the two functions I just mentioned, and they both dodged the problem. I'll try to add some interesting users in a followup. Differential Revision: https://reviews.llvm.org/D28587 llvm-svn: 292449
* Preserve domtree and loop-simplify for runtime unrolling.Eli Friedman2017-01-183-21/+82
| | | | | | | | | | | | | | | | | Mostly straightforward changes; we just didn't do the computation before. One sort of interesting change in LoopUnroll.cpp: we weren't handling dominance for children of the loop latch correctly, but foldBlockIntoPredecessor hid the problem for complete unrolling. Currently punting on loop peeling; made some minor changes to isolate that problem to LoopUnrollPeel.cpp. Adds a flag -unroll-verify-domtree; it verifies the domtree immediately after we finish updating it. This is on by default for +Asserts builds. Differential Revision: https://reviews.llvm.org/D28073 llvm-svn: 292447
* Treat segment [B, E) as not overlapping block with boundaries [A, B)Krzysztof Parzyszek2017-01-181-1/+6
| | | | llvm-svn: 292446
* [Hexagon] Remove dead defs from the live set when expanding wstoresKrzysztof Parzyszek2017-01-181-1/+8
| | | | llvm-svn: 292445
* Revert r291670 because it introduces a crash.Michael Kuperstein2017-01-181-97/+0
| | | | | | | | | r291670 doesn't crash on the original testcase from PR31589, but it crashes on a slightly more complex one. PR31589 has the new reproducer. llvm-svn: 292444
* Improve the `-filter-print-funcs` option to skip the banner for CGSCC pass ↵Mehdi Amini2017-01-181-3/+13
| | | | | | | | | | | | | | | when nothing is to be printed Before, it would print a sequence of: *** IR Dump After Function Integration/Inlining ****** *** IR Dump After Function Integration/Inlining ****** *** IR Dump After Function Integration/Inlining ****** ... for every single function in the module. llvm-svn: 292442
* [InstCombine] add an assert to make a shl+icmp transform assumption ↵Sanjay Patel2017-01-181-1/+9
| | | | | | explicit; NFCI llvm-svn: 292440
* [CodeGenPrepare] Fix a typo in the comment. NFC.Haicheng Wu2017-01-181-1/+1
| | | | | | | | encode => endcode. Differential Revision: https://reviews.llvm.org/D28866 llvm-svn: 292438
* [InstCombine] remove a redundant check; NFCISanjay Patel2017-01-181-2/+0
| | | | | | | I missed deleting this check when I refactored this chunk in: https://reviews.llvm.org/rL292260 llvm-svn: 292433
* ThinLTOBitcodeWriter: Clear comdats on filtered globals.Peter Collingbourne2017-01-181-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D28839 llvm-svn: 292431
* Cloning: Copy comdats when cloning globals.Peter Collingbourne2017-01-181-0/+13
| | | | | | Differential Revision: https://reviews.llvm.org/D28838 llvm-svn: 292430
* Fix up a comment. NFC.Michael Kuperstein2017-01-181-1/+0
| | | | llvm-svn: 292425
* [LV] Allow reductions that have several uses outside the loopMichael Kuperstein2017-01-182-10/+14
| | | | | | | | | | | We currently check whether a reduction has a single outside user. We don't really need to require that - we just need to make sure a single value is used externally. The number of external users of that value shouldn't actually matter. Differential Revision: https://reviews.llvm.org/D28830 llvm-svn: 292424
* [AArch64] Generate literals by the little endEvandro Menezes2017-01-183-26/+26
| | | | | | | | | | | ARM seems to prefer that long literals be formed from their little end in order to promote the fusion of the instrs pairs MOV/MOVK and MOVK/MOVK on Cortex A57 and others (v. "Cortex A57 Software Optimisation Guide", section 4.14). Differential revision: https://reviews.llvm.org/D28697 llvm-svn: 292422
* [NewGVN] We don't use postdom info anymore. Update.Davide Italiano2017-01-181-1/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D28842 llvm-svn: 292421
* [ThinLTO] Add a recursive step in Metadata lazy-loadingMehdi Amini2017-01-181-4/+17
| | | | | | | | | | | | | | | | | | | | | | Summary: Without this, we're stressing the RAUW of unique nodes, which is a costly operation. This is intended to limit the number of RAUW, and is very effective on the total link-time of opt with ThinLTO, before: real 4m4.587s user 15m3.401s sys 0m23.616s after: real 3m25.261s user 12m22.132s sys 0m24.152s Reviewers: tejohnson, pcc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28751 llvm-svn: 292420
* [AMDGPU] Do not allow register coalescer to create big superregsStanislav Mekhanoshin2017-01-182-0/+27
| | | | | | | | | | | | | | | | Limit register coalescer by not allowing it to artificially increase size of registers beyond dword. Such super-registers are in fact register sequences and not distinct HW registers. With more super-regs we would need to allocate adjacent registers and constraint regalloc more than needed. Moreover, our super registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2, VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers allocation even more, resulting in excessive spilling. Differential Revision: https://reviews.llvm.org/D28782 llvm-svn: 292413
* GlobalISel: Implement narrowing for G_STOREJustin Bogner2017-01-181-2/+23
| | | | | | | Legalize stores of types that are too wide by breaking them up into sequences of smaller stores. llvm-svn: 292412
* Don't create a comdat group for a dropped def with initializerTeresa Johnson2017-01-181-2/+5
| | | | | | | | | | | | | | | | | | Non-prevailing weak/linkonce odr symbols will be dropped by ThinLTO to available_externally when possible. If they had an initializer in the global_ctors list, a comdat group was being created. This code already had logic to skip available_externally defs, but now the EliminateAvailableExternally pass will drop these symbols to declarations earlier. Change the check to skip all declarations for linker (which includes available_externally along with declarations). Reviewers: mehdi_amini Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28737 llvm-svn: 292408
* Revert 292404 due to buildbot failures.Kirill Bobyrev2017-01-182-5/+5
| | | | llvm-svn: 292407
* [X86] Minor code cleanup to fix several clang-tidy warnings. NFCKirill Bobyrev2017-01-182-5/+5
| | | | llvm-svn: 292404
* [ARM] Create SubtargetFeatures from build attrsSam Parker2017-01-181-42/+153
| | | | | | | | | | | An ELFObjectFile can now create SubtargetFeatures from the available ARM build attributes, in a similar manner to MIPS. I've moved the MIPS code into its own function and the ARM handler also has a separate function. Differential Revision: https://reviews.llvm.org/D28291 llvm-svn: 292403
* raw_fd_ostream: Make file handles non-inheritable by defaultPavel Labath2017-01-181-2/+2
| | | | | | | | | | | | | | | | Summary: This makes the file descriptors on unix platform non-inheritable (O_CLOEXEC). There is no change in behavior on windows, as the handles were already non-inheritable there. Reviewers: rnk, rafael Subscribers: llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D28854 llvm-svn: 292401
* [Assembler] Fix crash when assembling .quad for AArch32.Chad Rosier2017-01-181-1/+2
| | | | | | | | | | | A 64-bit relocation does not exist in 32-bit ARMELF. Report an error instead of crashing. PR23870 Patch by Sanne Wouda (sanwou01). Differential Revision: https://reviews.llvm.org/D28851 llvm-svn: 292373
* [thumb,framelowering] Reset NoVRegs in Thumb1FrameLowering::emitPrologue.Florian Hahn2017-01-182-0/+6
| | | | | | | | | | | | | | | | | | | | | | | Summary: In this function, virtual registers can be introduced (for example through calls to emitThumbRegPlusImmInReg). doScavengeFrameVirtualRegs will replace those virtual registers with concrete registers later on in PrologEpilogInserter, which sets NoVRegs again. This patch fixes the Codegen/Thumb/segmented-stacks.ll test case which failed with expensive checks. https://llvm.org/bugs/show_bug.cgi?id=27484 Reviewers: rnk, bkramer, olista01 Reviewed By: olista01 Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D28829 llvm-svn: 292372
* [InstCombine][AVX2] Add DemandedElts support for VPERMD/VPERMPS shufflesSimon Pilgrim2017-01-181-1/+4
| | | | | | Simplify a vpermv shuffle mask based on the elements of the mask that are actually demanded. llvm-svn: 292371
* Re-revert: [globalisel] Tablegen-erate current Register Bank InformationDaniel Sanders2017-01-188-86/+191
| | | | | | | More missing guards. My build didn't notice it due to a stale file left over from a Global ISel build. llvm-svn: 292369
* Re-commit: [globalisel] Tablegen-erate current Register Bank InformationDaniel Sanders2017-01-188-191/+86
| | | | | | | | | | | | | | | | | | | | | | | Summary: Adds a RegisterBank tablegen class that can be used to declare the register banks and an associated tablegen pass to generate the necessary code. Changes since last commit: The new tablegen pass is now correctly guarded by LLVM_BUILD_GLOBAL_ISEL and this should fix the buildbots however it may not be the whole fix. The previous buildbot failures suggest there may be a memory bug lurking that I'm unable to reproduce (including when using asan) or spot in the source. If they re-occur on this commit then I'll need assistance from the bot owners to track it down. Reviewers: t.p.northover, ab, rovka, qcolombet Reviewed By: qcolombet Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka Differential Revision: https://reviews.llvm.org/D27338 llvm-svn: 292367
* [ARM] Create objdump subtarget from build attrsSam Parker2017-01-183-51/+144
| | | | | | | | | | | Enable an ELFObjectFile to read the its arm build attributes to produce a target triple with a specific ARM architecture. llvm-objdump now uses this functionality to automatically produce a more accurate target. Differential Revision: https://reviews.llvm.org/D28769 llvm-svn: 292366
* [InstCombine] Remove unnecessary intrinsics demanded elts handlingSimon Pilgrim2017-01-181-22/+2
| | | | | | As discussed on D28777 - we don't need to handle 'all element' shuffles inside InstCombiner::visitCallInst as InstCombiner::SimplifyDemandedVectorElts will do everything we need. llvm-svn: 292365
* [X86] Improve mul combine for negative multiplayer (2^c - 1)Michael Zuckerman2017-01-181-16/+31
| | | | | | | | | | | This patch improves the mul instruction combine function (combineMul) by adding new layer of logic. In this patch, we are adding the ability to fold (mul x, -((1 << c) -1)) or (mul x, -((1 << c) +1)) into (neg(X << c) -x) or (neg((x << c) + x) respective. Differential Revision: https://reviews.llvm.org/D28232 llvm-svn: 292358
* Revert "[XRay][Arm] Repair XRay table emission on Arm32 and add tests to ↵Renato Golin2017-01-181-3/+0
| | | | | | | | | | | identify such problem earlier" This reverts commit r292210, as it broke the Thumb buldbot with: clang-5.0: error: the clang compiler does not support '-fxray-instrument on thumbv7-unknown-linux-gnueabihf'. llvm-svn: 292357
* [SystemZ] Proper handling of undef flag while expanding pseudo.Jonas Paulsson2017-01-182-7/+11
| | | | | | | | During post-RA pseudo expansion, an 'undef' flag of the source operand should be propagated by emitGRX32Move(). Review: Ulrich Weigand llvm-svn: 292353
* [X86] Fix for bugzilla 31576 - add support for "data32" instruction prefixMarina Yatsina2017-01-183-2/+18
| | | | | | | | | | | This patch fixes bugzilla 31576 (https://llvm.org/bugs/show_bug.cgi?id=31576). "data32" instruction prefix was not defined in the llvm. An exception had to be added to the X86 tablegen and AsmPrinter because both "data16" and "data32" are encoded to 0x66 (but in different modes). Differential Revision: https://reviews.llvm.org/D28468 llvm-svn: 292352
OpenPOWER on IntegriCloud