summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [libFuzzer] Marking exported symbols as visible. Patch by Mike AizatskyKostya Serebryany2015-09-301-1/+2
| | | | llvm-svn: 248954
* [SLP] Don't vectorize loads of non-packed types (like i1, i2).Michael Zolotukhin2015-09-301-1/+18
| | | | | | | | | | | | | | | | | | Summary: Given an array of i2 elements, 4 consecutive scalar loads will be lowered to i8-sized loads and thus will access 4 consecutive bytes in memory. If we vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in memory. Hence, we should prohibit vectorization in such cases. PS: Initial patch was proposed by Arnold. Reviewers: aschwaighofer, nadav, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13277 llvm-svn: 248943
* Fix -Wsign-compare warningDavid Blaikie2015-09-301-1/+1
| | | | llvm-svn: 248942
* Fix debug info with SafeStack.Evgeniy Stepanov2015-09-305-14/+27
| | | | llvm-svn: 248933
* [AArch64] Remove an unnecessary restriction on pre-index instructions.Chad Rosier2015-09-301-2/+1
| | | | | | | | Previously, the index was constrained to the size of the memory operation for no apparent reason. This change removes that constraint so that we can form pre-index instructions with any valid offset. llvm-svn: 248931
* DeadCodeElimination: rewrite to be fasterFiona Glaser2015-09-301-31/+45
| | | | | | | | | | Same strategy as simplifyInstructionsInBlock. ~1/3 less time on my test suite. This pass doesn't have many in-tree users, but getting rid of an O(N^2) worst case and making it cleaner should at least make it a viable alternative to ADCE, since it's now consistently somewhat faster. llvm-svn: 248927
* [PowerPC] Disable shrink wrappingHal Finkel2015-09-301-2/+2
| | | | | | | Shrink wrapping is causing a self-hosting failure on PPC64/Linux. Disable for now until the problem can be fixed. llvm-svn: 248924
* [ARM] Support for ARMv6-Z / ARMv6-ZK missingArtyom Skrobov2015-09-301-2/+1
| | | | | | | | | | | | | | | As Richard Barton observed at http://reviews.llvm.org/D12937#inline-107121 TargetParser in LLVM has insufficient support for ARMv6Z and ARMv6ZK. In particular, there were no tests for TrustZone being supported in these architectures. The patch clears a FIXME: left by Saleem Abdulrasool in r201471, and fixes his test case which hadn't really been testing what it was claiming to test. Differential Revision: http://reviews.llvm.org/D13236 llvm-svn: 248921
* SLPVectorizer: limit the scheduling region size per basic block.Erik Eckstein2015-09-301-9/+56
| | | | | | | | | | | Usually large blocks are not a problem. But if a large block (> 10k instructions) contains many (potential) chains of vector instructions, and those chains are spread over a wide range of instructions, then scheduling becomes a compile time problem. This change introduces a limit for the accumulate scheduling region size of a block. For real-world functions this limit will never be exceeded (it's about 10x larger than the maximum value seen in the test-suite and external test suite). llvm-svn: 248917
* [AArch64] Use helper function to improve readability. NFC.Chad Rosier2015-09-301-2/+1
| | | | llvm-svn: 248914
* [InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin ↵Andrea Di Biagio2015-09-301-0/+41
| | | | | | | | | | | | | | | | shuffles if the shuffle mask is constant. This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a builtin shuffle if the mask is constant. Converting byte shuffle intrinsic calls to builtin shuffles can help finding more opportunities for combining shuffles later on in selection dag. We may end up with byte shuffles with constant masks as the result of inlining. Differential Revision: http://reviews.llvm.org/D13252 llvm-svn: 248913
* Refactor computeKnownBits alignment handling codeArtur Pilipenko2015-09-301-53/+38
| | | | | | | | Reviewed By: reames, hfinkel Differential Revision: http://reviews.llvm.org/D12958 llvm-svn: 248892
* [ARM][NEON] Use address space in vld([1234]|[234]lane) and ↵Jeroen Ketema2015-09-302-6/+62
| | | | | | | | | | | | | | | | | | | | | vst([1234]|[234]lane) instructions This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234], vst[234]lane ARM neon intrinsics and associates an address space with the pointer that these intrinsics take. This changes, e.g., <2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32) to <2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32) This change ensures that address spaces are fully taken into account in the ARM target during lowering of interleaved loads and stores. Differential Revision: http://reviews.llvm.org/D12985 llvm-svn: 248887
* [X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP ↵Simon Pilgrim2015-09-306-44/+157
| | | | | | | | | | | | shift instructions The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes. Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases. Differential Revision: http://reviews.llvm.org/D8690 llvm-svn: 248878
* InstrProf: Don't call std::unique twice hereJustin Bogner2015-09-301-1/+0
| | | | llvm-svn: 248872
* http://reviews.llvm.org/D13145Dehao Chen2015-09-302-66/+293
| | | | | | Support hierarachical sample profile format. llvm-svn: 248865
* [safestack] Fix a stupid mix-up in the direct-tls code path.Evgeniy Stepanov2015-09-301-1/+1
| | | | llvm-svn: 248863
* AMDGPU/SI: Don't set DATA_FORMAT if ADD_TID_ENABLE is setMarek Olsak2015-09-294-8/+18
| | | | | | | | | | to prevent setting a huge stride, because DATA_FORMAT has a different meaning if ADD_TID_ENABLE is set. This is a candidate for stable llvm 3.7. Tested-and-Reviewed-by: Christian König <christian.koenig@amd.com> llvm-svn: 248858
* [WinEH] Setup RBP correctly in Win64 funclet prologuesReid Kleckner2015-09-291-52/+44
| | | | | | | Previously local variable captures just didn't work in 64-bit. Now we can access local variables more or less correctly. llvm-svn: 248857
* [WinEH] Ensure that funclets obey the x64 ABIDavid Majnemer2015-09-292-38/+32
| | | | | | | | | | The x64 ABI requires that epilogues do not contain code other than stack adjustments and some limited control flow. However, we'd insert code to initialize the return address after stack adjustments. Instead, insert EAX/RAX with the current value before we create the stack adjustments in the epilogue. llvm-svn: 248839
* InstrProf: Support for value profiling in the indexed profile formatJustin Bogner2015-09-294-54/+208
| | | | | | | | | Add support to the indexed instrprof reader and writer for the format that will be used for value profiling. Patch by Betul Buyukkurt, with minor modifications. llvm-svn: 248833
* HHVM calling conventions.Maksim Panchenko2015-09-299-22/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | HHVM calling convention, hhvmcc, is used by HHVM JIT for functions in translated cache. We currently support LLVM back end to generate code for X86-64 and may support other architectures in the future. In HHVM calling convention any GP register could be used to pass and return values, with the exception of R12 which is reserved for thread-local area and is callee-saved. Other than R12, we always pass RBX and RBP as args, which are our virtual machine's stack pointer and frame pointer respectively. When we enter translation cache via hhvmcc function, we expect the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed to standard ABI alignment. This affects stack object alignment and stack adjustments for function calls. One extra calling convention, hhvm_ccc, is used to call C++ helpers from HHVM's translation cache. It is almost identical to standard C calling convention with an exception of first argument which is passed in RBP (before we use RDI, RSI, etc.) Differential Revision: http://reviews.llvm.org/D12681 llvm-svn: 248832
* [AArch64] Add support for pre- and post-index LDPSWs.Chad Rosier2015-09-291-5/+7
| | | | llvm-svn: 248825
* [WinEH] Teach AsmPrinter about funcletsDavid Majnemer2015-09-296-32/+145
| | | | | | | | | | | Summary: Funclets have been turned into functions by the time they hit the object file. Make sure that they have decent names for the symbol table and CFI directives explaining how to reason about their prologues. Differential Revision: http://reviews.llvm.org/D13261 llvm-svn: 248824
* Rename some function arguments in MachineBasicBlock.cpp/h by turning the ↵Cong Hou2015-09-291-55/+55
| | | | | | first letter into upper case. NFC. llvm-svn: 248821
* http://reviews.llvm.org/D13231Dehao Chen2015-09-291-46/+54
| | | | | | Change lookup functions to const functions. llvm-svn: 248818
* [AArch64] Add integer pre- and post-index halfword/byte loads and stores.Chad Rosier2015-09-291-1/+27
| | | | llvm-svn: 248817
* Revert r248810 which breaks tests.Dehao Chen2015-09-291-3/+2
| | | | llvm-svn: 248814
* http://reviews.llvm.org/D13231Dehao Chen2015-09-291-2/+3
| | | | | | Change lookup functions to const functions. llvm-svn: 248810
* Addition of interfaces the BE to conform to Table A-2 of ELF V2 ABI V1.1Nemanja Ivanovic2015-09-292-31/+43
| | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D13191 Back end portion of the fifth round of additions to altivec.h. llvm-svn: 248809
* [AArch64] Scale offsets by the size of the memory operation. NFC.Chad Rosier2015-09-291-17/+21
| | | | | | | | | The immediate in the load/store should be scaled by the size of the memory operation, not the size of the register being loaded/stored. This change gets us one step closer to forming LDPSW instructions. This change also enables pre- and post-indexing for halfword and byte loads and stores. llvm-svn: 248804
* [ValueTracking] Lower dom-conditions-dom-blocks and dom-conditions-max-uses ↵Igor Laevsky2015-09-291-2/+2
| | | | | | | | | | thresholds On some of our benchmarks this change shows about 50% compile time improvement without any noticeable performance difference. Differential Revision: http://reviews.llvm.org/D13248 llvm-svn: 248801
* [AArch64] Remove some redundant cases. NFC.Chad Rosier2015-09-291-23/+15
| | | | llvm-svn: 248800
* [ValueTracking] Teach isKnownNonZero about monotonically increasing PHIsJames Molloy2015-09-291-0/+20
| | | | | | | | If a PHI starts at a non-negative constant, monotonically increases (only adds of a constant are supported at the moment) and that add does not wrap, then the PHI is known never to be zero. llvm-svn: 248796
* Arguments spilled on the stack before a function call may haveJeroen Ketema2015-09-293-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | alignment requirements, for example in the case of vectors. These requirements are exploited by the code generator by using move instructions that have similar alignment requirements, e.g., movaps on x86. Although the code generator properly aligns the arguments with respect to the displacement of the stack pointer it computes, the displacement itself may cause misalignment. For example if we have %3 = load <16 x float>, <16 x float>* %1, align 64 call void @bar(<16 x float> %3, i32 0) the x86 back-end emits: movaps 32(%ecx), %xmm2 movaps (%ecx), %xmm0 movaps 16(%ecx), %xmm1 movaps 48(%ecx), %xmm3 subl $20, %esp <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards movaps %xmm3, (%esp) <-- movaps requires 16-byte alignment, while %esp is not aligned as such. movl $0, 16(%esp) calll __bar To solve this, we need to make sure that the computed value with which the stack pointer is changed is a multiple af the maximal alignment seen during its computation. With this change we get proper alignment: subl $32, %esp movaps %xmm3, (%esp) Differential Revision: http://reviews.llvm.org/D12337 llvm-svn: 248786
* [InstCombine] Improve Vector Demanded Bits Through BitcastsSimon Pilgrim2015-09-291-35/+34
| | | | | | | | | | | | Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements. This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector). This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts. Differential Revision: http://reviews.llvm.org/D12935 llvm-svn: 248784
* [LoopUnswitch] Add block frequency analysis to recognize hot/cold regionsChen Li2015-09-291-0/+48
| | | | | | | | | | | | Summary: This patch adds block frequency analysis to LoopUnswitch pass to recognize hot/cold regions. For cold regions the pass only performs trivial unswitches since they do not increase code size, and for hot regions everything works as before. This helps to minimize code growth in cold regions and be more aggressive in hot regions. Currently the default cold regions are blocks with frequencies below 20% of function entry frequency, and it can be adjusted via -loop-unswitch-cold-block-frequency flag. The entire feature is controlled via -loop-unswitch-with-block-frequency flag and it is off by default. Reviewers: broune, silvas, dnovillo, reames Subscribers: davidxl, llvm-commits Differential Revision: http://reviews.llvm.org/D11605 llvm-svn: 248777
* [CMake] X86AsmParser: Prune redundant LINK_LIBS.NAKAMURA Takumi2015-09-291-3/+0
| | | | | | It is described in LLVMBuild.txt. llvm-svn: 248771
* Move dbg.declare intrinsics when merging and replacing allocas.Evgeniy Stepanov2015-09-292-4/+12
| | | | | | | | | | | | | | Place new and update dbg.declare calls immediately after the corresponding alloca. Current code in replaceDbgDeclareForAlloca puts the new dbg.declare at the end of the basic block. LLVM codegen has problems emitting debug info in a situation when dbg.declare appears after all uses of the variable. This usually kinda works for inlining and ASan (two users of this function) but not for SafeStack (see the pending change in http://reviews.llvm.org/D13178). llvm-svn: 248769
* RegisterPressure: LiveRegSet tracks register units not physregsMatthias Braun2015-09-291-1/+1
| | | | | | | There are always more physical registers and register units so the previous behaviour was correct but we can do with less memory. llvm-svn: 248767
* [WinEH] Fix ip2state table emission with funcletsReid Kleckner2015-09-285-56/+88
| | | | | | | Previously we were hijacking the old LandingPadInfo data structures to communicate our state numbers. Now we don't need that anymore. llvm-svn: 248763
* Fix unused variable warning in non-debug builds.Richard Trieu2015-09-281-1/+2
| | | | llvm-svn: 248754
* tidy up comments; NFCSanjay Patel2015-09-281-7/+7
| | | | llvm-svn: 248750
* add a FIXME for a CPU model check that should have an attribute insteadSanjay Patel2015-09-281-1/+2
| | | | llvm-svn: 248746
* move one-use check under the comment that describes it; NFCISanjay Patel2015-09-281-3/+2
| | | | llvm-svn: 248745
* [SCEV] Don't crash on pointer comparisonsSanjoy Das2015-09-281-2/+1
| | | | | | | | | | | | `ScalarEvolution::isImpliedCondOperandsViaNoOverflow` tries to cast the operand type of the comparison it is given to an `IntegerType`. This is incorrect because it could actually be simplifying a comparison between two pointers. Switch it to using `getTypeSizeInBits` instead, which does the right thing for both pointers and integers. Fixed PR24956. llvm-svn: 248743
* AMDGPU: Factor switch into separate functionMatt Arsenault2015-09-282-21/+30
| | | | llvm-svn: 248742
* AMDGPU: Fix splitting x16 SMRD loadsMatt Arsenault2015-09-281-2/+2
| | | | | | | | When used recursively, this would set the kill flag on the intermediate step from first splitting x16 to x8. llvm-svn: 248741
* AMDGPU: Fix moving SMRD loads with literal offsets on CIMatt Arsenault2015-09-281-3/+9
| | | | llvm-svn: 248740
* AMDGPU: Fix splitting SMRD with large offsetMatt Arsenault2015-09-281-1/+1
| | | | | | | | | | | | | The splitting of > 4 dword SMRD instructions if using an offset in an SGPR instead of an immediate was not setting the destination register, resulting an an instruction missing an operand which would assert later. Test will be included in a following commit which fixes a related issue. llvm-svn: 248739
OpenPOWER on IntegriCloud