summaryrefslogtreecommitdiffstats
path: root/llvm
Commit message (Collapse)AuthorAgeFilesLines
* [WinEH] Emit __C_specific_handler tables for the new IRReid Kleckner2015-10-016-65/+439
| | | | | | | | | | | | We emit denormalized tables, where every range of invokes in the same state gets a complete list of EH action entries. This is significantly simpler than trying to infer the correct nested scoping structure from the MI. Fortunately, for SEH, the nesting structure is really just a size optimization. With this, some basic __try / __except examples work. llvm-svn: 249078
* [Hexagon] XFAILing test while diagnosing backend error.Colin LeMahieu2015-10-011-0/+1
| | | | llvm-svn: 249075
* AMDGPU/SI: Remove assert from AMDGPUOpenCLImageTypeLowering passTom Stellard2015-10-012-2/+30
| | | | | | | | | | | | | | | Summary: Instead of asserting when the kernel metadata is different than we expect, we should just skip lowering that function. This fixes assertion failures with OpenCL argument metadata from older LLVM releases. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13356 llvm-svn: 249073
* [WinEH] Stop BranchFolding from merging across funcletsDavid Majnemer2015-10-013-15/+58
| | | | | | | BranchFolding would merge two funclets together, this is not OK. Disable this and strengthen the assertion in FuncletLayout. llvm-svn: 249069
* Kill another reference to in-source buildsJonathan Roelofs2015-10-011-8/+5
| | | | llvm-svn: 249067
* [WinEH] Make FuncletLayout more robust against catchretDavid Majnemer2015-10-0113-50/+249
| | | | | | | | | Catchret transfers control from a catch funclet to an earlier funclet. However, it is not completely clear which funclet the catchret target is part of. Make this clear by stapling the catchret target's funclet membership onto the CATCHRET SDAG node. llvm-svn: 249052
* [AArch64] Deprecate a command-line option used for testing.Chad Rosier2015-10-011-12/+4
| | | | | | | Support for pairing unscaled loads and stores has been enabled since the original ARM64 port. This feature is no longer experimental, AFAICT. llvm-svn: 249049
* [SystemZ] Add some generic (floating point support) load instructions.Jonas Paulsson2015-10-0112-31/+132
| | | | | | | | | | | | | | | | | | | | | | Add generic instructions for load complement, load negative and load positive for fp32 and fp64, and let isel prefer them. They do not clobber CC, and so give scheduler more freedom. SystemZElimCompare pass will convert them when it can to the CC-setting variants. Regression tests updated to expect the new opcodes in places where the old ones where used. New test case SystemZ/fp-cmp-05.ll checks that SystemZCompareElim.cpp can handle the new opcodes. README.txt updated (bullet removed). Note that fp128 is not yet handled, because it is relatively rare, and is a bit trickier, because of the fact that l.dfr would operate on the sign bit of one of the subregisters of a fp128, but we would not want to copy the other sub-reg in case src and dst regs are not the same. Reviewed by Ulrich Weigand. llvm-svn: 249046
* Fix printing of 64 bit values and make test more strict.Rafael Espindola2015-10-012-17/+30
| | | | llvm-svn: 249043
* AMDGPU: Add MEM_RAT STORE_TYPED.Tom Stellard2015-10-015-0/+55
| | | | | | | | | | | | v2: Add test (Matt). Fix capitalization of isEOP (Matt). Move pattern to class parameter (Matt). Make the instruction available to Cayman (Matt). Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED. Patch by: Zoltan Gilian llvm-svn: 249042
* AMDGPU: Factor out EOP query.Tom Stellard2015-10-011-4/+6
| | | | | | | | v2: Fix brace placement and capitalization (Matt). Patch by: Zoltan Gilian llvm-svn: 249041
* Reformat.NAKAMURA Takumi2015-10-011-1/+2
| | | | llvm-svn: 249033
* Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"NAKAMURA Takumi2015-10-018-134/+24
| | | | | | It broke; LLVM :: CodeGen__Generic__2009-11-16-BadKillsCrash.ll llvm-svn: 249032
* Use more strict types. NFC.Rafael Espindola2015-10-011-4/+8
| | | | | | On 32 bit ELF these are 32 bit values. llvm-svn: 249022
* [InstCombine] Remove trivially empty lifetime start/end ranges.Arnaud A. de Grandmaison2015-10-012-0/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Some passes may open up opportunities for optimizations, leaving empty lifetime start/end ranges. For example, with the following code: void foo(char *, char *); void bar(int Size, bool flag) { for (int i = 0; i < Size; ++i) { char text[1]; char buff[1]; if (flag) foo(text, buff); // BBFoo } } the loop unswitch pass will create 2 versions of the loop, one with flag==true, and the other one with flag==false, but always leaving the BBFoo basic block, with lifetime ranges covering the scope of the for loop. Simplify CFG will then remove BBFoo in the case where flag==false, but will leave the lifetime markers. This patch teaches InstCombine to remove trivially empty lifetime marker ranges, that is ranges ending right after they were started (ignoring debug info or other lifetime markers in the range). This fixes PR24598: excessive compile time after r234581. Reviewers: reames, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13305 llvm-svn: 249018
* [SystemZ] Add assembly instructions for obtaining clock values as well as ↵Ulrich Weigand2015-10-013-1/+149
| | | | | | | | | | | CPU features Provide assembler support for STCK, STCKF, STCKE, and STFLE. Author: joncmu Differential Revision: http://reviews.llvm.org/D13299 llvm-svn: 249015
* [AArch64] Hoist commonly failing check. NFC.Chad Rosier2015-10-011-6/+6
| | | | llvm-svn: 249011
* [AArch64] Rename variable to improve readability. NFC.Chad Rosier2015-10-011-10/+10
| | | | llvm-svn: 249008
* [AArch64] Update comment to reflect reality.Chad Rosier2015-10-011-2/+2
| | | | llvm-svn: 249007
* [mips][microMIPS] Implement CACHEE, WRPGPR and WSBH instructionsZoran Jovanovic2015-10-019-5/+62
| | | | | | Differential Revision: http://reviews.llvm.org/D10337 llvm-svn: 249004
* [ARM] More care with Thumb1 writeback in ARMLoadStoreOptimizerScott Douglass2015-10-012-3/+34
| | | | | | Differential Revision: http://reviews.llvm.org/D13240 llvm-svn: 249002
* [NaryReassociate] SeenExprs records WeakVHJingyue Wu2015-10-012-6/+26
| | | | | | | | | | | | | | | | Summary: The instructions SeenExprs records may be deleted during rewriting. FindClosestMatchingDominator should ignore these deleted instructions. Fixes PR24301. Reviewers: grosser Subscribers: grosser, llvm-commits Differential Revision: http://reviews.llvm.org/D13315 llvm-svn: 248983
* Fix performance problem in long-running SectionMemoryManagersKeno Fischer2015-10-012-5/+31
| | | | | | | | | | | | | | | | | | | | | Summary: Without this patch, the memory manager would call `mprotect` on every memory region it ever allocated whenever it wanted to finalize memory (i.e. not just the ones it just allocated). This caused terrible performance problems for long running memory managers. In one particular compile heavy julia benchmark, we were spending 50% of time in `mprotect` if running under MCJIT. Fix this by splitting allocated memory blocks into those on which memory permissions have been set and those on which they haven't and only running `mprotect` on the latter. Reviewers: lhames Subscribers: reames, llvm-commits Differential Revision: http://reviews.llvm.org/D13156 llvm-svn: 248981
* AMDGPU/SI: Re-order PreloadedValue enum and number entries based on init orderTom Stellard2015-10-011-9/+12
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12451 llvm-svn: 248978
* [llvm-objdump] Fix time of check to time of use bug.Davide Italiano2015-10-011-3/+0
| | | | | | | There's already a test that covers this situation, so we should be fine. llvm-svn: 248976
* Revert "Enable -Wdeprecated in the cmake build now that LLVM (& Clang, ↵David Blaikie2015-10-011-1/+1
| | | | | | | | | | | Polly, and LLD) are -Wdeprecated clean" This reverts commit r248963. Seems there's some standard libraries (and libcxxabi implementations) that aren't -Wdeprecated clean... hrm. llvm-svn: 248972
* Update sample profile propagation algorithm.Dehao Chen2015-10-016-139/+221
| | | | | | http://reviews.llvm.org/D13218 llvm-svn: 248968
* [X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.Ahmed Bougacha2015-10-012-0/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The custom code produces incorrect results if later reassociated. Since r221657, on x86, vNi32 uitofp is lowered using an optimized sequence: movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...] pand %xmm0, %xmm1 por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...] psrld $16, %xmm0 por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...] addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...] addps %xmm1, %xmm0 Since r240361, the machine combiner opportunistically reassociates 2-instruction sequences (with -ffast-math). In the new code sequence, the ADDPS' are eligible. In isolation, for simple examples (without reassociable users), this makes no performance difference (the goal being to enable reassociation of longer chains). In the trivial example (just one uitofp), the reassociation doesn't happen, because (I think) it would require the emission of a separate movaps for a constantpool load (instead of folding it into addps). However, when we have multiple uitofp sequences, and the constantpool loads are CSE'd earlier, the machine combiner can do the reassociation. When the ADDPS' are reassociated, the resulting sequence isn't correct anymore, as we'd be adding large (2**39) constants with comparatively smaller values (~2**23). Given that two of the three inputs are powers of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15, the reassociated chain will produce 0 for any input in [0, 2**14[. In my testing, it also produces wrong results for 99.5% of [0, 2**32[. Avoid this by disabling the new lowering when -ffast-math. It does mean that we'll get slower code than without it, but at least we won't get egregiously incorrect code. One might argue that, considering -ffast-math is all but meaningless, uitofp producing wrong results isn't a compiler bug. But it really is. Fixes PR24512. ...though this is really more of a workaround. Ideally, we'd have some sort of Machine FMF, but that's a problem that's not worth tackling until we do more with machine IR. llvm-svn: 248965
* Enable -Wdeprecated in the cmake build now that LLVM (& Clang, Polly, and ↵David Blaikie2015-09-301-1/+1
| | | | | | | | | | | | | | | | | | | LLD) are -Wdeprecated clean This particularly helps enforce the C++ Rule of 5 (for new move ops this is already an error, but for a type only using C++98 features (copy ctor/assign, dtor) it is only deprecated, not invalid) Applying the flag for any GCC compatible compiler - GCC doesn't warn on the Rule of 5 cases that C++11 deprecates, but it doesn't have other false positives so far as I could see (compiling with GCC 4.8 didn't produce any -Wdeprecated warnings I could spot). Reviewers: aaron.ballman Differential Revision: http://reviews.llvm.org/D13314 llvm-svn: 248963
* [WinEH] Emit int3 after noreturn calls on Win64Reid Kleckner2015-09-308-24/+134
| | | | | | | | | | | | | | | | | | | | | | | The Win64 unwinder disassembles forwards from each PC to try to determine if this PC is in an epilogue. If so, it skips calling the EH personality function for that frame. Typically, this means you cannot catch an exception in the same frame that you threw it, because 'throw' calls a noreturn runtime function. Previously we avoided this problem with the TrapUnreachable TargetOption, but that's a much bigger hammer than we need. All we need is a 1 byte non-epilogue instruction right after the call. Instead, what we got was an unconditional branch to a shared block containing the ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which added TrapUnreachable, and replaces it with something better. The new code pattern matches for invoke/call followed by unreachable and inserts an int3 into the DAG. To be 100% watertight, we would need to insert SEH_Epilogue instructions into all basic blocks ending in a call with no terminators or successors, but in practice this is unlikely to come up. llvm-svn: 248959
* [PowerPC] undef Relocation names in PowerPC*.defHal Finkel2015-09-302-0/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | glibc's PowerPC /usr/include/asm/sigcontext.h, has this: #ifdef __powerpc64__ #include <asm/elf.h> #endif and that contains defines of all of the relocation symbols, like this: #define R_PPC_NONE 0 and if that file is included prior to including include/llvm/Support/ELFRelocs/PowerPC*.def, which we cannot in general prevent, the result will fail. As it turns out, this happens when compiling lld/unittests/DriverTests/GnuLdDriverTest.cpp under PPC64/Linux, because: lld/include/lld/ReaderWriter/ELFLinkingContext.h includes lld/unittests/DriverTests/DriverTest.h which includes utils/unittest/googletest/include/gtest/gtest.h which includes utils/unittest/googletest/include/gtest/internal/gtest-internal.h which includes /usr/include/sys/wait.h which includes /usr/include/signal.h which includes /usr/include/bits/sigcontext.h which includes /usr/include/asm/sigcontext.h which includes /usr/include/asm/elf.h the test could be fixed to include ReaderWriter/ELFLinkingContext.h before including unittests/DriverTests/DriverTest.h, but dealing with this in the *.def files is a more-general solution that localizes the fix to the headers instead of requiring changes to an unbounded number of other source files (both in-tree and external). llvm-svn: 248957
* [x86] enable machine combiner reassociations for 256-bit vector logical ↵Sanjay Patel2015-09-302-2/+49
| | | | | | integer insts llvm-svn: 248955
* [libFuzzer] Marking exported symbols as visible. Patch by Mike AizatskyKostya Serebryany2015-09-301-1/+2
| | | | llvm-svn: 248954
* [AArch64] Remove an unnecessary run line and other cleanup. NFC.Chad Rosier2015-09-302-99/+95
| | | | | | | Unscaled load/store combining has been enabled since the initial ARM64 port. No need for a redundance run. Also, add CHECK-LABEL directives. llvm-svn: 248945
* [SLP] Don't vectorize loads of non-packed types (like i1, i2).Michael Zolotukhin2015-09-302-1/+44
| | | | | | | | | | | | | | | | | | Summary: Given an array of i2 elements, 4 consecutive scalar loads will be lowered to i8-sized loads and thus will access 4 consecutive bytes in memory. If we vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in memory. Hence, we should prohibit vectorization in such cases. PS: Initial patch was proposed by Arnold. Reviewers: aschwaighofer, nadav, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13277 llvm-svn: 248943
* Fix -Wsign-compare warningDavid Blaikie2015-09-301-1/+1
| | | | llvm-svn: 248942
* Move dw_op_minus test to DebugInfo/X86.Evgeniy Stepanov2015-09-301-0/+0
| | | | | | | | The test requires X86 target support, and checks the actual debug info contents, including register numbers which would be different on other platforms. llvm-svn: 248938
* Fix debug info with SafeStack.Evgeniy Stepanov2015-09-308-17/+198
| | | | llvm-svn: 248933
* [AArch64] Remove an unnecessary restriction on pre-index instructions.Chad Rosier2015-09-309-22/+206
| | | | | | | | Previously, the index was constrained to the size of the memory operation for no apparent reason. This change removes that constraint so that we can form pre-index instructions with any valid offset. llvm-svn: 248931
* DeadCodeElimination: rewrite to be fasterFiona Glaser2015-09-301-31/+45
| | | | | | | | | | Same strategy as simplifyInstructionsInBlock. ~1/3 less time on my test suite. This pass doesn't have many in-tree users, but getting rid of an O(N^2) worst case and making it cleaner should at least make it a viable alternative to ADCE, since it's now consistently somewhat faster. llvm-svn: 248927
* [PowerPC] Disable shrink wrappingHal Finkel2015-09-302-2/+3
| | | | | | | Shrink wrapping is causing a self-hosting failure on PPC64/Linux. Disable for now until the problem can be fixed. llvm-svn: 248924
* SLPVectorizer: add a test to check if the minimum region size works.Erik Eckstein2015-09-301-1/+28
| | | | | | This is an addition to rL248917. llvm-svn: 248923
* [ARM] Support for ARMv6-Z / ARMv6-ZK missingArtyom Skrobov2015-09-303-8/+12
| | | | | | | | | | | | | | | As Richard Barton observed at http://reviews.llvm.org/D12937#inline-107121 TargetParser in LLVM has insufficient support for ARMv6Z and ARMv6ZK. In particular, there were no tests for TrustZone being supported in these architectures. The patch clears a FIXME: left by Saleem Abdulrasool in r201471, and fixes his test case which hadn't really been testing what it was claiming to test. Differential Revision: http://reviews.llvm.org/D13236 llvm-svn: 248921
* SLPVectorizer: limit the scheduling region size per basic block.Erik Eckstein2015-09-302-9/+122
| | | | | | | | | | | Usually large blocks are not a problem. But if a large block (> 10k instructions) contains many (potential) chains of vector instructions, and those chains are spread over a wide range of instructions, then scheduling becomes a compile time problem. This change introduces a limit for the accumulate scheduling region size of a block. For real-world functions this limit will never be exceeded (it's about 10x larger than the maximum value seen in the test-suite and external test suite). llvm-svn: 248917
* [AArch64] Use helper function to improve readability. NFC.Chad Rosier2015-09-301-2/+1
| | | | llvm-svn: 248914
* [InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin ↵Andrea Di Biagio2015-09-302-0/+308
| | | | | | | | | | | | | | | | shuffles if the shuffle mask is constant. This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a builtin shuffle if the mask is constant. Converting byte shuffle intrinsic calls to builtin shuffles can help finding more opportunities for combining shuffles later on in selection dag. We may end up with byte shuffles with constant masks as the result of inlining. Differential Revision: http://reviews.llvm.org/D13252 llvm-svn: 248913
* [CMake] Make the bindir and libdir arguments to set_output_directory optionalJohn Brawn2015-09-301-20/+29
| | | | | | | | | | | | When building a plugin against an installed LLVM toolchain using add_llvm_loadable_module (in the documented manner) doesn't work as nothing sets the *_OUTPUT_INTDIR variables causing an error when set_output_directory is called. Making those arguments optional (causing the default output directory to be used) fixes this. Differential Revision: http://reviews.llvm.org/D13215 llvm-svn: 248911
* Add support for sub-byte aligned writes to lib/Support/Endian.hTeresa Johnson2015-09-303-11/+129
| | | | | | | | | | | | | | | Summary: As per Duncan's review for D12536, I extracted the sub-byte bit aligned reading and writing code into lib/Support, and generalized it. Added calls from BackpatchWord. Also added unittests. Reviewers: dexonsmith Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13189 llvm-svn: 248897
* Refactor computeKnownBits alignment handling codeArtur Pilipenko2015-09-301-53/+38
| | | | | | | | Reviewed By: reames, hfinkel Differential Revision: http://reviews.llvm.org/D12958 llvm-svn: 248892
* [ARM][NEON] Use address space in vld([1234]|[234]lane) and ↵Jeroen Ketema2015-09-3045-530/+746
| | | | | | | | | | | | | | | | | | | | | vst([1234]|[234]lane) instructions This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234], vst[234]lane ARM neon intrinsics and associates an address space with the pointer that these intrinsics take. This changes, e.g., <2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32) to <2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32) This change ensures that address spaces are fully taken into account in the ARM target during lowering of interleaved loads and stores. Differential Revision: http://reviews.llvm.org/D12985 llvm-svn: 248887
OpenPOWER on IntegriCloud