summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* [LoopVectorize] Remove an unused private AA pointerHal Finkel2014-07-201-2/+1
| | | | | | Thanks to the lld-x86_64-darwin13 builder for catching this first. llvm-svn: 213488
* [MC] Pass MCSymbolData to needsRelocateWithSymbolUlrich Weigand2014-07-207-13/+38
| | | | | | | | | | | | | | | | | | As discussed in a previous checking to support the .localentry directive on PowerPC, we need to inspect the actual target symbol in needsRelocateWithSymbol to make the appropriate decision based on that symbol's st_other bits. Currently, needsRelocateWithSymbol does not get the target symbol. However, it is directly available to its sole caller. This patch therefore simply extends the needsRelocateWithSymbol by a new parameter "const MCSymbolData &SD", passes in the target symbol, and updates all derived implementations. In particular, in the PowerPC implementation, this patch removes the FIXME added by the previous checkin. llvm-svn: 213487
* [LoopVectorize] Use AA to partition potential dependency checksHal Finkel2014-07-2010-167/+316
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this change, the loop vectorizer did not make use of the alias analysis infrastructure. Instead, it performed memory dependence analysis using ScalarEvolution-based linear dependence checks within equivalence classes derived from the results of ValueTracking's GetUnderlyingObjects. Unfortunately, this meant that: 1. The loop vectorizer had logic that essentially duplicated that in BasicAA for aliasing based on identified objects. 2. The loop vectorizer could not partition the space of dependency checks based on information only easily available from within AA (TBAA metadata is currently the prime example). This means, for example, regardless of whether -fno-strict-aliasing was provided, the vectorizer would only vectorize this loop with a runtime memory-overlap check: void foo(int *a, float *b) { for (int i = 0; i < 1600; ++i) a[i] = b[i]; } This is suboptimal because the TBAA metadata already provides the information necessary to show that this check unnecessary. Of course, the vectorizer has a limit on the number of such checks it will insert, so in practice, ignoring TBAA means not vectorizing more-complicated loops that we should. This change causes the vectorizer to use an AliasSetTracker to keep track of the pointers in the loop. The resulting alias sets are then used to partition the space of dependency checks, and potential runtime checks; this results in more-efficient vectorizations. When pointer locations are added to the AliasSetTracker, two things are done: 1. The location size is set to UnknownSize (otherwise you'd not catch inter-iteration dependencies) 2. For instructions in blocks that would need to be predicated, TBAA is removed (because the metadata might have a control dependency on the condition being speculated). For non-predicated blocks, you can leave the TBAA metadata. This is safe because you can't have an iteration dependency on the TBAA metadata (if you did, and you unrolled sufficiently, you'd end up with the same pointer value used by two accesses that TBAA says should not alias, and that would yield undefined behavior). llvm-svn: 213486
* [PowerPC] ELFv2 MC support for .localentry directiveUlrich Weigand2014-07-209-0/+220
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A second binutils feature needed to support ELFv2 is the .localentry directive. In the ELFv2 ABI, functions may have two entry points: one for calling the routine locally via "bl", and one for calling the function via function pointer (either at the source level, or implicitly via a PLT stub for global calls). The two entry points share a single ELF symbol, where the ELF symbol address identifies the global entry point address, while the local entry point is found by adding a delta offset to the symbol address. That offset is encoded into three platform-specific bits of the ELF symbol st_other field. The .localentry directive instructs the assembler to set those fields to encode a particular offset. This is typically used by a function prologue sequence like this: func: addis r2, r12, (.TOC.-func)@ha addi r2, r2, (.TOC.-func)@l .localentry func, .-func Note that according to the ABI, when calling the global entry point, r12 must be set to point the global entry point address itself; while when calling the local entry point, r2 must be set to point to the TOC base. The two instructions between the global and local entry point in the above example translate the first requirement into the second. This patch implements support in the PowerPC MC streamers to emit the .localentry directive (both into assembler and ELF object output), as well as support in the assembler parser to parse that directive. In addition, there is another change required in MC fixup/relocation handling to properly deal with relocations targeting function symbols with two entry points: When the target function is known local, the MC layer would immediately handle the fixup by inserting the target address -- this is wrong, since the call may need to go to the local entry point instead. The GNU assembler handles this case by *not* directly resolving fixups targeting functions with two entry points, but always emits the relocation and relies on the linker to handle this case correctly. This patch changes LLVM MC to do the same (this is done via the processFixupValue routine). Similarly, there are cases where the assembler would normally emit a relocation, but "simplify" it to a relocation targeting a *section* instead of the actual symbol. For the same reason as above, this may be wrong when the target symbol has two entry points. The GNU assembler again handles this case by not performing this simplification in that case, but leaving the relocation targeting the full symbol, which is then resolved by the linker. This patch changes LLVM MC to do the same (via the needsRelocateWithSymbol routine). NOTE: The method used in this patch is overly pessimistic, since the needsRelocateWithSymbol routine currently does not have access to the actual target symbol, and thus must always assume that it might have two entry points. This will be improved upon by a follow-on patch that modifies common code to pass the target symbol when calling needsRelocateWithSymbol. Reviewed by Hal Finkel. llvm-svn: 213485
* [PowerPC] ELFv2 MC support for .abiversion directiveUlrich Weigand2014-07-205-1/+62
| | | | | | | | | | | | | ELFv2 binaries are marked by a bit in the ELF header e_flags field. A new assembler directive .abiversion can be used to set that flag. This patch implements support in the PowerPC MC streamers to emit the .abiversion directive (both into assembler and ELF binary output), as well as support in the assembler parser to parse the .abiversion directive. Reviewed by Hal Finkel. llvm-svn: 213484
* [PowerPC] Refactor byval handling in LowerFormalArguments_64SVR4Ulrich Weigand2014-07-201-31/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When handling an incoming byval argument, we need to possibly write incoming registers to the stack in order to create an on-stack image of the parameter, so we can return its address to common code. This currently uses CreateFixedObject to access the parts of the parameter save area where the argument is (or needs to be) stored. However, sometimes we need to access multiple parts of that area, e.g. to write multiple registers. The code currently uses a new CreateFixedObject call for each of these accesses, resulting in a patchwork of overlapping (fixed) stack objects. This doesn't really matter in the case of fixed objects, since any access to those turns into a fixed stackpointer + offset address anyway. However, with the upcoming ELFv2 patches, we may actually need to place an incoming argument into our *own* stack frame instead of the caller's. This means we need to use CreateStackObject instead, and we cannot have multiple overlapping instances of those. To make the rest of the argument handling code work equally in both situations, this patch refactors it to always use just a single call to CreateFixedObject, and access parts of that object as required using address arithmetic. This way, we can in a future patch substitute CreateStackObject without further changes. No change to generated code intended. llvm-svn: 213483
* [PowerPC] Fix FrameIndex handling in SelectAddressRegImmUlrich Weigand2014-07-204-20/+18
| | | | | | | | | | | | | | | | | The PPCTargetLowering::SelectAddressRegImm routine needs to handle FrameIndex nodes in a special manner, by tranlating them into a TargetFrameIndex node. This was done in most cases, but seems to have been neglected in one path: when the input tree has an OR of the FrameIndex with an immediate. This can happen if the FrameIndex can be proven to be sufficiently aligned that an OR of that immediate is equivalent to an ADD. The missing handling of FrameIndex in that case caused the SelectionDAG instruction selection to miss opportunities to merge the OR back into the FrameIndex node, leading to superfluous addi/ori instructions in the final assembler output. llvm-svn: 213482
* Redo THUMB support.Joerg Sonnenberger2014-07-203-7/+76
| | | | | | Discussed with and tested by: Saleem Abdulrasool llvm-svn: 213481
* [Mips] Replace assembler code by YAML to make the 'dynlib-fileheader.test'Simon Atanasyan2014-07-201-13/+43
| | | | | | test target independent. llvm-svn: 213480
* Revert r213467, it breaks non-thumb mode.Joerg Sonnenberger2014-07-204-182/+20
| | | | llvm-svn: 213479
* Namespace cleanup (no functional change)Artyom Skrobov2014-07-201-11/+7
| | | | llvm-svn: 213478
* SIISelLowering.cpp: Define _USE_MATH_DEFINES to let M_PI provided on MS <cmath>.NAKAMURA Takumi2014-07-201-0/+6
| | | | | FIXME: Would it be better to move it into configure? llvm-svn: 213477
* MachineRegionInfo.cpp: Another fix on ↵NAKAMURA Takumi2014-07-201-5/+4
| | | | | | MachineRegionInfo::MachineRegionInfo::recalculate() to appease msc17. llvm-svn: 213476
* Remove braces around single-statement block and rangify outer loop.Manuel Jacob2014-07-201-6/+3
| | | | | | This is a follow-up to r213474. llvm-svn: 213475
* [C++11] Add predecessors(BasicBlock *) / successors(BasicBlock *) iterator ↵Manuel Jacob2014-07-2041-226/+166
| | | | | | | | | | | | | | | | | | ranges. Summary: This patch introduces two new iterator ranges and updates existing code to use it. No functional change intended. Test Plan: All tests (make check-all) still pass. Reviewers: dblaikie Reviewed By: dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4481 llvm-svn: 213474
* R600: Add missing test for concat_vectorsMatt Arsenault2014-07-201-0/+249
| | | | llvm-svn: 213473
* R600: Remove unused functionMatt Arsenault2014-07-204-11/+1
| | | | llvm-svn: 213472
* R600/SI: Remove dead code and add missing tests.Matt Arsenault2014-07-203-24/+39
| | | | | | | | This probably was killed by some generic DAGCombiner improvements in checking the TargetBooleanContents instead of just 1. llvm-svn: 213471
* linux process: silence GCC switch coverage warningSaleem Abdulrasool2014-07-201-0/+2
| | | | | | | Add missing entry for eExecMessage message type to silence GCC switch coverage warning. llvm-svn: 213470
* build: fix cmake warning with newer CMakeSaleem Abdulrasool2014-07-202-11/+11
| | | | | | | Hoist the compatibility macros out a level and re-use them when adding link dependencies. Silences a warning from CMake. llvm-svn: 213469
* Update formatting with clang-format.Bill Wendling2014-07-201-1/+1
| | | | llvm-svn: 213468
* ARM: fix division in some casesSaleem Abdulrasool2014-07-204-20/+182
| | | | | | | | | | | | | | | | | | | | | | | | For ARM cores that are ARMv6T2+ but not ARMv7ve or ARMv7-r and not an updated ARMv7-a that has the idiv extension (chips with clz but not idiv), an incorrect jump would be calculated due to the preference to thumb instructions over ARM. Rather than computing the target at runtime, use a jumptable instead. This trades a bit of storage for performance. The overhead is 32-bytes for each of the three routines, but avoid the calculation of the offset. Because clz was introduced in ARMv6T2 and idiv in certain versions of ARMv7, the non-clz, non-idiv case implies a target which does not support Thumb-2, and thus we cannot use Thumb on those targets (as it is unlikely that the assembly will assemble). Take the opportunity to refactor the IT block macros into assembly.h rather than redefining them in the TUs where they are used. Existing tests cover the full change already, so no new tests are added. This effectively reverts SVN r213309. llvm-svn: 213467
* Fix msc17 build. RegionInfo::RegionInfo::recalculate() doesn't make sense.NAKAMURA Takumi2014-07-201-4/+2
| | | | llvm-svn: 213466
* Fix -Asserts build introduced since r213456.NAKAMURA Takumi2014-07-202-0/+4
| | | | llvm-svn: 213465
* Sure up ownership passing of the PBQPBuilder by passing unique_ptrs by value ↵David Blaikie2014-07-192-8/+8
| | | | | | | | | rather than lvalue reference. Also removes an unnecessary '.release()' that should've been a std::move anyway. (I'm on a hunt for '.release()' calls) llvm-svn: 213464
* MC: permit emitting a symbol value as section relativeSaleem Abdulrasool2014-07-193-6/+14
| | | | | | | | | | | This adds an optional parameter to the EmitSymbolValue method in MCStreamer to permit emitting a symbol value as a section relative value. This is to cover the use in MCDwarf which should not really know about how to emit a section relative value for a given target. This addresses post-review comments from Eric Christopher in SVN r213275. llvm-svn: 213463
* [Mips] Replace assembler code by YAML to make the test 'dynlib-dynamic.test'Simon Atanasyan2014-07-191-23/+84
| | | | | | target independent. llvm-svn: 213462
* Revert accidentally committed r213459Matt Arsenault2014-07-191-3/+1
| | | | llvm-svn: 213461
* Fix build with GCC.Matt Arsenault2014-07-191-3/+7
| | | | | | | Seems like a bug in either GCC or clang, but I'm not sure which is right. llvm-svn: 213460
* XXX - Increase unroll thresholdMatt Arsenault2014-07-191-1/+3
| | | | llvm-svn: 213459
* R600/SI: implement range reduction for sin/cosMatt Arsenault2014-07-195-14/+53
| | | | | | | | | | | | | | | | These instructions can only take a limited input range, and return the constant value 1 out of range. We should do range reduction to be able to process arbitrary values. Use a FRACT instruction after normalization to achieve this. Also add a test for constant folding with the lowered code with unsafe-fp-math enabled. v2: use DAG lowering instead of intrinsic, adapt test v3: calculate constant, fold pattern into instruction definition v4: misc style fixes, add sin-fold testcase, cosmetics Patch by Grigori Goronzy llvm-svn: 213458
* Update for RegionInfo changes.Matt Arsenault2014-07-1910-21/+27
| | | | | | | Mostly related to missing includes and renaming of the pass to RegionInfoPass. llvm-svn: 213457
* Templatify RegionInfo so it works on MachineBasicBlocksMatt Arsenault2014-07-1912-1009/+1724
| | | | llvm-svn: 213456
* R600: Implement a few simple TTI queries.Matt Arsenault2014-07-191-0/+24
| | | | | | I'm not sure if these have any effect right now. llvm-svn: 213455
* If a module build reports errors, don't try to load itBen Langmuir2014-07-191-13/+21
| | | | | | ... just to find out that it didn't build. llvm-svn: 213454
* [LoopVectorize] Use CreateAligned(Load|Store)Hal Finkel2014-07-191-4/+3
| | | | | | | | | IRBuilder has CreateAligned(Load|Store) functions; use them and we don't need to make a second call to setAlignment. No functionality change intended. llvm-svn: 213453
* [LoopVectorize] Propagate known metadata to vectorized instructionsHal Finkel2014-07-192-4/+89
| | | | | | | | | | | | | There are some kinds of metadata that are safe to propagate from the scalar instructions to the vector instructions (fpmath and tbaa currently). Regarding TBAA, one might worry about propagating it on if-converted loads and stores, because the metadata might have had a control dependency on the condition, and thus actually aliased with some other non-speculated memory access when the condition was false. However, this would be caught by the runtime overlap checks. llvm-svn: 213452
* [x86] Fix wrong shuffle mask in test 'combine-vec-shuffle-3.ll'. No ↵Andrea Di Biagio2014-07-191-8/+3
| | | | | | | | | | | | | functional change. Function @test3c should check that the DAGCombiner is able to fold a pair of shuffles into a new shuffle with a permute mask of <6,7,2,3>. However, one of the shuffles in @test3c had a wrong permute mask; this prevented the DAGCombiner from folding the shuffles into the expected result. Now that the shuffle mask is fixed, the backend correctly folds the two shuffles in function @test3c into a single movhlps instruction. llvm-svn: 213451
* Revert D3908 due to issues on Mac platformsViktor Kutuzov2014-07-193-218/+0
| | | | llvm-svn: 213450
* Handle AddrSpaceCast in stripAndAccumulateInBoundsConstantOffsetsHal Finkel2014-07-191-1/+2
| | | | | | | | All of the other similar functions in that part of the file look through addrspacecast in addition to bitcast, and I see no reason why stripAndAccumulateInBoundsConstantOffsets shouldn't do so also. llvm-svn: 213449
* MergedLoadStoreMotion.cpp: Fix msc17 build. Member initializer is unavailable.NAKAMURA Takumi2014-07-191-2/+3
| | | | llvm-svn: 213448
* Make Value::isDereferenceablePointer handle offsets to pointer types with ↵Hal Finkel2014-07-192-0/+103
| | | | | | | | | | | | | | | | | | dereferenceable attributes When we have a parameter (or call site return) with a dereferenceable attribute, it can specify the size of an array pointed to by that parameter. If we have a value for which we can accumulate a constant offset to such a parameter, then we can use that offset in a direct comparison with the size specified by the dereferenceable attribute. This enables us to handle cases like this: int foo(int a[static 3]) { return a[2]; /* this is always dereferenceable */ } llvm-svn: 213447
* Cleanup comparisons to VariableArrayType::Static for non-VLAsHal Finkel2014-07-192-2/+2
| | | | | | | The enum is part of ArrayType, so there is no functional change, but comparing to ArrayType::Static for non-VLAs makes more sense. llvm-svn: 213446
* TypePrinter should not ignore IndexTypeCVRQualifiers on constant-sized arraysHal Finkel2014-07-192-0/+14
| | | | | | | | | | C99 array parameters can have index-type CVR qualifiers, and the TypePrinter should print them when present (and we were not for constant-sized arrays). Otherwise, we'd drop the restrict in: int foo(int a[restrict static 3]) { ... } llvm-svn: 213445
* Use the dereferenceable attribute on C99 array parameters with staticHal Finkel2014-07-192-1/+46
| | | | | | | | | | | | | | In C99, an array parameter declarator might have the form: direct-declarator '[' 'static' type-qual-list[opt] assign-expr ']' where the static keyword indicates that the caller will always provide a pointer to the beginning of an array with at least the number of elements specified by the assignment expression. For constant sizes, we can use the new dereferenceable attribute to pass this information to the optimizer. For VLAs, we don't know the size, but (for addrspace(0)) do know that the pointer must be nonnull (and so we can use the nonnull attribute). llvm-svn: 213444
* PR20356: Fix all Sema warnings with mismatched ext_/warn_ versusRichard Smith2014-07-1914-55/+54
| | | | | | | | ExtWarn/Warnings. Mostly the name of the warning was changed to match the semantics, but in the PR20356 cases, the warning was about valid code, so the diagnostic was changed from ExtWarn to Warning instead. llvm-svn: 213443
* ARM: correct WoA __builtin_alloca handling on O0Saleem Abdulrasool2014-07-192-3/+24
| | | | | | | | | | | | | | | When performing a dynamic stack adjustment without optimisations, we would mark SP as def and R4 as kill. This occurred as part of the expansion of a WIN__CHKSTK SDNode which indicated the proper handling of SP and R4. The result would be that we would double define SP as part of an operation, which is obviously incorrect. Furthermore, the VTList for the chain had an incorrect parameter type of i32 instead of Other. Correct these to permit proper lowering of __builtin_alloca at -O0. llvm-svn: 213442
* clang/test/Misc/backend-optimization-failure.cpp: Appease to add -triple=x86_64.NAKAMURA Takumi2014-07-191-1/+2
| | | | | FIXME: Could this be made generic? llvm-svn: 213441
* Add the ability to suppress the creation of a persistentJim Ingham2014-07-193-0/+21
| | | | | | | | result variable and use in in "Process::LoadImage" so that, for instance, "process load" doesn't increment the return variable number. llvm-svn: 213440
* Remove uses of the redundant ".reset(nullptr)" of unique_ptr, in favor of ↵David Blaikie2014-07-196-14/+12
| | | | | | | | | | | ".reset()" It's also possible to just write "= nullptr", but there's some question of whether that's as readable, so I leave it up to authors to pick which they prefer for now. If we want to discuss standardizing on one or the other, we can do that at some point in the future. llvm-svn: 213439
OpenPOWER on IntegriCloud