summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Fixing -Wtype-limits warnings with the asserts (the expression would always ↵Aaron Ballman2014-11-131-3/+3
| | | | | | evaluate to true). Also fixing a -Wcast-qual warning, where the cast expression isn't required. llvm-svn: 221888
* IR: Create the Metadata classDuncan P. N. Exon Smith2014-11-131-2/+2
| | | | | | | | | This will become the root of a new class hierarchy separate from `Value`. As a first step, stick it between `Value` and `MDNode`. This is part of PR21532. llvm-svn: 221886
* AVX-512: SINT_TO_FP cost model and some bugfixesElena Demikhovsky2014-11-132-4/+25
| | | | | | | Checked some corner cases, for example translation of <8 x i1> to <8 x double> llvm-svn: 221883
* Object, COFF: Refactor code to get relocation iteratorsDavid Majnemer2014-11-131-26/+24
| | | | | | No functional change intended. llvm-svn: 221880
* This patch changes the ownership of TLOF from TargetLoweringBase to ↵Aditya Nandakumar2014-11-1337-63/+136
| | | | | | TargetMachine so that different subtargets could share the TLOF effectively llvm-svn: 221878
* Revert r219432 - "Revert "[BasicAA] Revert "Revert r218714 - Make better use ↵Hal Finkel2014-11-131-5/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of zext and sign information.""" Let's try this again... This reverts r219432, plus a bug fix. Description of the bug in r219432 (by Nick): The bug was using AllPositive to break out of the loop; if the loop break condition i != e is changed to i != e && AllPositive then the test_modulo_analysis_with_global test I've added will fail as the Modulo will be calculated incorrectly (as the last loop iteration is skipped, so Modulo isn't updated with its Scale). Nick also adds this comment: ComputeSignBit is safe to use in loops as it takes into account phi nodes, and the == EK_ZeroEx check is safe in loops as, no matter how the variable changes between iterations, zero-extensions will always guarantee a zero sign bit. The isValueEqualInPotentialCycles check is therefore definitely not needed as all the variable analysis holds no matter how the variables change between loop iterations. And this patch also adds another enhancement to GetLinearExpression - basically to convert ConstantInts to Offsets (see test_const_eval and test_const_eval_scaled for the situations this improves). Original commit message: This reverts r218944, which reverted r218714, plus a bug fix. Description of the bug in r218714 (by Nick): The original patch forgot to check if the Scale in VariableGEPIndex flipped the sign of the variable. The BasicAA pass iterates over the instructions in the order they appear in the function, and so BasicAliasAnalysis::aliasGEP is called with the variable it first comes across as parameter GEP1. Adding a %reorder label puts the definition of %a after %b so aliasGEP is called with %b as the first parameter and %a as the second. aliasGEP later calculates that %a == %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) - ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly conclude that %a > %b. Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug. Slightly modified by me to add an early exit from the loop and avoid unnecessary, but expensive, function calls. Original commit message: Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! llvm-svn: 221876
* Object, COFF: Increase code reuseDavid Majnemer2014-11-131-24/+32
| | | | | | | | | | Split getObject's smarts into checkOffset, use this to replace the handwritten check in getSectionContents. Similarly, replace checks in section_rel_begin/section_rel_end with getNumberOfRelocations. No functionality change intended. llvm-svn: 221873
* Object, COFF: getRelocationSymbol shouldn't assertDavid Majnemer2014-11-131-1/+1
| | | | | | | | lib/Object is supposed to be robust to malformed object files. Don't assert if we don't have a symbol table. I'll try to come up with a test case later. llvm-svn: 221870
* Object, COFF: Cleanup some code in getSectionNameDavid Majnemer2014-11-131-2/+2
| | | | | | | Use StringRef::startswith to tidy up some code, no functionality change intended. llvm-svn: 221869
* Object, COFF: Fix some theoretical bugsDavid Majnemer2014-11-131-3/+14
| | | | | | | | getObject didn't consider the case where a pointer came before the start of the object file. No test is included, trying to come up with something reasonable. llvm-svn: 221868
* Read 64 bits at a time in the bitcode reader.Rafael Espindola2014-11-131-4/+3
| | | | | | | The reading of 64 bit values could still be optimized, but at least this cuts down on the number of virtual calls to fetch more data. llvm-svn: 221865
* [x86] Teach the vector shuffle lowering to make a more nuanced decisionChandler Carruth2014-11-131-12/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | between splitting a vector into 128-bit lanes and recombining them vs. decomposing things into single-input shuffles and a final blend. This handles a large number of cases in AVX1 where the cross-lane shuffles would be much more expensive to represent even though we end up with a fast blend at the root. Instead, we can do a better job of shuffling in a single lane and then inserting it into the other lanes. This fixes the remaining bits of Halide's regression captured in PR21281 for AVX1. However, the bug persists in AVX2 because I've made this change reasonably conservative. The cases where it makes sense in AVX2 to split into 128-bit lanes are much more rare because we can often do full permutations across all elements of the 256-bit vector. However, the particular test case in PR21281 is an example of one of the rare cases where it is *always* better to work in a single 128-bit lane. I'm going to try to teach the logic to detect and form the good code even in AVX2 next, but it will need to use a separate heuristic. Finally, there is one pesky regression here where we previously would craftily use vpermilps in AVX1 to shuffle both high and low halves at the same time. We no longer pull that off, and not for any really good reason. Ultimately, I think this is just another missing nuance to the selection heuristic that I'll try to add in afterward, but this change already seems strictly worth doing considering the magnitude of the improvements in common matrix math shuffle patterns. As always, please let me know if this causes a surprising regression for you. llvm-svn: 221861
* llvm-readobj: Print out address table when dumping COFF delay-import tableRui Ueyama2014-11-131-0/+14
| | | | llvm-svn: 221855
* Add an assert and a test that verify r221709's fix.Frederic Riss2014-11-131-2/+4
| | | | llvm-svn: 221854
* [x86] Don't form overly fragmented blends when splitting andChandler Carruth2014-11-131-2/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | re-combining shuffles because nothing was available in the wider vector type. The key observation (which I've put in the comments for future maintainers) is that at this point, no further combining is really possible. And so even though these shuffles trivially could be combined, we need to actually do that as we produce them when producing them this late in the lowering. This fixes another (huge) part of the Halide vector shuffle regressions. As it happens, this was already well covered by the tests, but I hadn't noticed how bad some of these got. The specific patterns that turn directly into unpckl/h patterns were occurring *many* times in common vector processing code. There are still more problems here sadly, but trying to incrementally tease them apart and it looks like this is the core of the problem in the splitting logic. There is some chance of regression here, you can see it in the test changes. Specifically, where we stop forming pshufb in some cases, it is possible that pshufb was in fact faster. Intel "says" that pshufb is slower than the instruction sequences replacing it. llvm-svn: 221852
* [CodeGenPrepare] Handle zero extensions in the TypePromotionHelper.Quentin Colombet2014-11-131-111/+143
| | | | | | | | | | | | | | | | | | | Prior to this patch the TypePromotionHelper was promoting only sign extensions. Supporting zero extensions changes: - How constants are extended. - How sign extensions, zero extensions, and truncate are composed together. - How the type of the extended operation is recorded. Now we need to know the kind of the extension as well as its type. Each change is fairly small, unlike the diff. Most of the diff are comments/variable renaming to say "extension" instead of "sign extension". The performance improvements on the test suite are within the noise. Related to <rdar://problem/18310086>. llvm-svn: 221851
* [FastISel][AArch64] Optimize select when one of the operands is a 'true' or ↵Juergen Ributzka2014-11-131-0/+61
| | | | | | | | | | | 'false' value. Optimize selects of i1 in the presence of 'true' and 'false' operands to simple logic operations. This fixes rdar://problem/18960150. llvm-svn: 221848
* [FastISel][AArch64] Fold the cmp into the select when possible.Juergen Ributzka2014-11-131-0/+54
| | | | | | | | | This folds the compare emission into the select emission when possible, so we can directly use the flags and don't have to emit a separate compare. Related to rdar://problem/18960150. llvm-svn: 221847
* [FastISel][AArch64] Extend 'select' lowering to support also i1 to i16.Juergen Ributzka2014-11-131-34/+46
| | | | | | Related to rdar://problem/18960150. llvm-svn: 221846
* Revert "[dwarfdump] Add support for dumping accelerator tables."Frederic Riss2014-11-135-194/+0
| | | | | | | | | | This reverts commit r221836. The tests are asserting on some buildbots. This also reverts the test part of r221837 as it relies on dwarfdump dumping the accelerator tables. llvm-svn: 221842
* Improve long path name support on Windows.Paul Robinson2014-11-131-38/+66
| | | | | | | | | | Windows normally limits the length of an absolute path name to 260 characters; directories can have lower limits. These limits increase to about 32K if you use absolute paths with the special '\\?\' prefix. Teach Support\Windows\Path.inc to use that prefix as needed. TODO: Other parts of Support could also learn to use this prefix. llvm-svn: 221841
* Teach ScalarEvolution to sharpen range information.Sanjoy Das2014-11-131-0/+60
| | | | | | | | | | | | | | | | | | If x is known to have the range [a, b), in a loop predicated by (icmp ne x, a) its range can be sharpened to [a + 1, b). Get ScalarEvolution and hence IndVars to exploit this fact. This change triggers an optimization to widen-loop-comp.ll, so it had to be edited to get it to pass. This change was originally landed in r219834 but had a bug and broke ASan. It was reverted in r219878, and is now being re-landed after fixing the original bug. phabricator: http://reviews.llvm.org/D5639 reviewed by: atrick llvm-svn: 221839
* Fix emission of Dwarf accelerator table when there are multiple CUs.Frederic Riss2014-11-123-7/+10
| | | | | | | | The DIE offset in the accel tables is an offset relative to the start of the debug_info section, but we were encoding the offset to the start of the containing CU. llvm-svn: 221837
* [dwarfdump] Add support for dumping accelerator tables.Frederic Riss2014-11-125-0/+194
| | | | | | | The class used for the dump only allows to dump for the moment, but it can (and will) be easily extended to support search also. llvm-svn: 221836
* Allow DWARFFormValue::extractValue to be called with a null CU.Frederic Riss2014-11-121-15/+17
| | | | | | | | | | | | | | | Currently FormValues are only used for attributes of DIEs and thus uers always have a CU lying around when calling into the FormValue API. Accelerator tables encode their information using the same Forms as the attributes, thus it is natural to use DWARFFormValue to extract/dump them. There is no CU in that case though. Allow the API to be called with a null CU arguemnt by making the RelocMap lookup conditional on the CU pointer validity. And document this new behvior in the header. (Test coverage for this use of the API comes in the DwarfAccelTable support patch) llvm-svn: 221835
* Remove unsused variables.Frederic Riss2014-11-121-2/+0
| | | | llvm-svn: 221834
* [CodeGenPrepare] Replace other uses of EVT::getEVT with TL::getValueType.Ahmed Bougacha2014-11-121-5/+5
| | | | | | | | | | | | | | | | | | | | | r221820 fixed a problem (PR21548) where an iPTR was used in TLI legality checks, which isn't valid and resulted in a failed assertion. The solution was to lower pointer types into the correct target's VT, by using TL::getValueType instead of EVT::getEVT. This commit changes 3 other uses of EVT::getEVT, but without any tests: - One of these non-lowered EVTs is passed to allowsMisalignedMemoryAccesses, which goes into target's TL implementation and doesn't cause any problem (yet.) - Two others are passed to TLI.isOperationLegalOrCustom: - one only looks at extensions, so doesn't concern pointers. - one only looks at binary operators, so also isn't a problem. The latter might some day be exposed to pointers and cause the same assert as the original PR, because there's a comment hinting at also supporting cast ops. For consistency, update all of them and be done with it. llvm-svn: 221827
* [CodeGenPrepare][AArch64] Fix a TLI legality check on iPTR to use a lowered ↵Ahmed Bougacha2014-11-121-2/+2
| | | | | | | | instead. Fixes PR21548. Related to PR20474. llvm-svn: 221820
* Expose the number of Newton-Raphson iterations applied to the hardware's ↵Sanjay Patel2014-11-121-3/+7
| | | | | | | | | | | | reciprocal estimate as a parameter (x86). This is a follow-on to r221706 and r221731 and discussed in more detail in PR21385. This patch also loosens the testcase checking for btver2. We know that the "1.0" will be loaded, but we can't tell exactly when, so replace the CHECK-NEXT specifiers with plain CHECKs. The CHECK-NEXT sequence relied on a quirk of post-RA-scheduling that may change independently of anything in these tests. llvm-svn: 221819
* Add fortified (__*_chk) library functions to TLI (NFC)Ahmed Bougacha2014-11-122-17/+15
| | | | | | | | | | | | One of them (__memcpy_chk) was already there, the others were checked by comparing function names. Note that the fortified libfuncs are now part of TLI, but are always available, because they aren't generated, only optimized into the non-checking versions. Differential Revision: http://reviews.llvm.org/D6179 llvm-svn: 221817
* Temporary fix for PR21528 - use mangled C++ function names in COFF debug ↵Timur Iskhodzhanov2014-11-121-1/+8
| | | | | | info to un-break ASan on Windows llvm-svn: 221813
* [COFF] Make it clearer that the symbols subsection holds function display ↵Timur Iskhodzhanov2014-11-121-1/+1
| | | | | | name rather than just name llvm-svn: 221812
* [AVX512] Add integer shift by immediate intrinsics.Cameron McInally2014-11-122-1/+12
| | | | llvm-svn: 221811
* Changing a StringRef::begin() call into StringRef::data(); NFC.Aaron Ballman2014-11-121-1/+1
| | | | llvm-svn: 221808
* Use the return of readBytes to find out if we are at the end of the stream.Rafael Espindola2014-11-121-12/+0
| | | | | | | This allows the removal of isObjectEnd and opens the way for reading 64 bits at a time. llvm-svn: 221804
* CGSCC should not treat intrinsic calls like function calls (PR21403)Sanjay Patel2014-11-121-1/+8
| | | | | | | | | | | | | | | | Make the handling of calls to intrinsics in CGSCC consistent: they are not treated like regular function calls because they are never lowered to function calls. Without this patch, we can get dangling pointer asserts from the subsequent loop that processes callsites because it already ignores intrinsics. See http://llvm.org/bugs/show_bug.cgi?id=21403 for more details / discussion. Differential Revision: http://reviews.llvm.org/D6124 llvm-svn: 221802
* Fix broken doxygen annotations, NFCJingyue Wu2014-11-122-4/+0
| | | | llvm-svn: 221801
* Disable indvar widening if arithmetics on the wider type are more expensiveJingyue Wu2014-11-122-12/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Reapply r221772. The old patch breaks the bot because the @indvar_32_bit test was run whether NVPTX was enabled or not. IndVarSimplify should not widen an indvar if arithmetics on the wider indvar are more expensive than those on the narrower indvar. For instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is twice as expensive as that on i32, because the hardware needs to simulate a 64-bit integer using two 32-bit integers. Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo. Fixes PR21148. Test Plan: Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics on the wider type are more expensive. This test is run only when NVPTX is enabled. Reviewers: jholewinski, eliben, meheff, atrick Reviewed By: atrick Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6196 llvm-svn: 221799
* remove function names from comments; NFCSanjay Patel2014-11-121-11/+10
| | | | llvm-svn: 221798
* Return the number of read bytes in MemoryObject::readBytes.Rafael Espindola2014-11-122-15/+31
| | | | | | | Returning more information will allow BitstreamReader to be simplified a bit and changed to read 64 bits at a time. llvm-svn: 221794
* Add support for small-model PIC for PowerPC.Justin Hibbits2014-11-126-63/+117
| | | | | | | | | | | | | | | | | | | | Summary: Large-model was added first. With the addition of support for multiple PIC models in LLVM, now add small-model PIC for 32-bit PowerPC, SysV4 ABI. This generates more optimal code, for shared libraries with less than about 16380 data objects. Test Plan: Test cases added or updated Reviewers: joerg, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, mcrosier, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D5399 llvm-svn: 221791
* Reduce code duplication a bit. NFC.Rafael Espindola2014-11-121-2/+2
| | | | llvm-svn: 221785
* Fixing a -Wcast-qual warning; NFC.Aaron Ballman2014-11-121-2/+3
| | | | llvm-svn: 221781
* [mips][micromips] Add predicate 'InMicroMips' at CodeGen patterns for ↵Zoran Jovanovic2014-11-121-1/+2
| | | | | | | | microMIPS instructions Differential Revision: http://reviews.llvm.org/D6198 llvm-svn: 221780
* [x86] Start improving the matching of unpck instructions based on testChandler Carruth2014-11-121-0/+6
| | | | | | | | cases from Halide folks. This initial step was extracted from a prototype change by Clay Wood to try and address regressions found with Halide and the new vector shuffle lowering. llvm-svn: 221779
* AVX-512: Intrinsics for ERIElena Demikhovsky2014-11-125-59/+94
| | | | | | | | | 3 instructions: vrcp28, vrsqrt28, vexp2, only vector forms. Intrinsics include SAE (Suppres All Exceptions) parameter. http://reviews.llvm.org/D6214 llvm-svn: 221774
* Reverts r221772 which fails testsJingyue Wu2014-11-122-69/+12
| | | | llvm-svn: 221773
* Disable indvar widening if arithmetics on the wider type are more expensiveJingyue Wu2014-11-122-12/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: IndVarSimplify should not widen an indvar if arithmetics on the wider indvar are more expensive than those on the narrower indvar. For instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is twice as expensive as that on i32, because the hardware needs to simulate a 64-bit integer using two 32-bit integers. Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo. Fixes PR21148. Test Plan: Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics on the wider type are more expensive. Reviewers: jholewinski, eliben, meheff, atrick Reviewed By: atrick Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6196 llvm-svn: 221772
* [PowerPC] Add vec_vsx_ld and vec_vsx_st intrinsicsBill Schmidt2014-11-123-8/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for PowerPC, which provide programmer access to the lxvd2x, lxvw4x, stxvd2x, and stxvw4x instructions. New LLVM intrinsics are provided to represent these four instructions in IntrinsicsPowerPC.td. These are patterned after the similar intrinsics for lvx and stvx (Altivec). In PPCInstrVSX.td, these intrinsics are tied to the code gen patterns, with additional patterns to allow plain vanilla loads and stores to still generate these instructions. At -O1 and higher the intrinsics are immediately converted to loads and stores in InstCombineCalls.cpp. This will open up more optimization opportunities while still allowing the correct instructions to be generated. (Similar code exists for aligned Altivec loads and stores.) The new intrinsics are added to the code that checks for consecutive loads and stores in PPCISelLowering.cpp, as well as to PPCTargetLowering::getTgtMemIntrinsic(). There's a new test to verify the correct instructions are generated. The loads and stores tend to be reordered, so the test just counts their number. It runs at -O2, as it's not very effective to test this at -O0, when many unnecessary loads and stores are generated. I ended up having to modify vsx-fma-m.ll. It turns out this test case is slightly unreliable, but I don't know a good way to prevent problems with it. The xvmaddmdp instructions read and write the same register, which is one of the multiplicands. Commutativity allows either to be chosen. If the FMAs are reordered differently than expected by the test, the register assignment can be different as a result. Hopefully this doesn't change often. There is a companion patch for Clang. llvm-svn: 221767
* Merge StreamableMemoryObject into MemoryObject.Rafael Espindola2014-11-123-10/+8
| | | | | | | | | Every MemoryObject is a StreamableMemoryObject since the removal of StringRefMemoryObject, so just merge the two. I will clean up the MemoryObject interface in the upcoming commits. llvm-svn: 221766
OpenPOWER on IntegriCloud