summaryrefslogtreecommitdiffstats
path: root/llvm/include
Commit message (Collapse)AuthorAgeFilesLines
...
* [CodeGenPrepare] Move sign/zero extensions near loads using type promotion.Quentin Colombet2014-12-161-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch extends the optimization in CodeGenPrepare that moves a sign/zero extension near a load when the target can combine them. The optimization may promote any operations between the extension and the load to make that possible. Although this optimization may be beneficial for all targets, in particular AArch64, this is enabled for X86 only as I have not benchmarked it for other targets yet. ** Context ** Most targets feature extended loads, i.e., loads that perform a zero or sign extension for free. In that context it is interesting to expose such pattern in CodeGenPrepare so that the instruction selection pass can form such loads. Sometimes, this pattern is blocked because of instructions between the load and the extension. When those instructions are promotable to the extended type, we can expose this pattern. ** Motivating Example ** Let us consider an example: define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) { %ld = load i8* %addr1 %zextld = zext i8 %ld to i32 %ld2 = load i32* %addr2 %add = add nsw i32 %ld2, %zextld %sextadd = sext i32 %add to i64 %zexta = zext i8 %a to i32 %addza = add nsw i32 %zexta, %zextld %sextaddza = sext i32 %addza to i64 %addb = add nsw i32 %b, %zextld %sextaddb = sext i32 %addb to i64 call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb) ret void } As it is, this IR generates the following assembly on x86_64: [...] movzbl (%rdi), %eax # zero-extended load movl (%rsi), %es # plain load addl %eax, %esi # 32-bit add movslq %esi, %rdi # sign extend the result of add movzbl %dl, %edx # zero extend the first argument addl %eax, %edx # 32-bit add movslq %edx, %rsi # sign extend the result of add addl %eax, %ecx # 32-bit add movslq %ecx, %rdx # sign extend the result of add [...] The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA. Now, by promoting the additions to form more extended loads we would generate: [...] movzbl (%rdi), %eax # zero-extended load movslq (%rsi), %rdi # sign-extended load addq %rax, %rdi # 64-bit add movzbl %dl, %esi # zero extend the first argument addq %rax, %rsi # 64-bit add movslq %ecx, %rdx # sign extend the second argument addq %rax, %rdx # 64-bit add [...] The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA. This kind of sequences happen a lot on code using 32-bit indexes on 64-bit architectures. Note: The throughput numbers are similar on Sandy Bridge and Haswell. ** Proposed Solution ** To avoid the penalty of all these sign/zero extensions, we merge them in the loads at the beginning of the chain of computation by promoting all the chain of computation on the extended type. The promotion is done if and only if we do not introduce new extensions, i.e., if we do not degrade the code quality. To achieve this, we extend the existing “move ext to load” optimization with the promotion mechanism introduced to match larger patterns for addressing mode (r200947). The idea of this extension is to perform the following transformation: ext(promotableInst1(...(promotableInstN(load)))) => promotedInst1(...(promotedInstN(ext(load)))) The promotion mechanism in that optimization is enabled by a new TargetLowering switch, which is off by default. In other words, by default, the optimization performs the “move ext to load” optimization as it was before this patch. ** Performance ** Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10. Tested Optimization Levels: O3/Os Tests: llvm-testsuite + externals. Results: - No regression beside noise. - Improvements: CINT2006/473.astar: ~2% Benchmarks/PAQ8p: ~2% Misc/perlin: ~3% The results are consistent for both O3 and Os. <rdar://problem/18310086> llvm-svn: 224351
* Remove the last unnecessary member variable of mapped_file_region. NFC.Rafael Espindola2014-12-161-3/+0
| | | | llvm-svn: 224312
* Convert a member variable to a local variable. NFC.Rafael Espindola2014-12-161-1/+0
| | | | llvm-svn: 224311
* Remove unused member and simplify. NFC.Rafael Espindola2014-12-161-1/+0
| | | | llvm-svn: 224309
* Start adding thin archive support.Rafael Espindola2014-12-161-5/+4
| | | | | | This is just sufficient for 'ar t' to work. llvm-svn: 224307
* Silence more static analyzer warnings.Michael Ilseman2014-12-153-1/+13
| | | | | | | | Add in definedness checks for shift operators, null checks when pointers are assumed by the code to be non-null, and explicit unreachables. llvm-svn: 224255
* Sink store based on alias analysisElena Demikhovsky2014-12-151-12/+15
| | | | | | | | | | | | - by Ella Bolshinsky The alias analysis is used define whether the given instruction is a barrier for store sinking. For 2 identical stores, following instructions are checked in the both basic blocks, to determine whether they are sinking barriers. http://reviews.llvm.org/D6420 llvm-svn: 224247
* AVX-512: Added EXPAND instructions and intrinsics.Elena Demikhovsky2014-12-151-0/+102
| | | | llvm-svn: 224241
* ThreadLocal: Return a mutable pointer if templated with a non-const typeDavid Majnemer2014-12-151-1/+1
| | | | | | | It makes more sense for ThreadLocal<const T>::get to return a const T* and ThreadLocal<T>::get to return a T*. llvm-svn: 224225
* Loop Vectorizer minor changes in the code - Elena Demikhovsky2014-12-141-2/+2
| | | | | | | | some comments, function names, identation. Reviewed here: http://reviews.llvm.org/D6527 llvm-svn: 224218
* Fix Doxygen command misspellings.Benjamin Kramer2014-12-131-2/+2
| | | | | | Found by -Wdocumentation. llvm-svn: 224197
* Silencing a *lot* of -Wsign-compare warnings; NFC.Aaron Ballman2014-12-131-1/+2
| | | | llvm-svn: 224194
* Clean up static analyzer warnings.Michael Ilseman2014-12-122-1/+5
| | | | | | | | | Clang's static analyzer found several potential cases of undefined behavior, use of un-initialized values, and potentially null pointer dereferences in tablegen, Support, MC, and ADT. This cleans them up with specific assertions on the assumptions of the code. llvm-svn: 224154
* Pass a FD to resise_file and add a testcase.Rafael Espindola2014-12-121-2/+2
| | | | | | I will add a real use in another commit. llvm-svn: 224136
* Remove unused feature. NFC.Rafael Espindola2014-12-121-1/+1
| | | | llvm-svn: 224135
* Emit Tag_ABI_FP_16bit_format build attribute.Charlie Turner2014-12-121-0/+3
| | | | | | | | | | | | | The __fp16 type is unconditionally exposed. Since -mfp16-format is not yet supported, there is not a user switch to change this behaviour. This build attribute should capture the default behaviour of the compiler, which is to expose the IEEE 754 version of __fp16. When -mfp16-format is emitted, that will be the way to control the value of this build attribute. Change-Id: I8a46641ff0fd2ef8ad0af5f482a6d1af2ac3f6b0 llvm-svn: 224115
* Update the modules build to match r223802.Richard Smith2014-12-121-1/+2
| | | | llvm-svn: 224091
* Document that PassManager::add() may delete the pass right away.Matthias Braun2014-12-121-10/+2
| | | | | | | | Also remove redundant documentation: - doxygen will copy documentation to overriden methods. - Use \copydoc on PIMPL classes instead of replicating the text. llvm-svn: 224089
* Comment and minor code cleanup for GCStrategy (NFC)Philip Reames2014-12-121-45/+83
| | | | | | Updating comments to reflect the current state of the world after my recent changes to ownership structure and generally better describe what a GCStrategy is and how it works. llvm-svn: 224086
* Add target hook for whether it is profitable to reduce load widthsMatt Arsenault2014-12-121-0/+10
| | | | | | | | Add an option to disable optimization to shrink truncated larger type loads to smaller type loads. On SI this prevents using scalar load instructions in some cases, since there are no scalar extloads. llvm-svn: 224084
* Bitcode: Use unsigned char to record MDStringsDuncan P. N. Exon Smith2014-12-112-0/+10
| | | | | | | | | | `MDString`s can have arbitrary characters in them. Prevent an assertion that fired in `BitcodeWriter` because of sign extension by copying the characters into the record as `unsigned char`s. Based on a patch by Keno Fischer; fixes PR21882. llvm-svn: 224077
* Bitcode: Add METADATA_NODE and METADATA_VALUEDuncan P. N. Exon Smith2014-12-111-2/+2
| | | | | | | | | | | | | | | | This reflects the typelessness of `Metadata` in the bitcode format, removing types from all metadata operands. `METADATA_VALUE` represents a `ValueAsMetadata`, and always has two fields: the type and the value. `METADATA_NODE` represents an `MDNode`, and unlike `METADATA_OLD_NODE`, doesn't store types. It stores operands at their ID+1 so that `0` can reference `nullptr` operands. Part of PR21532. llvm-svn: 224073
* Bitcode: Add `OLD_` prefix to metadata node recordsDuncan P. N. Exon Smith2014-12-111-2/+2
| | | | | | | | I'm about to change these, so move the old ones out of the way. Part of PR21532. llvm-svn: 224070
* IR: Store MDNodes in a separate LeakDetector containerDuncan P. N. Exon Smith2014-12-111-0/+13
| | | | | | | | | | | | | | | | This gives us better leak detection messages, like `Value` has. This also has the side effect of papering over a problem where `MachineInstr`s are added as garbage to the leak detector and then deleted without being removed. If `MDNode::getTemporary()` allocates an `MDNodeFwdDecl` in the same spot, the leak detector asserts. By separating `MDNode`s into their own container we lose that assertion. Since `MachineInstr` is required to have a trivial destructor, its usage of `LeakDetector` at all is pretty suspect. I'll be sending a patch soon to strip that out. llvm-svn: 224060
* [CodeGen] Add print and verify pass after each MachineFunctionPass by defaultMatthias Braun2014-12-111-29/+32
| | | | | | | | | | | | | | | | | | | Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. This is the 2nd attempt at this after realizing that PassManager::add() may actually delete the pass. llvm-svn: 224059
* LeakDetector: Simplify code and fix comments, NFCDuncan P. N. Exon Smith2014-12-111-18/+8
| | | | | | | | | | | | Rather than requiring overloads in the wrapper and the impl, just overload the impl and use templates in the wrapper. This makes it less error prone to add more overloads (`void *` defeats any chance the compiler has at noticing bugs, so the easier the better). At the same time, correct the comment that was lying about not changing functionality for `Value`. llvm-svn: 224058
* Remove a convoluted way of calling close by moving the call to the only caller.Rafael Espindola2014-12-111-9/+3
| | | | | | As a bonus we can actually check the return value. llvm-svn: 224046
* This reverts commit r224043 and r224042.Rafael Espindola2014-12-111-32/+29
| | | | | | check-llvm was failing. llvm-svn: 224045
* [CodeGen] Add print and verify pass after each MachineFunctionPass by defaultMatthias Braun2014-12-111-29/+32
| | | | | | | | | | | | | | | | Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. llvm-svn: 224042
* [CodeGen] Let MachineVerifierPass own its banner stringMatthias Braun2014-12-111-1/+1
| | | | llvm-svn: 224041
* Remove dead code. NFC.Rafael Espindola2014-12-111-18/+0
| | | | llvm-svn: 224029
* [AVX512] Add support for 512b variable bit shift intrinsics.Cameron McInally2014-12-111-0/+25
| | | | llvm-svn: 224028
* AVX-512: Added all forms of COMPRESS instructionElena Demikhovsky2014-12-111-0/+102
| | | | | | + intrinsics + tests llvm-svn: 224019
* Make MultiClass::DefPrototypes own their Records to fix memory leaks.Craig Topper2014-12-111-1/+1
| | | | llvm-svn: 223998
* GCStrategy should not own GCFunctionInfoPhilip Reames2014-12-113-34/+31
| | | | | | | | | | | | This change moves the ownership and access of GCFunctionInfo (the object which describes the safepoints associated with a safepoint under GCRoot) to GCModuleInfo. Previously, this was owned by GCStrategy which was in turned owned by GCModuleInfo. This made GCStrategy module specific which is 'surprising' given it's name and other purposes. There's a few more changes needed, but we're getting towards the point we can reuse GCStrategy for gc.statepoint as well. p.s. The style of this code ends up being a mess. I was trying to move code around without otherwise changing much. Once I get the ownership structure rearranged, I will go through and fixup spacing, naming, comments etc. Differential Revision: http://reviews.llvm.org/D6587 llvm-svn: 223994
* LiveInterval: Use range based for loops for subregister ranges.Matthias Braun2014-12-111-0/+8
| | | | llvm-svn: 223991
* LiveInterval: Use more range based for loops for value numbers and segments.Matthias Braun2014-12-101-9/+7
| | | | llvm-svn: 223978
* Move three methods only used by MCJIT to MCJIT.Rafael Espindola2014-12-104-43/+24
| | | | | | | | These methods are only used by MCJIT and are very specific to it. In fact, they are also fairly specific to the fact that we have a dynamic linker of relocatable objects. llvm-svn: 223964
* IR: Move call to dropAllReferences() to MDNode subclassesDuncan P. N. Exon Smith2014-12-101-2/+2
| | | | | | | Don't call `dropAllReferences()` from `MDNode::~MDNode()`, call it directly from `~MDNodeFwdDecl()` and `~GenericMDNode()`. llvm-svn: 223904
* MCRegisterInfo: Add MCSubRegIndexIterator.Matthias Braun2014-12-101-0/+33
| | | | | | | This iterator iterates over subregister and their associated subregister indices at the same time. llvm-svn: 223893
* LiveIntervalUnion: Allow specification of liverange when unifying/extracting.Matthias Braun2014-12-101-2/+8
| | | | | | This allows it to add subregister ranges into the union. llvm-svn: 223890
* Tablegen'erate lanemasks for register units.Matthias Braun2014-12-101-0/+39
| | | | | | Now we can relate lanemasks in a virtual register to register units. llvm-svn: 223889
* RegisterCoalescer: Preserve subregister liveranges.Matthias Braun2014-12-101-5/+6
| | | | llvm-svn: 223888
* LiveInterval: Add removeEmptySubRanges().Matthias Braun2014-12-101-0/+4
| | | | llvm-svn: 223887
* LiveIntervalAnalysis: Add subregister aware variants pruneValue().Matthias Braun2014-12-101-3/+10
| | | | llvm-svn: 223886
* LiveInterval: Introduce LiveQuery accessor for dead or live out values.Matthias Braun2014-12-101-0/+6
| | | | llvm-svn: 223885
* Add a flag to enable/disable subregister liveness.Matthias Braun2014-12-102-0/+14
| | | | llvm-svn: 223884
* LiveIntervalAnalysis: Adapt repairIntervalsInRange() to subregister liveness.Matthias Braun2014-12-101-0/+9
| | | | llvm-svn: 223883
* LiveIntervalAnalysis: Update SubRanges in shrinkToUses().Matthias Braun2014-12-101-5/+14
| | | | llvm-svn: 223880
* LiveIntervalAnalysis: Make computeDeadValues() private.Matthias Braun2014-12-101-11/+9
| | | | llvm-svn: 223879
OpenPOWER on IntegriCloud