|
|
chrismillward.com © Chris Millward 2010-2011 |
|
|
|
|
|
|
|
| |
|
All the information, code samples and anything else contained in the articles below may be used, modified and distributed without restriction of any kind. Feel free to contact me with any comments, etc.
|
| |
| |
27th September 2010 Windows Handles: STRICT or NO_STRICT?
[Read]
[-]
|
| |
| Ever since time began, Windows.h has included typedefs that define empty structs. Something more-or-less equivalent to this: |
#define DECLARE_HANDLE(h) typedef struct handle_##h {}* h;
DECLARE_HANDLE(HWND) |
| Why would you want to do this? It's just a trick to enforce compiler type checking. So an HWND (Handle to WiNDow) is different from an HBRUSH (Handle to BRUSH). In reality, however, they are both just pointers (addresses, plain old numbers even...) So if you're writing native code and want to avoid a lot of casting, #define NO_STRICT before including windows.h; this means STRICT is not #defined and a handle becomes just a handle, i.e. a (void*). IMHO this actually simplifies writing code: because you're not spending all your time making sure you cast every handle to the correct 'type', you're actually less likely to make mistakes, such as the one I made when writing this (now corrected), where I forgot to call DeleteObject() on hPen. |
| |
| For example, with STRICT #defined: |
HDC hdc = GetDC(hwnd);
HPEN hPen = CreatePen(1, PS_SOLID, GetSysColor(COLOR_BTNFACE));
HPEN hPenPrev = (HPEN)SelectObject(hdc, hPen);
...
SelectObject(hdc, hPenPrev);
DeleteObject(hPen);
ReleaseDC(hwnd, hdc); |
| Without the type checking enforced by STRICT, this becomes: |
void *hdc = GetDC(hwnd);
void *hPen = CreatePen(1, PS_SOLID, GetSysColor(COLOR_BTNFACE));
void *hPenPrev = SelectObject(hdc, hPen);
...
SelectObject(hdc, hPenPrev);
DeleteObject(hPen);
ReleaseDC(hwnd, hdc); |
| You could also use ‘HANDLE’ in place of ‘void*’. The point is, as far as Windows is concerned, all handles are the same thing; they simply point to memory that Windows has provided for whatever (Windows-) object you're working with. Understanding this takes you one step closer to understanding what a 'Windows program' actually is and what Windows itself actually does. (Note, STRICT is #defined by default, and if you're using MFC/ATL, you must leave it defined.) |
| |
| A final point to remember is that that casting costs nothing. All a cast does is direct the compiler to interpret something as something else. So there is absolutely zero performance hit from repeated (compile-time) casting. |
| |
| [Direct link to this article] |
| |
|
| |
|
| |
28th September 2010 Stack Frames In Native Code
[Read]
[-]
|
| |
| For anyone not in the loop, stack frames are basically 'snippets' of stack memory used to contain auto type- (more accurately storage class) variables. The stack is basically a machine/OS-provided, rapidly-accesible data area. Variables are 'added' to the stack from the top down (i.e. the largest stack address is used first - less of the size of the variable - all the way down to 0). Read this article if you're not with me. |
| |
| The way this works on x86-based processors, and by convention, is that a register, esp (or rsp in 64-bit environments) is maintained - presumably by the OS, but intended for the purpose at the hardware level (I honestly don't know, but it's a good enough model) - that contains the current position (address) of the stack which, as you can gather from the previous paragraph, will decrease as the number/size of auto variables increases. |
| |
| There is then a second register (ebp/rbp) which is used to store the current value of esp/rsp at the beginning of every function call. This value is then restored to its previous value after the function returns. (Don't forget, a 'function' is simply a standardised block of code where certain conventions are used to make coding an application and interfacing with the OS less messy.) |
| |
| A very useful side-effect of this approach is that if you dereference ebp (*ebp in C/C++ speak), you automatically get the previous value of ebp. Why? Because at every function call (regardless of calling convention), the value of esp is assigned to ebp and then ebp is itself immediately pushed onto the stack (i.e. at the very address contaned in ebp, because ebp is the current top of the stack at that point). So you have a 'naturally-occuring' singly-linked-list of function start addresses. |
| |
| In brief, it's actually very simple (and inexpensive, performance-wise) to walk the stack and find where an auto variable is located. The following is an actual function I use in native C++ code (more-or-less verbatim) to help avoid unnecessary memory copying. Note you must turn off MSVC's /Oy optimisation for this to work. (You can use a #pragma to override the project settings.) |
#pragma optimize("y", off)
static BYTE __declspec(naked) IsAuto32(void*)
{
__asm
{
mov eax, [esp+4] cmp eax, esp
jbe exit0 push ebx
mov ebx, ebp
loop0:
cmp ebx, 0
je exit1
cmp eax, ebx
mov ebx, [ebx] ja loop0
mov eax, 1
pop ebx
ret
exit1:
pop ebx
mov eax, 0
exit0:
ret
}
} |
| |
| [Direct link to this article] |
| |
|
| |
|
7th October 2010 Binary Search Trees
[Read]
[-]
|
| |
| Binary search trees (BST for short) are highly efficient data structures for storing sortable information. Rather than adding information and re-sorting, you add the information already in order, and cheaply performance-wise. Retrieval is also cheap. There are various types of BSTs, and a wealth of information on the web and elsewhere, so I won't go into too much detail here. |
| |
| The basic idea, though, is that whereas other complex data types (e.g. linked list) may have nodes with pointers forwards and backwards, BST nodes have '<' and '>=' pointers (referred to respectively as 'left' and 'right'). When you insert into a binary tree, for example, you follow the pointer trail of lefts (<) and rights (>=) until you find the appropriate place to put the new node. Since at any given node you can 'branch off' in one of two directions, the result can be viewed, naturally, as a tree structure. |
| |
| A simple BST class might look like this (implementing in declaration for brevity): |
template<class T, typename K>
class BST_Std
{
public:
BST_Std() :
_pRootNode(0),
_cbKeyOffset((DWORD)-1)
{
}
~BST_Std()
{
RemoveAll();
}
public: void Insert(T *pItem, K &key = _GetKey(pItem))
{ assert((BYTE*)&key >= (BYTE*)pItem);
assert((BYTE*)&key < (BYTE*)pItem + sizeof(K)); DWORD cbKeyOffset = (DWORD)((BYTE*)&key - (BYTE*)pItem);
assert(_cbKeyOffset == (DWORD)-1 || cbKeyOffset == _cbKeyOffset);
_cbKeyOffset = cbKeyOffset;
_Insert(pItem);
}
T *Find(K key)
{
return _Find(key);
}
void Remove(K key)
{
_Remove(key);
}
void RemoveAll()
{
_RemoveAll();
}
protected:
struct Node
{
Node() :
_pLeft(0),
_pRight(0),
_pItem(0)
{
}
Node(T *pItem) :
_pLeft(0),
_pRight(0),
_pItem(pItem)
{
}
T *_pItem;
Node *_pLeft;
Node *_pRight;
};
Node *_pRootNode;
DWORD _cbKeyOffset;
private:
K &_GetKey(T *pItem)
{ assert(_cbKeyOffset != (DWORD)-1);
K &rKey = *(K*)((BYTE*)pItem + _cbKeyOffset);
return rKey;
}
protected:
virtual void _Insert(T *pItem)
{
if(_pRootNode == 0)
{
_pRootNode = new Node(pItem);
return;
} Node *p0 = _pRootNode;
K rootKey = _GetKey(p0->_pItem);
for(Node *p1 = p0; p1 != 0; p0 = ((p1 != 0) ? p1 : p0))
p1 = ((rootKey < _GetKey(p1->_pItem)) ? p1->_pLeft : p1->_pRight); if(rootKey < _GetKey(p0->_pItem))
p0->_pLeft = new Node(pItem);
else
p0->_pRight = new Node(pItem);
}
virtual T *_Find(K key,Node **ppNode = 0,Node **ppPrevNode = 0)
{
if(_pRootNode != 0)
{ Node *p0 = _pRootNode;
K rootKey = _GetKey(p0->_pItem);
for(Node *p1 = p0; p1 != 0; p0 = ((p1 != 0) ? p1 : p0))
{
K curKey = _GetKey(p1->_pItem);
if(curKey == key)
{
if(ppNode)
*ppNode = p1;
return p1->_pItem;
}
if(ppPrevNode)
*ppPrevNode = p1;
p1 = ((rootKey < curKey) ? p1->_pLeft : p1->_pRight);
}
} return 0;
}
virtual void _Remove(K key)
{
Node *pNode, *pPrevNode = 0;
if(_Find(key, &pNode, &pPrevNode) == 0)
return;
if(pPrevNode == 0)
{
delete pNode;
_pRootNode = 0;
return;
} pPrevNode->_pLeft = pNode->_pLeft;
pPrevNode->_pRight = pNode->_pRight;
if(pNode == _pRootNode)
_pRootNode = (pNode->_pLeft ? pNode->_pLeft : pNode->_pRight);
delete pNode;
}
virtual void _RemoveAll()
{
while(_pRootNode != 0)
_Remove(_GetKey(_pRootNode->_pItem));
_cbKeyOffset = (DWORD)-1;
}
}; |
| |
| [Direct link to this article] |
| |
|
| |
|
| |
18th October 2010 You've heard of Copy-On-Write... Is there such a thing as 'Copy-On-Would-Be-Overwritten?'
[Read]
[-]
|
| |
| Well, yes. At first glance, all you need is a so-called 'smart pointer' class that behaves exactly like a real pointer, except that when the pointer object goes out of scope, the memory it was pointing to (assuming there is a non-0 reference count) is copied to a new location, and this new location returned by the * operator from then on by any and all other smart pointers pointing to the same data. |
| |
| However, there is a problem: what happens if the scopes of a given smart pointer and the original data are different? Let's say you have a plain old null-terminated string, which is allocated on the stack in Function1. Function1 then calls Function2, passing the start address of the string. Function2 then assigns it to a smart pointer, does various other things, and finally calls Function3, this time passing the smart pointer object. Function3 performs some operation using the string and, crucially, assigns a global smart pointer the 'value' of the one passed to it, before returning. Function2's smart pointer goes out of scope; it's destructor is called, and Function2 returns. Finally, Function1 returns. |
SmartPtr<LPWSTR> _pSmartStr;
void Function3(SmartPtr<LPWSTR> pSmartStr)
{
...
_pSmartStr = pSmartStr;
MessageBox(0, pSmartStr, pSmartStr, 0);
...
}
void Function2(LPWSTR psz)
{
...
if(...)
{
SmartPtr<LPWSTR> pSmartStr = psz;
Function3(pSmartStr);
}
...
}
void Function1()
{
WCHAR szTest[] = L"TestString";
Function2(szTest);
} |
| |
| It's a crude example, but in this scenario, the data only needs to be copied as Function1 returns, not Function2, and not when the smart pointer's destructor is called as the object 'goes out of scope' from 'inside Function2's if statement'. ({} scoping is for syntax and variable name re-use purposes; it has no effect on memory allocation on the stack itself, but does affect construction and destruction of objects.) As you can see, the data does need to be copied at some stage before (actually, up to and including) Function1 returns because of the global _pSmartStr. |
| |
| Implementation |
| |
| So how do we go about writing a smart pointer class capable of taking the above into account and never unnecessarily copying data, whilst also not producing more of a performance burdon than the simple method of just copying when a smart pointer goes out of scope? |
| |
| Well, firstly, let's deal with the scope vs. function issue. This means we need to 'destruct' each smart pointer object at the end of the function it's declared in, not at the end of its scope. The simplest way to do this (arguably) is to have a single global object that does all the real work, with the smart pointer's ctors and dtors just calling functions of this global object. The global object is then responsible for the functional 'construction' and 'destruction' of each smart pointer object. |
static class _AutoPtrImp
{
template<typename T>
friend class AutoPtr;
private:
void _OnPtrConstruct(void *pAuto, DWORD cbSize,DWORD &rIndex)
{
}
void _OnPtrDestruct(DWORD dwIndex)
{
}
} _autoPtrImp;
template<typename T>
class AutoPtr
{
public:
AutoPtr() :
_index(0)
{
}
AutoPtr(T *pAuto, DWORD count, DWORDindex) :
AutoPtr()
{
_autoPtrImp.OnPtrConstruct(pAuto, count * sizeof(T), &_index);
}
~AutoPtr() :
{
_autoPtrImp.OnPtrDestruct(_index);
}
private:
DWORD _index;
}; |
| |
| Note 'index' is a number we'll use to represent a given set of instances of AutoPtr. (We can't just use the pointer address because if we copy the data, that will change!) |
| |
| The next thing we need is some kind of symbol table to add to _AutoPtrImp so we can keep track of each pointer (by index) and any data we need to associate with it. We'll use a self-balancing AVL-BST class implementation (see previous article and here for the AVLTree implementation). |
| |
| [Direct link to this article] |
| |
|
| |
|
| |
13th November 2010 Why won't VS 2005 let me add controls to an Inherited Form in C#?
[Read]
[-]
|
| |
| The answer is twofold. Firstly, the designer will make your controls on the base form private by default, and so anything in the derived form is unable to access those controls. Secondly, it depends what controls are on the base form. |
| |
| Example scenario: say you're converting a web app to WinForms in C# (not something that happens very often, but bear with me), and you want to imitate some of the web UI... So you have a 'master' form instead of a master page, and all your other forms inherit from this one. (We'll deal with the question of how to switch from one form to another later.) In VS2005, if you open, say FormDefault (intended to replace Default.aspx - which you've inherited from FormMaster) in the Designer, and you have a control such as a TableLayoutPanel on FormMaster, you'll find you can't create any new controls on top of, or within it. |
| |
| The solution to this is fairly simple though. If you make the control on the base form Protected or Public, you can then manually add, say a Panel, to the TableLayoutPanel from within the Designer code-behind file for your derived form (yes, you have to edit the region that says "DO NOT EDIT!"), set it's DockStyle to Fill as well, then you have a new control to start from. |
| |
| Unfortunately you then have another problem. Every time the designer synchronises itself on the derived form, either during a build or after you add some controls, the bit of code you added to the .Designer.cs will disappear. This is because the designer doesn't actually execute any of the code in InitializeComponent(), it just serialises and deserialises using some internal system to keep track of what should be there and what shouldn't. Since there was never any reason for your code (which would be something like this.layout0.Controls.Add(this.panel0);) to be there in the first place (at least as far as the Designer can tell) it won't survive the designer synchronisation process. |
| |
| Fortunately, there is a solution to this, as the framework allows you to add your own code to what the designer does when serialising and deserialising. For the example described: |
| |
using System;
using System.Windows.Forms;
using System.ComponentModel;
using System.ComponentModel.Design;
using System.ComponentModel.Design.Serialization;
using System.CodeDom;
namespace YourAppRootNamespace
{ [DesignerCategory("")]
public static class ContentPanelWrapper
{ [DesignerSerializer(typeof(ContentPanelSerialiser), typeof(CodeDomSerializer))]
public class ContentPanel : Panel
{
} internal class ContentPanelSerialiser : CodeDomSerializer
{
public override object Serialize(IDesignerSerializationManager manager, object value)
{
CodeDomSerializer baseClassSerializer = (CodeDomSerializer)manager.
GetSerializer(typeof(ContentPanel).BaseType, typeof(CodeDomSerializer));
object codeObject = baseClassSerializer.Serialize(manager, value);
if (codeObject is CodeStatementCollection)
{
string sHack0 = "contentPanel0.ResumeLayout();";
string sHack1 = "contentPanel0.TabStop = false;";
string sHack2 = "layout0.Controls.Add(contentPanel0);";
CodeStatementCollection statements = (CodeStatementCollection)codeObject;
CodeSnippetStatement statement0 = new CodeSnippetStatement(sHack0);
CodeSnippetStatement statement1 = new CodeSnippetStatement(sHack1);
CodeSnippetStatement statement2 = new CodeSnippetStatement(sHack2);
statements.Add(statement0);
statements.Add(statement1);
statements.Add(statement2);
}
return codeObject;
}
}
}
|
| |
| You'll then need to add a ContentPanel to your derived form. Easiest way to do this is again to edit the .Designer.cs file, copying the commenting sequence, declaration and 'instantiation'. Bear in mind if you wrap the ContentPanel as I have to stop the designer wanting to design the ContentPanel as well, you'll need to refer to ContentPanelWrapper.ContentPanel in both the contentPanel0 = new ...; and the private ... contentPanel0; parts as well. |
| |
| [Just as a brief aside, one of the good things about Winforms is that you can layer multiple layout controls on top of each other and, unless (a subset of them-) performs more than trivial layout (i.e. has associated handlers etc.) the final window that your program creates does not waste resources by actually creating windows for these controls; usually, what you'll actually get at runtime is a window with appropriate sizing code to produce the effect of all the layout controls, so employing this hack doesn't leave as much of a footprint as you might imagine.] |
| |
| One final note: be careful about adding comments; if you add them as CodeSnippetStatement objects, the designer will ignore the lot (including the ones that aren't comments). There is a class you can use that represents comments as comments, if you want to annotate the modified portion of InitializeComponent(), which may or may not work (haven't tested). In my example, I've omitted this to highlight the modified portion, which gets through without issue. |
| |
| [Direct link to this article] |
| |
|
| |
|
| |
16th December 2010 Why does the forms designer in VS2005 prefix everything with this?
[Read]
[-]
|
| |
| You can work out a bit about how the forms designer works, given the understanding that it does not execute code. The InitializeComponent() function is, however, executed by the program, so of course the designer prefixes with this: to avoid conflicts with any properties/methods/anything that exists in the class that you create. The ironic element to this is that it makes no difference to the designer itself at all, when displaying the form in design mode, because it doesn't care about anything that isn't a designable part of the form. I am probably within a couple of standard deviations of the mean if I say that the winforms designer in VS2005 simply ignores the this. snippets when parsing. |
| |
| [Direct link to this article] |
| |
|
| |
|
| |
26th September 2011 Maintaining the software-User Contract: Startup Delays
[Read]
[-]
|
| |
| Some applications take a long time to start up (and by start up I mean the time it takes for some kind of meaningful, useable UI to appear having double-clicked a shortcut icon on or otherwise launched an application).
|
| |
| Of course most if not all applications do need to do at least some initialisation work, and some applications are just very large in size, so the Windows loader takes longer to load them in to memory. (UPX can be useful here.) The combined result of these two factors is the type of application startup delay I've just described, and it's measurable: the time taken between asking the Shell to execute an application and meaningful, functional and, most importantly, responsive UI to appear. |
| |
| From the user's perspective, these delays are one or more of the following: confusing, irritating, annoying and/or, most importantly contrary to what the user expects to happen. Of course, if the executable is large, and has a good reason for being large, then it's the OS's fault for not loading it quicker. If it's a.NET application, it's the .NET framework's fault for taking so long to host the MSIL-level executable. The problem is, from the user's perspective, it's your icon they've clicked on, and it's your application that has not responded in a timely fashion. |
| |
| So what do you do about it? Well, it's probably fair to say most people dislike so-called 'splash screens', which can often be gratuitous with no indication that they perform any actual function, and thus are often dismissed as just a gimmicky waste of time. You might even conclude that the splash screen itself must slow down the startup of the application, although whether that's what actually happens or not is another matter. |
| |
| The bottom line, however, is that even an experienced user who does understand to some extent how Operating Systems work and, for example, that a .NET application requires the Framework to host it because it's not native machine code, and that these things all take time etc. etc... even those users will experience some frustration and confusion at a startup delay, simply because the expectation isn't matched by what actually happens. |
| |
| So here's my solution: if your application takes a long time to start up, write a tiny win32 executable that displays a simple window with product name, version and copyright, with a simple progress bar that reaches 100 when the actual application UI is up and ready for user input. What I am advocating here is the inclusion of this tiny executable for no functional reason at all: it doesn't do a anything, and the progress bar simply moves up in small random steps until the target application is up and running, whereupon it jumps to 100, remains visible for 200 ms so it's clear 100% has been reached and does not in any way interfere wih the actual startup time of the actual application. And resist the temptation to make the window topmost. If the timing doesn't quite work out, the target application window will naturally appear over the top, and the user is satisfied. |
| |
| Here's Main.cpp of StartNRT.exe which uses exactly this approach to launching my .NET 2 C# app, Network Response Tester. The DLUConv functions are self-explanatory, but you can have those as well. |
| |
Main.cpp [View as Text]
DLUConv.h [View as Text]
DLUConv.cpp [View as Text]
stdafx.h [View as Text] |
| |
| Using MSVC 2010 with the CRT statically linked, the win32 executable is 36KB. Packing it with UPX reduces that to 18KB, with the resulting load time remaining approximately the same on Windows 7, i.e. negligible.
|
| |
| [Direct link to this article] |
| |
|
| |
|
| |
2nd October 2011 ‘Chicken Picken’ Performance: The Window Property List
[Read]
[-]
|
| |
| If you want to write high quality applications for Windows (or any platform for that matter) a good skill to incorporate into your process is the ability to identify opportunities 'on the fly' to squeeze out an extra bit of performance (however trivial it might seem at the time) provided there is no impact on other aspects of your project. Don't forget, you might end up spending hours profiling and modifying your code to reach a performance target, and you might have to make decisions where clarity, maintainability, portability, or some other factor is compromised by your performance 'improvement'... so missing opportunities that cost nothing at all for your project, and which also cost nothing at all to notice in the first place... is all but unforgiveable.
|
| |
| A good example of this is the Window Property List - provided by win32 functions SetProp(), GetProp() and RemoveProp() - which basically allows you to store any type for which sizeof(type) <= (sizeof(HANDLE) == sizeof(void*)), i.e. 32 bits, or the size of a standard int (so pointers included) using a string as key. This can be especially useful if you are using a Window Procedure that is assigned to more than one window. In that case, the traditional 'Petzold method' of using static variables within the wndproc to store window-level attributes (fonts, colours, bitmaps etc.) is no good (but remains fine for 1:1 window:wndproc scenarios). |
| |
| I'll skip further discussion of how the Windows Property List is used; you can find all you need on msdn. There is, however, a very simple way to squeeze some performance from using it, and that is simply to use short strings as your keys. You can use const strings or #defines where the variable name describes what the property holds; the actual string key can be more-or-less anything. Given that, and the fact that however Windows stores your strings and associated properties, comparing short strings will always be faster than comparing long strings, why not make them as short as possible? As in L"_1", L"_2", L"_3" ... |
| |
| Note using L'_' as a prefix works on Windows 2000 through Windows 7 (all flavours) and I can say that with certainty because I've tested it on all of them. You do run into problems if you use certain other characters and (if memory serves) omitting the prefix char altogether doesn't work under certain circumstances. Of course you don't have to use numbers... nothing wrong with L"_a", L"_b", L"_c" ... Or you could use hex digits for the longest single-char sequence (L'0' - L'F') making it the optimum choice. |
| |
| By substituting meaningless 2-3 char strings for 'descriptive' strings in this way you instantly get two benefits. Firstly, and most obviously, Windows will take less time to create, locate or delete any property you request because it only has to compare short, standardised strings. Secondly, the fact that your property key strings are close to sequential, but likely to be used in something like 'ascending-meets-near-random-remains-near-random' order; that is, you use each property close to 'in order' initially, but then access depends on what the user does (likely to be true in the average case) you are also providing a near-optimal usage pattern for a binary tree (balanced or unbalanced) or a repeatedly-sorted-list-or-array-with-binary-search, which are the two most likely approaches Microsoft developers would (one imagines...) adopt for implementing the property list. What you can't know without inside knowledge is whether the sequence L"_A", L"_a", L"_B", L"_b" ... is better or worse than L"_A", L"_B", ... L"_a", L"_b" ... so unless you have that inside knowledge, hex is best. |
| |
| [Direct link to this article] |
| |
|
| |
|
| |
5th October 2011 On The 'Synchronicity' of SendMessage() And Related Concerns
[Read]
[-]
|
| |
| Under normal circumstances when you use SendMessage() you can assume that execution of the calling- (usually main) thread will immediately stop and the window procedure associated with the HWND you pass as first parameter will be invoked with the message value and WPARAM/LPARAM you pass. But is that what actually happens? In fact (well, to the best of my knowledge at least) that might be the effect but there is in fact a queue for sent messages as well as posted messages. Be careful here though... this doesn't mean that sending a message is in any way like posting a message. |
| |
| The principal difference is that when you call SendMessage() your calling thread does block until the function returns, which only happens after the window procedure has processed the message and returned; whereas, with PostMessage(), your thread continues and the window procedure will only receive and process the message; there is no return path to your original call. Other than that though, the way both messaging approaches work is actually fairly similar. If you think about it, that must be the case because window handles are system-global. And notwithstanding UAC or any other protection issues, in theory not only can any thread call SendMessage() on your window, but any process can as well. And that means that at the point you send your message, there could already be several others waiting to be processed. So, clearly, the method Windows employs must be based on a queue otherwise you'd have messages being processed in something other than FIFO order, which for what should be obvious reasons can't be allowed to happen. |
| |
| So, when your application's message loop calls GetMessage(), or any call you make outside the main message loop to PeekMessage(), your thread actually blocks if there are queued sent messages. Without being part of the Microsoft team involved in the scheduling parts of the OS, I can't say whether all pending sent messages are handled at once, or whether certain posted messages are perhaps considered 'harmless' and allowed to go through the message loop, or even whether and how each queued sent message might be distributed across different processes, but it's dangerous to think in those terms anyway, because without the actual knowledge it's just speculation. Always respect the way things are specified when you write code, even if it seems obvious that something could be gained by not doing so. |
| |
| But note that I said respect the specification, not follow it blindly, because there is some performance leverage you can reasonably gain in some specific cases, without having an adverse effect on the smooth running of the OS. Why would bypassing the sent messages queue interfere with the smooth running of the OS? Well, for instance, if every running process were to call a system window procedure directly, let's say with a WM_SETREDRAW message - which you can reasonably assume will at most set or reset a flag and invalidate the window - then the nice, ordered, FIFO operation that would be guaranteed across the OS by the sending messages paradigm (i.e. queued sent messages) would be lost, and what you'd have left would be a free-for-all. FIAO(?) |
| |
| Understanding that principle is enlightening when you apply it to how Windows is supposed to work, and how your message loop actually plays a direct role. (There are actually some much more advanced things the message retrieval functions do that are beyond the scope of this piece.) But is it safe to 'play dirty' and gain some performance for your application by calling window procedures directly instead of sending them messages? Well, the answer is sometimes, which is why the rule is always to stick to specification. |
| |
| OK, so when is it safe to bypass the sent messages queue and directly call a window procedure? The answer is (a) provided you don't do it too often and (b) provided your call doesn't have any system-wide side-effects of any kind whatsoever. And you have to be as certain of that as you can before you decide to do it. For instance, the example I gave before of calling a system window procedure directly with WM_SETREDRAW is likely to be safe, because it doesn't need to do anything other than at the window level. If you do it with WM_SETTEXT, however, you will not only be competing with the entire system if any other application happens to be doing the same thing, you will also be competing with the sent messages queue. And the sent mesages queue might be about to call the same window procedure you're tying to call, having set some other context with some other stack-based string that the target wndproc's handler for WM_SETTEXT is supposed to change the text to, and at that point the danger is your call will be lost because you're doing it 'out-of-routine'; you might interrupt the system briefly with your call, but the system (perhaps at the direction of another application's message loop) will resume what it was doing and call the wndproc again, meaning your attempt to change the text of something is 'overridden'. |
| |
| There are many, many pitfalls like this that ignoring the rules could result in. And I suspect Windows probably incorporates some 'backcompat', perhaps even some 'idiot-proofing' to protect against some of those pitfalls, so the potential for entangling your application and every other on the system, perhaps including the system, in a multi-level 'web of mis-timings and mis-firings' is (in reality) huge. |
| |
| So here are my rules for safely gaining a slight performance advantage in some cases by bypassing the sent messages queue and directly calling window procedures: |
| |
| 1. Make sure the call you make has no side-effects at the system level whatsoever. If you can't establish something close to that by simple application of logic and making better-than-reasonable assumptions, establish it with a throw-away prototype or unit test and try it out on all flavours of Windows under varying conditions. |
| |
| 2. Don't do it in all cases where you could. Why not? Because if, given a particular case where you could do this, your decision as to whether to do so or not is random - which is the effect of my first statement - then it follows that if you do produce a 'clash' of some sort, it becomes extremely unlikely that you'll crash the OS completely. I could prove that with some maths, but it's easier to say it in words. Once you introduce randomness into your decision-making, and you then apply the assumption that all- or at least most other applications do the same, then it follows statistically (central limit theorem given sufficiently large number of samples, or any other proof you like...) that what I'm calling 'clashes' - i.e. where out-of-order sequence produces unexpected or undefined results - will follow a normal distribution as opposed to any other. And that means that you have to be within (n) standard deviations of the mean for catastrophic consequences to occur. The actual value of (n) will obviously vary, but having this scenario is a lot better than an entirely random pattern (where all applications 'cheat' like this at every opportunity), in which some O(1) percentage of every (n) potential cases become actual cases. |
| |
| 3. Only do it where it will actually improve performance, not just for the sake of doing it at all, which basically boils down to: only do it for small operations (such as a WM_SETREDRAW), where the overhead you get from having to wait for an application or system process to do something that results in a queued sent message to be processed, coupled with all the additional function calls involved (including the actual call to SendMessage()) is, while still tiny, likely to be more than a quick call directly to the wndproc (or even DefWindowProc() if that's where it ends up anyway). |
| |
| And that's it. Just for a bit of further info, if you look at the msdn documentation for MsgWaitForMultipleObjects() for instance, you'll see that one of the amalgamated flags is QS_ALLEVENTS, and that another is QS_ALLINPUT. Well if your program is 'idle' then surely only 'events' should warrant a response? But if you assume (i.e. accept) QS_ALLINPUT (which is == QS_ALLEVENTS|QS_SENDMESSAGE) you have to realise that isn't true, so there's one way of proving that GetMessage() and PeekMessage() do more than check the posted messages queue and return message info if a posted message is in the queue. Not that you need proof, it's documented all over msdn anyway, and common knowledge for any seasoned win32 programmer. |
| |
| [Direct link to this article] |
| |
|
| |
|
| |
| |
| |
| |
| |
| | |