Yes and no. The smallest “unit” for which the peak value is determined and the maximum possible gain factor is computed is a “frame” (typically 500 milliseconds). You can think of it like this: Cut the input audio into separate frames, apply “traditional” normalization individually on each frame and finally join together the normalized frames again.
However, simply applying the maximum possible gain factor to each frame individually could result in very unsteady gain factor “jumps” between neighboring frames, which sounds bad. That’s why a Gaussian smoothing kernel is applied in order to ensure a smooth and steady adaption of the gain factors. The size of the smoothing kernel is expressed in frames. The default is 31, so it will consider the 15 preceding and the 15 subsequent frames around the current one. The “sigma” for the Gaussian filter is computed automatically, based on the selected kernel size. The exact formula used is 1/3 + (((filterSize / 2) - 1) / 3), which is based on the so-called “3-sigma rule”.
VST only allows the plug-in to specify the number of parameters. Also, for each parameter we can define a name (max. 8 characters). But that’s it! The range of each parameter is always 0.0 to 1.0 and it needs to be mapped to something useful inside the plug-in. How the parameters are presented graphically to user totally depends on the individual application! The screenshots in the manual were made in Acoustica. The graphical interface shown by Audacity is a bit more “stripped-down” compared to what Acoustica offers
Alternatively, it’s also possible to write your own VST-GUI from the scratch, which then completely replaces the application’s “native” interface. But I didn’t go that route…
Not quite sure what you mean with that 
If you look at this chart, it shows the “raw” max. gain factors determined for each frame (blue), the minimum filter values (green) as well as the final smoothed values (orange).
Only the final smoothed values will be applied…
Well, as I have mentioned before, ideally we should apply a psycho-acoustic model. That is: We transform everything into the frequency domain and weight each frequency by how sensitive the human ear is for that particular frequency. Though this would be much more complex and much slower compared to the current approach. And, in my experience, the current approach works pretty well. Last but not least, there already is an optional “RMS based” mode available.
Nonetheless, the current code is written in a way that the function, which determines the “frame local” gain factor, can be exchanged easily - like it’s already done with “RMS based” mode. More modes to be added in future versions…
With such “extreme” synthetic examples the limitations of the current approach are apparent. But with “real world” recordings it is not that much of an issue.
As long as only a single gain factor is computed for the entire track, or even the entire album, we still have the problem that the maximum gain that can be applied (without clipping) is still restricted by the “loudest” peak. That means: Only a single extraordinary peak could prevent the entire track, or even the entire album, from being amplified adequately. And then the affected track/album will sound MUCH quieter than other tracks/album.
(Assuming we do not apply a “compression” filter to destroy the peaks beforehand, of course)
It is supposed to work under WinXP, if that is still relevant these days.
You probably know this, but I need to ask anyway: Did you set the VST_PATH environment variable correctly and did you make Audacity re-scan for new VST plug-in’s after that?