For the first question, I understand that the filters are normalized in energy. Suppose that we first consider uniform or box filters of size $(2L+1)\times(2L+1)$ and unit amplitude. Their Frobenius norm are $2L+1$. If you divide the amplitude by $(2L+1)^2$, then the Frobenius not of all filters will be exactly one.
For other filters, such a scale normalization will result in either exact or approximate unit energy (I may add simulation on Gaussians later).
I am note sure to understand the second question. You may find useful explanations on the SURF (Speeded-Up Robust Features) in the paper: An Analysis of the SURF Method, IPOL Journal,Image Processing On Line (Edouard Oyallon, Julien Rabin)
The SURF method (Speeded Up Robust Features) is a fast and robust
algorithm for local, similarity invariant representation and
comparison of images. Similarly to many other local descriptor-based
approaches, interest points of a given image are defined as salient
features from a scale-invariant representation. Such a multiple-scale
analysis is provided by the convolution of the initial image with
discrete kernels at several scales (box filters). The second step
consists in building orientation invariant descriptors, by using local
gradient statistics (intensity and orientation). The main interest of
the SURF approach lies in its fast computation of operators using box
filters, thus enabling real-time applications such as tracking and
object recognition. The SURF framework described in this paper is
based on the PhD thesis of H. Bay [ETH Zurich, 2009], and more
specifically on the paper co-written by H. Bay, A. Ess, T. Tuytelaars
and L. Van Gool [Computer Vision and Image Understanding, 110 (2008),
pp. 346–359]. An implementation is proposed and used to illustrate the
approach for image matching. A short comparison with a
state-of-the-art approach is also presented, the SIFT algorithm of D.
Lowe [International Journal of Computer Vision, 60 (2004), pp.
91–110], with which SURF shares a lot in common.