Take two scalars $a_1$ and $a_2$ and stack horizontally $(a_1,a_2)$ to get $1 \times 2$
row vector.
Take two scalars $a_1$ and $a_2$ and stack vertically $\begin{bmatrix}a_1\\a_2 \end{bmatrix} =(a_1,a_2)^\top$ to get $2 \times 1$ column vector.
Generalise this by letting 2 be $n$. Adopt the convention that any vector $a$ is a column vector so it is dimension $n \times 1$ and therefore $a^\top$ is row vector $1 \times n$. The essential thing is that only one of the dimesions is larger than 1 and which one determines whether you have a row or a column. However the adopted convention makes it explicit whether a vector is row or column.
Now $a^\top$ and $b^\top$ are row vectors according to the convention. Let there dimensions be $1 \times n$ and $1 \times q$. If you stack these horizontally you get
$c^\top := (a^\top b^\top) = (a_1,...,a_n,b_1,...,b_q)$
her $c$ is written transposed because I know I get a row vector and any row vector should according to the convention adopted be written as a transposed column vector. It is a row so dimension is $1 \times ...$ and $...$ must be $n+q$. I can transpose to get
$$c = (a^\top b^\top)^\top =\begin{bmatrix}a \\ b \end{bmatrix} = \begin{bmatrix}a_1\\ \vdots \\ a_n \\ b_1 \\ \vdots \\ b_q \end{bmatrix}$$
The only thing you are using is that if you have 8 numbers and 7 numbers and you stack the 8 numbers in a column and the seven numbers in a column and then stack the columns themselves then you et the same thing as if you had stacked all 15 numbers (of course respecting the order).
More generally you can read about rules for matrix partitioning.
Partitioned Matrices
Partitioning is useful when applied to large matrices because manipulations can be carried out on the smaller blocks. More importantly, when one is multiplying partitioned matrices, the basic rule can be applied to the blocks as though they were single elements. The only restriction is that the blocks must be conformable for multiplication.
A well known example from statistics is arrived at when you have a $n \times k$ design matrix $\boldsymbol X$. You can partition this into $n$ stacked row vectors $\mathbf x_i^\top$ with $i=1,...,n$ so you get
$$\boldsymbol X = \begin{bmatrix}\mathbf x_1^\top \\ \mathbf x_2^\top \\ \vdots \\ \mathbf x_n^\top\end{bmatrix}$$
it is then easy to see that
$$\boldsymbol X^\top \boldsymbol X =\begin{bmatrix}\mathbf x_1^\top \\ \mathbf x_2^\top \\ \vdots \\ \mathbf x_n^\top\end{bmatrix}^\top \begin{bmatrix}\mathbf x_1^\top \\ \mathbf x_2^\top \\ \vdots \\ \mathbf x_n^\top\end{bmatrix} = \begin{bmatrix}\mathbf x_1 & \mathbf x_2 & \dots &\mathbf x_n\end{bmatrix} \begin{bmatrix}\mathbf x_1^\top \\ \mathbf x_2^\top \\ \vdots \\ \mathbf x_n^\top\end{bmatrix}$$
and now comes the superimportant psychological trick - think of the vectors $\mathbf x_i$ and $\mathbf x_i^\top$ as scalars such that you see $\begin{bmatrix}\mathbf x_1 & \mathbf x_2 & \dots &\mathbf x_n\end{bmatrix}$ as a $1\times n$ row (although it is not) and you see $\begin{bmatrix}\mathbf x_1^\top \\ \mathbf x_2^\top \\ \vdots \\ \mathbf x_n^\top\end{bmatrix}$ as a $n \times 1$ column (although it is not). Then apply standard rule of matrix multiplication to get
$$\begin{bmatrix}\mathbf x_1 & \mathbf x_2 & \dots &\mathbf x_n\end{bmatrix} \begin{bmatrix}\mathbf x_1^\top \\ \mathbf x_2^\top \\ \vdots \\ \mathbf x_n^\top\end{bmatrix} = \sum_i \mathbf x_i \mathbf x_i^\top,$$
this is just one example you probably need to go through more examples before you get the aha-experience, so google "matrix partitioning" or "introduction to matrix partitioning".