There isn't anything inherently wrong with your model. However, the model makes a pretty strong assumption regarding the relationship between variables. Consider the alternative model
$$Y = \beta_0 + \beta_1 T + \beta_2 T M + \beta_3 M$$
Like you wrote out, your model assumes that $\beta_3 = 0$. As long as that is true, then your model will give you unbiased results. However, if $\beta_3 \ne 0$ then your model will produce misleading results because you are placing a modeling restriction that the data does not adhere to. Basically, your model is unable to capture the true density of the data generating mechanism.
The advantage of the model estimating $\beta_3$ is that we don't need to assume $\beta_3 = 0$. The model will provide an unbiased answer whether $\beta_3$ is equal to zero or not.
Below is a simple simulation study written using Python 3.5.2 demonstrating this.
Scenario 1 $\beta_3 = 0$
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
# Setting simulation parameters
n = 1000000
sample_size = 500
runs = 1000
# Generating data
df = pd.DataFrame()
df['M'] = np.random.binomial(n=1, p=0.4, size=n)
df['T'] = np.random.binomial(n=1, p=0.5, size=n)
df['Yt1'] = 120 + 5*1 + df['M']*10*1 + np.random.normal(size=n)
df['Yt0'] = 120 + 5*0 + df['M']*0*1 + np.random.normal(size=n)
df['Y'] = np.where(df['T'] == 1, df['Yt1'], df['Yt0'])
# Calculating true values based on full population
truth_m0 = 5
truth_m1 = 5 + 10
# Objects to store results
bias_f1_m0 = []
bias_f1_m1 = []
bias_f2_m0 = []
bias_f2_m1 = []
for i in range(runs):
dfs = df.sample(n=sample_size, replace=False)
# Assuming beta_3 == 0
m1 = smf.ols('Y ~ T + T:M', data=dfs).fit()
bias_f1_m0.append(m1.params['T'] - truth_m0)
bias_f1_m1.append(m1.params['T'] + m1.params['T:M'] - truth_m1)
# No assumption for beta_3
m2 = smf.ols('Y ~ T + T:M + M', data=dfs).fit()
bias_f2_m0.append(m2.params['T'] - truth_m0)
bias_f2_m1.append(m2.params['T'] + m2.params['T:M'] - truth_m1)
For the model excluding $\beta_3$, bias was 0.005 for both $M=0$ and $M=1$ strata. For the model including $\beta_3$, the bias was 0.002 for $M=0$ and 0.009 for $M=1$ strata.
Scenario 2 $\beta_3 \ne 0$
For scenario 2, I used the same code but changed the data generating mechanism to include a $\beta_3$ term in the model. Below is the data generating mechanism
# Generating other data
df = pd.DataFrame()
df['M'] = np.random.binomial(n=1, p=0.4, size=n)
df['T'] = np.random.binomial(n=1, p=0.5, size=n)
df['Yt1'] = 120 + 5*1 + df['M']*10*1 + df['M']*7 + np.random.normal(size=n)
df['Yt0'] = 120 + 5*0 + df['M']*0*1 + df['M']*7 + np.random.normal(size=n)
df['Y'] = np.where(df['T'] == 1, df['Yt1'], df['Yt0'])
For the model excluding $\beta_3$, bias for the $M=0$ strata was -2.798 and 4.198 for the $M=1$ strata. The model including $\beta_3$, was unbiased ($M=1$: 0.001, $M=1$: -0.008)
As these two scenarios demonstrate, excluding $\beta_3 M$ from your model is fine as long as the assumption that $\beta_3 = 0$ is true. If it is not, your model will potentially give biased results