As the user agent is completely client controlled, it is a good thing to pay attention to it, as it can be used in various attacks.
Allowed Characters in User Agent
what characters are/are not allowed in a proper user-agent string and do some characters need to be in a certain order?
@Stephen Ostermiller already linked to RFC2616. It was updated in RFC7231, but nothing really changed:
User-Agent = product *( RWS ( product / comment ) )
[...]
product = token ["/" product-version]
product-version = token
It does however link to RFC7230 to specify how comments may look:
comment = "(" *( ctext / quoted-pair / comment ) ")"
ctext = HTAB / SP / %x21-27 / %x2A-5B / %x5D-7E / obs-text
[...]
quoted-pair = "" ( HTAB / SP / VCHAR / obs-text )
This is a fancy way of saying that pretty much all characters are allowed in the comment part of the user agent. ()\ are the only ones that cannot be placed freely.
token is a bit more restrictive, as can be seen in RFC7230. It doesn't allow (),/:;<=>?@[\]{}.
How to filter user agents
what are the syntax rules in determining that a user agent string is 100% legitimate and not some hacker-crafted string?
As user agents can contain pretty much any character, reasonable filtering is impossible. And this isn't even considering that not all clients will follow the RFC (filtering should not be very restrictive, for usability reasons).
Filtering user input is a good first line of defense, but it should never be your only one, as it is extremely difficult to prevent all attacks with input filtering.
You need secure coding practices, and you need to implement proper defenses against common attacks. So if the user agent is echoed, you need to encode it to prevent XSS. If the user agent is stored in the database, you need to use prepared statements to defend against SQL injection. If you pass something to the PHP function unserialize, you need to keep object injection in mind (I'm mentioning it because the O:21 looks a bit as it might have been a test). And so on.
If you want an additional line of defense, you might think about using a WAF such as mod_security.