This seems like a dumb question, but does PMML have a way to represent a data set? And if not, why not? There's detailed support for defining the name, type, and legal values of each feature that will appear in the data, as well as which of those features to use in the trained model, and what transformations to apply to the values of those features before using them.
However, I haven't been able to find an example, or any discussion, of representing raw data (either training sets or test data) in PMML. Or for that matter representing data in XML at all. PMML in Action (pp. 7-8) explicitly says "...raw input data is usually formatted as a flat file in which columns represent data fields and rows records or transactions." The sample datasets at the DMG website are all in .csv format.
This omission seems particularly strange, since PMML has a model format for support vector machines, which are (almost) just weighted sets of examples. So it would have been only the tiniest bit of effort to define format for training sets and test sets.