Portable devices come with different limitations in user interaction like limited display size, small keyboard, and different sorts of input and output capabilities. With the advance of speech recognition and speech synthesis technologies, their complementary use becomes attractive for mobile devices in order to implement real multimodal user interaction. However, current systems and formats do not sufficiently integrate advanced multimodal interactions. We introduce an advanced generic multimodal interaction and rendering system (MIRS) dedicated for mobile devices. MIRS incorporates efficient processing of XML specification languages for limited, mobile devices and comes with the XML-based dialog and interface specification language (DISL). DISL can be considered as an UIML subset, which is enhanced by the means of state-oriented dialog specifications. The dialog specification is based on ODSN (object oriented dialog specification notation), which has been introduced to define user interface control by means of interaction states with transition rules.