We introduce a globally normalized transition-based neural network model that
achieves state-of-the-art part-of-speech tagging, dependency parsing and
sentence compression results. Our model is a simple feed-forward neural network
that operates on a task-specific transition system, yet achieves comparable or
better accuracies than recurrent models. We discuss the importance of global as
opposed to local normalization: a key insight is that the label bias problem
implies that globally normalized models can be strictly more expressive than
locally normalized models.