Abstract
Driven by deep neural networks and large scale datasets, scene text detection
methods have progressed substantially over the past years, continuously
refreshing the performance records on various standard benchmarks. However,
limited by the representations (axis-aligned rectangles, rotated rectangles or
quadrangles) adopted to describe text, existing methods may fall short when
dealing with much more free-form text instances, such as curved text, which are
actually very common in real-world scenarios. To tackle this problem, we
propose a more flexible representation for scene text, termed as TextSnake,
which is able to effectively represent text instances in horizontal, oriented
and curved forms. In TextSnake, a text instance is described as a sequence of
ordered, overlapping disks centered at symmetric axes, each of which is
associated with potentially variable radius and orientation. Such geometry
attributes are estimated via a Fully Convolutional Network (FCN) model. In
experiments, the text detector based on TextSnake achieves state-of-the-art or
comparable performance on Total-Text and SCUT-CTW1500, the two newly published
benchmarks with special emphasis on curved text in natural images, as well as
the widely-used datasets ICDAR 2015 and MSRA-TD500. Specifically, TextSnake
outperforms the baseline on Total-Text by more than 40\% in F-measure.
Links and resources
Tags