Many people rely on Google Search to find answers to their grammar questions. While Google already has features like “Did you mean” to correct simple typos, it struggles with more complex grammatical errors. Developing new Google Search features that can handle these errors quickly and accurately is a challenge.
One common approach to grammatical error correction (GEC) is to use autoregressive Transformer models, which treat GEC as a translation problem. However, these models can be slow because the generation process cannot be parallelized. Another option is to treat GEC as a text editing problem, which would decrease latency. In their study “EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start,” Eric Malmi and Jakub Adamek describe a new text-editing model called EdiT5. This model is based on the T5 Transformer encoder-decoder architecture and powers the new grammar check feature in Google Search.
The EdiT5 model reduces decoding steps by treating GEC as a text editing problem. It uses an encoder to determine which input tokens to keep or delete, and then generates a draft output. The draft is optionally reordered using a non-autoregressive pointer network. Finally, a decoder outputs any tokens that were missing from the draft and places them in the correct locations to create a grammatically correct output. By reducing the decoder to a single layer and increasing the size of the encoder, the latency of the model is significantly decreased.
To ensure accuracy, the EdiT5 model was tested on the public BEA grammatical error correction benchmark. The large EdiT5 model with 391M parameters achieved a higher accuracy score and a 9x speedup compared to the T5 base model with 248M parameters. The mean latency of the EdiT5 model was only 4.1 milliseconds.
In order to generate accurate grammatical corrections, the researchers utilized a technique called hard distillation. They trained a teacher language model, which was then used to generate training data for the student EdiT5 model. To improve the quality of the training data, iterations of self-training and iterative refinement were performed.
The improved GEC data was used to train two EdiT5-based models: a grammatical error correction model and a grammaticality classifier. When the grammar check feature is used in Google Search, the query is first corrected using the correction model and then checked using the classifier model. This process ensures that only correct corrections are shown to the user.
In conclusion, the new grammar check feature in Google Search is powered by the efficient EdiT5 model. Users can check the grammaticality of their queries by including the phrase “grammar check” in their search. The development of this feature was made possible by the contributions of the research team and the use of advanced techniques like hard distillation.