Time Travel: 2014
Chapter 152 Eve Kali’s confusion (continued)
Chapter 152 Eve Carly’s Confusion (continued)
It is precisely because of the above reasons, no matter which time and space it is.
Many countries around the world are exploring text.
The progress of recording methods in human society is, to some extent, reflected in the different condensed forms of texts.
Text exploration is also an extremely important task for some large enterprises.
The development of text summarization determines the launch of one product after another.
The exploration of texts not only greatly promotes the in-depth study of literature, but also greatly promotes the advancement of science and technology.
All in all, it’s never too much to put some effort into your text summaries.
After all, this is Lin Hui’s first step in the field of technology.
Speaking of the confusion Eve Carly encountered.
Lin Hui did not expect that Eve Carly's confusion mainly focused on the construction of the LH text summary accuracy measurement model.
Lin Hui remembered that he had explained the model construction clearly enough at that time.
When building a model, you must first use a language model to evaluate the fluency of the language generated by the algorithm, then use a similarity model to evaluate the semantic correlation between the text and the abstract, and finally, in order to effectively evaluate the degree of recurrence of entities and proprietary words, introduce the original text information quantitative model to evaluate.
Although in order to prevent the disciples from starving to death, Lin Hui deliberately omitted some trivial steps between these steps.
But this kind of thing is to scientific researchers what trenches are to tanks.
Although there will be some impact, it should not be a big problem.
Really publish all the technical details.
That can’t be called announcing technical routes, it’s called compiling textbooks.
Regarding Lin Hui’s mention of “using language models to evaluate the fluency of algorithm-generated language”
Eve Carly was confused about how Lin Hui obtained the corpus for language model training?
This problem really won't be a problem in the next few years.
Because there are a lot of ready-made corpora.
For Simplified Chinese only, there are several resources such as the National Language Commission Modern Chinese Corpus, Beijing University Corpus, and Corpus Linguistics Online.
However, at this time and space node, Lin Hui obviously cannot tell other researchers that he is using a ready-made prediction library.
After all, some ready-made corpora were basically released around 16 years ago.
Nonetheless, the question of how to explain the source of the corpus does not trouble Lin Hui.
In fact, even if there is no ready-made corpus, it is not too complicated to build a usable corpus that can tune/teach early generative summarization algorithms.
The simplest way - text corpus can be automatically constructed with the help of the Internet.
When building a corpus using this method, the user only needs to provide the required text category system.
Then a large number of websites are collected from the Internet, and the content hierarchy of the website and the web content information corresponding to each keyword are extracted and analyzed.
Filter out the texts needed by users from each website as candidate corpus.
This process is actually not complicated, and is somewhat similar to the process of crawling a web page.
What is more difficult is how to denoise the corpus formed by this method.
But this is not a problem for Lin Hui.
It is only necessary to merge candidate corpora that match the same text category from multiple websites into a candidate corpus for each category.
Then denoising the text under each category in the candidate corpus can improve the quality of the corpus.
After denoising is completed, the corpus can be output.
Although this process is still not easy to implement.
But in the academic field, except for a few isolated experts who like to get into trouble.
In most cases, as long as the logic is consistent, no one will give in.
Apart from being curious about how Lin Hui constructed the corpus.
Relates to "Assessing semantic relatedness between text and abstracts using similarity models"
Eve Kali is curious about what kind of similarity model Lin Hui uses to evaluate the semantic correlation between text summaries and summaries.
Well, this question involves the core of the text summary accuracy model developed by Lin Hui.
The answer to this question cannot be explained in a few words.
You'll Also Like
-
Naruto Live: The Death of Naruto in the Beginning Movie
Chapter 829 2 hours ago -
Anime: Saiyans eat the world of food
Chapter 285 2 hours ago -
Naruto: I traveled through time with the Samsara Eye.
Chapter 534 3 hours ago -
Siheyuan: Shazhu's Brother Shadan
Chapter 183 3 hours ago -
From Pirates to Sweeping the Worlds
Chapter 796 3 hours ago -
After I became a real young master, I became famous for my bad behavior
Chapter 627 3 hours ago -
Opening: Minato became my brother
Chapter 108 3 hours ago -
Genshin Impact: Xiao Tiandao becomes a game character
Chapter 190 3 hours ago -
Naruto: This Uchiha is a businessman.
Chapter 209 3 hours ago -
Pirate: I am Kozuki's nemesis and Nika's enemy!
Chapter 205 3 hours ago