Time Travel: 2014

Chapter 201 Communication seeking common ground while reserving differences

Dimensional explosion is also called dimensional disaster.

Some mystics also like to call it the Curse of Dimension.

The term dimensionality explosion was first coined by Richard Bellman when he was considering optimization problems.

The term was originally used to describe various problem scenarios encountered when analyzing and organizing high-dimensional spaces (usually hundreds or thousands of dimensions) as the dimensions of mathematical space increase due to the exponential increase in volume.

When an additional dimension is added to a mathematical space, its volume grows exponentially.

Such difficulties will not be encountered in low-dimensional space.

For example, physical space rarely encounters such problems. After all, physics is usually only modeled in three dimensions.

It sounds amazing, although it is physically difficult to encounter the problem of dimension explosion.

But dimensionality explosion is common in natural language processing and machine learning.

Any amount of information in this field can easily break through the third dimension.

In fact, the phenomenon of dimensionality explosion has been mentioned in many fields, such as sampling, combinatorial mathematics, machine learning and data mining.

The common feature of these problems is that when the dimensionality increases, the volume of the space increases too quickly, so the available data becomes very sparse.

In a high-dimensional space, when all data becomes very sparse and dissimilar from many angles, the commonly used data organization strategies become extremely inefficient.

In fact, Eve Carley and her previous team used text similarity measurement based on network knowledge.

If all web pages are directly analyzed, it will often lead to sparse knowledge content and difficult calculation.

In fact, this situation is caused by the explosion of dimensions.

Eve Carley is well aware that the current method of using vectors to introduce semantic text similarity will lead to a dimensionality explosion.

Why did Lin Hui suddenly ask her how she saw introducing vectors into calculating semantic text similarity?

Could it be that Lin Hui really has any way to properly deal with the problem of dimension explosion?

However, the dimension explosion in the direction of machine learning and natural language processing is not so easy to solve.

Or does Lin Hui plan to simply bypass vectors to measure semantic text similarity?

Although Eve Carly didn’t know why Lin Hui suddenly asked this.

But how could Eve Carly give up this opportunity that might lead to Lin Hui’s advice?

Eve Kali first explained to Lin Hui the role that vectors usually play when calculating semantic text similarity in the West.

Then Eve Carly officially began to answer the questions Lin Hui asked her before:

“The introduction of vectors will make it easier for machines to process semantic text information.

If we do not introduce vectors, we have few options when dealing with semantic text similarity.

And without introducing vectors, the solution we choose to calculate semantic text similarity is more or less LOW.

For example, string-based methods, which compare raw text.

It mainly includes edit distance, longest common subsequence, N-Gram similarity, etc. for measurement.

Take edit distance, for example, which measures the similarity between two texts based on the minimum number of edit operations required to convert one text into the other.

The editing operations defined by this algorithm include three types: add, delete, and replace.

The longest common subseries is based on...

This set of metrics is even a bit like the Microsoft Word format for measuring generality.

Although the string-based method is simple in principle and easy to implement.

But this method does not take into account the meaning of words and the interrelationship between words.

Issues involving synonyms, polysemy, etc. cannot be dealt with.

String-based methods are rarely used alone to calculate text similarity.

Instead, the calculation results of these methods are integrated into more complex methods as features that characterize text.

In addition to this method, there are..."

Lin Hui also knew a little bit about these things.

He just wanted to determine the progress of the research on this time and space through Eve Carly's mouth.

Measuring semantic text similarity based on string editing operations and longest common subseries is indeed a bit low-end.

But low-end does not mean useless, so this algorithm cannot be said to be worthless.

Imagine if there was a breakthrough in text recognition.

If the judgment method of defining text similarity is combined with the text recognition algorithm.

Instead, the method of determining text similarity based on strings is the most appropriate.

After all, this string-based discrimination method is the closest to the intuitive logical form of computer vision.

In fact, text recognition algorithms are also very common technology in later generations.

Even the screenshot tool of any chat software can be well qualified for the task of text recognition.

But in this time and space, there are even some software that specialize in text recognition as a gimmick.

The actual work performed is just scanning the document and converting it into PDF.

A batch that is inefficient when it comes to actual text recognition.

Lin Hui felt as if he had stumbled upon another business opportunity.

Although I have discovered a business opportunity, it is not suitable to do it now.

After all, text recognition is related to the field of computer vision.

The so-called computer vision is to let the machine see things.

This is considered a field of artificial intelligence.

Research in this area enables computers and systems to derive meaningful information from images, videos, and other visual inputs.

The machine takes action or provides recommendations based on this information.

If artificial intelligence gives computers the ability to think.

Then computer vision is the ability to discover, observe and understand.

Although computer vision cannot be said to be very complicated.

But at least the threshold is much higher than natural language processing.

Obviously it is not suitable for Lin Ash to be blended now.

But Lin Hui was patient, and Lin Hui silently kept this matter in his heart.

Lin Hui felt that he should not be too short-sighted.

Some things seem useless now.

It doesn’t necessarily mean that the long-term perspective is useless.

Thinking of this, Lin Hui suddenly felt very lucky.

After rebirth, the experience of his previous life made him more comfortable.

On the other hand, what benefited him from rebirth was a change in his thinking.

When it comes to many things, Lin Hui will subconsciously consider the long-term value.

You may even inadvertently consider what will happen ten or twenty years from now.

There’s this long-term way of thinking.

Lin Hui felt that given time, he would be able to reach a height that few others could reach.

But these ideas are not enough for outsiders.

Although there are some differences with Eve Carly on the method of evaluating text similarity based on strings.

But Lin Hui didn't show it, and academic exchanges were often just about seeking common ground while reserving differences.

Tap the screen to use advanced tools Tip: You can use left and right keyboard keys to browse between chapters.

You'll Also Like