How to determine similarity between source and mutliple strings in python?
Assuming I have the following source string:
Humpty dumpty <span id="1">sat</span> on a wall, humpty dumpty had a great
fall. All of <span id="two">the kings</span> horses and all the kings men.
and a few other strings in a list, each string is separated by a new line:
Humpty dumpty sat on a wall, humpty dumpty had a great fall. All of the
kings horses and all the kings men.
Humpty dumpty sat on the wall, all of the kings horses and all the kings men.
There is a humpty dumpty who had sat on the wall, and all of the kings
horses and all the kings men.
Humpty dumpty sat on some wall, humpty dumpty had a great fall. All of the
kings horses and all the kings men couldn't put him together again.
Humpty dumpty this is a completely related sentence.
I want to be able to starting with the target string, find out which of
the "other strings in the list" that most closely match the source string
using python. Is there some best way to come up with some "score" in the
comparison between the source string and target string pairs and based on
some criteria be able to determine which string is most closely matched to
the source string? (In this case, the string most similar should be the
1st string, as it is the source string without any of the "".
No comments:
Post a Comment