UK WEB HOSTING FORUM FOR DISCUSSION ON WEB HOSTING SERVICE AND SUPPORT
LINUX HOSTING WINDOWS HOSTING PACKAGES SHOPPING CART OSCOMMERCE ZEN CART AGORA
ECOMMERCE HOSTING ASP MSSQL FRONTPAGE HOSTING PHP MYSQL HOSTING DISCUSSION FORUM
CPANEL RESELLER HOSTING DEDICATED SERVER VPS HOSTING PLESK VIRTUOZZO
Quick Search
Your forum announcement here!

  UK Web Hosting | Dedicated Server Windows and Linux VPS Forum > Web Hosting and Domains > PHP Hosting

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 15-12-2006, 20:23
Junior Member
 
Join Date: Dec 2006
Posts: 1
Exclamation Help Comparing Strings of RSS Feed Headlines using similar_text()

Hello,

I have dumped a number of RSS feeds into a mySQL table. These feeds are news RSS feeds (AP, Reuters, ABC, CNN, CBS news, digg, slashdot, etc)

I am having difficulty writing a script that displays the content of each of the feeds WITHOUT showing duplicate articles. Instead I want duplicate articles to be listed under the initial article as "Related Headlines" -- much like Google News does. The question is: How to do this?

Example:

le: Rumsfeld Resigns from Iraq.
Source: Yahoo! News Date: Dec 3, 2006
Article Body: Today, Donald Rumsfeld has resigned from Iraq. Millions of Iraqis are partying in the streets.

Related Headlines: "Bush Fires Rumsfeld" - ABC News
"Rumsfeld is No More" - CNN News


Get the idea? I am using similar_text() to compare the headlines of each article. If the similarity is >70% then the compared headline is to be removed from the array so that it isn't displayed as an independent article but will be displayed as a Related Headline for that article.

So here is my code attempt:




Code:
<?php

$query = "select id, headline, intro, body, author, date, source, vote, xmlsitetype from anews1 where xmlsitetype = 0 ORDER BY date DESC LIMIT 10";

$result = mysql_query($query);    
   

	// Go through each news item from the database table

while ($rownews = mysql_fetch_assoc($result)){
						

			  // find similar_text and flag duplicate stories to be displayed as Related Headlines
			  $dupquery = "SELECT id, headline, intro, body, author, date, source, vote, xmlsitetype from anews1";

			  echo $dupquery;
			  $dupresult = mysql_query($dupquery);
				
			  // strip unneeded characters like quotes out of headlines to clean them up:
			  $cleanheadlines = array("\"", "'");

			  while ($dupcheck = mysql_fetch_assoc($dupresult)){
				   foreach ($dupcheck as $key => $dupcheck[headline]){
	
					  $str1 = str_replace($cleanheadlines, "", addslashes($rownews[headline]));
					  $str2 = str_replace($cleanheadlines, "", addslashes($dupcheck[headline]));
					  echo "<p><b>str1</b> is: ".$str1."<br><b>str2</b> is: ". $str2."<p>";
					  if (similar_text( $str1, $str2, $p ) > .70){
					  	 echo '<b><u>phrases are similar</u></b>';
					  	 	// flag str2 in the array so that it is not displayed as an independent article, but rather as a Related Headline
					  	} else {
					  		
					  		echo 'phrases not similar';
                // if phrases are not similar then print this article as the next independent news article in the news list
							
					  }
					  echo "Percent: $p%";
	
	   			   }
			   }
			?>
<ol>
				<li>
					<strong><a href="<?php echo $rownews[source]; ?>"><?php echo $rownews[headline]; ?></a></strong><br />
					<span style="font-size:0.8em; color:#999; height: 10px;">&rarr; <a href="<?php echo $rownews[source]; ?>"><?php echo $rownews[date]; ?></a> | <?php echo $rownews[date]; ?></span><br />
					<div style="font-size:1em; color:#000; height: 130px;"><?php echo $rownews[body]; ?></div>
				</li>
			<?php  

		
  }
  
  

	?>

	</ol>
</div>
The problem with this code is that the foreach loop is printing and comparing EVERY element in the array... so that $str2 is assigned each element in the array.. I only want $str2 to be assigned the second array element (which is the headline element) so that I can then compare it with $str1.

How do I do this?
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


All times are GMT. The time now is 10:59.

 

Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
LinkBacks Enabled by Web Hosting 3.1.0
Copyright © 2001-2008, eUKhost.com. All rights reserved.

 
Site Map

VPS Hosting
VPS Hosting plans

Dedicated Server Hosting
Dedicated Server plans

Business Web Hosting
100% uptime Hosting

Cpanel Hosting
cPanel Shared Hosting

Reseller Hosting
Reseller Web Hosting

Windows Hosting
Windows Shared Hosting

Windows VPS

Windows VPS Hosting

Semi Dedicated Servers
Semi-Dedicated Hosting

Dedicated Server Mirroring
Dedicated Server Mirroring

Webhosting Knowledgebase
Frequently asked Questions

Web Hosting Blog
eUKhost Blog

Web Hosting Support
Support Helpdesk

UK Data Center
eUKhost Datacenter

Web Hosting Forum
eUKhost Forum

Support Tutorials
Online Flash Tutorials

Offsite Back-up Plans
Remote Backup Service

Customer Testimonials
eUK Customer Testimonials


knowledgebase articles

eUKhost.com Services

Pre-Sales Questions
Pre-sales FAQ's

Domain Names
Domain registration FAQ's

cPanel Hosting
cPanel Hosting FAQ's

Windows Web Hosting
Plesk Control Panel

Reseller Hosting
Reseller Hosting FAQ's

VPS Hosting
Virtual Private Server

Semi-Dedicated Servers
Semi-Dedicated FAQ's

Dedicated Servers
Dedicated Server Hosting


popular blog categories


Web Hosting
Website Hosting articles

UK Web Hosting
UK Hosting articles

Dedicated Server Hosting
Dedicated Server guidelines

VPS Hosting
VPS hosting articles

cPanel Hosting
cPanel Hosting articles

Linux Operating System
Linux Operating techniques

Windows Web Hosting
Windows plesk articles