Website auslesen (Pagescraper Problem)

phistoh · Beitrag von **phistoh** » 24.08.2005, 03:50

Jo... ich hab mal ne Frage... wenn ich in Samurize das Pagescraper Plugin verwende, um Daten von dieser Seite auszulesen (http://www.last.fm/user/phistoh/) klappt es nicht...

(Also, ich hab halt alles richtig eingestellt und wenn ich auf Download klicke tut sich nichts... andere Seiten klappen jedoch...)

Mit dem Script

Code: Alles auswählen

'------------------------------------------------------------------------ 
' Internet Script Template (v1.0) 
'------------------------------------------------------------------------ 
' 
' *** SCRIPT REQUEST *** 
' 
' A customisable template that should allow you to effectively write your 
' own scripts with very little (or no) VBscripting knowledge. All you 
' have to do is examine the HTML source for the page you want to get 
' data from and change the six variables below. 
' 
' The script is already set up to work with FilePlanet.com, so if you 
' want to have a good idea as to how this script works, I advise that 
' you examine the source of the fileplanet.com website as well. 
' 
' PLEASE NOTE: I will provide very little (if any) support for this 
' ------------ script. I only made it because I was asked by a user of my 
'              portal for a Script Template so he could have a go at 
'              making his own scripts. 
' 
'              There are quite a few things that could go wrong with it 
'              (as far as wetting the variables below is concerned) so 
'              only attempt to use it if you have *SOME* idea of what 
'              you are doing ;) 
'              
' 
'                                       - NeM 
'------------------------------------------------------------------------ 
' Stop by NeM Portal at http://samurize.breezeland.com/ 
'------------------------------------------------------------------------ 

'========================================================================= 
' VARIABLES TO CHANGE: 
'========================================================================= 

'========================================================================= 
' The name of the temp file you want to use for the script (needed for 
' the GetInfoTempFile() function only) 
'========================================================================= 
TMP_FILE = "fileplanet.tmp" 
'========================================================================= 



'========================================================================= 
' The url of the site you want to get your data from 
'========================================================================= 
SITE_URL = "http://www.fileplanet.com/" 
'========================================================================= 



'========================================================================= 
' A string in the HTML source to verify that you have got the right page 
' (eg. use the page title) 
'========================================================================= 
PAGE_CHECK = "<title>FilePlanet" 
'========================================================================= 



'========================================================================= 
' String to search for first - to advance to a spot just before each 
' data item you want to extract. 
' (NOTE: text containing quote marks (") must have two " marks to signify 
' a " in the text.) 
'========================================================================= 
FIRST_CHECK = "class=""feature"">" 
'========================================================================= 



'========================================================================= 
' String marking the end of all text before the text you are interested in 
' (eg. if you were interested in the text of a hyperlink: 
' 
'         <a href="somesite">My Text</a> 
' 
' you would set this to  "somesite"">" for example. 
'========================================================================= 
FIRST_CHECK2 = ".shtml"">" 
'========================================================================= 


'========================================================================= 
' String marking the end of the text you are interested in. eg. if you 
' were interested in the text of a hyperlink: 
' 
'         <a href="somesite">My Text</a> 
' 
' you would set this to "</a>" 
'========================================================================= 
END_CHECK = "</a>" 
'========================================================================= 




'========================================================================= 
' DO NOT MODIFY ANYTHING PAST HERE UNLES YOU KNOW WHAT YOU ARE DOING! 
'========================================================================= 

Function GetInfo()       

   dim htmlResult,output,newsItem,mainFeature 
   htmlResult = ReturnHTML(SITE_URL) 
   output = NULL 

   startPos = instr(htmlResult, PAGE_CHECK) 

   if startPos > 0 then 


      Do While instr(startPos, htmlResult, FIRST_CHECK)>0 

         posOne = instr( startPos, htmlResult, FIRST_CHECK) 
         posOne = instr( posOne, htmlResult, FIRST_CHECK2) + Len(FIRST_CHECK2)    

         posTwo = instr( posOne, htmlResult, END_CHECK) 

         if posOne > 0 AND posTwo > posOne Then 

            'grab the name of the news item 
            newsItem = right(mid( htmlResult, posOne, posTwo-posOne ), posTwo-posOne) 

         end if 

         ' new start position 
         startPos = posTwo 

         'append the user to the list and put a newline char after it 
         output = output & newsItem 

         ' don't add a newline to the last item 
         If instr(startPos, htmlResult, FIRST_CHECK)>0 Then 
            output = output & (Chr(13) & Chr(10)) 
         End If 



      Loop 

   else 
      output = "Could not obtain data." 
   end if 

   GetInfo = TrimHTML(output) 
End Function 

Function GetInfoTempFile() 

   set fs=CreateObject("Scripting.FileSystemObject") 
   set f=fs.CreateTextFile(TMP_FILE,true) 
   f.write(GetInfo()) 
   f.close 
   set f=nothing 
   set fs=nothing 

   GetInfoTempFile = "OK!" 

End Function 

Private Function ReturnHTML(sURL) 
   Dim objXMLHTTP,HTML 
   Set objXMLHTTP = CreateObject("Microsoft.XMLHTTP") 
   objXMLHTTP.Open "GET", sURL, False 
   objXMLHTTP.Send 
   HTML = objXMLHTTP.responseBody 
   Set objRS = CreateObject("ADODB.Recordset") 
   objRS.Fields.Append "txt", 200, 45000, &H00000080 
   objRS.Open 
   objRS.AddNew 
   objRS.Fields("txt").AppendChunk HTML 
   ReturnHTML = objRS("txt").Value 
   objRS.Close 
   Set objRS = Nothing 
   Set objXMLHTTP = Nothing 
End Function 

' ****************************** 
' Following function by Alderaic 
' ****************************** 

'very simple function that will remove all html tags 
Private Function TrimHTML(str) 
pos_deb = InStr(1, str, "<") 
Do Until pos_deb = 0 
    pos_fin = InStr(pos_deb, str, ">") 
    part_d = Mid(str, 1, pos_deb - 1) 
    part_f = Mid(str, pos_fin + 1, Len(str) - pos_fin) 
    str = part_d & part_f 
    pos_deb = InStr(1, str, "<") 
Loop 
TrimHTML = str 
End Function

komm ich auch noch nicht ganz klar... ich kam soweit, dass wenn ich die erste Position der Top-Weekly Artists auslesen wollte er mir zweimal Postion 1 anzeigte, und als ich die 2. Postion auslesen wollte er mir die 2. und die 3. anzeigte...

Aber egal... im Prinzip will ich nur wissen, wieso das mit Pagescraper nicht funktioniert... weil es imho damit komfortabler einzustellen wäre...

Beitrag von **Tante Google** » 24.08.2005, 03:50

flambee1 · Beitrag von **flambee1** » 24.08.2005, 09:44

Ich schlage vor, dass du dich damit im samurize.com-Forum direkt an den Ersteller ("uziq") wendest. Das Plug ist seht neu und hin und wieder zickt das noch rum. Ich selber habe bisher noch nicht viel damit gemacht und kann dir das nicht sonderlich beiseite stehen.

Gruss Flam