Website auslesen (Pagescraper Problem)
Website auslesen (Pagescraper Problem)
Jo... ich hab mal ne Frage... wenn ich in Samurize das Pagescraper Plugin verwende, um Daten von dieser Seite auszulesen (http://www.last.fm/user/phistoh/) klappt es nicht...
(Also, ich hab halt alles richtig eingestellt und wenn ich auf Download klicke tut sich nichts... andere Seiten klappen jedoch...)
Mit dem Script
komm ich auch noch nicht ganz klar... ich kam soweit, dass wenn ich die erste Position der Top-Weekly Artists auslesen wollte er mir zweimal Postion 1 anzeigte, und als ich die 2. Postion auslesen wollte er mir die 2. und die 3. anzeigte...
Aber egal... im Prinzip will ich nur wissen, wieso das mit Pagescraper nicht funktioniert... weil es imho damit komfortabler einzustellen wäre...

Mit dem Script
Code: Alles auswählen
'------------------------------------------------------------------------
' Internet Script Template (v1.0)
'------------------------------------------------------------------------
'
' *** SCRIPT REQUEST ***
'
' A customisable template that should allow you to effectively write your
' own scripts with very little (or no) VBscripting knowledge. All you
' have to do is examine the HTML source for the page you want to get
' data from and change the six variables below.
'
' The script is already set up to work with FilePlanet.com, so if you
' want to have a good idea as to how this script works, I advise that
' you examine the source of the fileplanet.com website as well.
'
' PLEASE NOTE: I will provide very little (if any) support for this
' ------------ script. I only made it because I was asked by a user of my
' portal for a Script Template so he could have a go at
' making his own scripts.
'
' There are quite a few things that could go wrong with it
' (as far as wetting the variables below is concerned) so
' only attempt to use it if you have *SOME* idea of what
' you are doing ;)
'
'
' - NeM
'------------------------------------------------------------------------
' Stop by NeM Portal at http://samurize.breezeland.com/
'------------------------------------------------------------------------
'=========================================================================
' VARIABLES TO CHANGE:
'=========================================================================
'=========================================================================
' The name of the temp file you want to use for the script (needed for
' the GetInfoTempFile() function only)
'=========================================================================
TMP_FILE = "fileplanet.tmp"
'=========================================================================
'=========================================================================
' The url of the site you want to get your data from
'=========================================================================
SITE_URL = "http://www.fileplanet.com/"
'=========================================================================
'=========================================================================
' A string in the HTML source to verify that you have got the right page
' (eg. use the page title)
'=========================================================================
PAGE_CHECK = "<title>FilePlanet"
'=========================================================================
'=========================================================================
' String to search for first - to advance to a spot just before each
' data item you want to extract.
' (NOTE: text containing quote marks (") must have two " marks to signify
' a " in the text.)
'=========================================================================
FIRST_CHECK = "class=""feature"">"
'=========================================================================
'=========================================================================
' String marking the end of all text before the text you are interested in
' (eg. if you were interested in the text of a hyperlink:
'
' <a href="somesite">My Text</a>
'
' you would set this to "somesite"">" for example.
'=========================================================================
FIRST_CHECK2 = ".shtml"">"
'=========================================================================
'=========================================================================
' String marking the end of the text you are interested in. eg. if you
' were interested in the text of a hyperlink:
'
' <a href="somesite">My Text</a>
'
' you would set this to "</a>"
'=========================================================================
END_CHECK = "</a>"
'=========================================================================
'=========================================================================
' DO NOT MODIFY ANYTHING PAST HERE UNLES YOU KNOW WHAT YOU ARE DOING!
'=========================================================================
Function GetInfo()
dim htmlResult,output,newsItem,mainFeature
htmlResult = ReturnHTML(SITE_URL)
output = NULL
startPos = instr(htmlResult, PAGE_CHECK)
if startPos > 0 then
Do While instr(startPos, htmlResult, FIRST_CHECK)>0
posOne = instr( startPos, htmlResult, FIRST_CHECK)
posOne = instr( posOne, htmlResult, FIRST_CHECK2) + Len(FIRST_CHECK2)
posTwo = instr( posOne, htmlResult, END_CHECK)
if posOne > 0 AND posTwo > posOne Then
'grab the name of the news item
newsItem = right(mid( htmlResult, posOne, posTwo-posOne ), posTwo-posOne)
end if
' new start position
startPos = posTwo
'append the user to the list and put a newline char after it
output = output & newsItem
' don't add a newline to the last item
If instr(startPos, htmlResult, FIRST_CHECK)>0 Then
output = output & (Chr(13) & Chr(10))
End If
Loop
else
output = "Could not obtain data."
end if
GetInfo = TrimHTML(output)
End Function
Function GetInfoTempFile()
set fs=CreateObject("Scripting.FileSystemObject")
set f=fs.CreateTextFile(TMP_FILE,true)
f.write(GetInfo())
f.close
set f=nothing
set fs=nothing
GetInfoTempFile = "OK!"
End Function
Private Function ReturnHTML(sURL)
Dim objXMLHTTP,HTML
Set objXMLHTTP = CreateObject("Microsoft.XMLHTTP")
objXMLHTTP.Open "GET", sURL, False
objXMLHTTP.Send
HTML = objXMLHTTP.responseBody
Set objRS = CreateObject("ADODB.Recordset")
objRS.Fields.Append "txt", 200, 45000, &H00000080
objRS.Open
objRS.AddNew
objRS.Fields("txt").AppendChunk HTML
ReturnHTML = objRS("txt").Value
objRS.Close
Set objRS = Nothing
Set objXMLHTTP = Nothing
End Function
' ******************************
' Following function by Alderaic
' ******************************
'very simple function that will remove all html tags
Private Function TrimHTML(str)
pos_deb = InStr(1, str, "<")
Do Until pos_deb = 0
pos_fin = InStr(pos_deb, str, ">")
part_d = Mid(str, 1, pos_deb - 1)
part_f = Mid(str, pos_fin + 1, Len(str) - pos_fin)
str = part_d & part_f
pos_deb = InStr(1, str, "<")
Loop
TrimHTML = str
End Function
Aber egal... im Prinzip will ich nur wissen, wieso das mit Pagescraper nicht funktioniert... weil es imho damit komfortabler einzustellen wäre...

-
Tante Google
Ich schlage vor, dass du dich damit im samurize.com-Forum direkt an den Ersteller ("uziq") wendest. Das Plug ist seht neu und hin und wieder zickt das noch rum. Ich selber habe bisher noch nicht viel damit gemacht und kann dir das nicht sonderlich beiseite stehen.
Gruss Flam
Gruss Flam
Samurize hopefully never dies