• benow over 9 years ago

    benow edited over 9 years ago
    I've created a small java client library for the XML api. It's based on a MusicBrainz XML REST library I also maintain, so the code is somewhat refined already. It does client side caching, gzip compression and request throttling. Request XML is parsed into basic objects (Album, Artist, Label, Search, etc) for easy access to the data. The original XML response is retained in the objects, which allows for direct access should accessors not be declared in the objects (ie it's future-proof should the XML api change). There are no external dependencies.

    It's quick and easy to use:
    // create new client with your api key
    Discogs discogs=new Discogs("1234");
    Artist artist=discogs.getArtist("Richard H. Kirk");
    System.out.println(artist.getProfile());
    System.out.println(artist);


    and can be found at http://benow.ca/projects/discogs-java/

    Thanks to all for the great work on and contributions to discogs.com!
  • benbono over 9 years ago

    That sounds great! Under what license terms are you distributing this software?
  • benow over 9 years ago

    It's currently licensed as LGPL, http://www.gnu.org/copyleft/lesser.html. Use in whatever form as long as changes remain open, preferably pushed back to me. I'm open to change should the discogs guys want it.
  • benow over 9 years ago

    benow edited over 9 years ago
    I've added dump file interation support, after discoving the existence of dumps via the DiscogsNet API. It is now possible to produce discogs objects (Release, Artist, Label) when iterating through a dump file.

    An example of use is
    ArtistProducer prod = new ArtistProducer();
    prod.produce(inFile, new ProductionHandler<Artist>() {
    &nbsp;&nbsp;public void onProduce(ObjectProducer<Artist> producer, Artist produced) {
    &nbsp;&nbsp;&nbsp;&nbsp;doSomethingWith(produced);
    }});

    Iteration also includes precalculation, for easy iteration progress updates.

    It can be found in the usual place.

    EDIT: updated as procedure has chaged since using SAX processing.
  • Sikke303 over 9 years ago

    Hi, :)

    Seems nice, I'll test it soon :)

    Perhaps I'll will use a part of it in my program.

    Peace

    Sikke
  • audio78 over 9 years ago

    Hi benow,
    excellent work! Thank you :)

    I have found a small bug:
    ArtistRelease getType() always returns null. In the source code it is:
    return getStringByPath("  MAiN");

    but the attribute "main" does not exist in the element "release" of the xml. Correctly, it is called "type"
    return getStringByPath("  type");

    I am looking forward to use your lib :) Keep on the great work.
  • marcoc1712 over 9 years ago

    marcoc1712 edited over 9 years ago
    Hi,

    I'm tring to use and debug your library that looks cool, but I could not call the discogs.get_Artist(Name) method on names with non ASCII characters, like "Stéphane Pompougnac" (returned by the search...).

    Tried to URL Encode the URL, (both UTF-8 and Windows-1252), but I get always the same answer from the server: java.io.FileNotFoundException.

    Any advice?

    Marco.

  • benow over 9 years ago

    Heyas,

    audio78, I've fixed this. Thx for the report.

    marcoc1712, yours was not so easy, but it's fixed. This did it

    String convName = URLEncoder.encode(name, "ISO-8859-1");
    Element resultE = loadResult("/artist/" + convName);

    There's also been a fix for dates and link from the project page (thx Tim). There also was a fix for missing release id's (thx Rob) .

    The datacenter hosting benow.ca is going thru electrical upgrades now, so I can't deploy... I will post a note here when deployed.

    Thx for all the reports. I've been using it to import the dumps into postgres, millions of objects without error... so it's holding up under some heavy use.
  • marcoc1712 over 9 years ago

    Hi,

    tried to mod DocumentLoader as you suggest, now the url seems to be correctly encoded, and if I hit a :

    discogs.get_Artist("Stéphane Pompougnac") is OK:

    Hitting url: http://discogs.com/artist/St%E9phane+Pompougnac?f=xml&api_key= xxxxx

    I did the same mod in SEARCH, then if I hit this:

    discogs.search("artist", "Stéphane Pompougnac")

    I get this:

    Hitting url: http://discogs.com/search?type=artist&q=St%E9phane+Pompougnac&f=xml&api_key=xxxx

    java.lang.RuntimeException: Error during fetch
    ...
    Caused by: java.io.FileNotFoundException: http://www.discogs.com/search?type=artist&q=St%E9phane+Pompougnac&f=xml&api_key=xxxx
    ....

    With this:

    discogs.search("artist", "Stéphane Pompougnac")

    I get This:

    Hitting url: http://discogs.com/search?type=artist&q=St%C3%A9phane+Pompougnac&f=xml&api_key=xxxx

    and the expected results.

    Looks to me as SEARCH and ARTIST ws uses different encoding.

    By the way, are you aware that SEARCH dont care about the 'type' parameter? In the result I found releases too.

    Thanks for your precious aid.

    Marco.

  • marcoc1712 over 9 years ago

    Ops...

    I was using 'artist' instead of 'artists' as parameter in search, code is Ok.

    I think you miss, maybe by design, Real_name in Artist:

    public String getRealName() {
    return getStringByPath("realname");
    }

    regards, Marco.
  • benow over 9 years ago

    Marco, you're right. Artists is ISO-8859-1 and search is UTF-8, wierd. It now works. I've also made the realName addition.

    FYI's, the API is designed to allow discogs the most amount of room to move. If the xml schema changes, those changes can be accessed directly from the DOM by using getElement(), getByPath(String) or getNodeByPath(String). So artist.getByPath(" @some-new-attribute") would get the value for some-new-attribute of the artist element. If I've not surfaced something, you can email me (andy@benow.ca) or post here, and I'll add it to the objects... but it doesn't need to be in the objects to be used.

    I've updated the code with all the fixes, it can be found in the usual place:
    http://benow.ca/projects/discogs-java/
  • audio78 over 9 years ago

    thank you for your quick fixes.
    I have a question regarding UTF-8. It seems my jsp does not display utf-8 characters correctly.
    Do you return always utf-8 encoded Strings? I have tried several utf-8 settings in the code.
    E.g.
    http://discogs.com/release/507569?f=xml&api_key=YOURKEY
    track.getTitle() for track2
    returns "Frühlingsblume" instead "Frühlingsblume".
    Thank you.
  • benow over 9 years ago

    Hmm, it was defaulting to the java system default, which is UTF-16, I believe. It seemed to be working well here, but I've forced it to UTF-8 and it seems to be working:

    Release r = discogs.getRelease("507569");
    System.out.println("Track name with UTF-8 char: " +r.getTracks().get(1).getTitle());

    Track name with UTF-8 char: Frühlingsblume

    The umlaut is displayed correctly in the console and in the xml dump. It might be due to the loading of XML in your webapp?? Maybe... to be sure output xml is parsed correctly (if that's what you're doing), I've added an XML header to the toString() methods:

    search.toString() now results in
    &lt;?xml version="1.0" encoding="UTF-8"?>
    &lt;resp requests="41" stat="ok" version="1.0">
    &lt;searchresults end="20" numResults="103" start="1">
    ...

    I've done a bit of research, and found this:
    http://hootcook.blogspot.com/2009/04/java-charset-encoding-utf-8.html

    I tried adding the modification to the code directly, but it seems to mess things up. Perhaps you might want to try this:
    String s=track.getTitle();
    System.out.println(new String ( s.getBytes("ISO-8859-1"), "UTF-8"));

    There is no encoding specified in the discogs XML, but it seems to be UTF-8, however, as seen above, it might be 8859-1. All a little hacky, I know. Let me know what you find.
  • marcoc1712 over 9 years ago

    Hi benow,
    tried to download from
    http://benow.ca/projects/discogs-java/discogs-java.nightly.tar.gz

    but what I get is the same original version without the mods we were talking about, sure I'm doing somethingh wrong...

    About the Encoding problem by Audio78: are you storing the data in database and then reading back? if Yes, you coud check if the DB connection and the schema/table/column coollection are setted to be in UTF-8.

    Looks like You make a double encoding.

    Marco.
  • audio78 over 9 years ago

    thank you very much for your tips, benow and marco.
    I did not make any db-connection but using directly the API. Anyhow I solved the problem after some research (already set the request/response header, setting a filter, setting to utf in every jsp file): Adding the line "-Dfile.encoding=UTF-8" to the eclipse.ini (as I am using Eclipse) did the trick. Very strange that this setting affects also the rendered jsp/servlet code, but it seems that it affects every file in Eclipse.
  • benow over 9 years ago

    Oops, sorry about that... a problem with my deployment:
    http://benow.ca/project/discogs-java.nightly.tar.gz

    Audio78, interesting. I think I might do the same here.
  • marcoc1712 over 9 years ago

    Hi,

    just for your Info I've got an error compiling:

    Method: createStream in DiscogsDumpProducer does not override..., just commented the  override, now it's fine.

    Thanks, Marco.

    p.s.

    How long does last the complete import of the dump files? how many releases are in the file?

    Thanks again.

  • benow over 9 years ago

    benow edited over 9 years ago
    Hmm, builds ok for me.

    It takes about 7hrs to import all three and there are 2.8M releases and 1.8M artists. The process is unoptimized, and surely could be improved. I'm not breaking it up into full tables, either... rather, I am extracting the info I need for lookup (name, title, etc) into cols and storing the xml. When I find what I need, I create the objects from XML. This means rapid lookup, name completion and no net fetch for XML. It also means I don't have to replicated the full discogs schema. It works well.

    AFA optimization goes, I'm running into a similar issue with another project, and there seems to be much optimization that can be done:
    http://benow.ca/forum/News/Speeding%20insert%20performance%20for%20Apache%20Derby%20%28Java%20DB%290
  • marcoc1712 over 9 years ago

    Hi,

    thanks for the Info. I've tried some time ago to import the dump files but I've got so many errors and it was taking so much time that I've give up.

    I think is a good Idea store the XML in the DB and build up the object structure only when needed, I'm tryng with Artists, seems ok, the only problem is the list of releases in the XML, sometime take a wile to be parsed, maybe I have to clean it up and make a link to the release table or similar, Have You did somethingh similar?

    By the way, I'm facing another trouble with encoding: looking for "Elīna Garanča" spot this error:

    Caused by: java.io.IOException: Error parsing: The markup in the document preceding the root element must be well-formed.
    at org.benow.java.rest.XML.loadDocument(XML.java:64)
    at org.benow.java.rest.DocumentLoader.loadDocument(DocumentLoader.java:285)
    at org.discogs.ws.Discogs.loadResult(Discogs.java:160)
    ... 45 more

    Searching reports no results.

    Thanks, Marco.
  • marcoc1712 over 9 years ago

    Errata corrige:

    Searching result in "Elīna Garanča" (in search.getSearchResults()), is the following get_artist that fail.

    Marco.

  • benow over 9 years ago

    Marco, yes it takes time, but it works well. I'm using SAX for parsing the XML and it is very quick and reliable. The database persistence is probably the slow part. Perhaps you want to try an in-memory database:
    http://benow.ca/forum/News/Speeding%20insert%20performance%20for%20Apache%20Derby%20%28Java%20DB%290
    That might not be directly applicable for you, but a similar approach might speed things up.

    I'll look into the search result problem.
  • marcoc1712 over 8 years ago

    Hi Benow,

    After some google and Trial I've found the problem with URLEncoding: you must provvide it a string ISO-8859-1 compatible, "Elīna Garanča" it's not, so you have to 'escape' it from UTF-8 instead (and this is also the w3c suggestion).

    I know a function in PERL (CGI::escapeHTML) who is doing exactly this, i was not able to find an equivalent in java, but I've noticied that in searchresult disocgs provvide the URI string correctly escaped, so I've tried this:

    try{

    // get the UTF-8 encoded name of "Elīna Garanča"
    String name = get_from_db_Elīna_Garanča();

    org.discogs.ws.search.Search search =
    discogs.search("artists", name);

    List <org.discogs.ws.search.SearchResult> resultList
    = search.getSearchResults();
    }
    catch (Exception Ex)
    {
    Ex.printStackTrace();
    }
    for (org.discogs.ws.search.SearchResult res : resultList)
    {
    org.discogs.model.Artist artist = null;

    try {
    String urlStr = res.getURL().toString().substring(30);

    artist = discogs.getArtist(urlStr);
    }
    catch(Exception Ex)
    {
    return null;
    }
    ...
    }

    And it works.

    I did this way to live untouched your job, but for sure is not the cleanest way to do it, since you get the URI string and build the url, then I convert it back to string, extract the 'name' from fixed position and then you convert it back to an URL...

    Maybe there is a reason why you did not use the URI from the beginning, but I think using the one provvided by Discogs could be the better solution to avoid all the encoding and decoding problems we have to face otherways.

    By the way, are you aware you miss ExactResults contents in the search?
    And maybe you could also fix a little bug in get_artist:

    line:
    convAnv = URLEncoder.encode(name, "ISO-8859-1")

    should be:

    convAnv = URLEncoder.encode(anv, "ISO-8859-1")

    Hope is usefull.

    N.B.

    I could not correctly paste the code, some character or sequence is lost (i.e everything in between greater and minor signs in lists declarations), so please look at it as a concept, sorry for that.

    Regards, Marco.

  • marcoc1712 over 8 years ago

    Hi,

    I'm feeling a little bit stupid...

    the name part of the URL in search result is just the result of:

    encoded = URLEncoder.encode(name, "UTF-8");

    supposed name is a valid utf-8 string.

    You could not use "encoded" directly, becouse his escape mapping could be not compatible with "ISO-8859-1" (remember Stéphane Pompougnac), then you have to escape the encoded string:

    escaped = URLEncoder.encode(encoded, "ISO-8859-1");

    So, you could live your software as it is, but the caller has first to URLencode the utf-8 string (as I'm doing now) OR you could encode and escape (and You'll need this if you call get_artist on a search result).

    In any case, I suggest you to make some encoding check in input, in order to avoid to double encode an already URLencoded utf-8 string or try to URLencode utf-8 a non utf-8 string... who kwnows what will happen...

    Anytime Unicode is involved is like this, trouble, trouble, trouble...

    Hope is definitive.

    Marco.

  • marcoc1712 over 8 years ago

    Hi Benow,

    I've found a little bug in Release.getExtraArtists()

    the line :

    NodeList cn = aE.getElementsByTagName("artists");

    should be:

    NodeList cn = aE.getElementsByTagName("artist");

    Hope it Helps you.

    Marco.
  • benow over 8 years ago

  • marcoc1712 over 8 years ago

    Hi Benow,

    I'm using Your API and it's a great help! thanks for this.

    Now I should retrieve from the search more than the first 20 results and seems to me you do not provvide any method to do this, If i mess something please please advise me how, in the meanwhile, I've found an easy way to let discogs.search do the job whit a very little change in DISCOGS.search() method:

    public Search search(
    String type,
    String term,
    String page) {
    try {
    String convTerm = URLEncoder.encode(term, "UTF-8");
    Element resultE = loadResult("/search?" + (type == null ? "" : "type=" + type + "&") + (page == null ? "" : "page=" + page + "&")+"q=" + convTerm);
    if (resultE != null)
    return new Search(resultE,
    this);
    return null;
    } catch (UnsupportedEncodingException e) {
    e.printStackTrace();
    return null;
    }
    }

    This way, you could loop over the search call, asking for the nth page, until you get no more results, even better you could read the result header for the line "searchresults end="60" numResults="65" start="41"" and precalculate the number of iteration you need to get the complete list.

    Hope could Help you and others.

    Marco.
  • audio78 over 8 years ago

    audio78 edited over 8 years ago
    Hi Benow,
    still great API, thank you :)
    I have a question regarding the understanding of the caching. I see, when I send a request to discogs, the xml is stored in the given cache directory.
    How long do you look up this xml/how long is the xml valid for caching?
    Say, some hours later I do the same request. Are you then first look in the cache dir and load it from the xml, if it is found?

    Edit: just seen that a new Discogs Api 2.0 is out. Are you planning to upgrade your javaAPI to it? Would be great.
    Also seen that they are now more into JSON instead xml, but have an option to return everything still in xml, so in that case I assumme the structure of the javaAPI may not be changed in a big way.

Log In You must be logged in to post.